<div style="text-align: center; margin-left: 0em; font-weight: bold; font-size: 20px; font-family: TimesNewRoman;">
    TIME SERIES DATA PROCESSING | RESERVOIRS LEVEL - Main Notebook
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Each part of the following script was used to proccess the raw data for the Reservoirs Level Time Series Raw Data for all the european countries of the Dispa-SET_Unleash project.
<br>
Read explanation text cells to follow and understand all the process until final results were got stept by step.
</div>

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    1. Notebook Set Up
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Importing needed libraries
</div>

In [259]:
import pandas as pd
import re
import csv
import os
import requests
from urllib.parse import urlparse, parse_qs
import datetime
from bs4 import BeautifulSoup
import http.client
from multiprocessing import Pool
import shutil
import numpy as np

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    2. Dispa-SET_Unleash Folder Path
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Determinning dynamically the zone_folder_path based on the location of the "Dispa-SET_Unleash" folder relative to the current working directory.
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- If the "Dispa-SET_Unleash" folder is copied to a different machine or location, the dispaSET_unleash_folder_path variable will automatically adjust accordingly.
</div>

In [235]:
# Get the current working directory
current_directory = os.getcwd()

# Navigate to the parent directory of "Dispa-SET_Unleash"
dispaSET_unleash_parent_directory = os.path.dirname(current_directory)

# Get the path to the "Dispa-SET_Unleash" folder
dispaSET_unleash_folder_path = os.path.dirname(dispaSET_unleash_parent_directory)

# Construct the dispaSET_unleash_folder_name variable
dispaSET_unleash_folder_name = os.path.basename(dispaSET_unleash_folder_path)

print("dispaSET_unleash_folder_name:", dispaSET_unleash_folder_name)
print("dispaSET_unleash_folder_path:", dispaSET_unleash_folder_path)

dispaSET_unleash_folder_name: Dispa-SET_Unleash
dispaSET_unleash_folder_path: /home/ray/Dispa-SET_Unleash


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    3. Usefull Variable Definition
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Entering a value to all the variables which content are going to be used in some of the next stages of this script. 
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- Indicate the year of all data is referring to in the variable data_year.
<br>
- The universal_standar_time variable is going to be used to download all the time series data in this horary zone. Additionally as each european country belongs a particular time sector the corresponding time series data related to its time sector are going to be downloaded as well but in a different file.
</div>

In [236]:
# Year to which data refers to:
data_year = 2023

# Universal standad time:
universal_standard_time = 'UTC'

# Western European Time:
western_european_time = 'WET_WEST'

# Central European Time:
central_european_time = 'CET_CEST'

# Eastern European Time:
eastern_european_time = 'EET_EST'

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
4. Reservoirs Level Directories Definition
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Creating the folders that are going to content all the data realted to the Reservoirs Level time series.
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- Futher the downloaded raw data is going to be used to get the Reservoirs Level time series. 
</div>

In [237]:
# Additional string to be appended
additional_path = "/RawData/HydroData/ReservoirLevel/"
additional_path_1 = "/RawData/HydroData/ScaledInflows/"

# Construct the standard_time_data_folder_path variable
reference_data_folder_path = dispaSET_unleash_folder_path + additional_path

# Construct the Reservoir_Level_folder_path variable
reservoir_level_folder_path = dispaSET_unleash_folder_path + additional_path

# Construct the Scalled_Inflow_folder_path variable
scaled_inflows_folder_path = dispaSET_unleash_folder_path + additional_path_1


print("reference_data_folder_path:", reference_data_folder_path)
print("reservoitr_level_folder_path:", reservoir_level_folder_path)
print("scaled_inflows_folder_path:", scaled_inflows_folder_path)

reference_data_folder_path: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/
reservoitr_level_folder_path: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/
scaled_inflows_folder_path: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
4. European Standard Time
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Each of the Europeand Countries Modeled in Dispa-SET are in a particular european standar time zone. So in order to get the appropiated time series data frame, the correspondign european time zone is going to be identificated for each country.
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- Additionally all the time series are also going to be downloaded under the UTC <em>(The World’s Time Standard)</em>.
<br>
- This is going to be done for each country as well.
    <br>
- All this features are going to be saved in a csv file called Reference_Data.csv where all additional characters to the download process are going to be written.
</div>

In [238]:
# Define lists of countries and standard times
countries = [
    "Austria", "Belgium", "Bulgaria", "Switzerland", "Cyprus", "Czech Republic",
    "Germany", "Denmark", "Estonia", "Greece", "Spain", "Finland", "France",
    "Croatia", "Hungary", "Ireland", "Italy", "Lithuania", "Luxembourg", "Latvia",
    "Malta", "Netherlands", "Norway", "Poland", "Portugal", "Romania", "Sweden",
    "Slovenia", "Slovakia", "United Kingdom"
]

dispaSET_codes = ["AT", "BE", "BG", "CH", "CY", "CZ", "DE", "DK", "EE", "EL", "ES", "FI", "FR", "HR", "HU", 
                  "IE", "IT", "LT", "LU", "LV", "MT", "NL", "NO", "PL", "PT", "RO", "SE", "SI", "SK", "UK"
]

standard_times_url_country_base = [
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YAT-APG------L!CTY|10YAT-APG------L&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YBE----------2!CTY|10YBE----------2&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|EET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|EET|DAYTIMERANGE&area.values=CTY|10YCA-BULGARIA-R!CTY|10YCA-BULGARIA-R&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=EET_EEST&dateTime.timezone_input=EET+(UTC+2)+/+EEST+(UTC+3)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YCH-SWISSGRIDZ!CTY|10YCH-SWISSGRIDZ&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|EET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|EET|DAYTIMERANGE&area.values=CTY|10YCY-1001A0003J!CTY|10YCY-1001A0003J&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=EET_EEST&dateTime.timezone_input=EET+(UTC+2)+/+EEST+(UTC+3)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YCZ-CEPS-----N!CTY|10YCZ-CEPS-----N&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10Y1001A1001A83F!CTY|10Y1001A1001A83F&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10Y1001A1001A65H!CTY|10Y1001A1001A65H&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|EET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|EET|DAYTIMERANGE&area.values=CTY|10Y1001A1001A39I!CTY|10Y1001A1001A39I&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=EET_EEST&dateTime.timezone_input=EET+(UTC+2)+/+EEST+(UTC+3)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|EET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|EET|DAYTIMERANGE&area.values=CTY|10YGR-HTSO-----Y!CTY|10YGR-HTSO-----Y&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=EET_EEST&dateTime.timezone_input=EET+(UTC+2)+/+EEST+(UTC+3)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YES-REE------0!CTY|10YES-REE------0&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|EET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|EET|DAYTIMERANGE&area.values=CTY|10YFI-1--------U!CTY|10YFI-1--------U&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=EET_EEST&dateTime.timezone_input=EET+(UTC+2)+/+EEST+(UTC+3)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YFR-RTE------C!CTY|10YFR-RTE------C&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YHR-HEP------M!CTY|10YHR-HEP------M&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YHU-MAVIR----U!CTY|10YHU-MAVIR----U&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YIE-1001A00010!CTY|10YIE-1001A00010&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YIT-GRTN-----B!CTY|10YIT-GRTN-----B&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|EET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|EET|DAYTIMERANGE&area.values=CTY|10YLT-1001A0008Q!CTY|10YLT-1001A0008Q&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=EET_EEST&dateTime.timezone_input=EET+(UTC+2)+/+EEST+(UTC+3)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YLU-CEGEDEL-NQ!CTY|10YLU-CEGEDEL-NQ&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|EET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|EET|DAYTIMERANGE&area.values=CTY|10YLV-1001A00074!CTY|10YLV-1001A00074&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=EET_EEST&dateTime.timezone_input=EET+(UTC+2)+/+EEST+(UTC+3)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10Y1001A1001A93C!CTY|10Y1001A1001A93C&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YNL----------L!CTY|10YNL----------L&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YNO-0--------C!CTY|10YNO-0--------C&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YPL-AREA-----S!CTY|10YPL-AREA-----S&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|WET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|WET|DAYTIMERANGE&area.values=CTY|10YPT-REN------W!CTY|10YPT-REN------W&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=WET_WEST&dateTime.timezone_input=WET+(UTC)+/+WEST+(UTC+1)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|EET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|EET|DAYTIMERANGE&area.values=CTY|10YRO-TEL------P!CTY|10YRO-TEL------P&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=EET_EEST&dateTime.timezone_input=EET+(UTC+2)+/+EEST+(UTC+3)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YSE-1--------K!CTY|10YSE-1--------K&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YSI-ELES-----O!CTY|10YSI-ELES-----O&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YSK-SEPS-----K!CTY|10YSK-SEPS-----K&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|WET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|WET|DAYTIMERANGE&area.values=CTY|10Y1001A1001A92E!CTY|10Y1001A1001A92E&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=WET_WEST&dateTime.timezone_input=WET+(UTC)+/+WEST+(UTC+1)"
]

# Create DataFrame
df = pd.DataFrame({'Country': countries, 'Dispa-SET_Code': dispaSET_codes, 'Full_Standard_Time_Url_Country_Base': standard_times_url_country_base})

# Add new empty column
df['Time_Zone_Country_Code'] = ''

df

reference_data_file_name = 'Reference_Data.csv'

# Construct the full file path
reference_data_file_path = os.path.join(reference_data_folder_path, reference_data_file_name)

# Create the CSV file with the specified name
with open(reference_data_file_path, 'w') as f:
    # Optional: Write a header if needed
    # f.write("header1,header2,header3\n")

# Save DataFrame to the CSV file
    df.to_csv(reference_data_file_path, index=False)

In [239]:
# Load the CSV file into a pandas DataFrame
data = pd.read_csv(reference_data_file_path)

# Function to extract time zone country code from the URL
def extract_time_zone_country_code(url):
    start_index = url.find("&dateTime.timezone_input=") + len("&dateTime.timezone_input=")
    if start_index != -1:  # Check if delimiter is found
        return url[start_index:]  # Extract from start_index till the end
    else:
        return None

# Apply the function to extract time zone country code for each row
data['Time_Zone_Country_Code'] = data['Full_Standard_Time_Url_Country_Base'].apply(extract_time_zone_country_code)

# Write the DataFrame back to the CSV file with the new column included
data.to_csv(reference_data_file_path, index=False)

data

Unnamed: 0,Country,Dispa-SET_Code,Full_Standard_Time_Url_Country_Base,Time_Zone_Country_Code
0,Austria,AT,https://transparency.entsoe.eu/generation/r2/a...,CET+(UTC+1)+/+CEST+(UTC+2)
1,Belgium,BE,https://transparency.entsoe.eu/generation/r2/a...,CET+(UTC+1)+/+CEST+(UTC+2)
2,Bulgaria,BG,https://transparency.entsoe.eu/generation/r2/a...,EET+(UTC+2)+/+EEST+(UTC+3)
3,Switzerland,CH,https://transparency.entsoe.eu/generation/r2/a...,CET+(UTC+1)+/+CEST+(UTC+2)
4,Cyprus,CY,https://transparency.entsoe.eu/generation/r2/a...,EET+(UTC+2)+/+EEST+(UTC+3)
5,Czech Republic,CZ,https://transparency.entsoe.eu/generation/r2/a...,CET+(UTC+1)+/+CEST+(UTC+2)
6,Germany,DE,https://transparency.entsoe.eu/generation/r2/a...,CET+(UTC+1)+/+CEST+(UTC+2)
7,Denmark,DK,https://transparency.entsoe.eu/generation/r2/a...,CET+(UTC+1)+/+CEST+(UTC+2)
8,Estonia,EE,https://transparency.entsoe.eu/generation/r2/a...,EET+(UTC+2)+/+EEST+(UTC+3)
9,Greece,EL,https://transparency.entsoe.eu/generation/r2/a...,EET+(UTC+2)+/+EEST+(UTC+3)


In [240]:
# Open the existing CSV file in read mode
with open(reference_data_file_path, 'r') as csvfile:
    # Read existing data
    reader = csv.reader(csvfile)
    rows = list(reader)

# Add new columns to the header row
header_row = rows[0]
header_row.extend(['Data_Year', 'Universal_Standard_Time', 'Western_European_Time', 'Central_European_Time', 'Eastern_European_Time'])

# Add data to each row
for row in rows[1:]:
    row.extend([data_year, universal_standard_time, western_european_time, central_european_time, eastern_european_time])


# Write back to the CSV file
with open(reference_data_file_path, 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(rows)

print("Columns added successfully to the CSV file at:", reference_data_file_path)

Columns added successfully to the CSV file at: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv


<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [241]:
print (f"dispaSET_unleash_folder_name:                              {dispaSET_unleash_folder_name}")
print (f"dispaSET_unleash_folder_path:                              {dispaSET_unleash_folder_path}")
print (f"data_year:                                                 {data_year}")
print (f"universal_standard_time:                                   {universal_standard_time}")
print (f"western_european_time:                                     {western_european_time}")
print (f"central_european_time:                                     {central_european_time}")
print (f"eastern_european_time:                                     {eastern_european_time}")
print (f"reference_data_folder_path:                                {reference_data_folder_path}")
print (f"reference_data_file_name:                                  {reference_data_file_name}")
print (f"reference_data_file_path:                                  {reference_data_file_path}")
print (f"reservoir_level_folder_path:                               {reservoir_level_folder_path}")

dispaSET_unleash_folder_name:                              Dispa-SET_Unleash
dispaSET_unleash_folder_path:                              /home/ray/Dispa-SET_Unleash
data_year:                                                 2023
universal_standard_time:                                   UTC
western_european_time:                                     WET_WEST
central_european_time:                                     CET_CEST
eastern_european_time:                                     EET_EST
reference_data_folder_path:                                /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/
reference_data_file_name:                                  Reference_Data.csv
reference_data_file_path:                                  /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv
reservoir_level_folder_path:                               /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
5. Main Data Source
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Sellecting the main source to get the raw data of the time series.
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- All the data to be processed is going to be downloaded from one main source:
<br>
<em><strong>ENTSOE Transparency Platform:</strong></em> Which the main url link is the following
<div style="text-align: left; margin-left: 3.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
https://transparency.entsoe.eu/dashboard/show
</div>

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
5.1. Url Donwload Link
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Getting the dowload links to access the web page where the data is goning to be extracted.
    </div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- All the downloaded data use a base url structure as download link.
<br>
- The URL base is lightly modified to get the correponding for each country for all the year from 2016 till 2023.
     <br>
- Additionally and despite the ENTSO E web page manages four Time Zones (UTC, CET/CEST, WET/WEST and EER/EEST), the corresponding values of the hydro resources are not defined under an specific time zone, so it is going to be assumed the UTC to give the final Dispa-SET format time step.
    <br>
- Each URL_country_base link is added to the csv file reference_data_file_path.
</div>

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
5.1.1. Conutry Code
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Adding the Entso E Country Code Nomenclature.
    </div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- The codification that the Entso E web page use to identify each european country was already extracted using the Availability Factor Notebook. So the same is going to be used in this step to form the link for the hydro storage levels data.
</div>

In [242]:
# Read the CSV file into a DataFrame
df = pd.read_csv(reference_data_file_path)

# Define the new column name and data
new_column_name = "Country_Code"
new_column_data = [
"CTY|10YAT-APG------L!CTY|10YAT-APG------L",
"CTY|10YBE----------2!CTY|10YBE----------2",
"CTY|10YCA-BULGARIA-R!CTY|10YCA-BULGARIA-R",
"CTY|10YCH-SWISSGRIDZ!CTY|10YCH-SWISSGRIDZ",
"CTY|10YCY-1001A0003J!CTY|10YCY-1001A0003J",
"CTY|10YCZ-CEPS-----N!CTY|10YCZ-CEPS-----N",
"CTY|10Y1001A1001A83F!CTY|10Y1001A1001A83F",
"CTY|10Y1001A1001A65H!CTY|10Y1001A1001A65H",
"CTY|10Y1001A1001A83F!CTY|10Y1001A1001A83F",
"CTY|10YGR-HTSO-----Y!CTY|10YGR-HTSO-----Y",
"CTY|10YES-REE------0!CTY|10YES-REE------0",
"CTY|10YFI-1--------U!CTY|10YFI-1--------U",
"CTY|10YFR-RTE------C!CTY|10YFR-RTE------C",
"CTY|10YHR-HEP------M!CTY|10YHR-HEP------M",
"CTY|10YHU-MAVIR----U!CTY|10YHU-MAVIR----U",
"CTY|10YIE-1001A00010!CTY|10YIE-1001A00010",
"CTY|10YIT-GRTN-----B!CTY|10YIT-GRTN-----B",
"CTY|10YLT-1001A0008Q!CTY|10YLT-1001A0008Q",
"CTY|10YLU-CEGEDEL-NQ!CTY|10YLU-CEGEDEL-NQ",
"CTY|10YLV-1001A00074!CTY|10YLV-1001A00074",
"CTY|10Y1001A1001A93C!CTY|10Y1001A1001A93C",
"CTY|10YNL----------L!CTY|10YNL----------L",
"CTY|10YNO-0--------C!CTY|10YNO-0--------C",
"CTY|10YPL-AREA-----S!CTY|10YPL-AREA-----S",
"CTY|10YPL-AREA-----S!CTY|10YPL-AREA-----S",
"CTY|10YRO-TEL------P!CTY|10YRO-TEL------P",
"CTY|10YSE-1--------K!CTY|10YSE-1--------K",
"CTY|10YSI-ELES-----O!CTY|10YSI-ELES-----O",
"CTY|10YSK-SEPS-----K!CTY|10YSK-SEPS-----K",
"CTY|10Y1001A1001A92E!CTY|10Y1001A1001A92E"
]  # Replace with your data

# Add the new column to the DataFrame (assuming same length as existing data)
df[new_column_name] = new_column_data

# Save the updated DataFrame back to the same CSV file
df.to_csv(reference_data_file_path, index=False)

print(f"Added new column '{new_column_name}' to CSV file '{reference_data_file_path}'")

Added new column 'Country_Code' to CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv'


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
5.1.2. Links Source 
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Forming all the links raw data source.
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- The country code of each country is going to be used to form the corresponding raw data link for all specified years (2016-2023).
</div>
<div style="text-align: left; margin-left: 4.0em; font-weight: unbold; font-size: 15px; font-family: TimesNewRoman;">
- The if another period of time is required, just change the years into the variables start_time and end_time.
</div>

In [243]:
# Define start and end time variables
start_time = "01.01.2016+00:00"
end_time = "01.01.2024+00:00"

# Base URL
utc_urls_base = "https://transparency.entsoe.eu/generation/r2/waterReservoirsAndHydroStoragePlants/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime={start_time}|UTC|YEAR&dateTime.endDateTime={end_time}|UTC|YEAR&area.values="

# Format the base URL with start and end time variables
formatted_url = utc_urls_base.format(start_time=start_time, end_time=end_time)

# Print the formatted URL
print(formatted_url)


https://transparency.entsoe.eu/generation/r2/waterReservoirsAndHydroStoragePlants/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2016+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=


In [244]:
# Define the file paths
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"

# Define the URL base
#utc_urls_base = "https://transparency.entsoe.eu/generation/r2/waterReservoirsAndHydroStoragePlants/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2016+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values="


# Read the CSV file into a DataFrame
df = pd.read_csv(reference_data_file_path)

# Create a new column called URL_Base_Link
df['URL_Base_Link'] = formatted_url + df['Country_Code']

df.to_csv(reference_data_file_path, index=False)

print(f"Added new column 'URL_Base_Link' to CSV file '{reference_data_file_path}'")

Added new column 'URL_Base_Link' to CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv'


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
5.2. Country / Zone Data  Folders
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Creating all the contening folders for each European country modelled in Dispa-SET.
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- This folders/directories are going to be used to storage all the time series data downloaded from ENTSO E web resource.
<br>
- Additionally all the paths of the future created files are going to be written in the reference_data_file in order to use them for next download stages.
</div>

In [245]:
# Define file paths and variables
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"
#reference_data_folder_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/"
#data_year = "2023"

# Read the reference data file
df = pd.read_csv(reference_data_file_path)

# Create a new column 'Zone_Folder_Path'
df['Zone_Folder_Path'] = reference_data_folder_path + df['Dispa-SET_Code'].astype(str)

# Iterate over each row and create a CSV file
for index, row in df.iterrows():
    zone_folder_path = row['Zone_Folder_Path']
    output_csv_file_path = os.path.join(zone_folder_path, f"{data_year}_1.csv")
    
    # Create directories if they don't exist
    os.makedirs(os.path.dirname(output_csv_file_path), exist_ok=True)
    
    # Write the CSV file
    with open(output_csv_file_path, 'w') as f:
        # Write header if needed
         f.write("Header1,")
        
        # Write data (if needed)
        # f.write("Data1,Data2,...\n")
        
        # Here, you can write any data you want into the CSV file
    
    print(f"CSV file created: {output_csv_file_path}")

df.to_csv(reference_data_file_path, index=False)

CSV file created: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/2023_1.csv
CSV file created: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE/2023_1.csv
CSV file created: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG/2023_1.csv
CSV file created: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CH/2023_1.csv
CSV file created: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CY/2023_1.csv
CSV file created: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CZ/2023_1.csv
CSV file created: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DE/2023_1.csv
CSV file created: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DK/2023_1.csv
CSV file created: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/EE/2023_1.csv
CSV file created: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/EL/2023_1.csv
CSV file created: /home/ray/Dispa-SET_Unleash/RawData/HydroData/Reserv

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
5.3. Raw Data Download
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Downloading the Reservoirs Level Raw Data for each European country modelled in Dispa-SET.
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- The data of all recently created links are extracted and saved in a csv file called with the name of the value of the data_year variable pluss a sufix '_1'.
</div>

In [246]:
# Set max_headers to a higher value
http.client._MAXHEADERS = 1000

# Function to download table data from a URL
def download_table_data(url, output_file_path):
    # Send a GET request to the webpage
    response = requests.get(url)

    # Raise an error if there are too many headers
    if len(response.headers) > 100:
        raise Exception("Too many headers in the response")

    # Parse the HTML content of the webpage
    soup = BeautifulSoup(response.text, 'html.parser')

    # Find the table element
    table = soup.find('table')

    # Extract data from the table
    data = []
    for row in table.find_all('tr'):
        row_data = []
        for cell in row.find_all('td'):
            row_data.append(cell.text.strip())
        if row_data:  # Ensures we don't add empty rows
            data.append(row_data)

    # Write the downloaded data to a CSV file
    with open(output_file_path, 'w', newline='') as csvfile:
        writer = csv.writer(csvfile)
        writer.writerows(data)

    return output_file_path

# Load reference data CSV
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"
reference_data = pd.read_csv(reference_data_file_path)

# Iterate over each row in the reference data
for index, row in reference_data.iterrows():
    # Extract URL and zone folder path from the current row
    url = row['URL_Base_Link']
    zone_folder_path = row['Zone_Folder_Path']

    # Create the corresponding folder if it doesn't exist
    output_folder_path = zone_folder_path
    #os.makedirs(output_folder_path, exist_ok=True)

    # Download table data and save it to a CSV file
    output_file_name = f"{data_year}_1.csv"
    output_file_path = os.path.join(output_folder_path, output_file_name)
    downloaded_file_path = download_table_data(url, output_file_path)

    # Update the Raw_Data_File_Path column with the path of the downloaded CSV file
    reference_data.at[index, 'Raw_Data_File_Path'] = downloaded_file_path

# Save the updated reference data CSV
reference_data.to_csv(reference_data_file_path, index=False)


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
5.3.1. Raw Data Headers
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Adding column names.
<br>
- All the raw data were downloaded without columns identification, the corresponding header to each column is added.
<br>
-Additionally an extra row is going to be added to the recently downloaded data for future interpolation processes.
</div>

In [247]:
# Define the path to the CSV file containing the file paths
#standard_time_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv"

# Read the CSV file containing the paths
df = pd.read_csv(reference_data_file_path)

success = True

# Iterate through each file path in the column 'Time_Series_Raw_Data_File_Path'
for index, row in df.iterrows():
    file_path = row['Raw_Data_File_Path']
    
    # Open the existing CSV file in read mode
    with open(file_path, 'r', newline='') as file:
        # Read the existing content
        reader = csv.reader(file)
        rows = list(reader)

    # Insert a new empty row at the beginning
    rows.insert(0, [])

    # Write the updated content back to the CSV file
    with open(file_path, 'w', newline='') as file:
        writer = csv.writer(file)
        writer.writerows(rows)
        
    print(f"Empty row added at the beginning of file '{file_path}'.")

print("All empty rows added successfully.")

Empty row added at the beginning of file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/2023_1.csv'.
Empty row added at the beginning of file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE/2023_1.csv'.
Empty row added at the beginning of file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG/2023_1.csv'.
Empty row added at the beginning of file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CH/2023_1.csv'.
Empty row added at the beginning of file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CY/2023_1.csv'.
Empty row added at the beginning of file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CZ/2023_1.csv'.
Empty row added at the beginning of file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DE/2023_1.csv'.
Empty row added at the beginning of file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DK/2023_1.csv'.
Empty row added at the beginning of file '/home/

In [248]:
start_year = int(start_time.split('.')[2].split('+')[0])  # Extract the year from start_time
end_year = int(end_time.split('.')[2].split('+')[0])  # Extract the year from start_time

year_range = list(range(start_year+1, end_year+1))

# Convert the integer values to strings
start_year_str = str(start_year)
year_range_str = [str(year) for year in year_range]

# Create the headers list
headers = ['Week', start_year_str] + year_range_str

# Define the path to the CSV file containing the file paths
#standard_time_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv"

# Read the CSV file containing the paths
df = pd.read_csv(reference_data_file_path)

success = True

# Iterate through each file path in the column 'Time_Series_Raw_Data_File_Path'
for index, row in df.iterrows():
    file_path = row['Raw_Data_File_Path']
    
    # Check if the file exists
    if os.path.exists(file_path):
        # Open the file for reading
        with open(file_path, 'r') as file:
            lines = file.readlines()

        # Find the index of the first empty row
        empty_row_index = next((i for i, line in enumerate(lines) if line.strip() == ""), None)

        # If an empty row is found, copy the headers to it
        if empty_row_index is not None:
            lines[empty_row_index] = ','.join(headers) + '\n'

            # Write the updated lines back to the file
            with open(file_path, 'w') as file:
                file.writelines(lines)
        else:
            print(f"No empty row found in file '{file_path}'. Headers not copied.")
    else:
        success = False
        print(f"Error: File '{file_path}' does not exist.")

if success:
    print("Headers copied successfully to the first empty row of all files.")
else:
    print("Some errors occurred while copying headers.")

Headers copied successfully to the first empty row of all files.


In [249]:
# Read the reference data CSV file
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"
df_reference = pd.read_csv(reference_data_file_path)

# Iterate through each zone and its folder path
for index, row in df_reference.iterrows():
    raw_data_file_path = row["Raw_Data_File_Path"]

    # Read the zone's CSV data (handle potential file not found)
    if os.path.exists(raw_data_file_path):
        df = pd.read_csv(raw_data_file_path)
    else:
        print(f"File not found: {raw_data_file_path}")
        continue  # Skip to the next zone if file is missing

    # Process the data (add new row, fill missing values)
    df.loc[len(df)] = ['Week 54'] + [None] * (len(df.columns) - 1)
    for col in df.columns[1:]:
        penultimate_value = df.iloc[-2][col]
        last_value = df.iloc[-1][col]

        if pd.isna(penultimate_value):
            next_col_index = df.columns.get_loc(col) + 1
            if next_col_index < len(df.columns):
                next_col_name = df.columns[next_col_index]
                next_col_first_value = df.iloc[0][next_col_name]
                df.iloc[-2, df.columns.get_loc(col)] = next_col_first_value

        if pd.isna(last_value):
            next_col_index = df.columns.get_loc(col) + 1
            if next_col_index < len(df.columns):
                next_col_name = df.columns[next_col_index]
                next_col_second_value = df.iloc[1][next_col_name]
                df.iloc[-1, df.columns.get_loc(col)] = next_col_second_value

    # Save the modified DataFrame back to the CSV file
    df.to_csv(raw_data_file_path, index=False)

    print(f"Processing completed successfully for file: {raw_data_file_path}")

print("Overall processing finished.")

Processing completed successfully for file: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/2023_1.csv
Processing completed successfully for file: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE/2023_1.csv
Processing completed successfully for file: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG/2023_1.csv
Processing completed successfully for file: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CH/2023_1.csv
Processing completed successfully for file: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CY/2023_1.csv
Processing completed successfully for file: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CZ/2023_1.csv
Processing completed successfully for file: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DE/2023_1.csv
Processing completed successfully for file: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DK/2023_1.csv
Processing completed successfully for file: /hom

  df.loc[len(df)] = ['Week 54'] + [None] * (len(df.columns) - 1)
  df.iloc[-2, df.columns.get_loc(col)] = next_col_first_value
  df.loc[len(df)] = ['Week 54'] + [None] * (len(df.columns) - 1)
  df.iloc[-2, df.columns.get_loc(col)] = next_col_first_value
  df.loc[len(df)] = ['Week 54'] + [None] * (len(df.columns) - 1)
  df.iloc[-2, df.columns.get_loc(col)] = next_col_first_value
  df.loc[len(df)] = ['Week 54'] + [None] * (len(df.columns) - 1)
  df.iloc[-2, df.columns.get_loc(col)] = next_col_first_value
  df.loc[len(df)] = ['Week 54'] + [None] * (len(df.columns) - 1)
  df.iloc[-2, df.columns.get_loc(col)] = next_col_first_value
  df.loc[len(df)] = ['Week 54'] + [None] * (len(df.columns) - 1)
  df.iloc[-2, df.columns.get_loc(col)] = next_col_first_value
  df.loc[len(df)] = ['Week 54'] + [None] * (len(df.columns) - 1)
  df.iloc[-2, df.columns.get_loc(col)] = next_col_first_value
  df.loc[len(df)] = ['Week 54'] + [None] * (len(df.columns) - 1)
  df.iloc[-2, df.columns.get_loc(col)] = next_

In [250]:
# Read the CSV file containing the file paths
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"
df = pd.read_csv(reference_data_file_path)

# Iterate over each file path in the 'Raw_Data_File_Path' column
for file_path in df['Raw_Data_File_Path']:
    # Check if the file exists
    if os.path.exists(file_path):
        # Define the new file name
        new_file_name = f"{start_year}_{end_year}.csv"
        new_file_path = os.path.join(os.path.dirname(file_path), new_file_name)

        # Rename the original file
        os.rename(file_path, new_file_path)

        # Read the original CSV file
        df_original = pd.read_csv(new_file_path)

        # Extract the first column and the column with the same value as data_year
        new_df = df_original[['Week', str(data_year)]]

        # Define the new file name for the extracted data
        new_file_name_extracted = f"{data_year}_1.csv"
        new_file_path_extracted = os.path.join(os.path.dirname(file_path), new_file_name_extracted)

        # Save the extracted data to a new CSV file
        new_df.to_csv(new_file_path_extracted, index=False)

        print(f"Data extracted and saved to {new_file_path_extracted}")

    else:
        print(f"File '{file_path}' does not exist.")

Data extracted and saved to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/2023_1.csv
Data extracted and saved to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE/2023_1.csv
Data extracted and saved to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG/2023_1.csv
Data extracted and saved to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CH/2023_1.csv
Data extracted and saved to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CY/2023_1.csv
Data extracted and saved to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CZ/2023_1.csv
Data extracted and saved to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DE/2023_1.csv
Data extracted and saved to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DK/2023_1.csv
Data extracted and saved to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/EE/2023_1.csv
Data extracted and saved to /home/ray/Dispa-SET_Unleash/RawData/HydroData

In [251]:
# Define the paths
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"
#scaled_inflows_folder_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/"

# Read the reference data CSV file
df = pd.read_csv(reference_data_file_path)

# Add a new column called 'Scalled_Inflows_Folder_Path'
df['Scalled_Inflows_Folder_Path'] = scaled_inflows_folder_path + df['Dispa-SET_Code']

# Save the updated DataFrame back to the CSV file
df.to_csv(reference_data_file_path, index=False)

print("Scalled_Inflows_Folder_Path column added and saved to the reference data file.")

Scalled_Inflows_Folder_Path column added and saved to the reference data file.


In [252]:
# Define the paths
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"

# Read the reference data CSV file
df = pd.read_csv(reference_data_file_path)

# Add new headers
new_headers = ['1h', '30min', '15min']
df = pd.concat([df, pd.DataFrame(columns=new_headers)], axis=1)

# Save the updated DataFrame back to the CSV file
df.to_csv(reference_data_file_path, index=False)

print("New headers added to the reference data file.")

New headers added to the reference data file.


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
5.3.2. Reservoir Level Factor
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Getting the corresponding reservoir level factor.
<br>
- As the full storage capacity of each HPSP and HDAM unit it is not provided by the web source. It is going to work with the maximum Storage Energy Value of the corresponding year i.e. defined in the data_year variable.
</div>

In [253]:
# Define the variables
#data_year = "2023"
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"

# Read the reference data CSV file
df_reference = pd.read_csv(reference_data_file_path)

# Function to normalize column values
def normalize_column(column):
    # Fill non-numeric, "n/e", "N/A", and 0 values with 0
    column = column.replace(["n/e", "N/A", 0], pd.NA)
    column = column.fillna(0)
    
    # Check if column values are numeric
    if pd.api.types.is_numeric_dtype(column):
        # Find the maximum value
        max_value = column.max()
        # Check if the maximum value is not 0
        if max_value != 0:
            # Divide each value by the maximum value
            return column.divide(max_value)
    return column

# Iterate over each row in the DataFrame
for index, row in df_reference.iterrows():
    raw_data_file_path = row['Raw_Data_File_Path']
    
    # Read the corresponding CSV file
    df_raw_data = pd.read_csv(raw_data_file_path)
    
    # Normalize the second column
    column_to_normalize = df_raw_data.iloc[:, 1]
    df_raw_data.iloc[:, 1] = normalize_column(column_to_normalize)
    
    # Save the modified DataFrame back to the CSV file
    df_raw_data.to_csv(raw_data_file_path, index=False)
    print(f"Values in CSV file '{raw_data_file_path}' normalized.")

print("Process completed successfully.")

Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/2023_1.csv' normalized.
Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE/2023_1.csv' normalized.
Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG/2023_1.csv' normalized.
Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CH/2023_1.csv' normalized.
Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CY/2023_1.csv' normalized.
Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CZ/2023_1.csv' normalized.
Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DE/2023_1.csv' normalized.
Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DK/2023_1.csv' normalized.
Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/EE/2023_1.csv' normalized.
Values in CSV file '/home/ra

  column = column.fillna(0)
  column = column.fillna(0)
  column = column.fillna(0)
  column = column.fillna(0)
  column = column.fillna(0)
  column = column.fillna(0)
  column = column.fillna(0)
  column = column.fillna(0)
  column = column.fillna(0)
  column = column.fillna(0)
  column = column.fillna(0)
  column = column.fillna(0)
  column = column.fillna(0)


<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Getting the corresponding time format.
<br>
- As the full storage capacity data for all the countries are given by a time resolution of a week i.e. 53/54 weeks per year. So it is going to add the corresponding date to each week according to the analized year i.e. value specified in the data_year variable.
</div>

In [254]:
# Define the reference data file path
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"

# Read the reference data CSV file
df_reference = pd.read_csv(reference_data_file_path)

# Iterate over each row in the DataFrame
for index, row in df_reference.iterrows():
    raw_data_file_path = row['Raw_Data_File_Path']

    # Read the CSV file
    df = pd.read_csv(raw_data_file_path)

    # Extract week number from the 'Week' column
    df['Week'] = df['Week'].str.extract(r'(\d+)').astype(int)  # Extract digits from the string and convert to int

    # Calculate the dates
    start_date = pd.to_datetime(f'{data_year}-01-01')  # Start from January 1st of the specified year
    df['Dispa-SET_Date'] = start_date + pd.to_timedelta((df['Week'] - 1) * 7, unit='D')  # Add the corresponding number of weeks

    # Convert to string in the desired format
    df['Dispa-SET_Date'] = df['Dispa-SET_Date'].dt.strftime('%Y-%m-%d 00:00:00+00:00')

    # Save the modified DataFrame back to the CSV file
    df.to_csv(raw_data_file_path, index=False)

    print(f"Processed CSV file '{raw_data_file_path}'")

print("Processing completed for all CSV files.")

Processed CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/2023_1.csv'
Processed CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE/2023_1.csv'
Processed CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG/2023_1.csv'
Processed CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CH/2023_1.csv'
Processed CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CY/2023_1.csv'
Processed CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CZ/2023_1.csv'
Processed CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DE/2023_1.csv'
Processed CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DK/2023_1.csv'
Processed CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/EE/2023_1.csv'
Processed CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/EL/2023_1.csv'
Processed CSV file '/home/ray/Dispa-SET_

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
5.4. Reservoirs Level Files
</div>

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
5.4.1. Reservoirs Level Clean Files Creation
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Creating all the clean csv files where the final data of the reservoir levels have to be gathered.
    <br>
- The next three cells are use to create the content folders (1h, 30 min and/or 15min) and the cleaned csv files according the time step provided by the web source (ENTSO E).
</div>

In [255]:
# Define the paths
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"
#data_year = "2023"

# Read the reference data CSV file
df = pd.read_csv(reference_data_file_path)

# Iterate over each row in the DataFrame
for index, row in df.iterrows():
    scaled_inflows_folder_path = row['Scalled_Inflows_Folder_Path']
    zone_folder_path = row['Zone_Folder_Path']
    
    # Check if the scaled inflows folder path exists
    if os.path.exists(scaled_inflows_folder_path):
        # Iterate over each required folder name
        for folder_name in ['1h', '30min', '15min']:
            folder_path = os.path.join(scaled_inflows_folder_path, folder_name)
            
            # Check if the folder exists
            if os.path.exists(folder_path):
                # Create the destination folder path
                destination_path = os.path.join(zone_folder_path, folder_name)
                
                # Copy the folder and its contents to the destination
                shutil.copytree(folder_path, destination_path)
                
                print(f"Folder '{folder_name}' copied to '{destination_path}'.")
            else:
                print(f"Folder '{folder_name}' does not exist in '{scaled_inflows_folder_path}'.")
    else:
        print(f"Scalled inflows folder path '{scaled_inflows_folder_path}' does not exist.")

print("Copying complete.")

Folder '1h' copied to '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/1h'.
Folder '30min' copied to '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/30min'.
Folder '15min' copied to '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/15min'.
Folder '1h' copied to '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE/1h'.
Folder '30min' does not exist in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/BE'.
Folder '15min' does not exist in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/BE'.
Folder '1h' copied to '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG/1h'.
Folder '30min' does not exist in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/BG'.
Folder '15min' does not exist in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/BG'.
Folder '1h' copied to '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CH/1h'.
Folder '30min' does not exist in '/home/

In [256]:
# Define the paths
#data_year = "2023"
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"

# Read the reference data CSV file
df = pd.read_csv(reference_data_file_path)

# Iterate over each row in the DataFrame
for index, row in df.iterrows():
    zone_folder_path = row['Zone_Folder_Path']
    
    # Check if the zone folder path exists
    if os.path.exists(zone_folder_path):
        # Iterate over the required subfolders
        for subfolder in ['1h', '30min', '15min']:
            subfolder_path = os.path.join(zone_folder_path, subfolder)
            
            # Check if the subfolder exists
            if os.path.exists(subfolder_path):
                # Check if the CSV file for the data year exists in the subfolder
                csv_file_path = os.path.join(subfolder_path, f"{data_year}.csv")
                if os.path.exists(csv_file_path):
                    # Write the CSV file path to the corresponding column in the DataFrame
                    df.at[index, subfolder] = csv_file_path
                    print(f"CSV file '{data_year}.csv' found in '{subfolder_path}'.")
                else:
                    print(f"CSV file '{data_year}.csv' not found in '{subfolder_path}'.")
            else:
                print(f"Subfolder '{subfolder}' does not exist in '{zone_folder_path}'.")
    else:
        print(f"Zone folder path '{zone_folder_path}' does not exist.")

# Save the updated DataFrame to the reference data CSV file
df.to_csv(reference_data_file_path, index=False)
print("CSV file updated successfully.")

CSV file '2023.csv' found in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/1h'.
CSV file '2023.csv' found in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/30min'.
CSV file '2023.csv' found in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/15min'.
CSV file '2023.csv' found in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE/1h'.
Subfolder '30min' does not exist in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE'.
Subfolder '15min' does not exist in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE'.
CSV file '2023.csv' found in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG/1h'.
Subfolder '30min' does not exist in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG'.
Subfolder '15min' does not exist in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG'.
CSV file '2023.csv' found in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLe

  df.at[index, subfolder] = csv_file_path
  df.at[index, subfolder] = csv_file_path
  df.at[index, subfolder] = csv_file_path


In [257]:
# Define the paths
#data_year = "2023"
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"

# Read the reference data CSV file
df = pd.read_csv(reference_data_file_path)

# Iterate over each row in the DataFrame
for index, row in df.iterrows():
    for column in ['1h', '30min', '15min']:
        csv_file_path = row[column]
        
        # Check if the CSV file path exists
        if pd.notna(csv_file_path):
            # Read the CSV file
            df_csv = pd.read_csv(csv_file_path)
            
            # Erase all the values of the second and third column
            df_csv.iloc[:, 1:3] = ''
            
            # Save the modified DataFrame back to the CSV file
            df_csv.to_csv(csv_file_path, index=False)
            print(f"Values erased in CSV file '{csv_file_path}'.")

print("Process completed successfully.")

  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''


Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/30min/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/15min/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CH/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CY/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CY/30min/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CZ/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET

  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''


Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/EE/30min/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/EE/15min/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/EL/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/ES/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/ES/30min/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/ES/15min/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/FI/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/FR/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/HR/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-

  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''


Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/IE/30min/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/IT/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/LT/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/LU/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/LU/30min/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/LU/15min/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/LV/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/MT/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/NL/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET

  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
5.4.2. Reservoirs Level Clean Files Interpolation
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Completing the time data to the clean Reservoirs Level files.
    <br>
- For interpolation purposes it is needed to add new dates from the next year to the corresponding Reservoirs Level Clean files.
<br>
- Additionally is needed remove duplicated dates if there are any.
</div>

In [258]:
def add_new_rows(time_step):
    # Define reference data file path
    #reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"

    # Read the reference data CSV file
    df_reference = pd.read_csv(reference_data_file_path)

    # Iterate over each row in the DataFrame
    for index, row in df_reference.iterrows():
        # Get the file paths from the specified columns
        file_path_1 = row[time_step]
        file_path_2 = row['Raw_Data_File_Path']

        # Check if file_path_1 is not NaN
        if isinstance(file_path_1, str):  # Check if file_path_1 is a string
            # Read the last field of the first column from file_path_1
            df1 = pd.read_csv(file_path_1)
            last_date_file1 = pd.to_datetime(df1.iloc[-1, 0])

            # Read the last field of the 'Dispa-SET_Date' column from file_path_2
            df2 = pd.read_csv(file_path_2)
            last_date_file2 = pd.to_datetime(df2['Dispa-SET_Date'].iloc[-1])

            # Check if last date from file_path_1 is less than last date from file_path_2
            if last_date_file1 < last_date_file2:
                # Calculate time steps until reaching the last date from file_path_2
                if time_step == '1h':
                    freq = '1H'
                elif time_step == '30min':
                    freq = '30Min'
                elif time_step == '15min':
                    freq = '15Min'
                else:
                    print("Invalid time step frequency.")
                    return
                
                time_steps = pd.date_range(start=last_date_file1, end=last_date_file2, freq=freq)

                # Create a new DataFrame with time steps
                df_new_rows = pd.DataFrame({'Unnamed: 0': time_steps})

                # Append the new rows to file_path_1 DataFrame
                df1 = pd.concat([df1, df_new_rows], ignore_index=True)

                # Save the updated DataFrame back to the original CSV file
                df1.to_csv(file_path_1, index=False)
                print(f"New rows added successfully to {file_path_1}")
            else:
                print(f"No new rows need to be added to {file_path_1}")
        else:
            print(f"Invalid file path for {time_step} in row {index}: NaN")

    print("Overall processing finished.")

# Add new rows with 1 hour time step
add_new_rows('1h')

# Add new rows with 30 minute time step
add_new_rows('30min')

# Add new rows with 15 minute time step
add_new_rows('15min')

  time_steps = pd.date_range(start=last_date_file1, end=last_date_file2, freq=freq)
  time_steps = pd.date_range(start=last_date_file1, end=last_date_file2, freq=freq)
  time_steps = pd.date_range(start=last_date_file1, end=last_date_file2, freq=freq)
  time_steps = pd.date_range(start=last_date_file1, end=last_date_file2, freq=freq)
  time_steps = pd.date_range(start=last_date_file1, end=last_date_file2, freq=freq)
  time_steps = pd.date_range(start=last_date_file1, end=last_date_file2, freq=freq)
  time_steps = pd.date_range(start=last_date_file1, end=last_date_file2, freq=freq)
  time_steps = pd.date_range(start=last_date_file1, end=last_date_file2, freq=freq)
  time_steps = pd.date_range(start=last_date_file1, end=last_date_file2, freq=freq)
  time_steps = pd.date_range(start=last_date_file1, end=last_date_file2, freq=freq)
  time_steps = pd.date_range(start=last_date_file1, end=last_date_file2, freq=freq)
  time_steps = pd.date_range(start=last_date_file1, end=last_date_file2, fre

New rows added successfully to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/1h/2023.csv
New rows added successfully to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE/1h/2023.csv
New rows added successfully to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG/1h/2023.csv
New rows added successfully to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CH/1h/2023.csv
New rows added successfully to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CY/1h/2023.csv
New rows added successfully to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CZ/1h/2023.csv
New rows added successfully to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DE/1h/2023.csv
New rows added successfully to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DK/1h/2023.csv
New rows added successfully to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/EE/1h/2023.csv
New rows added successfully to /home/

  time_steps = pd.date_range(start=last_date_file1, end=last_date_file2, freq=freq)
  time_steps = pd.date_range(start=last_date_file1, end=last_date_file2, freq=freq)
  time_steps = pd.date_range(start=last_date_file1, end=last_date_file2, freq=freq)
  time_steps = pd.date_range(start=last_date_file1, end=last_date_file2, freq=freq)
  time_steps = pd.date_range(start=last_date_file1, end=last_date_file2, freq=freq)
  time_steps = pd.date_range(start=last_date_file1, end=last_date_file2, freq=freq)
  time_steps = pd.date_range(start=last_date_file1, end=last_date_file2, freq=freq)
  time_steps = pd.date_range(start=last_date_file1, end=last_date_file2, freq=freq)
  time_steps = pd.date_range(start=last_date_file1, end=last_date_file2, freq=freq)
  time_steps = pd.date_range(start=last_date_file1, end=last_date_file2, freq=freq)
  time_steps = pd.date_range(start=last_date_file1, end=last_date_file2, freq=freq)
  time_steps = pd.date_range(start=last_date_file1, end=last_date_file2, fre

New rows added successfully to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CY/30min/2023.csv
Invalid file path for 30min in row 5: NaN
New rows added successfully to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DE/30min/2023.csv
Invalid file path for 30min in row 7: NaN
New rows added successfully to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/EE/30min/2023.csv
Invalid file path for 30min in row 9: NaN
New rows added successfully to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/ES/30min/2023.csv
Invalid file path for 30min in row 11: NaN
Invalid file path for 30min in row 12: NaN
Invalid file path for 30min in row 13: NaN
New rows added successfully to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/HU/30min/2023.csv
New rows added successfully to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/IE/30min/2023.csv
Invalid file path for 30min in row 16: NaN
Invalid file path for 30min in row 17: NaN

In [260]:
# Define reference data file path
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"

# Read the reference data CSV file
df_reference = pd.read_csv(reference_data_file_path)

# Iterate over each row in the DataFrame
for index, row in df_reference.iterrows():
    # Get the file paths from the specified columns
    file_paths = {'1h': row['1h'], '30min': row['30min'], '15min': row['15min']}

    for time_step, file_path in file_paths.items():
        # Check if file_path is not NaN and is a string
        if isinstance(file_path, str):  
            # Read the CSV file
            df = pd.read_csv(file_path)

            # Remove rows with duplicated values in the first column
            if not df.empty:  # Check if the DataFrame is not empty
                df = df.drop_duplicates(subset=df.columns[0], keep='first')

                # Save the updated DataFrame back to the original CSV file
                df.to_csv(file_path, index=False)

                print(f"Duplicates removed successfully from {time_step} file: {file_path}")
            else:
                print(f"Empty DataFrame for {time_step} file: {file_path}. No duplicates to remove.")
        else:
            print(f"Invalid file path for {time_step} in row {index}: NaN")

print("Overall processing finished.")

Duplicates removed successfully from 1h file: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/1h/2023.csv
Duplicates removed successfully from 30min file: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/30min/2023.csv
Duplicates removed successfully from 15min file: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/15min/2023.csv
Duplicates removed successfully from 1h file: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE/1h/2023.csv
Invalid file path for 30min in row 1: NaN
Invalid file path for 15min in row 1: NaN
Duplicates removed successfully from 1h file: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG/1h/2023.csv
Invalid file path for 30min in row 2: NaN
Invalid file path for 15min in row 2: NaN
Duplicates removed successfully from 1h file: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CH/1h/2023.csv
Invalid file path for 30min in row 3: NaN
Invalid file path for 15min in row 3: NaN
Dupl

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Copying the weekly data to the clean csv Reservoirs Level files.
    <br>
- For interpolation purposes it is needed copy the data per week to the corresponding date in the clean csv Reservoirs Level files.
<br>
- This is done for all the 30 countries modelled in Dispa-SET and for each corresponding time step as well (1 hou, 30 minutes and 15 minutes)
</div>

In [265]:
# Define reference data file path
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"

# Read the reference data CSV file
df_reference = pd.read_csv(reference_data_file_path)

# Iterate over each row in the DataFrame
for index, row in df_reference.iterrows():
    for time_interval in ['1h', '30min', '15min']:
        # Get the file paths from the specified columns
        file_path_1 = row[time_interval]
        file_path_2 = row['Raw_Data_File_Path']

        # Check if both file paths are not NaN
        if isinstance(file_path_1, str) and isinstance(file_path_2, str):
            # Read CSV files
            df_1 = pd.read_csv(file_path_1)
            df_2 = pd.read_csv(file_path_2)

            # Get the data year from the file path
            data_year = int(file_path_1.split('/')[-1].split('.')[0])

            # Get the column name corresponding to the data year
            year_column = str(data_year)

            # Iterate over each row in df_2
            for _, row_2 in df_2.iterrows():
                # Get the value from the 'Dispa-SET_Date' column in df_2
                date_value = row_2['Dispa-SET_Date']

                # Check if the value exists in the first column of df_1
                if date_value in df_1[df_1.columns[0]].values:
                    # Get the index where the value is found in df_1
                    index_value = df_1.index[df_1[df_1.columns[0]] == date_value][0]
                    # Copy corresponding value from df_2 and paste it to the corresponding fields of the second and third columns in df_1
                    df_1.at[index_value, df_1.columns[1]] = row_2[year_column]
                    df_1.at[index_value, df_1.columns[2]] = row_2[year_column]

            # Save the updated DataFrame back to the original CSV file
            df_1.to_csv(file_path_1, index=False)
            print(f"Values copied successfully from {file_path_2} to {file_path_1}")

        else:
            print(f"Invalid file paths in row {index}: NaN")

print("Overall processing finished.")

Values copied successfully from /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/2023_1.csv to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/1h/2023.csv
Values copied successfully from /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/2023_1.csv to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/30min/2023.csv
Values copied successfully from /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/2023_1.csv to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/15min/2023.csv
Values copied successfully from /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE/2023_1.csv to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE/1h/2023.csv
Invalid file paths in row 1: NaN
Invalid file paths in row 1: NaN
Values copied successfully from /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG/2023_1.csv to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG/1h/2023.cs

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Interpolating the clean csv Reservoirs Level fiels.
    <br>
- A linear interpolation process is used to get the values needed between each weekly data.
<br>
- This is done for all the 30 countries modelled in Dispa-SET and for each corresponding time step as well (1 hou, 30 minutes and 15 minutes)
</div>

In [266]:
# Define reference data file path
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"

# Read the reference data CSV file
df_reference = pd.read_csv(reference_data_file_path)

# Define the columns to be interpolated
columns_to_interpolate = ['HPHS', 'HDAM']  # Replace with actual column names

# Iterate over each row in the DataFrame
for index, row in df_reference.iterrows():
    # Get the file paths from the specified columns
    file_path_1h = row['1h']
    file_path_30min = row['30min']
    file_path_15min = row['15min']

    # Check if any of the file paths are not NaN
    if isinstance(file_path_1h, str):
        # Read CSV file for 1h
        df_1h = pd.read_csv(file_path_1h)
        # Perform linear interpolation for 1h columns
        df_1h[columns_to_interpolate] = df_1h[columns_to_interpolate].interpolate(method='linear')
        # Save the interpolated DataFrame back to the CSV file
        df_1h.to_csv(file_path_1h, index=False)
        print(f"Linear interpolation completed for {file_path_1h}")

    if isinstance(file_path_30min, str):
        # Read CSV file for 30min
        df_30min = pd.read_csv(file_path_30min)
        # Perform linear interpolation for 30min columns
        df_30min[columns_to_interpolate] = df_30min[columns_to_interpolate].interpolate(method='linear')
        # Save the interpolated DataFrame back to the CSV file
        df_30min.to_csv(file_path_30min, index=False)
        print(f"Linear interpolation completed for {file_path_30min}")

    if isinstance(file_path_15min, str):
        # Read CSV file for 15min
        df_15min = pd.read_csv(file_path_15min)
        # Perform linear interpolation for 15min columns
        df_15min[columns_to_interpolate] = df_15min[columns_to_interpolate].interpolate(method='linear')
        # Save the interpolated DataFrame back to the CSV file
        df_15min.to_csv(file_path_15min, index=False)
        print(f"Linear interpolation completed for {file_path_15min}")

print("Overall processing finished.")

Linear interpolation completed for /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/1h/2023.csv
Linear interpolation completed for /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/30min/2023.csv
Linear interpolation completed for /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/15min/2023.csv
Linear interpolation completed for /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE/1h/2023.csv
Linear interpolation completed for /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG/1h/2023.csv
Linear interpolation completed for /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CH/1h/2023.csv
Linear interpolation completed for /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CY/1h/2023.csv
Linear interpolation completed for /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CY/30min/2023.csv
Linear interpolation completed for /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CZ/1h/2

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Getting the final Reservoirs Level file.
    <br>
- A process of dates filtering is done, since just the analized year (specified in the data_year variable) is required.
<br>
- This is done for all the 30 countries modelled in Dispa-SET and for each corresponding time step as well (1 hou, 30 minutes and 15 minutes)
</div>

In [268]:
# Define the reference data file path
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"

# Define a function to process each file
def process_file(file_path):
    if not os.path.isfile(file_path):
        print(f"No file found at {file_path}")
        return None

    # Read the CSV file
    df = pd.read_csv(file_path)

    # Convert the first column to datetime
    df['Unnamed: 0'] = pd.to_datetime(df['Unnamed: 0'])

    # Extract the year from the date values
    df['Year'] = df['Unnamed: 0'].dt.year

    # Filter out rows that belong to the specified year
    data_year = int(os.path.basename(file_path).split('.')[0])
    df_filtered = df[df['Year'] == data_year].copy()  # Create a copy to avoid SettingWithCopyWarning

    # Drop the 'Year' column
    df_filtered.drop(columns=['Year'], inplace=True)

    return df_filtered

# Read the reference data CSV file
df_reference = pd.read_csv(reference_data_file_path)

# Iterate over each row in the DataFrame
for index, row in df_reference.iterrows():
    # Get the file paths from the specified columns
    file_path_1h = row['1h']
    file_path_30min = row['30min']
    file_path_15min = row['15min']

    # Process the file path for '1h'
    if isinstance(file_path_1h, str):
        df_filtered_1h = process_file(file_path_1h)
        if df_filtered_1h is not None:
            df_filtered_1h.to_csv(file_path_1h, index=False)
            print(f"Rows not belonging to {data_year} have been removed from {file_path_1h}")
        else:
            print(f"No file found for '1h' at index {index}")

    # Process the file path for '30min'
    if isinstance(file_path_30min, str):
        df_filtered_30min = process_file(file_path_30min)
        if df_filtered_30min is not None:
            df_filtered_30min.to_csv(file_path_30min, index=False)
            print(f"Rows not belonging to {data_year} have been removed from {file_path_30min}")
        else:
            print(f"No file found for '30min' at index {index}")

    # Process the file path for '15min'
    if isinstance(file_path_15min, str):
        df_filtered_15min = process_file(file_path_15min)
        if df_filtered_15min is not None:
            df_filtered_15min.to_csv(file_path_15min, index=False)
            print(f"Rows not belonging to {data_year} have been removed from {file_path_15min}")
        else:
            print(f"No file found for '15min' at index {index}")

print("Overall processing finished.")

Rows not belonging to 2023 have been removed from /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/1h/2023.csv
Rows not belonging to 2023 have been removed from /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/30min/2023.csv
Rows not belonging to 2023 have been removed from /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/15min/2023.csv
Rows not belonging to 2023 have been removed from /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE/1h/2023.csv
Rows not belonging to 2023 have been removed from /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG/1h/2023.csv
Rows not belonging to 2023 have been removed from /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CH/1h/2023.csv
Rows not belonging to 2023 have been removed from /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CY/1h/2023.csv
Rows not belonging to 2023 have been removed from /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/C

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
The final Reservoirs Level Time Series for each Country modeled in Dispa-SET is located in the following local directory:
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- /Local/Path/to/Dispas-SET/RawData/HydroData/ReservoirLevel/
<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Inside this path there are folders with the acronym of each country modelled in Dipsa-SET. i.e. AT, BE, CH.... UK
<br>
Inside each of this folders, there are sub folders named by the time stept of the time series. i.e. 1h, 30min and/or 15min.
<br>
Inside these sub folders, it is going to be found the corresponding time series .csv file named with the year of the data. e.g. 2023.csv
</div>