<div style="text-align: center; margin-left: 0em; font-weight: bold; font-size: 20px; font-family: TimesNewRoman;">
    TIME SERIES DATA PROCESSING | RESERVOIRS LEVEL - Main Notebook
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Each part of the following script was used to proccess the raw data for the Reservoirs Level Time Series Raw Data for all the european countries of the Dispa-SET_Unleash project.
<br>
Read explanation text cells to follow and understand all the process until final results were got stept by step.
</div>

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    1. Notebook Set Up
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Importing needed libraries
</div>

In [92]:
import pandas as pd
import re
import csv
import os
import requests
from urllib.parse import urlparse, parse_qs
import datetime
from bs4 import BeautifulSoup
import http.client
from multiprocessing import Pool
import shutil

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    2. Dispa-SET_Unleash Folder Path
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Determinning dynamically the zone_folder_path based on the location of the "Dispa-SET_Unleash" folder relative to the current working directory.
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- If the "Dispa-SET_Unleash" folder is copied to a different machine or location, the dispaSET_unleash_folder_path variable will automatically adjust accordingly.
</div>

In [93]:
# Get the current working directory
current_directory = os.getcwd()

# Navigate to the parent directory of "Dispa-SET_Unleash"
dispaSET_unleash_parent_directory = os.path.dirname(current_directory)

# Get the path to the "Dispa-SET_Unleash" folder
dispaSET_unleash_folder_path = os.path.dirname(dispaSET_unleash_parent_directory)

# Construct the dispaSET_unleash_folder_name variable
dispaSET_unleash_folder_name = os.path.basename(dispaSET_unleash_folder_path)

print("dispaSET_unleash_folder_name:", dispaSET_unleash_folder_name)
print("dispaSET_unleash_folder_path:", dispaSET_unleash_folder_path)

dispaSET_unleash_folder_name: Dispa-SET_Unleash
dispaSET_unleash_folder_path: /home/ray/Dispa-SET_Unleash


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    3. Usefull Variable Definition
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Entering a value to all the variables which content are going to be used in some of the next stages of this script. 
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- Indicate the year of all data is referring to in the variable data_year.
<br>
- The universal_standar_time variable is going to be used to download all the time series data in this horary zone. Additionally as each european country belongs a particular time sector the corresponding time series data related to its time sector are going to be downloaded as well but in a different file.
</div>

In [94]:
# Year to which data refers to:
data_year = 2023

# Universal standad time:
universal_standard_time = 'UTC'

# Western European Time:
western_european_time = 'WET_WEST'

# Central European Time:
central_european_time = 'CET_CEST'

# Eastern European Time:
eastern_european_time = 'EET_EST'

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
4. Reservoirs Level Directories Definition
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Creating the folders that are going to content all the data realted to the Reservoirs Level time series.
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- Futher the downloaded raw data is going to be used to get the Reservoirs Level time series. 
</div>

In [95]:
# Additional string to be appended
additional_path = "/RawData/HydroData/ReservoirLevel/"
additional_path_1 = "/RawData/HydroData/ScaledInflows/"

# Construct the standard_time_data_folder_path variable
reference_data_folder_path = dispaSET_unleash_folder_path + additional_path

# Construct the Reservoir_Level_folder_path variable
reservoir_level_folder_path = dispaSET_unleash_folder_path + additional_path

# Construct the Scalled_Inflow_folder_path variable
scaled_inflows_folder_path = dispaSET_unleash_folder_path + additional_path_1


print("reference_data_folder_path:", reference_data_folder_path)
print("reservoitr_level_folder_path:", reservoir_level_folder_path)
print("scaled_inflows_folder_path:", scaled_inflows_folder_path)

reference_data_folder_path: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/
reservoitr_level_folder_path: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/
scaled_inflows_folder_path: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
4. European Standard Time
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Each of the Europeand Countries Modeled in Dispa-SET are in a particular european standar time zone. So in order to get the appropiated time series data frame, the correspondign european time zone is going to be identificated for each country.
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- Additionally all the time series are also going to be downloaded under the UTC <em>(The World’s Time Standard)</em>.
<br>
- This is going to be done for each country as well.
    <br>
- All this features are going to be saved in a csv file called Reference_Data.csv where all additional characters to the download process are going to be written.
</div>

In [96]:
# Define lists of countries and standard times
countries = [
    "Austria", "Belgium", "Bulgaria", "Switzerland", "Cyprus", "Czech Republic",
    "Germany", "Denmark", "Estonia", "Greece", "Spain", "Finland", "France",
    "Croatia", "Hungary", "Ireland", "Italy", "Lithuania", "Luxembourg", "Latvia",
    "Malta", "Netherlands", "Norway", "Poland", "Portugal", "Romania", "Sweden",
    "Slovenia", "Slovakia", "United Kingdom"
]

dispaSET_codes = ["AT", "BE", "BG", "CH", "CY", "CZ", "DE", "DK", "EE", "EL", "ES", "FI", "FR", "HR", "HU", 
                  "IE", "IT", "LT", "LU", "LV", "MT", "NL", "NO", "PL", "PT", "RO", "SE", "SI", "SK", "UK"
]

standard_times_url_country_base = [
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YAT-APG------L!CTY|10YAT-APG------L&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YBE----------2!CTY|10YBE----------2&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|EET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|EET|DAYTIMERANGE&area.values=CTY|10YCA-BULGARIA-R!CTY|10YCA-BULGARIA-R&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=EET_EEST&dateTime.timezone_input=EET+(UTC+2)+/+EEST+(UTC+3)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YCH-SWISSGRIDZ!CTY|10YCH-SWISSGRIDZ&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|EET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|EET|DAYTIMERANGE&area.values=CTY|10YCY-1001A0003J!CTY|10YCY-1001A0003J&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=EET_EEST&dateTime.timezone_input=EET+(UTC+2)+/+EEST+(UTC+3)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YCZ-CEPS-----N!CTY|10YCZ-CEPS-----N&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10Y1001A1001A83F!CTY|10Y1001A1001A83F&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10Y1001A1001A65H!CTY|10Y1001A1001A65H&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|EET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|EET|DAYTIMERANGE&area.values=CTY|10Y1001A1001A39I!CTY|10Y1001A1001A39I&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=EET_EEST&dateTime.timezone_input=EET+(UTC+2)+/+EEST+(UTC+3)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|EET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|EET|DAYTIMERANGE&area.values=CTY|10YGR-HTSO-----Y!CTY|10YGR-HTSO-----Y&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=EET_EEST&dateTime.timezone_input=EET+(UTC+2)+/+EEST+(UTC+3)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YES-REE------0!CTY|10YES-REE------0&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|EET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|EET|DAYTIMERANGE&area.values=CTY|10YFI-1--------U!CTY|10YFI-1--------U&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=EET_EEST&dateTime.timezone_input=EET+(UTC+2)+/+EEST+(UTC+3)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YFR-RTE------C!CTY|10YFR-RTE------C&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YHR-HEP------M!CTY|10YHR-HEP------M&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YHU-MAVIR----U!CTY|10YHU-MAVIR----U&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YIE-1001A00010!CTY|10YIE-1001A00010&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YIT-GRTN-----B!CTY|10YIT-GRTN-----B&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|EET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|EET|DAYTIMERANGE&area.values=CTY|10YLT-1001A0008Q!CTY|10YLT-1001A0008Q&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=EET_EEST&dateTime.timezone_input=EET+(UTC+2)+/+EEST+(UTC+3)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YLU-CEGEDEL-NQ!CTY|10YLU-CEGEDEL-NQ&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|EET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|EET|DAYTIMERANGE&area.values=CTY|10YLV-1001A00074!CTY|10YLV-1001A00074&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=EET_EEST&dateTime.timezone_input=EET+(UTC+2)+/+EEST+(UTC+3)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10Y1001A1001A93C!CTY|10Y1001A1001A93C&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YNL----------L!CTY|10YNL----------L&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YNO-0--------C!CTY|10YNO-0--------C&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YPL-AREA-----S!CTY|10YPL-AREA-----S&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|WET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|WET|DAYTIMERANGE&area.values=CTY|10YPT-REN------W!CTY|10YPT-REN------W&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=WET_WEST&dateTime.timezone_input=WET+(UTC)+/+WEST+(UTC+1)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|EET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|EET|DAYTIMERANGE&area.values=CTY|10YRO-TEL------P!CTY|10YRO-TEL------P&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=EET_EEST&dateTime.timezone_input=EET+(UTC+2)+/+EEST+(UTC+3)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YSE-1--------K!CTY|10YSE-1--------K&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YSI-ELES-----O!CTY|10YSI-ELES-----O&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|CET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|CET|DAYTIMERANGE&area.values=CTY|10YSK-SEPS-----K!CTY|10YSK-SEPS-----K&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)",
    "https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&datepicker-day-offset-select-dv-date-from_input=D&dateTime.dateTime=01.01.2023+00:00|WET|DAYTIMERANGE&dateTime.endDateTime=01.01.2023+00:00|WET|DAYTIMERANGE&area.values=CTY|10Y1001A1001A92E!CTY|10Y1001A1001A92E&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&showConsumption.values=SHOW_CONSUMPTION&dateTime.timezone=WET_WEST&dateTime.timezone_input=WET+(UTC)+/+WEST+(UTC+1)"
]

# Create DataFrame
df = pd.DataFrame({'Country': countries, 'Dispa-SET_Code': dispaSET_codes, 'Full_Standard_Time_Url_Country_Base': standard_times_url_country_base})

# Add new empty column
df['Time_Zone_Country_Code'] = ''

df

reference_data_file_name = 'Reference_Data.csv'

# Construct the full file path
reference_data_file_path = os.path.join(reference_data_folder_path, reference_data_file_name)

# Create the CSV file with the specified name
with open(reference_data_file_path, 'w') as f:
    # Optional: Write a header if needed
    # f.write("header1,header2,header3\n")

# Save DataFrame to the CSV file
    df.to_csv(reference_data_file_path, index=False)

In [97]:
# Load the CSV file into a pandas DataFrame
data = pd.read_csv(reference_data_file_path)

# Function to extract time zone country code from the URL
def extract_time_zone_country_code(url):
    start_index = url.find("&dateTime.timezone_input=") + len("&dateTime.timezone_input=")
    if start_index != -1:  # Check if delimiter is found
        return url[start_index:]  # Extract from start_index till the end
    else:
        return None

# Apply the function to extract time zone country code for each row
data['Time_Zone_Country_Code'] = data['Full_Standard_Time_Url_Country_Base'].apply(extract_time_zone_country_code)

# Write the DataFrame back to the CSV file with the new column included
data.to_csv(reference_data_file_path, index=False)

data

Unnamed: 0,Country,Dispa-SET_Code,Full_Standard_Time_Url_Country_Base,Time_Zone_Country_Code
0,Austria,AT,https://transparency.entsoe.eu/generation/r2/a...,CET+(UTC+1)+/+CEST+(UTC+2)
1,Belgium,BE,https://transparency.entsoe.eu/generation/r2/a...,CET+(UTC+1)+/+CEST+(UTC+2)
2,Bulgaria,BG,https://transparency.entsoe.eu/generation/r2/a...,EET+(UTC+2)+/+EEST+(UTC+3)
3,Switzerland,CH,https://transparency.entsoe.eu/generation/r2/a...,CET+(UTC+1)+/+CEST+(UTC+2)
4,Cyprus,CY,https://transparency.entsoe.eu/generation/r2/a...,EET+(UTC+2)+/+EEST+(UTC+3)
5,Czech Republic,CZ,https://transparency.entsoe.eu/generation/r2/a...,CET+(UTC+1)+/+CEST+(UTC+2)
6,Germany,DE,https://transparency.entsoe.eu/generation/r2/a...,CET+(UTC+1)+/+CEST+(UTC+2)
7,Denmark,DK,https://transparency.entsoe.eu/generation/r2/a...,CET+(UTC+1)+/+CEST+(UTC+2)
8,Estonia,EE,https://transparency.entsoe.eu/generation/r2/a...,EET+(UTC+2)+/+EEST+(UTC+3)
9,Greece,EL,https://transparency.entsoe.eu/generation/r2/a...,EET+(UTC+2)+/+EEST+(UTC+3)


In [98]:
# Open the existing CSV file in read mode
with open(reference_data_file_path, 'r') as csvfile:
    # Read existing data
    reader = csv.reader(csvfile)
    rows = list(reader)

# Add new columns to the header row
header_row = rows[0]
header_row.extend(['Data_Year', 'Universal_Standard_Time', 'Western_European_Time', 'Central_European_Time', 'Eastern_European_Time'])

# Add data to each row
for row in rows[1:]:
    row.extend([data_year, universal_standard_time, western_european_time, central_european_time, eastern_european_time])


# Write back to the CSV file
with open(reference_data_file_path, 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(rows)

print("Columns added successfully to the CSV file at:", reference_data_file_path)

Columns added successfully to the CSV file at: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv


<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [99]:
print (f"dispaSET_unleash_folder_name:                              {dispaSET_unleash_folder_name}")
print (f"dispaSET_unleash_folder_path:                              {dispaSET_unleash_folder_path}")
print (f"data_year:                                                 {data_year}")
print (f"universal_standard_time:                                   {universal_standard_time}")
print (f"western_european_time:                                     {western_european_time}")
print (f"central_european_time:                                     {central_european_time}")
print (f"eastern_european_time:                                     {eastern_european_time}")
print (f"reference_data_folder_path:                                {reference_data_folder_path}")
print (f"reference_data_file_name:                                  {reference_data_file_name}")
print (f"reference_data_file_path:                                  {reference_data_file_path}")
print (f"reservoir_level_folder_path:                               {reservoir_level_folder_path}")

dispaSET_unleash_folder_name:                              Dispa-SET_Unleash
dispaSET_unleash_folder_path:                              /home/ray/Dispa-SET_Unleash
data_year:                                                 2023
universal_standard_time:                                   UTC
western_european_time:                                     WET_WEST
central_european_time:                                     CET_CEST
eastern_european_time:                                     EET_EST
reference_data_folder_path:                                /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/
reference_data_file_name:                                  Reference_Data.csv
reference_data_file_path:                                  /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv
reservoir_level_folder_path:                               /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
5. Main Data Source
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Sellecting the main source to get the raw data of the time series.
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- All the data to be processed is going to be downloaded from one main source:
<br>
<em><strong>ENTSOE Transparency Platform:</strong></em> Which the main url link is the following
<div style="text-align: left; margin-left: 3.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
https://transparency.entsoe.eu/dashboard/show
</div>

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
5.1. Url Donwload Link
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Getting the dowload links to access the web page where the data is goning to be extracted.
    </div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- All the downloaded data use a base url structure as download link.
<br>
- The URL base is lightly modified to get the correponding for each country for all the year from 2016 till 2023.
     <br>
- Additionally and despite the ENTSO E web page manages four Time Zones (UTC, CET/CEST, WET/WEST and EER/EEST), the corresponding values of the hydro resources are not defined under an specific time zone, so it is going to be assumed the UTC to give the final Dispa-SET format time step.
    <br>
- Each URL_country_base link is added to the csv file reference_data_file_path.
</div>

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
5.1.1. Conutry Code
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Adding the Entso E Country Code Nomenclature.
    </div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- The codification that the Entso E web page use to identify each european country was already extracted using the Availability Factor Notebook. So the same is going to be used in this step to form the link for the hydro storage levels data.
</div>

In [100]:
# Read the CSV file into a DataFrame
df = pd.read_csv(reference_data_file_path)

# Define the new column name and data
new_column_name = "Country_Code"
new_column_data = [
"CTY|10YAT-APG------L!CTY|10YAT-APG------L",
"CTY|10YBE----------2!CTY|10YBE----------2",
"CTY|10YCA-BULGARIA-R!CTY|10YCA-BULGARIA-R",
"CTY|10YCH-SWISSGRIDZ!CTY|10YCH-SWISSGRIDZ",
"CTY|10YCY-1001A0003J!CTY|10YCY-1001A0003J",
"CTY|10YCZ-CEPS-----N!CTY|10YCZ-CEPS-----N",
"CTY|10Y1001A1001A83F!CTY|10Y1001A1001A83F",
"CTY|10Y1001A1001A65H!CTY|10Y1001A1001A65H",
"CTY|10Y1001A1001A83F!CTY|10Y1001A1001A83F",
"CTY|10YGR-HTSO-----Y!CTY|10YGR-HTSO-----Y",
"CTY|10YES-REE------0!CTY|10YES-REE------0",
"CTY|10YFI-1--------U!CTY|10YFI-1--------U",
"CTY|10YFR-RTE------C!CTY|10YFR-RTE------C",
"CTY|10YHR-HEP------M!CTY|10YHR-HEP------M",
"CTY|10YHU-MAVIR----U!CTY|10YHU-MAVIR----U",
"CTY|10YIE-1001A00010!CTY|10YIE-1001A00010",
"CTY|10YIT-GRTN-----B!CTY|10YIT-GRTN-----B",
"CTY|10YLT-1001A0008Q!CTY|10YLT-1001A0008Q",
"CTY|10YLU-CEGEDEL-NQ!CTY|10YLU-CEGEDEL-NQ",
"CTY|10YLV-1001A00074!CTY|10YLV-1001A00074",
"CTY|10Y1001A1001A93C!CTY|10Y1001A1001A93C",
"CTY|10YNL----------L!CTY|10YNL----------L",
"CTY|10YNO-0--------C!CTY|10YNO-0--------C",
"CTY|10YPL-AREA-----S!CTY|10YPL-AREA-----S",
"CTY|10YPL-AREA-----S!CTY|10YPL-AREA-----S",
"CTY|10YRO-TEL------P!CTY|10YRO-TEL------P",
"CTY|10YSE-1--------K!CTY|10YSE-1--------K",
"CTY|10YSI-ELES-----O!CTY|10YSI-ELES-----O",
"CTY|10YSK-SEPS-----K!CTY|10YSK-SEPS-----K",
"CTY|10Y1001A1001A92E!CTY|10Y1001A1001A92E"
]  # Replace with your data

# Add the new column to the DataFrame (assuming same length as existing data)
df[new_column_name] = new_column_data

# Save the updated DataFrame back to the same CSV file
df.to_csv(reference_data_file_path, index=False)

print(f"Added new column '{new_column_name}' to CSV file '{reference_data_file_path}'")

Added new column 'Country_Code' to CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv'


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
5.1.2. Links Source 
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Forming all the links raw data source.
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- The country code of each country is going to be used to form the corresponding raw data link for all specified years (2016-2023).
</div>
<div style="text-align: left; margin-left: 4.0em; font-weight: unbold; font-size: 15px; font-family: TimesNewRoman;">
- The if another period of time is required, just change the years into the variables start_time and end_time.
</div>

In [101]:
# Define start and end time variables
start_time = "01.01.2016+00:00"
end_time = "01.01.2024+00:00"

# Base URL
utc_urls_base = "https://transparency.entsoe.eu/generation/r2/waterReservoirsAndHydroStoragePlants/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime={start_time}|UTC|YEAR&dateTime.endDateTime={end_time}|UTC|YEAR&area.values="

# Format the base URL with start and end time variables
formatted_url = utc_urls_base.format(start_time=start_time, end_time=end_time)

# Print the formatted URL
print(formatted_url)


https://transparency.entsoe.eu/generation/r2/waterReservoirsAndHydroStoragePlants/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2016+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=


In [102]:
# Define the file paths
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"

# Define the URL base
#utc_urls_base = "https://transparency.entsoe.eu/generation/r2/waterReservoirsAndHydroStoragePlants/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2016+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values="


# Read the CSV file into a DataFrame
df = pd.read_csv(reference_data_file_path)

# Create a new column called URL_Base_Link
df['URL_Base_Link'] = formatted_url + df['Country_Code']

df.to_csv(reference_data_file_path, index=False)

print(f"Added new column 'URL_Base_Link' to CSV file '{reference_data_file_path}'")

Added new column 'URL_Base_Link' to CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv'


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
5.2. Country / Zone Data  Folders
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Creating all the contening folders for each European country modelled in Dispa-SET.
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- This folders/directories are going to be used to storage all the time series data downloaded from ENTSO E web resource.
<br>
- Additionally all the paths of the future created files are going to be written in the reference_data_file in order to use them for next download stages.
</div>

In [103]:
# Define file paths and variables
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"
#reference_data_folder_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/"
#data_year = "2023"

# Read the reference data file
df = pd.read_csv(reference_data_file_path)

# Create a new column 'Zone_Folder_Path'
df['Zone_Folder_Path'] = reference_data_folder_path + df['Dispa-SET_Code'].astype(str)

# Iterate over each row and create a CSV file
for index, row in df.iterrows():
    zone_folder_path = row['Zone_Folder_Path']
    output_csv_file_path = os.path.join(zone_folder_path, f"{data_year}_1.csv")
    
    # Create directories if they don't exist
    os.makedirs(os.path.dirname(output_csv_file_path), exist_ok=True)
    
    # Write the CSV file
    with open(output_csv_file_path, 'w') as f:
        # Write header if needed
         f.write("Header1,")
        
        # Write data (if needed)
        # f.write("Data1,Data2,...\n")
        
        # Here, you can write any data you want into the CSV file
    
    print(f"CSV file created: {output_csv_file_path}")

df.to_csv(reference_data_file_path, index=False)

CSV file created: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/2023_1.csv
CSV file created: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE/2023_1.csv
CSV file created: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG/2023_1.csv
CSV file created: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CH/2023_1.csv
CSV file created: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CY/2023_1.csv
CSV file created: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CZ/2023_1.csv
CSV file created: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DE/2023_1.csv
CSV file created: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DK/2023_1.csv
CSV file created: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/EE/2023_1.csv
CSV file created: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/EL/2023_1.csv
CSV file created: /home/ray/Dispa-SET_Unleash/RawData/HydroData/Reserv

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
5.3. Raw Data Download
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Downloading the Reservoirs Level Raw Data for each European country modelled in Dispa-SET.
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- The data of all recently created links are extracted and saved in a csv file called with the name of the value of the data_year variable pluss a sufix '_1'.
</div>

In [104]:
# Set max_headers to a higher value
http.client._MAXHEADERS = 1000

# Function to download table data from a URL
def download_table_data(url, output_file_path):
    # Send a GET request to the webpage
    response = requests.get(url)

    # Raise an error if there are too many headers
    if len(response.headers) > 100:
        raise Exception("Too many headers in the response")

    # Parse the HTML content of the webpage
    soup = BeautifulSoup(response.text, 'html.parser')

    # Find the table element
    table = soup.find('table')

    # Extract data from the table
    data = []
    for row in table.find_all('tr'):
        row_data = []
        for cell in row.find_all('td'):
            row_data.append(cell.text.strip())
        if row_data:  # Ensures we don't add empty rows
            data.append(row_data)

    # Write the downloaded data to a CSV file
    with open(output_file_path, 'w', newline='') as csvfile:
        writer = csv.writer(csvfile)
        writer.writerows(data)

    return output_file_path

# Load reference data CSV
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"
reference_data = pd.read_csv(reference_data_file_path)

# Iterate over each row in the reference data
for index, row in reference_data.iterrows():
    # Extract URL and zone folder path from the current row
    url = row['URL_Base_Link']
    zone_folder_path = row['Zone_Folder_Path']

    # Create the corresponding folder if it doesn't exist
    output_folder_path = zone_folder_path
    #os.makedirs(output_folder_path, exist_ok=True)

    # Download table data and save it to a CSV file
    output_file_name = f"{data_year}_1.csv"
    output_file_path = os.path.join(output_folder_path, output_file_name)
    downloaded_file_path = download_table_data(url, output_file_path)

    # Update the Raw_Data_File_Path column with the path of the downloaded CSV file
    reference_data.at[index, 'Raw_Data_File_Path'] = downloaded_file_path

# Save the updated reference data CSV
reference_data.to_csv(reference_data_file_path, index=False)


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
5.3.1. Raw Data Headers
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Adding column names.
<br>
- All the raw data were downloaded without columns identification, the corresponding header to each column is added.
<br>
-Additionally an extra row is going to be added to the recently downloaded data for future interpolation processes.
</div>

In [105]:
# Define the path to the CSV file containing the file paths
#standard_time_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv"

# Read the CSV file containing the paths
df = pd.read_csv(reference_data_file_path)

success = True

# Iterate through each file path in the column 'Time_Series_Raw_Data_File_Path'
for index, row in df.iterrows():
    file_path = row['Raw_Data_File_Path']
    
    # Open the existing CSV file in read mode
    with open(file_path, 'r', newline='') as file:
        # Read the existing content
        reader = csv.reader(file)
        rows = list(reader)

    # Insert a new empty row at the beginning
    rows.insert(0, [])

    # Write the updated content back to the CSV file
    with open(file_path, 'w', newline='') as file:
        writer = csv.writer(file)
        writer.writerows(rows)
        
    print(f"Empty row added at the beginning of file '{file_path}'.")

print("All empty rows added successfully.")

Empty row added at the beginning of file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/2023_1.csv'.
Empty row added at the beginning of file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE/2023_1.csv'.
Empty row added at the beginning of file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG/2023_1.csv'.
Empty row added at the beginning of file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CH/2023_1.csv'.
Empty row added at the beginning of file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CY/2023_1.csv'.
Empty row added at the beginning of file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CZ/2023_1.csv'.
Empty row added at the beginning of file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DE/2023_1.csv'.
Empty row added at the beginning of file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DK/2023_1.csv'.
Empty row added at the beginning of file '/home/

In [106]:
start_year = int(start_time.split('.')[2].split('+')[0])  # Extract the year from start_time
end_year = int(end_time.split('.')[2].split('+')[0])  # Extract the year from start_time

year_range = list(range(start_year+1, end_year+1))

# Convert the integer values to strings
start_year_str = str(start_year)
year_range_str = [str(year) for year in year_range]

# Create the headers list
headers = ['Week', start_year_str] + year_range_str

# Define the path to the CSV file containing the file paths
#standard_time_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv"

# Read the CSV file containing the paths
df = pd.read_csv(reference_data_file_path)

success = True

# Iterate through each file path in the column 'Time_Series_Raw_Data_File_Path'
for index, row in df.iterrows():
    file_path = row['Raw_Data_File_Path']
    
    # Check if the file exists
    if os.path.exists(file_path):
        # Open the file for reading
        with open(file_path, 'r') as file:
            lines = file.readlines()

        # Find the index of the first empty row
        empty_row_index = next((i for i, line in enumerate(lines) if line.strip() == ""), None)

        # If an empty row is found, copy the headers to it
        if empty_row_index is not None:
            lines[empty_row_index] = ','.join(headers) + '\n'

            # Write the updated lines back to the file
            with open(file_path, 'w') as file:
                file.writelines(lines)
        else:
            print(f"No empty row found in file '{file_path}'. Headers not copied.")
    else:
        success = False
        print(f"Error: File '{file_path}' does not exist.")

if success:
    print("Headers copied successfully to the first empty row of all files.")
else:
    print("Some errors occurred while copying headers.")

Headers copied successfully to the first empty row of all files.


In [107]:
# Read the reference data CSV file
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"
df_reference = pd.read_csv(reference_data_file_path)

# Iterate through each zone and its folder path
for index, row in df_reference.iterrows():
    raw_data_file_path = row["Raw_Data_File_Path"]

    # Read the zone's CSV data (handle potential file not found)
    if os.path.exists(raw_data_file_path):
        df = pd.read_csv(raw_data_file_path)
    else:
        print(f"File not found: {raw_data_file_path}")
        continue  # Skip to the next zone if file is missing

    # Process the data (add new row, fill missing values)
    df.loc[len(df)] = ['Week 54'] + [None] * (len(df.columns) - 1)
    for col in df.columns[1:]:
        penultimate_value = df.iloc[-2][col]
        last_value = df.iloc[-1][col]

        if pd.isna(penultimate_value):
            next_col_index = df.columns.get_loc(col) + 1
            if next_col_index < len(df.columns):
                next_col_name = df.columns[next_col_index]
                next_col_first_value = df.iloc[0][next_col_name]
                df.iloc[-2, df.columns.get_loc(col)] = next_col_first_value

        if pd.isna(last_value):
            next_col_index = df.columns.get_loc(col) + 1
            if next_col_index < len(df.columns):
                next_col_name = df.columns[next_col_index]
                next_col_second_value = df.iloc[1][next_col_name]
                df.iloc[-1, df.columns.get_loc(col)] = next_col_second_value

    # Save the modified DataFrame back to the CSV file
    df.to_csv(raw_data_file_path, index=False)

    print(f"Processing completed successfully for file: {raw_data_file_path}")

print("Overall processing finished.")

Processing completed successfully for file: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/2023_1.csv
Processing completed successfully for file: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE/2023_1.csv
Processing completed successfully for file: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG/2023_1.csv
Processing completed successfully for file: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CH/2023_1.csv
Processing completed successfully for file: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CY/2023_1.csv
Processing completed successfully for file: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CZ/2023_1.csv
Processing completed successfully for file: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DE/2023_1.csv
Processing completed successfully for file: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DK/2023_1.csv
Processing completed successfully for file: /hom

  df.loc[len(df)] = ['Week 54'] + [None] * (len(df.columns) - 1)
  df.iloc[-2, df.columns.get_loc(col)] = next_col_first_value
  df.loc[len(df)] = ['Week 54'] + [None] * (len(df.columns) - 1)
  df.iloc[-2, df.columns.get_loc(col)] = next_col_first_value
  df.loc[len(df)] = ['Week 54'] + [None] * (len(df.columns) - 1)
  df.iloc[-2, df.columns.get_loc(col)] = next_col_first_value
  df.loc[len(df)] = ['Week 54'] + [None] * (len(df.columns) - 1)
  df.iloc[-2, df.columns.get_loc(col)] = next_col_first_value
  df.loc[len(df)] = ['Week 54'] + [None] * (len(df.columns) - 1)
  df.iloc[-2, df.columns.get_loc(col)] = next_col_first_value
  df.loc[len(df)] = ['Week 54'] + [None] * (len(df.columns) - 1)
  df.iloc[-2, df.columns.get_loc(col)] = next_col_first_value
  df.loc[len(df)] = ['Week 54'] + [None] * (len(df.columns) - 1)
  df.iloc[-2, df.columns.get_loc(col)] = next_col_first_value
  df.loc[len(df)] = ['Week 54'] + [None] * (len(df.columns) - 1)
  df.iloc[-2, df.columns.get_loc(col)] = next_

In [108]:
# Read the CSV file containing the file paths
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"
df = pd.read_csv(reference_data_file_path)

# Iterate over each file path in the 'Raw_Data_File_Path' column
for file_path in df['Raw_Data_File_Path']:
    # Check if the file exists
    if os.path.exists(file_path):
        # Define the new file name
        new_file_name = f"{start_year}_{end_year}.csv"
        new_file_path = os.path.join(os.path.dirname(file_path), new_file_name)

        # Rename the original file
        os.rename(file_path, new_file_path)

        # Read the original CSV file
        df_original = pd.read_csv(new_file_path)

        # Extract the first column and the column with the same value as data_year
        new_df = df_original[['Week', str(data_year)]]

        # Define the new file name for the extracted data
        new_file_name_extracted = f"{data_year}_1.csv"
        new_file_path_extracted = os.path.join(os.path.dirname(file_path), new_file_name_extracted)

        # Save the extracted data to a new CSV file
        new_df.to_csv(new_file_path_extracted, index=False)

        print(f"Data extracted and saved to {new_file_path_extracted}")

    else:
        print(f"File '{file_path}' does not exist.")

Data extracted and saved to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/2023_1.csv
Data extracted and saved to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE/2023_1.csv
Data extracted and saved to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG/2023_1.csv
Data extracted and saved to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CH/2023_1.csv
Data extracted and saved to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CY/2023_1.csv
Data extracted and saved to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CZ/2023_1.csv
Data extracted and saved to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DE/2023_1.csv
Data extracted and saved to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DK/2023_1.csv
Data extracted and saved to /home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/EE/2023_1.csv
Data extracted and saved to /home/ray/Dispa-SET_Unleash/RawData/HydroData

In [109]:
# Define the paths
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"
#scaled_inflows_folder_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/"

# Read the reference data CSV file
df = pd.read_csv(reference_data_file_path)

# Add a new column called 'Scalled_Inflows_Folder_Path'
df['Scalled_Inflows_Folder_Path'] = scaled_inflows_folder_path + df['Dispa-SET_Code']

# Save the updated DataFrame back to the CSV file
df.to_csv(reference_data_file_path, index=False)

print("Scalled_Inflows_Folder_Path column added and saved to the reference data file.")

Scalled_Inflows_Folder_Path column added and saved to the reference data file.


In [110]:
# Define the paths
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"

# Read the reference data CSV file
df = pd.read_csv(reference_data_file_path)

# Add new headers
new_headers = ['1h', '30min', '15min']
df = pd.concat([df, pd.DataFrame(columns=new_headers)], axis=1)

# Save the updated DataFrame back to the CSV file
df.to_csv(reference_data_file_path, index=False)

print("New headers added to the reference data file.")

New headers added to the reference data file.


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
5.3.2. Reservoir Level Factor
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Getting the corresponding reservoir level factor.
<br>
- As the full storage capacity of each HPSP and HDAM unit it is not provided by the web source. It is going to work with the maximum Storage Energy Value of the corresponding year i.e. defined in the data_year variable.
</div>

In [111]:
# Define the variables
#data_year = "2023"
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"

# Read the reference data CSV file
df_reference = pd.read_csv(reference_data_file_path)

# Function to normalize column values
def normalize_column(column):
    # Fill non-numeric, "n/e", "N/A", and 0 values with 0
    column = column.replace(["n/e", "N/A", 0], pd.NA)
    column = column.fillna(0)
    
    # Check if column values are numeric
    if pd.api.types.is_numeric_dtype(column):
        # Find the maximum value
        max_value = column.max()
        # Check if the maximum value is not 0
        if max_value != 0:
            # Divide each value by the maximum value
            return column.divide(max_value)
    return column

# Iterate over each row in the DataFrame
for index, row in df_reference.iterrows():
    raw_data_file_path = row['Raw_Data_File_Path']
    
    # Read the corresponding CSV file
    df_raw_data = pd.read_csv(raw_data_file_path)
    
    # Normalize the second column
    column_to_normalize = df_raw_data.iloc[:, 1]
    df_raw_data.iloc[:, 1] = normalize_column(column_to_normalize)
    
    # Save the modified DataFrame back to the CSV file
    df_raw_data.to_csv(raw_data_file_path, index=False)
    print(f"Values in CSV file '{raw_data_file_path}' normalized.")

print("Process completed successfully.")

Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/2023_1.csv' normalized.
Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE/2023_1.csv' normalized.
Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG/2023_1.csv' normalized.
Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CH/2023_1.csv' normalized.
Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CY/2023_1.csv' normalized.
Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CZ/2023_1.csv' normalized.
Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DE/2023_1.csv' normalized.
Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DK/2023_1.csv' normalized.
Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/EE/2023_1.csv' normalized.
Values in CSV file '/home/ra

  column = column.fillna(0)
  column = column.fillna(0)
  column = column.fillna(0)
  column = column.fillna(0)
  column = column.fillna(0)
  column = column.fillna(0)
  column = column.fillna(0)
  column = column.fillna(0)
  column = column.fillna(0)
  column = column.fillna(0)
  column = column.fillna(0)
  column = column.fillna(0)
  column = column.fillna(0)


<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Getting the corresponding time format.
<br>
- As the full storage capacity data for all the countries are given by a time resolution of a week i.e. 53/54 weeks per year. So it is going to add the corresponding date to each week according to the analized year i.e. value specified in the data_year variable.
</div>

In [115]:
# Define the reference data file path
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"

# Read the reference data CSV file
df_reference = pd.read_csv(reference_data_file_path)

# Iterate over each row in the DataFrame
for index, row in df_reference.iterrows():
    raw_data_file_path = row['Raw_Data_File_Path']

    # Read the CSV file
    df = pd.read_csv(raw_data_file_path)

    # Extract week number from the 'Week' column
    df['Week'] = df['Week'].str.extract(r'(\d+)').astype(int)  # Extract digits from the string and convert to int

    # Calculate the dates
    start_date = pd.to_datetime(f'{data_year}-01-01')  # Start from January 1st of the specified year
    df['Dispa-SET_Date'] = start_date + pd.to_timedelta((df['Week'] - 1) * 7, unit='D')  # Add the corresponding number of weeks

    # Convert to string in the desired format
    df['Dispa-SET_Date'] = df['Dispa-SET_Date'].dt.strftime('%Y-%m-%d 00:00:00+00:00')

    # Save the modified DataFrame back to the CSV file
    df.to_csv(raw_data_file_path, index=False)

    print(f"Processed CSV file '{raw_data_file_path}'")

print("Processing completed for all CSV files.")

Processed CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/2023_1.csv'
Processed CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE/2023_1.csv'
Processed CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG/2023_1.csv'
Processed CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CH/2023_1.csv'
Processed CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CY/2023_1.csv'
Processed CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CZ/2023_1.csv'
Processed CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DE/2023_1.csv'
Processed CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DK/2023_1.csv'
Processed CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/EE/2023_1.csv'
Processed CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/EL/2023_1.csv'
Processed CSV file '/home/ray/Dispa-SET_

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
5.4. Reservoirs Level Files
</div>

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
5.4.1. Reservoirs Level Clean Files Creation
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Creating all the clean csv files where the final data of the reservoir levels have to be gathered.
    <br>
- The next three cell are use to create the content folders (1h, 30 min and/or 15min) and the cleaned csv files according the time step provided by the web source (ENTSO E).
</div>

In [112]:
# Define the paths
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"
#data_year = "2023"

# Read the reference data CSV file
df = pd.read_csv(reference_data_file_path)

# Iterate over each row in the DataFrame
for index, row in df.iterrows():
    scaled_inflows_folder_path = row['Scalled_Inflows_Folder_Path']
    zone_folder_path = row['Zone_Folder_Path']
    
    # Check if the scaled inflows folder path exists
    if os.path.exists(scaled_inflows_folder_path):
        # Iterate over each required folder name
        for folder_name in ['1h', '30min', '15min']:
            folder_path = os.path.join(scaled_inflows_folder_path, folder_name)
            
            # Check if the folder exists
            if os.path.exists(folder_path):
                # Create the destination folder path
                destination_path = os.path.join(zone_folder_path, folder_name)
                
                # Copy the folder and its contents to the destination
                shutil.copytree(folder_path, destination_path)
                
                print(f"Folder '{folder_name}' copied to '{destination_path}'.")
            else:
                print(f"Folder '{folder_name}' does not exist in '{scaled_inflows_folder_path}'.")
    else:
        print(f"Scalled inflows folder path '{scaled_inflows_folder_path}' does not exist.")

print("Copying complete.")

Folder '1h' copied to '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/1h'.
Folder '30min' copied to '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/30min'.
Folder '15min' copied to '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/15min'.
Folder '1h' copied to '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE/1h'.
Folder '30min' does not exist in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/BE'.
Folder '15min' does not exist in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/BE'.
Folder '1h' copied to '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG/1h'.
Folder '30min' does not exist in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/BG'.
Folder '15min' does not exist in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/BG'.
Folder '1h' copied to '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CH/1h'.
Folder '30min' does not exist in '/home/

In [113]:
# Define the paths
#data_year = "2023"
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"

# Read the reference data CSV file
df = pd.read_csv(reference_data_file_path)

# Iterate over each row in the DataFrame
for index, row in df.iterrows():
    zone_folder_path = row['Zone_Folder_Path']
    
    # Check if the zone folder path exists
    if os.path.exists(zone_folder_path):
        # Iterate over the required subfolders
        for subfolder in ['1h', '30min', '15min']:
            subfolder_path = os.path.join(zone_folder_path, subfolder)
            
            # Check if the subfolder exists
            if os.path.exists(subfolder_path):
                # Check if the CSV file for the data year exists in the subfolder
                csv_file_path = os.path.join(subfolder_path, f"{data_year}.csv")
                if os.path.exists(csv_file_path):
                    # Write the CSV file path to the corresponding column in the DataFrame
                    df.at[index, subfolder] = csv_file_path
                    print(f"CSV file '{data_year}.csv' found in '{subfolder_path}'.")
                else:
                    print(f"CSV file '{data_year}.csv' not found in '{subfolder_path}'.")
            else:
                print(f"Subfolder '{subfolder}' does not exist in '{zone_folder_path}'.")
    else:
        print(f"Zone folder path '{zone_folder_path}' does not exist.")

# Save the updated DataFrame to the reference data CSV file
df.to_csv(reference_data_file_path, index=False)
print("CSV file updated successfully.")

CSV file '2023.csv' found in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/1h'.
CSV file '2023.csv' found in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/30min'.
CSV file '2023.csv' found in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/15min'.
CSV file '2023.csv' found in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE/1h'.
Subfolder '30min' does not exist in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE'.
Subfolder '15min' does not exist in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE'.
CSV file '2023.csv' found in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG/1h'.
Subfolder '30min' does not exist in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG'.
Subfolder '15min' does not exist in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG'.
CSV file '2023.csv' found in '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLe

  df.at[index, subfolder] = csv_file_path
  df.at[index, subfolder] = csv_file_path
  df.at[index, subfolder] = csv_file_path


In [114]:
# Define the paths
#data_year = "2023"
#reference_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/Reference_Data.csv"

# Read the reference data CSV file
df = pd.read_csv(reference_data_file_path)

# Iterate over each row in the DataFrame
for index, row in df.iterrows():
    for column in ['1h', '30min', '15min']:
        csv_file_path = row[column]
        
        # Check if the CSV file path exists
        if pd.notna(csv_file_path):
            # Read the CSV file
            df_csv = pd.read_csv(csv_file_path)
            
            # Erase all the values of the second and third column
            df_csv.iloc[:, 1:3] = ''
            
            # Save the modified DataFrame back to the CSV file
            df_csv.to_csv(csv_file_path, index=False)
            print(f"Values erased in CSV file '{csv_file_path}'.")

print("Process completed successfully.")

  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''


Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/30min/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/AT/15min/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BE/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/BG/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CH/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CY/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CY/30min/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/CZ/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET

  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''


Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/DK/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/EE/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/EE/30min/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/EE/15min/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/EL/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/ES/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/ES/30min/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/ES/15min/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/FI/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-

  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''


Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/HU/30min/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/HU/15min/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/IE/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/IE/30min/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/IT/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/LT/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/LU/1h/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/LU/30min/2023.csv'.
Values erased in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ReservoirLevel/LU/15min/2023.csv'.
Values erased in CSV file '/home/ray/Dis

  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''
  df_csv.iloc[:, 1:3] = ''


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
5.5.2. Raw Data Time Resolution
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Identifying the time spept of all downloaded files.
<br>
- Additional columns that indicates the Year, Mounth, Day, Hour and minute of the data are added to the files.
</div>

In [20]:
# Define the path to the CSV file containing the file paths
#standard_time_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv"

# Read the CSV file containing the paths
df = pd.read_csv(standard_time_data_file_path)

# Iterate through each file path in the 'Time_Series_Raw_Data_File_Path' column
for file_path in df['Time_Series_Raw_Data_File_Path']:
    # Read the CSV file
    file_df = pd.read_csv(file_path)

    # Add new columns 'Year', 'Month', 'Day', and 'Time_1', 'Time_2'
    file_df['Year'] = data_year
    file_df['Month'] = ''
    file_df['Day'] = ''
    file_df['Hour'] = file_df['MTU'].str[:2]  # Extract the first five characters of 'MTU' column
    file_df['Minute'] = file_df['MTU'].str[3:5]  # Extract the fourth and fifth characters of 'MTU' column

    # Set the value of the first row in the 'Month' column to '01'
    #file_df.loc[0, 'Month'] = '01'
    #file_df.loc[0, 'Day'] = '01'

    # Reorder the columns to have 'Day' before 'Time'
    file_df = file_df[['Year', 'Month', 'Day', 'Hour', 'Minute'] + [col for col in file_df.columns if col not in ['Year', 'Month', 'Day', 'Hour', 'Minute']]]

    # Write the updated DataFrame back to the CSV file
    file_df.to_csv(file_path, index=False)

    print(f"Updated file: {file_path}")

print("All files updated successfully.")


Updated file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/AT/2023_1.csv
Updated file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/BE/2023_1.csv
Updated file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/BG/2023_1.csv
Updated file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/CH/2023_1.csv
Updated file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/CY/2023_1.csv
Updated file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/CZ/2023_1.csv


  file_df = pd.read_csv(file_path)


Updated file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/DE/2023_1.csv
Updated file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/DK/2023_1.csv


  file_df = pd.read_csv(file_path)


Updated file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/EE/2023_1.csv
Updated file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/EL/2023_1.csv
Updated file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/ES/2023_1.csv
Updated file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/FI/2023_1.csv
Updated file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/FR/2023_1.csv
Updated file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/HR/2023_1.csv
Updated file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/HU/2023_1.csv
Updated file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/IE/2023_1.csv
Updated file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/IT/2023_1.csv
Updated file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/LT/2023_1.csv
Updated file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/LU/2023_1.csv
Updated file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/LV/2023_1.csv
Upda

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Extracting the time stept data.
<br>
- The rows corresponding to a time resolution of 1 hour, 30 minutes and 15 minutes are extrated and added to new files with the suffix _1h, _30min, and _15min.
</div>

In [21]:
def process_csv_file(csv_file_path):
    print(f"Processing file: {csv_file_path}")
    
    # Load the CSV file
    df = pd.read_csv(csv_file_path)
    
    # Count the number of rows
    num_rows = len(df)
    
    # Define file name without extension
    file_name_no_ext = os.path.splitext(os.path.basename(csv_file_path))[0]
    
    # Get the directory of the current CSV file
    base_dir = os.path.dirname(csv_file_path)
    
    # Check conditions and process accordingly
    if num_rows in [35040, 34544]:
        print(f"File has 35040 or 34544 rows. Processing...")
        process_35040_34544(df, file_name_no_ext, base_dir)
    elif num_rows in [17520, 17522]:
        print(f"File has 17520 or 17522 rows. Processing...")
        process_17520_17522(df, file_name_no_ext, base_dir)
    elif num_rows in [8760, 8761]:
        print(f"File has 8760 or 8761 rows. Processing...")
        process_8760_8761(df, file_name_no_ext, base_dir)
    else:
        print(f"File has {num_rows} rows. No specific conditions matched. Processing as default...")
        process_default(df, file_name_no_ext, base_dir)

def process_35040_34544(df, file_name_no_ext, base_dir):
    # Create new file paths
    suffixes = ['_1h', '_30min', '_15min']
    new_file_paths = [os.path.join(base_dir, file_name_no_ext + suffix + '.csv') for suffix in suffixes]
    
    # Write data to new files
    for new_file_path in new_file_paths:
        df.to_csv(new_file_path, index=False)
    
    # Filter rows and write to corresponding files
    df_minute_zero = df[df['Minute'] == 0]
    df_minute_zero.to_csv(os.path.join(base_dir, file_name_no_ext + '_1h.csv'), index=False)
    
    df_minute_zero_or_thirty = df[df['Minute'].isin([0, 30])]
    df_minute_zero_or_thirty.to_csv(os.path.join(base_dir, file_name_no_ext + '_30min.csv'), index=False)
    
    print("Processed successfully.")

def process_17520_17522(df, file_name_no_ext, base_dir):
    # Create new file paths
    suffixes = ['_1h', '_30min']
    new_file_paths = [os.path.join(base_dir, file_name_no_ext + suffix + '.csv') for suffix in suffixes]
    
    # Write data to new files
    for new_file_path in new_file_paths:
        df.to_csv(new_file_path, index=False)
    
    # Filter rows and write to corresponding files
    df_minute_zero = df[df['Minute'] == 0]
    df_minute_zero.to_csv(os.path.join(base_dir, file_name_no_ext + '_1h.csv'), index=False)
    
    print("Processed successfully.")

def process_8760_8761(df, file_name_no_ext, base_dir):
    # Create new file path
    new_file_path = os.path.join(base_dir, file_name_no_ext + '_1h.csv')
    
    # Write data to new file
    df.to_csv(new_file_path, index=False)
    
    print("Processed successfully.")

def process_default(df, file_name_no_ext, base_dir):
    # Create new file path
    new_file_path = os.path.join(base_dir, file_name_no_ext + '_1h.csv')
    
    # Filter rows and write to corresponding files
    df_minute_zero = df[df['Minute'] == 0]
    df_minute_zero.to_csv(os.path.join(base_dir, file_name_no_ext + '_1h.csv'), index=False)
    
    print("Processed successfully.")

# File path
#standard_time_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv"

# Load the CSV file
standard_time_data_df = pd.read_csv(standard_time_data_file_path)

# Iterate over each CSV file specified in 'Time_Series_Raw_Data_File_Path'
for csv_file_path in standard_time_data_df['Time_Series_Raw_Data_File_Path']:
    process_csv_file(csv_file_path)

Processing file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/AT/2023_1.csv
File has 35040 or 34544 rows. Processing...
Processed successfully.
Processing file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/BE/2023_1.csv
File has 8760 or 8761 rows. Processing...
Processed successfully.
Processing file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/BG/2023_1.csv
File has 8760 or 8761 rows. Processing...
Processed successfully.
Processing file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/CH/2023_1.csv
File has 8760 or 8761 rows. Processing...
Processed successfully.
Processing file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/CY/2023_1.csv
File has 17520 or 17522 rows. Processing...
Processed successfully.
Processing file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/CZ/2023_1.csv
File has 8760 or 8761 rows. Processing...
Processed successfully.
Processing file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/DE/2023_1.cs

  df = pd.read_csv(csv_file_path)


Processed successfully.
Processing file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/DK/2023_1.csv
File has 8760 or 8761 rows. Processing...
Processed successfully.
Processing file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/EE/2023_1.csv
File has 35040 or 34544 rows. Processing...


  df = pd.read_csv(csv_file_path)


Processed successfully.
Processing file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/EL/2023_1.csv
File has 8760 or 8761 rows. Processing...
Processed successfully.
Processing file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/ES/2023_1.csv
File has 35040 or 34544 rows. Processing...
Processed successfully.
Processing file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/FI/2023_1.csv
File has 24960 rows. No specific conditions matched. Processing as default...
Processed successfully.
Processing file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/FR/2023_1.csv
File has 8760 or 8761 rows. Processing...
Processed successfully.
Processing file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/HR/2023_1.csv
File has 11064 rows. No specific conditions matched. Processing as default...
Processed successfully.
Processing file: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/HU/2023_1.csv
File has 35040 or 34544 rows. Processing...
Processed su

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Filling the correspondig time stept.
<br>
- For the next stepts of the formating process it is necesary identify the path of the files where all the data of the 30 Dispa-SET countries are going to be storaged differencing them into time stepts (1 hour, 30 min, and 15 min)
</div>

In [22]:
# Define file paths and variables
#standard_time_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv"
#data_year = 2023

# Read the standard time data CSV file
df = pd.read_csv(standard_time_data_file_path)

# Define function to search for files in directory
def search_files(directory, prefix):
    for file in os.listdir(directory):
        if file.startswith(prefix):
            return os.path.join(directory, file)
    return None

# Add new columns
df[f"{data_year}_1h_File_Path"] = ""
df[f"{data_year}_30min_File_Path"] = ""
df[f"{data_year}_15min_File_Path"] = ""

# Iterate over Zone_Folder_Path column
for index, row in df.iterrows():
    zone_folder_path = row['Zone_Folder_Path']
    if os.path.exists(zone_folder_path):
        # Search for files in the directory
        hour_file_path = search_files(zone_folder_path, f"{data_year}_1_1h.csv")
        if hour_file_path:
            df.at[index, f"{data_year}_1h_File_Path"] = hour_file_path
            
        min30_file_path = search_files(zone_folder_path, f"{data_year}_1_30min.csv")
        if min30_file_path:
            df.at[index, f"{data_year}_30min_File_Path"] = min30_file_path
            
        min15_file_path = search_files(zone_folder_path, f"{data_year}_1_15min.csv")
        if min15_file_path:
            df.at[index, f"{data_year}_15min_File_Path"] = min15_file_path

# Save the updated DataFrame back to the same CSV file
df.to_csv(standard_time_data_file_path, index=False)

print("CSV file updated successfully.")

CSV file updated successfully.


<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Fulling the corresponding time step to each file.
<br>
- Not all the countries have raw data in a resolution of 15 mimuntes, 30 minutes or 1 hour, so, all the next three cells all the files of the corresponding time stept are going to be fullfilling in their corresponding columns of Minute, Hour, Day and Month of the year specified in the variable data_year 
</div>

<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
1 Hour time stept
</div>

In [23]:
# Specify the year
year = data_year

# Load the CSV file containing paths
df_paths = pd.read_csv(standard_time_data_file_path)

# Iterate over each row in the DataFrame
for index, row in df_paths.iterrows():
    # Get the path from the corresponding column
    file_path_column_name = f"{year}_1h_File_Path"  # Dynamically construct the column name based on the year
    file_path = row[file_path_column_name]
    
    # Check if the path exists and is not NaN
    if isinstance(file_path, str) and os.path.exists(file_path):
        # Load the existing CSV file
        df = pd.read_csv(file_path)

        # Generate a date range for the entire year
        dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')

        # Extract month, day, and hour from the date range
        months = [date.month for date in dates]
        days = [date.day for date in dates]
        hours = [date.hour % 24 for date in dates]  # Cycle through hours (0-23)

        # Update the DataFrame with the generated data
        df['Month'] = months
        df['Day'] = days
        df['Hour'] = hours

        # Save the updated DataFrame back to the same CSV file, overwriting the original file
        df.to_csv(file_path, index=False)

        print(f"CSV file updated successfully: {file_path}")
    else:
        print(f"No valid path specified in row {index + 1}. Skipping...")

  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')
  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')
  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')


CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/AT/2023_1_1h.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/BE/2023_1_1h.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/BG/2023_1_1h.csv


  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')
  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')
  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')
  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')


CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/CH/2023_1_1h.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/CY/2023_1_1h.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/CZ/2023_1_1h.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/DE/2023_1_1h.csv


  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')
  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')
  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')
  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')


CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/DK/2023_1_1h.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/EE/2023_1_1h.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/EL/2023_1_1h.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/ES/2023_1_1h.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/FI/2023_1_1h.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/FR/2023_1_1h.csv


  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')
  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')
  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')


CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/HR/2023_1_1h.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/HU/2023_1_1h.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/IE/2023_1_1h.csv


  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')
  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')
  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')


CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/IT/2023_1_1h.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/LT/2023_1_1h.csv


  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')
  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')
  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')


CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/LU/2023_1_1h.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/LV/2023_1_1h.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/MT/2023_1_1h.csv


  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')
  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')
  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')
  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')


CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/NL/2023_1_1h.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/NO/2023_1_1h.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/PL/2023_1_1h.csv


  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')
  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')
  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')


CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/PT/2023_1_1h.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/RO/2023_1_1h.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/SE/2023_1_1h.csv


  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')
  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')
  dates = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:00:00', freq='H')


CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/SI/2023_1_1h.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/SK/2023_1_1h.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/UK/2023_1_1h.csv


<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
30 Minutes time stept
</div>

In [24]:
# Specify the year
year = data_year

# Load the CSV file containing paths
#standard_time_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv"
df_paths = pd.read_csv(standard_time_data_file_path)

# Iterate over each row in the DataFrame
for index, row in df_paths.iterrows():
    # Dynamically construct the column name based on the year
    file_path_column_name = f"{year}_30min_File_Path"
    
    # Get the path from the corresponding column
    file_path = row[file_path_column_name]
    
    # Check if the path exists and is not NaN
    if isinstance(file_path, str) and os.path.exists(file_path):
        # Load the existing CSV file
        df = pd.read_csv(file_path)

        # Generate a date range for the entire year with a time step of 30 minutes
        dates_30min = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:59:59', freq='30T')

        # Extract month, day, and hour from the date range with 30-minute time step
        months_30min = [date.month for date in dates_30min]
        days_30min = [date.day for date in dates_30min]
        hours_30min = [date.hour % 24 for date in dates_30min]  # Cycle through hours (0-23)

        # Update the DataFrame with the generated data for 30-minute time step
        df['Month'] = months_30min
        df['Day'] = days_30min
        df['Hour'] = hours_30min

        # Save the updated DataFrame back to the same CSV file, overwriting the original file
        df.to_csv(file_path, index=False)

        print(f"CSV file updated successfully with 30-minute time step: {file_path}")
    else:
        print(f"No valid path specified in row {index + 1}. Skipping...")


  dates_30min = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:59:59', freq='30T')
  dates_30min = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:59:59', freq='30T')


CSV file updated successfully with 30-minute time step: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/AT/2023_1_30min.csv
No valid path specified in row 2. Skipping...
No valid path specified in row 3. Skipping...
No valid path specified in row 4. Skipping...
CSV file updated successfully with 30-minute time step: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/CY/2023_1_30min.csv
No valid path specified in row 6. Skipping...


  df = pd.read_csv(file_path)
  dates_30min = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:59:59', freq='30T')
  df = pd.read_csv(file_path)


CSV file updated successfully with 30-minute time step: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/DE/2023_1_30min.csv
No valid path specified in row 8. Skipping...
CSV file updated successfully with 30-minute time step: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/EE/2023_1_30min.csv
No valid path specified in row 10. Skipping...


  dates_30min = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:59:59', freq='30T')
  dates_30min = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:59:59', freq='30T')


CSV file updated successfully with 30-minute time step: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/ES/2023_1_30min.csv
No valid path specified in row 12. Skipping...
No valid path specified in row 13. Skipping...
No valid path specified in row 14. Skipping...
CSV file updated successfully with 30-minute time step: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/HU/2023_1_30min.csv


  dates_30min = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:59:59', freq='30T')
  dates_30min = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:59:59', freq='30T')


CSV file updated successfully with 30-minute time step: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/IE/2023_1_30min.csv
No valid path specified in row 17. Skipping...
No valid path specified in row 18. Skipping...
CSV file updated successfully with 30-minute time step: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/LU/2023_1_30min.csv
No valid path specified in row 20. Skipping...
No valid path specified in row 21. Skipping...


  dates_30min = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:59:59', freq='30T')
  dates_30min = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:59:59', freq='30T')


CSV file updated successfully with 30-minute time step: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/NL/2023_1_30min.csv
No valid path specified in row 23. Skipping...
No valid path specified in row 24. Skipping...
No valid path specified in row 25. Skipping...
CSV file updated successfully with 30-minute time step: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/RO/2023_1_30min.csv
No valid path specified in row 27. Skipping...
No valid path specified in row 28. Skipping...
No valid path specified in row 29. Skipping...


  dates_30min = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:59:59', freq='30T')
  dates_30min = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:59:59', freq='30T')


CSV file updated successfully with 30-minute time step: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/UK/2023_1_30min.csv


<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
15 Minutes time stept
</div>

In [25]:
# Specify the year
year = data_year

# Load the CSV file containing paths
#standard_time_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv"
df_paths = pd.read_csv(standard_time_data_file_path)

# Iterate over each row in the DataFrame
for index, row in df_paths.iterrows():
    # Dynamically construct the column name based on the year
    file_path_column_name = f"{year}_15min_File_Path"
    
    # Get the path from the corresponding column
    file_path = row[file_path_column_name]
    
    # Check if the path exists and is not NaN
    if isinstance(file_path, str) and os.path.exists(file_path):
        # Load the existing CSV file
        df = pd.read_csv(file_path)

        # Generate a date range for the entire year with a time step of 15 minutes
        dates_15min = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:59:59', freq='15T')

        # Extract month, day, and hour from the date range with 15-minute time step
        months_15min = [date.month for date in dates_15min]
        days_15min = [date.day for date in dates_15min]
        hours_15min = [date.hour % 24 for date in dates_15min]  # Cycle through hours (0-23)

        # Update the DataFrame with the generated data for 15-minute time step
        df['Month'] = months_15min
        df['Day'] = days_15min
        df['Hour'] = hours_15min

        # Save the updated DataFrame back to the same CSV file, overwriting the original file
        df.to_csv(file_path, index=False)

        print(f"CSV file updated successfully with 15-minute time step: {file_path}")
    else:
        print(f"No valid path specified in row {index + 1}. Skipping...")

  dates_15min = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:59:59', freq='15T')


CSV file updated successfully with 15-minute time step: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/AT/2023_1_15min.csv
No valid path specified in row 2. Skipping...
No valid path specified in row 3. Skipping...
No valid path specified in row 4. Skipping...
No valid path specified in row 5. Skipping...
No valid path specified in row 6. Skipping...


  df = pd.read_csv(file_path)
  dates_15min = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:59:59', freq='15T')


CSV file updated successfully with 15-minute time step: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/DE/2023_1_15min.csv
No valid path specified in row 8. Skipping...


  df = pd.read_csv(file_path)
  dates_15min = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:59:59', freq='15T')


CSV file updated successfully with 15-minute time step: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/EE/2023_1_15min.csv
No valid path specified in row 10. Skipping...


  dates_15min = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:59:59', freq='15T')


CSV file updated successfully with 15-minute time step: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/ES/2023_1_15min.csv
No valid path specified in row 12. Skipping...
No valid path specified in row 13. Skipping...
No valid path specified in row 14. Skipping...


  dates_15min = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:59:59', freq='15T')


CSV file updated successfully with 15-minute time step: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/HU/2023_1_15min.csv
No valid path specified in row 16. Skipping...
No valid path specified in row 17. Skipping...
No valid path specified in row 18. Skipping...


  dates_15min = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:59:59', freq='15T')


CSV file updated successfully with 15-minute time step: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/LU/2023_1_15min.csv
No valid path specified in row 20. Skipping...
No valid path specified in row 21. Skipping...


  dates_15min = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:59:59', freq='15T')


CSV file updated successfully with 15-minute time step: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/NL/2023_1_15min.csv
No valid path specified in row 23. Skipping...
No valid path specified in row 24. Skipping...
No valid path specified in row 25. Skipping...


  dates_15min = pd.date_range(start=f'{year}-01-01 00:00:00', end=f'{year}-12-31 23:59:59', freq='15T')


CSV file updated successfully with 15-minute time step: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/RO/2023_1_15min.csv
No valid path specified in row 27. Skipping...
No valid path specified in row 28. Skipping...
No valid path specified in row 29. Skipping...
No valid path specified in row 30. Skipping...


<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Adding the Dispa-SET Time Stept format.
<br>
- The format that Dispa-SET reads the time stept is the following, 0000-00-00 00:00:00+00:00 where the first part represents the date and the second part represents the time, so the next script add to each row of the correspongind file (1 hour file, 30 minutes file and 15 minutes file for each country) the time stept acording this format.
</div>

In [26]:
# Define a function to create Dispa_SET_Time_Step column
def create_Dispa_SET_Time_Step(row):
    # Ensure two-digit format for Day, Month, Hour, and Minute
    date_part = f"{row['Year']:04d}-{row['Month']:02d}-{row['Day']:02d}"
    time_part = f"{row['Hour']:02d}:{row['Minute']:02d}:00+00:00"
    return date_part + " " + time_part

# Specify the year
#data_year = 2023

# Load the CSV file containing file paths
#standard_time_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv"
df_paths = pd.read_csv(standard_time_data_file_path)

# Iterate over each row in the DataFrame
for index, row in df_paths.iterrows():
    # Get the file paths from the corresponding columns
    hour_file_path = row[f"{data_year}_1h_File_Path"]
    min_30_file_path = row[f"{data_year}_30min_File_Path"]
    min_15_file_path = row[f"{data_year}_15min_File_Path"]
    
    # Process each file path if it exists
    for file_path in [hour_file_path, min_30_file_path, min_15_file_path]:
        # Check if the path exists and is not NaN
        if isinstance(file_path, str) and os.path.exists(file_path):
            # Read the CSV file into a DataFrame
            df = pd.read_csv(file_path)
            
            # Apply the function to create the Dispa_SET_Time_Step column
            df['Dispa_SET_Time_Step'] = df.apply(create_Dispa_SET_Time_Step, axis=1)
            
            # Save the modified DataFrame back to the CSV file
            df.to_csv(file_path, index=False)
            
            print(f"CSV file updated successfully: {file_path}")
        else:
            print(f"No valid path specified in row {index + 1}. Skipping...")

CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/AT/2023_1_1h.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/AT/2023_1_30min.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/AT/2023_1_15min.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/BE/2023_1_1h.csv
No valid path specified in row 2. Skipping...
No valid path specified in row 2. Skipping...
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/BG/2023_1_1h.csv
No valid path specified in row 3. Skipping...
No valid path specified in row 3. Skipping...
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/CH/2023_1_1h.csv
No valid path specified in row 4. Skipping...
No valid path specified in row 4. Skipping...
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/CY/2023_1_1h.c

  df = pd.read_csv(file_path)


CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/DE/2023_1_30min.csv


  df = pd.read_csv(file_path)


CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/DE/2023_1_15min.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/DK/2023_1_1h.csv
No valid path specified in row 8. Skipping...
No valid path specified in row 8. Skipping...
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/EE/2023_1_1h.csv


  df = pd.read_csv(file_path)


CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/EE/2023_1_30min.csv


  df = pd.read_csv(file_path)


CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/EE/2023_1_15min.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/EL/2023_1_1h.csv
No valid path specified in row 10. Skipping...
No valid path specified in row 10. Skipping...
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/ES/2023_1_1h.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/ES/2023_1_30min.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/ES/2023_1_15min.csv
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/FI/2023_1_1h.csv
No valid path specified in row 12. Skipping...
No valid path specified in row 12. Skipping...
CSV file updated successfully: /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/FR/2023_1_1h.csv
No valid path specified in row 13. Skipping...
No valid path specified in row 13. 

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Erasing the unnecesary columns and ordering.
</div>

In [27]:
# Define the function to process each CSV file
def process_csv_file(row, time_step_column):
    csv_file = row[time_step_column]
    if pd.notnull(csv_file) and os.path.exists(csv_file):
        # Read the CSV file into a DataFrame
        df_csv = pd.read_csv(csv_file)
        
        # Drop specified columns: Year, Month, Day, Hour, Minute, MTU
        columns_to_drop = ['Year', 'Month', 'Day', 'Hour', 'Minute', 'MTU']
        df_csv = df_csv.drop(columns=columns_to_drop, errors='ignore')
        
        # Move the column 'Dispa_SET_Time_Step' to the first position
        if 'Dispa_SET_Time_Step' in df_csv.columns:
            columns = list(df_csv.columns)
            columns.remove('Dispa_SET_Time_Step')
            df_csv = df_csv[['Dispa_SET_Time_Step'] + columns]
        
        # Save the changes back to the CSV file
        df_csv.to_csv(csv_file, index=False)
        
# Define the path to the CSV file containing the file paths
#standard_time_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv"

# Read the CSV file containing the paths
df = pd.read_csv(standard_time_data_file_path)

# Convert data_year to a string
data_year = str(data_year)

# Define the column names for the file paths
hour_file_column = data_year + '_1h_File_Path'
thirty_min_file_column = data_year + '_30min_File_Path'
fifteen_min_file_column = data_year + '_15min_File_Path'

# Process the one-hour file
df.apply(lambda row: process_csv_file(row, hour_file_column), axis=1)

# Process the thirty-minute file
df.apply(lambda row: process_csv_file(row, thirty_min_file_column), axis=1)

# Process the fifteen-minute file
df.apply(lambda row: process_csv_file(row, fifteen_min_file_column), axis=1)

  df_csv = pd.read_csv(csv_file)
  df_csv = pd.read_csv(csv_file)
  df_csv = pd.read_csv(csv_file)
  df_csv = pd.read_csv(csv_file)


0     None
1     None
2     None
3     None
4     None
5     None
6     None
7     None
8     None
9     None
10    None
11    None
12    None
13    None
14    None
15    None
16    None
17    None
18    None
19    None
20    None
21    None
22    None
23    None
24    None
25    None
26    None
27    None
28    None
29    None
dtype: object

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
5.5.3. Raw Data Subdirectories Creation
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Creating the sub directories where the raw data is going to be splited by technology type.
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- Once the raw data file has the appropiaded time stept format, the same has to be divided to has the corresponding time serie of each technology type isolated.
<br>
- A sub folder named as the corresponding technology type are goint to be created to each country.
</div>

In [28]:
# Define the headers to be added
new_headers = [
    "Biomass_Actual_Aggregated", "Biomass_Actual_Consumption",
    "Fossil_Brown_coal-Lignite_Actual_Aggregated", "Fossil_Brown_coal-Lignite_Actual_Consumption",
    "Fossil_Coal-derived_gas_Actual_Aggregated", "Fossil_Coal-derived_gas_Actual_Consumption",
    "Fossil_Gas_Actual_Aggregated", "Fossil_Gas_Actual_Consumption",
    "Fossil_Hard_coal_Actual_Aggregated", "Fossil_Hard_coal_Actual_Consumption",
    "Fossil_Oil_Actual_Aggregated", "Fossil_Oil_Actual_Consumption",
    "Fossil_Oil_shale_Actual_Aggregated", "Fossil_Oil_shale_Actual_Consumption",
    "Fossil_Peat_Actual_Aggregated", "Fossil_Peat_Actual_Consumption",
    "Geothermal_Actual_Aggregated", "Geothermal_Actual_Consumption",
    "Hydro_Pumped_Storage_Actual_Aggregated", "Hydro_Pumped_Storage_Actual_Consumption",
    "Hydro_Run-of-river_and_poundage_Actual_Aggregated", "Hydro_Run-of-river_and_poundage_Actual_Consumption",
    "Hydro_Water_Reservoir_Actual_Aggregated", "Hydro_Water_Reservoir_Actual_Consumption",
    "Marine_Actual_Aggregated", "Marine_Actual_Consumption",
    "Nuclear_Actual_Aggregated", "Nuclear_Actual_Consumption",
    "Other_Actual_Aggregated", "Other_Actual_Consumption",
    "Other_renewable_Actual_Aggregated", "Other_renewable_Actual_Consumption",
    "Solar_Actual_Aggregated", "Solar_Actual_Consumption",
    "Waste_Actual_Aggregated", "Waste_Actual_Consumption",
    "Wind_Offshore_Actual_Aggregated", "Wind_Offshore_Actual_Consumption",
    "Wind_Onshore_Actual_Aggregated", "Wind_Onshore_Actual_Consumption"
]

# Define the path to the CSV file containing the file paths
#standard_time_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv"

# Read the CSV file containing the paths
df = pd.read_csv(standard_time_data_file_path)

# Iterate through each row in the DataFrame
for index, row in df.iterrows():
    # Extract the zone folder path from the row
    zone_folder_path = row['Zone_Folder_Path']
    
    # Create a new folder for each zone folder path
    for header in new_headers:
        new_folder_path = os.path.join(zone_folder_path, header)
        if not os.path.exists(new_folder_path):
            os.makedirs(new_folder_path)
        else:
            print(f"Folder '{new_folder_path}' already exists.")
            
        # Add the new column with the folder path to the DataFrame
        df.at[index, header] = new_folder_path

# Save the modified DataFrame back to the CSV file
df.to_csv(standard_time_data_file_path, index=False)

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
5.5.4. Raw Data Subdirectories Divition
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Splitting each raw data file of each country into a single file by technology type.
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- Each new csv file is goning to be saved in the corresponding folder created in the previous cell.
</div>

In [29]:
# Path to the standard time data CSV file
#standard_time_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv"

# Load the standard time data CSV file
standard_time_df = pd.read_csv(standard_time_data_file_path)

# Get the data year from the variable
#data_year = "2023"  # Example value

# Iterate over each row in the standard time data
for index, row in standard_time_df.iterrows():
    # Extract the file paths from the specified columns
    hour_file_path = row[data_year + "_1h_File_Path"]
    thirty_min_file_path = row[data_year + "_30min_File_Path"]
    fifteen_min_file_path = row[data_year + "_15min_File_Path"]
    
    # Iterate over the file paths and apply the processing code
    for file_path in [hour_file_path, thirty_min_file_path, fifteen_min_file_path]:
        # Skip if the file path is empty
        if pd.isna(file_path):
            continue
        
        # Load the CSV file
        csv_df = pd.read_csv(file_path)
        
        # Get the headers
        headers = csv_df.columns
        
        # Create a folder for each second column
        for i in range(1, len(headers)):
            second_column_name = headers[i]
            folder_name = second_column_name.strip().replace(' ', '_')
            folder_path = os.path.join(os.path.dirname(file_path), folder_name)
            os.makedirs(folder_path, exist_ok=True)
        
        # Iterate over each pair of columns and save them into corresponding folders
        for i in range(len(headers) - 1):
            first_column = headers[0]
            second_column = headers[i + 1]
            new_df = csv_df[[first_column, second_column]]
            
            # Get the folder name for the second column
            folder_name = second_column.strip().replace(' ', '_')
            folder_path = os.path.join(os.path.dirname(file_path), folder_name)
            
            # Get the base file name without extension
            base_file_name = os.path.splitext(os.path.basename(file_path))[0]
            
            # Create the new file path
            new_file_path = os.path.join(folder_path, f"{base_file_name}.csv")
            
            # Save the new DataFrame to a CSV file inside the folder
            new_df.to_csv(new_file_path, index=False)


  csv_df = pd.read_csv(file_path)
  csv_df = pd.read_csv(file_path)
  csv_df = pd.read_csv(file_path)
  csv_df = pd.read_csv(file_path)


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
5.6. Total Installed Capacity per Production Type
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Creating the files contents of the total installed capacity per production type for each one of the coutries modelled in Dispa-SET.
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- The source of the data belongs the ENTSO E portal web. The information is extracted from the tables of the corresponging ENTSO E country web page links.
</div>

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
5.6.1. Total Installed Capacity per Production Type Sources
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Getting and saving the links where the data of the total installed capacity per production type for each country is available.
</div>

In [30]:
# Read the CSV file
#standard_time_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv"
df = pd.read_csv(standard_time_data_file_path)

# Define the links
links = [
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10YAT-APG------L!CTY|10YAT-APG------L&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10YBE----------2!CTY|10YBE----------2&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10YCA-BULGARIA-R!CTY|10YCA-BULGARIA-R&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10YCH-SWISSGRIDZ!CTY|10YCH-SWISSGRIDZ&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10YCY-1001A0003J!CTY|10YCY-1001A0003J&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10YCZ-CEPS-----N!CTY|10YCZ-CEPS-----N&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10Y1001A1001A83F!CTY|10Y1001A1001A83F&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10Y1001A1001A65H!CTY|10Y1001A1001A65H&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10Y1001A1001A39I!CTY|10Y1001A1001A39I&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10YGR-HTSO-----Y!CTY|10YGR-HTSO-----Y&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10YES-REE------0!CTY|10YES-REE------0&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10YFI-1--------U!CTY|10YFI-1--------U&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10YFR-RTE------C!CTY|10YFR-RTE------C&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10YHR-HEP------M!CTY|10YHR-HEP------M&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10YHU-MAVIR----U!CTY|10YHU-MAVIR----U&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10YIE-1001A00010!CTY|10YIE-1001A00010&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10YIT-GRTN-----B!CTY|10YIT-GRTN-----B&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10YLT-1001A0008Q!CTY|10YLT-1001A0008Q&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10YLU-CEGEDEL-NQ!CTY|10YLU-CEGEDEL-NQ&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10YLV-1001A00074!CTY|10YLV-1001A00074&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10Y1001A1001A93C!CTY|10Y1001A1001A93C&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10YNL----------L!CTY|10YNL----------L&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10YNO-0--------C!CTY|10YNO-0--------C&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10YPL-AREA-----S!CTY|10YPL-AREA-----S&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10YPT-REN------W!CTY|10YPT-REN------W&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10YRO-TEL------P!CTY|10YRO-TEL------P&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10YSE-1--------K!CTY|10YSE-1--------K&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10YSI-ELES-----O!CTY|10YSI-ELES-----O&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10YSK-SEPS-----K!CTY|10YSK-SEPS-----K&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19",
    "https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10Y1001A1001A92E!CTY|10Y1001A1001A92E&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19"
]

# Add the links to the DataFrame
df['Production_Type_Total_Installed_Capacity'] = links

# Write the DataFrame back to the CSV file
df.to_csv(standard_time_data_file_path, index=False)

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Downloading the total installed capacity for each country in a new csv file.
</div>

In [31]:
# Set max_headers to a higher value
http.client._MAXHEADERS = 1000

# Function to download table data from a URL
def download_table_data(url):
    # Send a GET request to the webpage
    response = requests.get(url)

    # Raise an error if there are too many headers
    if len(response.headers) > 100:
        print("Too many headers in the response")
        return None

    # Parse the HTML content of the webpage
    soup = BeautifulSoup(response.text, 'html.parser')

    # Find the table element
    table = soup.find('table')

    # Extract data from the table
    data = []
    for row in table.find_all('tr'):
        row_data = []
        for cell in row.find_all('td'):
            row_data.append(cell.text.strip())
        if row_data:  # Ensures we don't add empty rows
            data.append(row_data)

    return data

# Open the standard_time_data_file_path and add a new column called Production_Type_Total_Installed_Capacity_File_Path
def process_standard_time_data(standard_time_data_file_path, data_year):
    # Read the standard time data CSV file
    with open(standard_time_data_file_path, 'r', newline='', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        fieldnames = reader.fieldnames + ['Production_Type_Total_Installed_Capacity_File_Path']
        rows = list(reader)

    # Process each row
    for row in rows:
        url = row['Production_Type_Total_Installed_Capacity']
        if url:
            print(f"Processing URL: {url}")
            table_data = download_table_data(url)
            if table_data:
                # Specify the CSV file name
                csv_file_name = f"{data_year}_Total_Installed_Capacity_per_Production_Type.csv"
                # Construct the full file path
                csv_file_path = os.path.join(row['Zone_Folder_Path'], csv_file_name)
                # Write the table data to CSV file
                with open(csv_file_path, 'w', newline='', encoding='utf-8') as csvfile:
                    writer = csv.writer(csvfile)
                    writer.writerows(table_data)
                print(f"Table data has been saved to {csv_file_path}.")
                # Update the CSV file with the file path
                row['Production_Type_Total_Installed_Capacity_File_Path'] = csv_file_path
            else:
                print("Failed to download table data.")

    # Rewrite the CSV file with the updated file paths for each URL
    with open(standard_time_data_file_path, 'w', newline='', encoding='utf-8') as csvfile:
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(rows)

# Specify the path to the CSV file containing the standard time data
#standard_time_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv"
# Specify the data year
#data_year = "2024"

# Process the standard time data
process_standard_time_data(standard_time_data_file_path, data_year)

Processing URL: https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show?name=&defaultValue=false&viewType=TABLE&areaType=CTY&atch=false&dateTime.dateTime=01.01.2015+00:00|UTC|YEAR&dateTime.endDateTime=01.01.2024+00:00|UTC|YEAR&area.values=CTY|10YAT-APG------L!CTY|10YAT-APG------L&productionType.values=B01&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19
Table data has been saved to /home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/AT/2023_Total_Installed_Capacity_per_Production_Type.csv.
Processing URL: https://

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Giving the corresponding headers to the columns of each downloaded file.
</div>

In [32]:
# Function to add headers to CSV files
def add_headers_to_csv(csv_file_path):
    # Specify the headers
    headers = ['Production_Type', '2015', '2016', '2017', '2018', '2019', '2020', '2021', '2022', '2023', '2024']
    
    # Read the existing data from the CSV file
    with open(csv_file_path, 'r', newline='', encoding='utf-8') as csvfile:
        reader = csv.reader(csvfile)
        rows = list(reader)

    # Add headers to the existing data
    rows.insert(0, headers)

    # Write the updated data back to the CSV file
    with open(csv_file_path, 'w', newline='', encoding='utf-8') as csvfile:
        writer = csv.writer(csvfile)
        writer.writerows(rows)

# Function to process the standard time data
def process_standard_time_data(standard_time_data_file_path):
    # Open the standard time data file
    with open(standard_time_data_file_path, 'r', newline='', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        
        # Process each row in the standard time data
        for row in reader:
            # Get the path of the CSV file
            csv_file_path = row['Production_Type_Total_Installed_Capacity_File_Path']
            
            # Add headers to the CSV file
            add_headers_to_csv(csv_file_path)

# Specify the path to the CSV file containing the standard time data
#standard_time_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv"

# Process the standard time data
process_standard_time_data(standard_time_data_file_path)

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Harmonizing the generation type column fields 
</div>

In [33]:
# Define the path to the CSV file containing the file paths
#standard_time_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv"

# Read the CSV file
df_paths = pd.read_csv(standard_time_data_file_path)

# Iterate over each row to process the files
for index, row in df_paths.iterrows():
    folder_path = row['Zone_Folder_Path']
    data_year = row['Data_Year']

    # Construct the file path
    file_name = f"{data_year}_Total_Installed_Capacity_per_Production_Type.csv"
    file_path = os.path.join(folder_path, file_name)

    # Read the CSV file
    df = pd.read_csv(file_path)

    # Replace ' ' with '_' and '/' with '-'
    df['Production_Type'] = df['Production_Type'].str.replace(' ', '_').str.replace('/', '-')

    # Save the modified DataFrame back to the original CSV file
    df.to_csv(file_path, index=False)

    # Print a confirmation message for each file
    print(f"Production types in {file_name} have been processed and saved back to the original file.")

Production types in 2023_Total_Installed_Capacity_per_Production_Type.csv have been processed and saved back to the original file.
Production types in 2023_Total_Installed_Capacity_per_Production_Type.csv have been processed and saved back to the original file.
Production types in 2023_Total_Installed_Capacity_per_Production_Type.csv have been processed and saved back to the original file.
Production types in 2023_Total_Installed_Capacity_per_Production_Type.csv have been processed and saved back to the original file.
Production types in 2023_Total_Installed_Capacity_per_Production_Type.csv have been processed and saved back to the original file.
Production types in 2023_Total_Installed_Capacity_per_Production_Type.csv have been processed and saved back to the original file.
Production types in 2023_Total_Installed_Capacity_per_Production_Type.csv have been processed and saved back to the original file.
Production types in 2023_Total_Installed_Capacity_per_Production_Type.csv have been

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Getting the paths to each total installed capacity file for each country.
</div>

In [34]:
# Define variables
#data_year = 2024
#standard_time_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv"

# Read the standard time data CSV file
standard_time_data = pd.read_csv(standard_time_data_file_path)

# Function to get the file path for each row
def get_csv_file_path(row):
    zone_folder_path = row['Zone_Folder_Path']
    csv_file_name = f"{data_year}_Total_Installed_Capacity_per_Production_Type.csv"
    csv_file_path = os.path.join(zone_folder_path, csv_file_name)
    return csv_file_path

# Add a new column 'Production_Type_Total_Installed_Capacity_File_Path' with file paths
standard_time_data['Production_Type_Total_Installed_Capacity_File_Path'] = standard_time_data.apply(get_csv_file_path, axis=1)

# Save the modified DataFrame back to the CSV file
standard_time_data.to_csv(standard_time_data_file_path, index=False)

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Fullfilling missing years to each total installed capacity file for each country.
</div>

In [35]:
# Define the path to the CSV file containing the file paths
#standard_time_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv"

# Read the CSV file
standard_time_data = pd.read_csv(standard_time_data_file_path)

# Iterate over each file path in the 'Production_Type_Total_Installed_Capacity_file_path' column
for index, row in standard_time_data.iterrows():
    # Get the file path
    csv_file_path = row['Production_Type_Total_Installed_Capacity_File_Path']
    
    # Read the CSV file
    df = pd.read_csv(csv_file_path)

    # Iterate over columns from '2015' to '2024'
    for column in df.columns[1:]:
        # Iterate over rows in the column
        for index, value in df[column].items():
            # Check if the value is non-numeric or empty
            if str(value).strip() in ('n/e', 'N/A', '', '0'):
                # Get the corresponding value from the previous column
                prev_value = df.at[index, df.columns[df.columns.get_loc(column) - 1]]
                # Check if the previous value is numeric
                if str(prev_value).strip().replace('.', '').isdigit():
                    # Copy the previous value to the current field
                    df.at[index, column] = prev_value

    # Save the modified DataFrame back to the CSV file
    df.to_csv(csv_file_path, index=False)

# Print a completion message
print("Missing or non-numeric values in the CSV files have been filled based on the corresponding values from the previous columns where applicable.")


Missing or non-numeric values in the CSV files have been filled based on the corresponding values from the previous columns where applicable.


  df.at[index, column] = prev_value
  df.at[index, column] = prev_value


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
5. Availability Factor File
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Getting the time series value per technology type.
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- All the raw data files  of generation and already separated by technology type is divided by the total installed capacity of the corresponding technology.
</div>

In [36]:
# Define variables
# Convert data_year to a string
data_year = str(data_year)
#standard_time_data_file_path = '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv'

# Read the CSV file containing file paths
df_production_type = pd.read_csv(standard_time_data_file_path)

# Iterate over each row in the DataFrame
for index, row in df_production_type.iterrows():
    # Extract the file path for the current production type
    file_path = row['Production_Type_Total_Installed_Capacity_File_Path']
    
    # Read the CSV file
    df = pd.read_csv(file_path)

    # Convert 'data_year' column to numeric type
    df[data_year] = pd.to_numeric(df[data_year], errors='coerce').fillna(0)

    # Iterate over each row
    for index, row in df.iterrows():
        value = row[data_year]
        production_type = row['Production_Type']

        # Find the corresponding folder
        folder_name = production_type.replace(' ', '_')
        aggregated_folder = os.path.join(os.path.dirname(file_path), f"{folder_name}_Actual_Aggregated")
        consumption_folder = os.path.join(os.path.dirname(file_path), f"{folder_name}_Actual_Consumption")

        # Process files in aggregated folder
        if os.path.exists(aggregated_folder):
            for filename in os.listdir(aggregated_folder):
                if filename.endswith('.csv'):
                    file_path_aggregated = os.path.join(aggregated_folder, filename)
                    df_aggregated = pd.read_csv(file_path_aggregated)

                    # Convert second column to numeric type
                    df_aggregated.iloc[:, 1] = pd.to_numeric(df_aggregated.iloc[:, 1], errors='coerce').fillna(0)

                    # Check if value is not 'n/e', 'N/A', 0, or empty
                    if value not in ['n/e', 'N/A', 0, '']:
                        # Perform division operation
                        if value != 0:
                            df_aggregated.iloc[:, 1] /= value
                        else:
                            # Handle division by zero
                            pass
                    else:
                        # Fill second column with 0
                        df_aggregated.iloc[:, 1] = 0

                    new_file_name = f"Availabilty_Factor_{filename}"
                    df_aggregated.to_csv(os.path.join(aggregated_folder, new_file_name), index=False)

        # Process files in consumption folder
        if os.path.exists(consumption_folder):
            for filename in os.listdir(consumption_folder):
                if filename.endswith('.csv'):
                    file_path_consumption = os.path.join(consumption_folder, filename)
                    df_consumption = pd.read_csv(file_path_consumption)

                    # Convert second column to numeric type
                    df_consumption.iloc[:, 1] = pd.to_numeric(df_consumption.iloc[:, 1], errors='coerce').fillna(0)

                    # Check if value is not 'n/e', 'N/A', 0, or empty
                    if value not in ['n/e', 'N/A', 0, '']:
                        # Perform division operation
                        if value != 0:
                            df_consumption.iloc[:, 1] /= value
                        else:
                            # Handle division by zero
                            pass
                    else:
                        # Fill second column with 0
                        df_consumption.iloc[:, 1] = 0

                    new_file_name = f"Availabilty_Factor_{filename}"
                    df_consumption.to_csv(os.path.join(consumption_folder, new_file_name), index=False)

1        0.0
2        0.0
3        0.0
4        0.0
        ... 
35035    0.0
35036    0.0
35037    0.0
35038    0.0
35039    0.0
Name: Fossil_Hard_coal_Actual_Aggregated, Length: 35040, dtype: float64' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  df_aggregated.iloc[:, 1] /= value
1       0.0
2       0.0
3       0.0
4       0.0
       ... 
8755    0.0
8756    0.0
8757    0.0
8758    0.0
8759    0.0
Name: Fossil_Hard_coal_Actual_Aggregated, Length: 8760, dtype: float64' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  df_aggregated.iloc[:, 1] /= value
1        0.0
2        0.0
3        0.0
4        0.0
        ... 
17515    0.0
17516    0.0
17517    0.0
17518    0.0
17519    0.0
Name: Fossil_Hard_coal_Actual_Aggregated, Length: 17520, dtype: float64' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  df_aggregated.iloc[:, 1] /= value
1        0.0
2        0.0
3      

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Creating the subfolders and the csv files that will content the final data.
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- Into each country folder, it will be created a subfolder named under the corresponding time step of the data. e.g. '1h', and '15min' or '30min' being possible that some countries has the 3 files or just 1 or 2. This is due to the availability of the data in the sources.
</div>

In [44]:
# Define variables
#standard_time_data_file_path = '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv'
#data_year = '2023'
path_columns = ['Zone_Folder_Path', 'Biomass_Actual_Aggregated', 'Biomass_Actual_Consumption', 'Fossil_Brown_coal-Lignite_Actual_Aggregated',
                'Fossil_Brown_coal-Lignite_Actual_Consumption', 'Fossil_Coal-derived_gas_Actual_Aggregated', 'Fossil_Coal-derived_gas_Actual_Consumption',
                'Fossil_Gas_Actual_Aggregated',	'Fossil_Gas_Actual_Consumption', 'Fossil_Hard_coal_Actual_Aggregated', 'Fossil_Hard_coal_Actual_Consumption',
                'Fossil_Oil_Actual_Aggregated',	'Fossil_Oil_Actual_Consumption', 'Fossil_Oil_shale_Actual_Aggregated', 'Fossil_Oil_shale_Actual_Consumption',
                'Fossil_Peat_Actual_Aggregated', 'Fossil_Peat_Actual_Consumption', 'Geothermal_Actual_Aggregated', 'Geothermal_Actual_Consumption', 
                'Hydro_Pumped_Storage_Actual_Aggregated', 'Hydro_Pumped_Storage_Actual_Consumption', 'Hydro_Run-of-river_and_poundage_Actual_Aggregated',
                'Hydro_Run-of-river_and_poundage_Actual_Consumption', 'Hydro_Water_Reservoir_Actual_Aggregated', 'Hydro_Water_Reservoir_Actual_Consumption',
                'Marine_Actual_Aggregated',	'Marine_Actual_Consumption', 'Nuclear_Actual_Aggregated', 'Nuclear_Actual_Consumption', 'Other_Actual_Aggregated',
                'Other_Actual_Consumption',	'Other_renewable_Actual_Aggregated', 'Other_renewable_Actual_Consumption', 'Solar_Actual_Aggregated',
                'Solar_Actual_Consumption',	'Waste_Actual_Aggregated', 'Waste_Actual_Consumption', 'Wind_Offshore_Actual_Aggregated', 'Wind_Offshore_Actual_Consumption',
                'Wind_Onshore_Actual_Aggregated', 'Wind_Onshore_Actual_Consumption'
    
]  # Replace with your column names

# Read the standard time data CSV file
standard_time_data = pd.read_csv(standard_time_data_file_path)

# Iterate over each path column
for column in path_columns:
    # Iterate over each row in the DataFrame
    for index, row in standard_time_data.iterrows():
        # Extract the folder path
        folder_path = row[column]
        
        # Check if the folder exists
        if os.path.exists(folder_path):
            # Count the number of existing CSV files in the folder
            existing_files = []
            for time_interval in ['1h', '15min', '30min']:
                file_name = f"{data_year}_1_{time_interval}.csv"
                file_path = os.path.join(folder_path, file_name)
                if os.path.exists(file_path):
                    existing_files.append(time_interval)
            
            # Create subfolders based on the number of existing files
            for time_interval in existing_files:
                subfolder_path = os.path.join(folder_path, time_interval)
                os.makedirs(subfolder_path, exist_ok=True)
                
                # Create a CSV file inside the subfolder with the name of data_year
                csv_file_path = os.path.join(subfolder_path, f"{data_year}.csv")
                with open(csv_file_path, 'w') as f:
                    f.write("This is a sample CSV file.")
                    # Print a message indicating the creation of the file
                print(f"File '{data_year}.csv' created in '{subfolder_path}'")

File '2023.csv' created in '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/AT/1h'
File '2023.csv' created in '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/AT/15min'
File '2023.csv' created in '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/AT/30min'
File '2023.csv' created in '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/BE/1h'
File '2023.csv' created in '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/BG/1h'
File '2023.csv' created in '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/CH/1h'
File '2023.csv' created in '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/CY/1h'
File '2023.csv' created in '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/CY/30min'
File '2023.csv' created in '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/CZ/1h'
File '2023.csv' created in '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/DE/1h'
File '2023.csv' created in '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/DE/15min

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Giving the correspondind headers to the csv files that will content the final data.
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- The Availability Factor Time Series have a determined header according the renewable techology type (HROR, PHOT, WTON, WOTF) , those are used to be read by Dispa-SET.
</div>

In [75]:
def replace_headers(data_year, standard_time_data_file_path):
    """
    Replace headers in CSV files based on specified conditions.

    Args:
        data_year (str): The year string.
        standard_time_data_file_path (str): Path to the standard time data CSV file.
    """
    # Read the standard time data CSV file
    standard_time_data = pd.read_csv(standard_time_data_file_path)

    # Define header mappings
    header_mapping = {
        'Hydro_Run-of-river_and_poundage_Actual_Aggregated': ['', 'HROR'],
        'Solar_Actual_Aggregated': ['', 'PHOT'],
        'Wind_Offshore_Actual_Aggregated': ['', 'WTOF'],
        'Wind_Onshore_Actual_Aggregated': ['', 'WTON']
    }

    # Iterate over each column in the dataframe
    for column in standard_time_data.columns:
        if column in header_mapping:
            # Iterate over each folder path in the column
            for path in standard_time_data[column]:
                # Iterate over each subfolder
                for subfolder in ['1h', '15min', '30min']:
                    # Construct the path to the CSV file
                    csv_file_path = os.path.join(path, subfolder, f'{data_year}.csv')
                    
                    # Check if the CSV file exists
                    if os.path.exists(csv_file_path):
                        # Read the CSV file
                        df = pd.read_csv(csv_file_path)
                        
                        # Replace headers
                        headers = header_mapping[column]
                        new_headers = headers[:2] + list(df.columns[2:])
                        df.columns = new_headers
                        
                        # Write back to the CSV file
                        df.to_csv(csv_file_path, index=False)
                        
                        print(f"Headers replaced for '{csv_file_path}'")

# Example usage
#data_year = '2023'
#standard_time_data_file_path = '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv'
replace_headers(data_year, standard_time_data_file_path)


Headers replaced for '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/AT/Hydro_Run-of-river_and_poundage_Actual_Aggregated/1h/2023.csv'
Headers replaced for '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/AT/Hydro_Run-of-river_and_poundage_Actual_Aggregated/15min/2023.csv'
Headers replaced for '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/AT/Hydro_Run-of-river_and_poundage_Actual_Aggregated/30min/2023.csv'
Headers replaced for '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/BE/Hydro_Run-of-river_and_poundage_Actual_Aggregated/1h/2023.csv'
Headers replaced for '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/BG/Hydro_Run-of-river_and_poundage_Actual_Aggregated/1h/2023.csv'
Headers replaced for '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/CH/Hydro_Run-of-river_and_poundage_Actual_Aggregated/1h/2023.csv'
Headers replaced for '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/CY/Hydro_Run-of-river_and_poundage_Actual_Aggregated/1h/202

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Getting the final Availability Factor Time Series file.
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- All the data is copied to the single csv file in a sub-folder with the time stept inside the country folder (AT, BE, CH... etc) .
</div>

In [95]:
#data_year = "2023"
#standard_time_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv"

# Read the CSV file to get the source file paths
df_standard_time = pd.read_csv(standard_time_data_file_path)

# Iterate over each row of the specified columns
for index, row in df_standard_time.iterrows():
    # Extract the source file paths from the current row
    file_path_1 = row['Zone_Folder_Path']
    file_path_2 = row['Wind_Onshore_Actual_Aggregated']
    file_path_3 = row['Wind_Offshore_Actual_Aggregated']
    file_path_4 = row['Solar_Actual_Aggregated']
    file_path_5 = row['Hydro_Run-of-river_and_poundage_Actual_Aggregated']
    
    # Define the subfolders to iterate over
    subfolders = ['1h', '15min', '30min']
    
    for subfolder in subfolders:
        # Construct the file paths for each subfolder
        csv_file_path_1 = os.path.join(file_path_1, subfolder, f"{data_year}.csv")
        csv_file_path_2 = os.path.join(file_path_2, subfolder, f"{data_year}.csv")
        csv_file_path_3 = os.path.join(file_path_3, subfolder, f"{data_year}.csv")
        csv_file_path_4 = os.path.join(file_path_4, subfolder, f"{data_year}.csv")
        csv_file_path_5 = os.path.join(file_path_5, subfolder, f"{data_year}.csv")
        
        # Check if CSV files exist in file_path_2, file_path_3, file_path_4, and file_path_5
        if os.path.exists(csv_file_path_2) and os.path.exists(csv_file_path_3) \
            and os.path.exists(csv_file_path_4) and os.path.exists(csv_file_path_5):
            
            # Read the contents of the CSV files
            df_2 = pd.read_csv(csv_file_path_2)
            df_3 = pd.read_csv(csv_file_path_3)
            df_4 = pd.read_csv(csv_file_path_4)
            df_5 = pd.read_csv(csv_file_path_5)
            
            # Combine the contents of the CSV files
            combined_df = pd.concat([df_2.iloc[:, :2], df_3.iloc[:, 1], df_4.iloc[:, 1], df_5.iloc[:, 1]], axis=1)
            
            # Delete the destination CSV file if it exists
            if os.path.exists(csv_file_path_1):
                os.remove(csv_file_path_1)
            
            # Write the combined DataFrame to the CSV file in file_path_1
            combined_df.to_csv(csv_file_path_1, index=False)
            
            print(f"Contents from CSV files written to '{csv_file_path_1}'")
        else:
            print(f"One or more CSV files do not exist at the specified paths")

Contents from CSV files written to '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/AT/1h/2023.csv'
Contents from CSV files written to '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/AT/15min/2023.csv'
Contents from CSV files written to '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/AT/30min/2023.csv'
Contents from CSV files written to '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/BE/1h/2023.csv'
One or more CSV files do not exist at the specified paths
One or more CSV files do not exist at the specified paths
Contents from CSV files written to '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/BG/1h/2023.csv'
One or more CSV files do not exist at the specified paths
One or more CSV files do not exist at the specified paths
Contents from CSV files written to '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/CH/1h/2023.csv'
One or more CSV files do not exist at the specified paths
One or more CSV files do not exist at the specified paths
Cont

In [99]:
# Define the path to the CSV file
#standard_time_data_file_path = '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv'

# Read the CSV file
standard_time_data = pd.read_csv(standard_time_data_file_path)

# Assuming data_year is already defined
#data_year = "2023"

# Columns to check
columns_to_check = ['WTON', 'WTOF', 'PHOT', 'HROR']

# List of subfolders to check
subfolders = ['1h', '15min', '30min']

# Iterate over each row in the DataFrame
for index, row in standard_time_data.iterrows():
    # Get the folder path from the 'Zone_Folder_Path' column
    folder_path = row['Zone_Folder_Path']
    
    # Iterate over subfolders
    for subfolder in subfolders:
        csv_file_path = os.path.join(folder_path, subfolder, f"{data_year}.csv")
        
        # Check if CSV file exists
        if os.path.exists(csv_file_path):
            # Read the CSV file
            df = pd.read_csv(csv_file_path)
            
            # Iterate over specified columns
            for column in columns_to_check:
                # Convert column values to numeric
                df[column] = pd.to_numeric(df[column], errors='coerce')
                
                # Check if the value is greater than 1
                df.loc[df[column] > 1, column] = 1
            
            # Write the modified DataFrame back to the CSV file
            df.to_csv(csv_file_path, index=False)
            
            print(f"Values in CSV file '{csv_file_path}' modified successfully")
        else:
            print(f"CSV file '{csv_file_path}' not found")

Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/AT/1h/2023.csv' modified successfully
Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/AT/15min/2023.csv' modified successfully
Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/AT/30min/2023.csv' modified successfully
Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/BE/1h/2023.csv' modified successfully
CSV file '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/BE/15min/2023.csv' not found
CSV file '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/BE/30min/2023.csv' not found
Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/BG/1h/2023.csv' modified successfully
CSV file '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/BG/15min/2023.csv' not found
CSV file '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/BG/30min/2023.csv' not found
Values in CSV file '/home/ray/Dispa-SET_Unleash/RawDa

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
The final Availability Factor Time Series is for each Country modeled in Dispa-SET is located in the following local directory:
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- /Local/Path/to/Dispas-SET/RawData/AvailabiltyFactors/
<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Inside this path there are folders with the acronym of each country modelled in Dipsa-SET. i.e. AT, BE, CH.... UK
<br>
Inside each of this folders, there are sub folders named by the time stept of the time series. i.e. 1h, 30min and/or 15min.
<br>
Inside these sub folders, it is going to be found the corresponding time series .csv file named with the year of the data. e.g. 2023.csv
</div>

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
6. Scalled Inflows File
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Creating the content folders for the scalled inflow data for each country.
</div>

In [195]:
# Define paths and variables
#additional_path = "/RawData/AvailabiltyFactors/"
#additional_path_1 = "/HydroData/ScaledInflows/"
#scaled_inflows_folder_path = dispaSET_unleash_folder_path + additional_path_1
#availability_factors_folder_path = "/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/"
#standard_time_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv"

# Read the standard time data file
standard_time_data = pd.read_csv(standard_time_data_file_path)

# Create a new column 'Scalled_Inflow_Folder_Path' with default values
standard_time_data['Scalled_Inflow_Folder_Path'] = ''

# Create a new column 'Availability_Factors_Folder_Path' with default values
standard_time_data['Availability_Factors_Folder_Path'] = ''

# Iterate over each row in the DataFrame
for index, row in standard_time_data.iterrows():
    # Get the value of the 'Dispa-SET_Code' column
    dispa_set_code = row['Dispa-SET_Code']
    
    # Create the scaled inflows folder path
    scaled_inflows_path = os.path.join(scaled_inflows_folder_path, dispa_set_code)
    
    # Create the availability factors folder path
    availability_factors_path = os.path.join(availability_factors_folder_path, dispa_set_code)
    
    # Create the folder for scaled inflows if it doesn't exist
    if not os.path.exists(scaled_inflows_path):
        os.makedirs(scaled_inflows_path)
        print(f"Scaled inflows folder created: {scaled_inflows_path}")
    
    # Update the 'Scalled_Inflow_Folder_Path' column with the folder path
    standard_time_data.at[index, 'Scalled_Inflow_Folder_Path'] = scaled_inflows_path
    
    # Update the 'Availability_Factors_Folder_Path' column with the availability factors path
    standard_time_data.at[index, 'Availability_Factors_Folder_Path'] = availability_factors_path

# Save the modified DataFrame back to the CSV file
standard_time_data.to_csv(standard_time_data_file_path, index=False)

print("Folder paths added to the CSV file.")

Folder paths added to the CSV file.


In [197]:
# Define the standard_time_data_file_path
#standard_time_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv"

# Read the CSV file into a DataFrame
df = pd.read_csv(standard_time_data_file_path)

# Function to extract folders from a path
def extract_folders(path):
    if pd.notnull(path):
        return set(os.listdir(path))
    else:
        return set()

# Function to create folders for each path in a given column
def create_folders_for_paths(row):
    availability_folders = extract_folders(row['Availability_Factors_Folder_Path'])
    for folder in ['1h', '15min', '30min']:
        if folder in availability_folders:
            folder_path = os.path.join(row['Scalled_Inflow_Folder_Path'], folder)
            if not os.path.exists(folder_path):
                os.makedirs(folder_path)
                print(f"Created folder: {folder_path}")

# Apply the function to each row
df.apply(create_folders_for_paths, axis=1)

Created folder: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/AT/1h
Created folder: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/AT/15min
Created folder: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/AT/30min
Created folder: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/BE/1h
Created folder: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/BG/1h
Created folder: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/CH/1h
Created folder: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/CY/1h
Created folder: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/CY/30min
Created folder: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/CZ/1h
Created folder: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/DE/1h
Created folder: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/DE/15min
Created folder: /home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/DE/30min
C

0     None
1     None
2     None
3     None
4     None
5     None
6     None
7     None
8     None
9     None
10    None
11    None
12    None
13    None
14    None
15    None
16    None
17    None
18    None
19    None
20    None
21    None
22    None
23    None
24    None
25    None
26    None
27    None
28    None
29    None
dtype: object

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Giving the correspondind headers to the csv files that will content the final data.
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- The Scalled Inflows Time Series have a determined header according the techology type (HPHS, HDAM), those are used to be read by Dispa-SET.
</div>

In [191]:
def replace_headers(data_year, standard_time_data_file_path):
    """
    Replace headers in CSV files based on specified conditions.

    Args:
        data_year (str): The year string.
        standard_time_data_file_path (str): Path to the standard time data CSV file.
    """
    # Read the standard time data CSV file
    standard_time_data = pd.read_csv(standard_time_data_file_path)

    # Define header mappings
    header_mapping = {
        'Hydro_Pumped_Storage_Actual_Aggregated': ['', 'HPHS'],
        'Hydro_Water_Reservoir_Actual_Aggregated': ['', 'HDAM']
    }

    # Iterate over each column in the dataframe
    for column in standard_time_data.columns:
        if column in header_mapping:
            # Iterate over each folder path in the column
            for path in standard_time_data[column]:
                # Iterate over each subfolder
                for subfolder in ['1h', '15min', '30min']:
                    # Construct the path to the CSV file
                    csv_file_path = os.path.join(path, subfolder, f'{data_year}.csv')
                    
                    # Check if the CSV file exists
                    if os.path.exists(csv_file_path):
                        # Read the CSV file
                        df = pd.read_csv(csv_file_path)
                        
                        # Replace headers
                        headers = header_mapping[column]
                        new_headers = headers[:2] + list(df.columns[2:])
                        df.columns = new_headers
                        
                        # Write back to the CSV file
                        df.to_csv(csv_file_path, index=False)
                        
                        print(f"Headers replaced for '{csv_file_path}'")

# Example usage
#data_year = '2023'
#standard_time_data_file_path = '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv'
replace_headers(data_year, standard_time_data_file_path)


Headers replaced for '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/AT/Hydro_Pumped_Storage_Actual_Aggregated/1h/2023.csv'
Headers replaced for '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/AT/Hydro_Pumped_Storage_Actual_Aggregated/15min/2023.csv'
Headers replaced for '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/AT/Hydro_Pumped_Storage_Actual_Aggregated/30min/2023.csv'
Headers replaced for '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/BE/Hydro_Pumped_Storage_Actual_Aggregated/1h/2023.csv'
Headers replaced for '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/BG/Hydro_Pumped_Storage_Actual_Aggregated/1h/2023.csv'
Headers replaced for '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/CH/Hydro_Pumped_Storage_Actual_Aggregated/1h/2023.csv'
Headers replaced for '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/CY/Hydro_Pumped_Storage_Actual_Aggregated/1h/2023.csv'
Headers replaced for '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyF

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Getting the final Scalled Inflows Time Series file.
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- All the data is copied to the single csv file in a sub-folder with the time stept inside the country folder (AT, BE, CH... etc) .
</div>

In [199]:
#data_year = "2023"
#standard_time_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv"

# Read the CSV file to get the source file paths
df_standard_time = pd.read_csv(standard_time_data_file_path)

# Iterate over each row of the specified columns
for index, row in df_standard_time.iterrows():
    # Extract the source file paths from the current row
    file_path_1 = row['Scalled_Inflow_Folder_Path']
    file_path_2 = row['Hydro_Pumped_Storage_Actual_Aggregated']
    file_path_3 = row['Hydro_Water_Reservoir_Actual_Aggregated']
    
    # Define the subfolders to iterate over
    subfolders = ['1h', '15min', '30min']
    
    for subfolder in subfolders:
        # Construct the file paths for each subfolder
        csv_file_path_1 = os.path.join(file_path_1, subfolder, f"{data_year}.csv")
        csv_file_path_2 = os.path.join(file_path_2, subfolder, f"{data_year}.csv")
        csv_file_path_3 = os.path.join(file_path_3, subfolder, f"{data_year}.csv")
        
        # Check if CSV files exist in file_path_2, file_path_3
        if os.path.exists(csv_file_path_2) and os.path.exists(csv_file_path_3):
            
            # Read the contents of the CSV files
            df_2 = pd.read_csv(csv_file_path_2)
            df_3 = pd.read_csv(csv_file_path_3)
           
            
            # Combine the contents of the CSV files
            combined_df = pd.concat([df_2.iloc[:, :2], df_3.iloc[:, 1]], axis=1)
            
            # Delete the destination CSV file if it exists
            if os.path.exists(csv_file_path_1):
                os.remove(csv_file_path_1)
            
            # Write the combined DataFrame to the CSV file in file_path_1
            combined_df.to_csv(csv_file_path_1, index=False)
            
            print(f"Contents from CSV files written to '{csv_file_path_1}'")
        else:
            print(f"One or more CSV files do not exist at the specified paths")

Contents from CSV files written to '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/AT/1h/2023.csv'
Contents from CSV files written to '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/AT/15min/2023.csv'
Contents from CSV files written to '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/AT/30min/2023.csv'
Contents from CSV files written to '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/BE/1h/2023.csv'
One or more CSV files do not exist at the specified paths
One or more CSV files do not exist at the specified paths
Contents from CSV files written to '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/BG/1h/2023.csv'
One or more CSV files do not exist at the specified paths
One or more CSV files do not exist at the specified paths
Contents from CSV files written to '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/CH/1h/2023.csv'
One or more CSV files do not exist at the specified paths
One or more CSV files do not exi

In [200]:
# Define the path to the CSV file
#standard_time_data_file_path = '/home/ray/Dispa-SET_Unleash/RawData/AvailabiltyFactors/Standard_Time_Data.csv'

# Read the CSV file
standard_time_data = pd.read_csv(standard_time_data_file_path)

# Assuming data_year is already defined
#data_year = "2023"

# Columns to check
columns_to_check = ['HPHS', 'HDAM']

# List of subfolders to check
subfolders = ['1h', '15min', '30min']

# Iterate over each row in the DataFrame
for index, row in standard_time_data.iterrows():
    # Get the folder path from the 'Zone_Folder_Path' column
    folder_path = row['Scalled_Inflow_Folder_Path']
    
    # Iterate over subfolders
    for subfolder in subfolders:
        csv_file_path = os.path.join(folder_path, subfolder, f"{data_year}.csv")
        
        # Check if CSV file exists
        if os.path.exists(csv_file_path):
            # Read the CSV file
            df = pd.read_csv(csv_file_path)
            
            # Iterate over specified columns
            for column in columns_to_check:
                # Convert column values to numeric
                df[column] = pd.to_numeric(df[column], errors='coerce')
                
                # Check if the value is greater than 1
                df.loc[df[column] > 1, column] = 1
            
            # Write the modified DataFrame back to the CSV file
            df.to_csv(csv_file_path, index=False)
            
            print(f"Values in CSV file '{csv_file_path}' modified successfully")
        else:
            print(f"CSV file '{csv_file_path}' not found")

Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/AT/1h/2023.csv' modified successfully
Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/AT/15min/2023.csv' modified successfully
Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/AT/30min/2023.csv' modified successfully
Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/BE/1h/2023.csv' modified successfully
CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/BE/15min/2023.csv' not found
CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/BE/30min/2023.csv' not found
Values in CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/BG/1h/2023.csv' modified successfully
CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/BG/15min/2023.csv' not found
CSV file '/home/ray/Dispa-SET_Unleash/RawData/HydroData/ScaledInflows/BG/30min/2023.csv' not found
Values i

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
The final Scalled Inflows Time Series is for each Country modeled in Dispa-SET is located in the following local directory:
</div>
<div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
- /Local/Path/to/Dispas-SET/RawData/HydroData/ScaledInflows/
<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
Inside this path there are folders with the acronym of each country modelled in Dipsa-SET. i.e. AT, BE, CH.... UK
<br>
Inside each of this folders, there are sub folders named by the time stept of the time series. i.e. 1h, 30min and/or 15min.
<br>
Inside these sub folders, it is going to be found the corresponding time series .csv file named with the year of the data. e.g. 2023.csv
</div>