<div style="text-align: center; margin-left: 0em; font-weight: bold; font-size: 20px; font-family: TimesNewRoman;">
    POWER PLANTS DATA PROCESSING - Main Notebook
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Each part of the following script was used to proccess the raw data for power plants units of the Dispa-SET_Unleash project.
    <br>
    Read explanation text cells to follow and understand all the process until final results were got stept by step.
</div>

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    1. Notebook Set Up
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Importing needed libraries
</div>

In [1]:
import os
import csv
from datetime import datetime
import requests
import pandas as pd
from shutil import move
import numpy as np
import shutil

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    2. Dispa-SET_Unleash Folder Path
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Determinning dynamically the zone_folder_path based on the location of the "Dispa-SET_Unleash" folder relative to the current working directory. 
<br>
    If the "Dispa-SET_Unleash" folder is copied to a different machine or location, the dispaSET_unleash_folder_path variable will automatically adjust accordingly.
</div>

In [2]:
# Get the current working directory
current_directory = os.getcwd()

# Navigate to the parent directory of "Dispa-SET_Unleash"
dispaSET_unleash_parent_directory = os.path.dirname(current_directory)

# Get the path to the "Dispa-SET_Unleash" folder
dispaSET_unleash_folder_path = os.path.dirname(dispaSET_unleash_parent_directory)

# Construct the dispaSET_unleash_folder_name variable
dispaSET_unleash_folder_name = os.path.basename(dispaSET_unleash_folder_path)

print("dispaSET_unleash_folder_name:", dispaSET_unleash_folder_name)
print("dispaSET_unleash_folder_path:", dispaSET_unleash_folder_path)

dispaSET_unleash_folder_name: Dispa-SET_Unleash
dispaSET_unleash_folder_path: /home/ray/Dispa-SET_Unleash


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    3. Zone(s) Creation
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Entering the zone name or names (in case of more than one zone wanted to be modelled) to create the folder where all data related to the corresponding zone are going to be storage
</div>
<div style="text-align: left; margin-left: 1.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    For European country names use the ISO 3166-1 standars i.e. AT, BE, BG, CH.... etc. to give the zone_name.
<br>
    For non European countries it would rather to call the zone_name with the same word of how it is defined in the data to be downloaded and processed. e.g. 
</div>
    <div style="text-align: left; margin-left: 2.00em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
        If it is downloading a csv file with all power plants of Spain but just the units fo Pamplona city are wanted, and in the corresponding donwloaded file, Pamplona is refering with the acronim "PMPLN"; set the zone_name variable with the word "PMPLN".
</div>

In [3]:
# List of folder names to create
zone_names = [
"AT",
"BE",
"BG",
"CH",
"CY",
"CZ",
"DE",
"DK",
"EE",
"EL",
"ES",
"FI",
"FR",
"HR",
"HU",
"IE",
"IT",
"LT",
"LU",
"LV",
"MT",
"NL",
"NO",
"PL",
"PT",
"RO",
"SE",
"SI",
"SK",
"UK"
]

In [4]:
# Original value of dispaSET_unleash_folder_path
#dispaSET_unleash_folder_path = "/home/ray/Dispa-SET_Unleash"

# Additional string to be appended
additional_path = "/RawData/PowerPlants/"

# Construct the power_plants_raw_data_folder_path variable
power_plants_raw_data_folder_path = dispaSET_unleash_folder_path + additional_path
print("power_plants_raw_data_folder_path:", power_plants_raw_data_folder_path)

# Dictionary to store created zone paths
created_zones = {}

# Create the zone
for zone_name in zone_names:
    zone_path = os.path.join(power_plants_raw_data_folder_path, zone_name)
    os.makedirs(zone_path, exist_ok=True)
    created_zones[zone_name] = zone_path
    print(f"Created zone: {zone_path}")

# Print the created zone paths
print("Created zones:")
for zone_name, zone_path in created_zones.items():
    print(f"{zone_name}: {zone_path}")
    
created_zones

power_plants_raw_data_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/
Created zone: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/AT
Created zone: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE
Created zone: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BG
Created zone: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH
Created zone: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CY
Created zone: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CZ
Created zone: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE
Created zone: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK
Created zone: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/EE
Created zone: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/EL
Created zone: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/ES
Created zone: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/FI
Created zone: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/FR
Created zone: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/HR
Create

{'AT': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/AT',
 'BE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE',
 'BG': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BG',
 'CH': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH',
 'CY': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CY',
 'CZ': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CZ',
 'DE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE',
 'DK': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK',
 'EE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/EE',
 'EL': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/EL',
 'ES': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/ES',
 'FI': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/FI',
 'FR': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/FR',
 'HR': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/HR',
 'HU': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/HU',
 'IE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/IE',
 'IT': '/home/ray/Dispa-

<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [5]:
print (f"dispaSET_unleash_folder_name:                              {dispaSET_unleash_folder_name}")
print (f"dispaSET_unleash_folder_path:                              {dispaSET_unleash_folder_path}")
print (f"power_plants_raw_data_folder_path:                         {power_plants_raw_data_folder_path}")
print (f"zone_names:                                                {zone_names}")
print (f"created_zones:                                             {created_zones}")

dispaSET_unleash_folder_name:                              Dispa-SET_Unleash
dispaSET_unleash_folder_path:                              /home/ray/Dispa-SET_Unleash
power_plants_raw_data_folder_path:                         /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/
zone_names:                                                ['AT', 'BE', 'BG', 'CH', 'CY', 'CZ', 'DE', 'DK', 'EE', 'EL', 'ES', 'FI', 'FR', 'HR', 'HU', 'IE', 'IT', 'LT', 'LU', 'LV', 'MT', 'NL', 'NO', 'PL', 'PT', 'RO', 'SE', 'SI', 'SK', 'UK']
created_zones:                                             {'AT': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/AT', 'BE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE', 'BG': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BG', 'CH': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH', 'CY': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CY', 'CZ': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CZ', 'DE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE', 'DK': '/

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    4. Raw Data Sources Path
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Entering all the files paths where the raw data is content.
     <br>
        If a set of files where all the all power plants raw data is content, put them all in the local folder which path is:
    <div style="text-align: left; margin-left: 1.0em; font-weight: unbold; font-size: 15px; font-family: TimesNewRoman;">
/Local/Dispa_Set/Path/Dispa-SET_Unleash/RawData/PowerPlants/EU_Power_Units_Raw_Data_Source/
        </div>
     Order them adding a number between pharentesys (), this ordernation is going to be saved by the variable sources_list. 
      <br>
        That list is going to be saved to be used as input for next stages.
      <br>
</div>
    <div style="text-align: left; margin-left: 1.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
        Notice that to process the data all the source files have to be .csv files.
</div>
    <div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    In the other hand, it is important to define which zone is refering the file source.
</div>
    <div style="text-align: left; margin-left: 1.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    If all the source file contents data that belongs to only one zone, epecify it in the variable download_links_zone_related applying the same order than the variable sources_list.
    <br>
    If the source file contents data that refers different zones at the same time, specify it with the word "General" in the variable download_links_zone_related using the same order than the variable sources_list.
</div>
    <div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    Remember that the next filtering stages depend on the correct setting of this step.
</div>
    <div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Additionally indicate the year of all data is referring to.
</div>
    <div style="text-align: left; margin-left: 1.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    This is going to be used as the name root under which all next files are going to be created.
</div>

In [6]:
# Year to which data refers to:
data_year = '2020'

In [7]:
# Additional string to be appended
#additional_path_1 = "/RawData/PowerPlants/EU_Power_Units_Raw_Data_Source/"

# Construct the power_plants_raw_data_folder_path variable
#power_plants_raw_data_sources_folder_path = dispaSET_unleash_folder_path + additional_path_1
#print("power_plants_raw_data_sources_folder_path:", power_plants_raw_data_sources_folder_path)


additional_path_1 = "/RawData/PowerPlants/EU_Power_Units_Raw_Data_Source/"

# Create the power_plants_raw_data_sources_folder_path
power_plants_raw_data_sources_folder_path = dispaSET_unleash_folder_path + additional_path_1 + str(data_year) + "/"
print("power_plants_raw_data_sources_folder_path:", power_plants_raw_data_sources_folder_path)

power_plants_raw_data_sources_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/EU_Power_Units_Raw_Data_Source/2020/


In [8]:
#power_plants_raw_data_sources_folder_path = "/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/EU_Power_Units_ENTSOE_Raw_Data/"

# List to store the file paths
sources_list = []

# Function to extract the number from within parentheses
def extract_number(filename):
    start = filename.rfind('(') + 1
    end = filename.rfind(')')
    return int(filename[start:end])

# Iterate over all files in the folder
for root, dirs, files in os.walk(power_plants_raw_data_sources_folder_path):
    for file in files:
        # Construct the full file path
        file_path = os.path.join(root, file)
        # Append the file path to the source list
        sources_list.append(file_path)

# Sort the source list based on the numbers within parentheses
sources_list.sort(key=lambda x: extract_number(os.path.basename(x)))

# Print the sorted list of file paths
print(sources_list)

sources_list

[]


[]

In [9]:
# List of the download links:
download_links = sources_list

In [10]:
# List of zones related to the download links:
download_links_zone_related = [
"AT",
"BE",
"BG",
"CH",
"CY",
"CZ",
"DE",
"DE",
"DE",
"DE",
"DK",
"EE",
"EL",
"ES",
"FI",
"FR",
"HR",
"HU",
"IE",
"IT",
"LT",
"LU",
"LV",
"MT",
"NL",
"NO",
"PL",
"PT",
"RO",
"SE",
"SI",
"SK",
"UK",
"UK"
]

In [11]:
def save_download_links_to_csv(links, zones, folder_path, data_year):
    # Create the filename using the data year, current date, and time
    now = datetime.now()
    timestamp = now.strftime("%Y%m%d_%H%M%S")
    file_name = f"{data_year}_power_plants_raw_data_sources_{timestamp}.csv"
    
    # Create a folder with the same name as the file (without extension)
    folder_name = os.path.splitext(file_name)[0]
    folder_path = os.path.join(folder_path, folder_name)
    os.makedirs(folder_path, exist_ok=True)
    
    # Combine the folder path and filename
    file_path = os.path.join(folder_path, file_name)
    
    # Write links to CSV file
    with open(file_path, 'w', newline='') as csvfile:
        writer = csv.writer(csvfile)
        
        # Write the header
        writer.writerow(['Download_Link_Sources', 'Zone', 'File_Name'])
        
        # Write the links, zones, and file names
        for i, (link, zone) in enumerate(zip(links, zones), start=1):
            writer.writerow([link, zone, i])
    
    print(f"Download links saved to: {file_path}")
    
    return file_path, file_name

# Save the download links to a CSV file and get the file path and name
power_plants_raw_data_sources_file_path, power_plants_raw_data_sources_file_name = save_download_links_to_csv(download_links, download_links_zone_related, power_plants_raw_data_folder_path, data_year)

print("File path:", power_plants_raw_data_sources_file_path)
print("File name:", power_plants_raw_data_sources_file_name)

Download links saved to: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240711_080920/2020_power_plants_raw_data_sources_20240711_080920.csv
File path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240711_080920/2020_power_plants_raw_data_sources_20240711_080920.csv
File name: 2020_power_plants_raw_data_sources_20240711_080920.csv


<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [12]:
print (f"dispaSET_unleash_folder_name:                            {dispaSET_unleash_folder_name}")
print (f"dispaSET_unleash_folder_path:                            {dispaSET_unleash_folder_path}")
print (f"power_plants_raw_data_folder_path:                       {power_plants_raw_data_folder_path}")
print (f"zone_names:                                              {zone_names}")
print (f"created_zones:                                           {created_zones}")
print (f"power_plants_raw_data_sources_file_path:                 {power_plants_raw_data_sources_file_path}")
print (f"power_plants_raw_data_sources_file_name:                 {power_plants_raw_data_sources_file_name}")
print (f"data_year:                                               {data_year}")
print (f"download_links_zone_related:                             {download_links_zone_related}")

dispaSET_unleash_folder_name:                            Dispa-SET_Unleash
dispaSET_unleash_folder_path:                            /home/ray/Dispa-SET_Unleash
power_plants_raw_data_folder_path:                       /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/
zone_names:                                              ['AT', 'BE', 'BG', 'CH', 'CY', 'CZ', 'DE', 'DK', 'EE', 'EL', 'ES', 'FI', 'FR', 'HR', 'HU', 'IE', 'IT', 'LT', 'LU', 'LV', 'MT', 'NL', 'NO', 'PL', 'PT', 'RO', 'SE', 'SI', 'SK', 'UK']
created_zones:                                           {'AT': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/AT', 'BE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE', 'BG': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BG', 'CH': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH', 'CY': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CY', 'CZ': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CZ', 'DE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE', 'DK': '/home/ray/D

In [13]:
created_zones

{'AT': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/AT',
 'BE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE',
 'BG': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BG',
 'CH': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH',
 'CY': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CY',
 'CZ': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CZ',
 'DE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE',
 'DK': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK',
 'EE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/EE',
 'EL': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/EL',
 'ES': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/ES',
 'FI': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/FI',
 'FR': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/FR',
 'HR': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/HR',
 'HU': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/HU',
 'IE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/IE',
 'IT': '/home/ray/Dispa-

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    5. Power Plants Raw Data Download Files
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Using the download list given previously to download and save all the power units raw data files inside a folder called as is it specified in the variable power_plants_raw_data_sources_folder_path.
    <br>
    All the downloaded files are named under the ordering of the download_links list.
    <br>
    Additionally, as some files use different kind of column delimiters e.g. ","; ";"; "/"; etc. the code verifies and changes this to be acceptable for the next steps 
</div>

In [14]:
#power_plants_raw_data_sources_file_path = "/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2023_power_plants_raw_data_sources_20240418_185453/2023_power_plants_raw_data_sources_20240418_185453.csv"

# Read the CSV file into a DataFrame
df = pd.read_csv(power_plants_raw_data_sources_file_path)

# Iterate over each row in the DataFrame
for index, row in df.iterrows():
    download_link = row['Download_Link_Sources']
    file_name = str(row['File_Name'])  # Convert to string
    
    # Check if the download link is valid
    if pd.notnull(download_link) and pd.notnull(file_name):
        # Construct the destination file path
        destination_path = os.path.join(os.path.dirname(power_plants_raw_data_sources_file_path), file_name)
        
        # Copy the content from the download link to the destination file path
        shutil.copy(download_link, destination_path)
        print(f"File '{file_name}' copied successfully.")
    else:
        print("Download link or file name is missing.")


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    6. Zone Classification
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Filtering the data contained in each downloaded file accordingly the zone previously specified. 
</div>

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    6.1. Zone Definition
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Adding a new column under the name "Country" to each downloaded file which have been relationated with a key Zone in the list download_links_zone_related fulfilled previously
    <br>
    For all those files that have been set with the key "General", it is assumed that so file contains data from various zones, so It will be filtered in a different way.
</div>

In [15]:
# Read the CSV file specified in power_plants_raw_data_sources_file_path
df_sources = pd.read_csv(power_plants_raw_data_sources_file_path)

# Iterate over each row in the DataFrame
for index, row in df_sources.iterrows():
    file_name = str(row['File_Name'])  # Convert to string
    zone = row['Zone']
    
    # Check if the zone is not "General"
    if zone != "General":
        # Construct the path to the corresponding CSV file
        csv_file_path = os.path.join(os.path.dirname(power_plants_raw_data_sources_file_path), file_name)
        
        # Check if the CSV file exists
        if os.path.exists(csv_file_path):
            # Read the CSV file
            df_csv = pd.read_csv(csv_file_path)
            
            # Add a new column "Country" with the value from the "Zone" column
            df_csv['Country'] = zone
            
            # Write the updated DataFrame back to the CSV file
            df_csv.to_csv(csv_file_path, index=False)
            
            print(f"Added 'Country' column to {file_name} with value '{zone}'")
        else:
            print(f"CSV file {file_name} does not exist.")
    else:
        print(f"No action needed for {file_name}")


<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [16]:
print (f"dispaSET_unleash_folder_name: {dispaSET_unleash_folder_name}")
print (f"dispaSET_unleash_folder_path: {dispaSET_unleash_folder_path}")
print (f"power_plants_raw_data_folder_path: {power_plants_raw_data_folder_path}")
print (f"zone_names: {zone_names}")
print (f"created_zones: {created_zones}")
print (f"power_plants_raw_data_sources_file_path: {power_plants_raw_data_sources_file_path}")
print (f"power_plants_raw_data_sources_file_name: {power_plants_raw_data_sources_file_name}")
print (f"data_year: {data_year}")
print (f"download_links_zone_related: {download_links_zone_related}")

dispaSET_unleash_folder_name: Dispa-SET_Unleash
dispaSET_unleash_folder_path: /home/ray/Dispa-SET_Unleash
power_plants_raw_data_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/
zone_names: ['AT', 'BE', 'BG', 'CH', 'CY', 'CZ', 'DE', 'DK', 'EE', 'EL', 'ES', 'FI', 'FR', 'HR', 'HU', 'IE', 'IT', 'LT', 'LU', 'LV', 'MT', 'NL', 'NO', 'PL', 'PT', 'RO', 'SE', 'SI', 'SK', 'UK']
created_zones: {'AT': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/AT', 'BE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE', 'BG': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BG', 'CH': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH', 'CY': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CY', 'CZ': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CZ', 'DE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE', 'DK': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK', 'EE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/EE', 'EL': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/EL', '

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    6.2. Raw Data File Zone Classification
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Moving each downloaded file to its corresponding folder zone accordinlgy the download_links_zone_related list.
    <br>
    The files relationed to the key "General" just will keep their current location.
</div>

In [17]:
# Read the raw data sources file
df_sources = pd.read_csv(power_plants_raw_data_sources_file_path)

# Create a new column Final_File_Path
df_sources['Final_File_Path'] = ''

# Iterate over each row in the DataFrame
for index, row in df_sources.iterrows():
    file_name = str(row['File_Name'])
    file_path = os.path.join(os.path.dirname(power_plants_raw_data_sources_file_path), file_name)
    
    # Check if the file exists
    if os.path.exists(file_path):
        # Open and read the file
        df_csv = pd.read_csv(file_path)
        
        # Check if the file has the header 'Country'
        if 'Country' in df_csv.columns:
            # Get the corresponding value of 'Zone'
            zone_value = row['Zone']
            
            # Check if the zone folder exists
            if zone_value in zone_names:
                # Construct the destination folder path
                destination_folder = os.path.join(power_plants_raw_data_folder_path, zone_value)
                
                # Move the file to the destination folder
                destination_file_path = os.path.join(destination_folder, file_name)
                move(file_path, destination_file_path)
                print(f"Moved file '{file_name}' to '{destination_folder}'")
                
                # Get the current path of the moved file
                final_file_path = os.path.abspath(destination_file_path)
                df_sources.at[index, 'Final_File_Path'] = final_file_path
            else:
                print(f"Destination folder for zone '{zone_value}' does not exist.")
        else:
            print(f"No 'Country' header found in file '{file_name}'. No action needed.")
    else:
        print(f"File '{file_name}' does not exist.")
        
    # If file was not moved, update Final_File_Path with current path
    if not df_sources.at[index, 'Final_File_Path']:
        df_sources.at[index, 'Final_File_Path'] = os.path.abspath(file_path)

# Save the DataFrame back to the CSV file
df_sources.to_csv(power_plants_raw_data_sources_file_path, index=False)

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Extracting a list of all the path of the power plants raw data files.
    <br>
    This list is goning to be used as reference for future filtration stepts.
</div>

In [18]:
# Read the CSV file
df = pd.read_csv(power_plants_raw_data_sources_file_path)

# Create the list power_plants_raw_data_file_list
power_plants_raw_data_file_list = []

# Iterate over each row in the DataFrame
for index, row in df.iterrows():
    # Convert integer values to string for File_Name and Final_File_Path
    file_name = str(row['File_Name'])
    final_file_path = str(row['Final_File_Path'])
    
    # Concatenate File_Name and Final_File_Path and append to the list
    file_path = os.path.join(file_name, final_file_path)
    power_plants_raw_data_file_list.append(file_path)

# Print the list
print(power_plants_raw_data_file_list)

[]


In [19]:
# Iterate over each row in the DataFrame
for index, row in df.iterrows():
    # Extract the folder path
    folder_path = os.path.dirname(row['Final_File_Path'])
    
    # Update the DataFrame with the folder path
    df.at[index, 'Folder_Path'] = folder_path

# Save the DataFrame back to the CSV file
df.to_csv(power_plants_raw_data_sources_file_path, index=False)


<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [20]:
print (f"dispaSET_unleash_folder_name: {dispaSET_unleash_folder_name}")
print (f"dispaSET_unleash_folder_path: {dispaSET_unleash_folder_path}")
print (f"power_plants_raw_data_folder_path: {power_plants_raw_data_folder_path}")
print (f"zone_names: {zone_names}")
print (f"created_zones: {created_zones}")
print (f"power_plants_raw_data_sources_file_path: {power_plants_raw_data_sources_file_path}")
print (f"power_plants_raw_data_sources_file_name: {power_plants_raw_data_sources_file_name}")
print (f"data_year: {data_year}")
print (f"download_links_zone_related: {download_links_zone_related}")
print (f"power_plants_raw_data_file_list: {power_plants_raw_data_file_list}")

dispaSET_unleash_folder_name: Dispa-SET_Unleash
dispaSET_unleash_folder_path: /home/ray/Dispa-SET_Unleash
power_plants_raw_data_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/
zone_names: ['AT', 'BE', 'BG', 'CH', 'CY', 'CZ', 'DE', 'DK', 'EE', 'EL', 'ES', 'FI', 'FR', 'HR', 'HU', 'IE', 'IT', 'LT', 'LU', 'LV', 'MT', 'NL', 'NO', 'PL', 'PT', 'RO', 'SE', 'SI', 'SK', 'UK']
created_zones: {'AT': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/AT', 'BE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE', 'BG': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BG', 'CH': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH', 'CY': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CY', 'CZ': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CZ', 'DE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE', 'DK': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK', 'EE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/EE', 'EL': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/EL', '

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    7. Data Formating
</div>

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    7.1. Clean Data File Creation
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Creating for each zone an empty csv file with all the technical features needed for Dispa-SET simulations as headers.
    <br>
    This file will be named under the value of the variable data year previously specified
    <br>
    On this csv file all the filtered data in the following steps will be written.
</div>

In [21]:
# Read the CSV file
df = pd.read_csv(power_plants_raw_data_sources_file_path)

# Iterate over each row in the DataFrame
for index, row in df.iterrows():
    # Get the folder path and file name
    folder_path = row['Folder_Path']
    file_name = f"{data_year}.csv"
    
    # Check if the file already exists in the folder
    file_path = os.path.join(folder_path, file_name)
    if os.path.exists(file_path):
        print(f"File '{file_name}' already exists in '{folder_path}'.")
        # Update the DataFrame with the existing file name and path
        df.at[index, 'Clean_File_Name'] = file_name
        df.at[index, 'Final_Clean_File_Path'] = file_path
    else:
        # Create the new CSV file with the specified headers
        headers = ["", "Unit", "PowerCapacity", "Nunits", "Zone", "Zone_th", "Zone_h2", "Technology", "Fuel", "Efficiency",
                   "MinUpTime", "MinDownTime", "RampUpRate", "RampDownRate", "StartUpCost", "NoLoadCost_pu", "RampingCost",
                   "PartLoadMin", "MinEfficiency", "StartUpTime", "CO2Intensity", "CHPType", "CHPPowerToHeat",
                   "CHPPowerLossFactor", "CHPMaxHeat", "COP", "Tnominal", "coef_COP_a", "coef_COP_b", "STOCapacity",
                   "STOSelfDischarge", "STOMaxChargingPower", "STOChargingEfficiency", "WaterWithdrawal", "WaterConsumption", "Status", "Source",
                   "Company", "Lat", "Lon"]
        with open(file_path, 'w') as f:
            f.write(','.join(headers))
        print(f"Created file '{file_name}' in '{folder_path}'.")
        # Update the DataFrame with the new file name and path
        df.at[index, 'Clean_File_Name'] = file_name
        df.at[index, 'Final_Clean_File_Path'] = file_path

# Create the list of clean data file paths
power_plants_clean_data_file_list = df['Final_Clean_File_Path'].tolist()

# Remove the header from the list
power_plants_clean_data_file_list.pop(0)

# Save the DataFrame back to the CSV file
df.to_csv(power_plants_raw_data_sources_file_path, index=False)

KeyError: 'Final_Clean_File_Path'

In [None]:
# Read the CSV file
df = pd.read_csv(power_plants_raw_data_sources_file_path)

# Create the list power_plants_clean_data_file_list
power_plants_clean_data_file_list = []

# Iterate over each row in the DataFrame
for index, row in df.iterrows():
    # Convert integer values to string for File_Name and Final_Clean_File_Path
    file_name = str(row['File_Name'])
    final_file_path = str(row['Final_Clean_File_Path'])
    
    # Concatenate File_Name and Final_Clean_File_Path and append to the list
    file_path = os.path.join(file_name, final_file_path)
    power_plants_clean_data_file_list.append(file_path)

# Print the list
print(power_plants_clean_data_file_list)

<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [None]:
print (f"dispaSET_unleash_folder_name: {dispaSET_unleash_folder_name}")
print (f"dispaSET_unleash_folder_path: {dispaSET_unleash_folder_path}")
print (f"power_plants_raw_data_folder_path: {power_plants_raw_data_folder_path}")
print (f"zone_names: {zone_names}")
print (f"created_zones: {created_zones}")
print (f"power_plants_raw_data_sources_file_path: {power_plants_raw_data_sources_file_path}")
print (f"power_plants_raw_data_sources_file_name: {power_plants_raw_data_sources_file_name}")
print (f"data_year: {data_year}")
print (f"download_links_zone_related: {download_links_zone_related}")
print (f"power_plants_raw_data_file_list: {power_plants_raw_data_file_list}")
print (f"power_plants_clean_data_file_list: {power_plants_clean_data_file_list}")

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    7.2. Dictionary Files
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Getting the current paths of all the dictionary files that are going to be used as reference for the next filtering data process.
        <br>
    The dictionady files are .csv files that contain a small data base of al the technical denomination ussually used for the power units, e.g. 'PowerCapacity', 'Energy Source", 'Technology' etc.
</div>

In [None]:
#dispaSET_unleash_folder_path = "/home/ray/Dispa-SET_Unleash"  # Replace with your actual path

# List of file names
file_names = [
    "ENTSOE_Production_Type.csv",
    "power_plants_clean_data_equivalent_headers.csv",
    "power_plants_all_data_equivalent_technologies.csv",
    "power_plants_all_data_equivalent_fuels.csv",
    "power_plants_all_data_equivalent_CHPTypes.csv",
    "EU_Power_Units_Technical_Features.csv"
]

# Dictionary to store file paths
file_paths = {}

# Construct file paths
for file_name in file_names:
    # Construct the file path
    file_path = os.path.join(dispaSET_unleash_folder_path, "RawData", "PowerPlants", file_name)

    # Store the file path in the dictionary
    file_paths[file_name] = file_path

# Define variables explicitly using the file paths
EU_Power_Units_Technical_Features_file_path = file_paths["EU_Power_Units_Technical_Features.csv"]
power_plants_clean_data_equivalent_headers_file_path = file_paths["power_plants_clean_data_equivalent_headers.csv"]
power_plants_all_data_equivalent_technologies_file_path = file_paths["power_plants_all_data_equivalent_technologies.csv"]
power_plants_all_data_equivalent_fuels_file_path = file_paths["power_plants_all_data_equivalent_fuels.csv"]
power_plants_all_data_equivalent_CHPTypes_file_path = file_paths["power_plants_all_data_equivalent_CHPTypes.csv"]
ENTSOE_Production_Type_file_path = file_paths["ENTSOE_Production_Type.csv"]
# ... Define other variables similarly using file_paths dictionary

# The variables are now accessible throughout your code.

# Print the variables (optional)
for file_name, file_path in file_paths.items():
     print(f"{file_name.split('.')[0]}_file_path: {file_path}")

<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [None]:
print (f"dispaSET_unleash_folder_name: {dispaSET_unleash_folder_name}")
print (f"dispaSET_unleash_folder_path: {dispaSET_unleash_folder_path}")
print (f"power_plants_raw_data_folder_path: {power_plants_raw_data_folder_path}")
print (f"zone_names: {zone_names}")
print (f"created_zones: {created_zones}")
print (f"power_plants_raw_data_sources_file_path: {power_plants_raw_data_sources_file_path}")
print (f"power_plants_raw_data_sources_file_name: {power_plants_raw_data_sources_file_name}")
print (f"data_year: {data_year}")
print (f"download_links_zone_related: {download_links_zone_related}")
print (f"power_plants_raw_data_file_list: {power_plants_raw_data_file_list}")
print (f"power_plants_clean_data_file_list: {power_plants_clean_data_file_list}")
print (f"power_plants_clean_data_equivalent_headers_file_path: {power_plants_clean_data_equivalent_headers_file_path}")
print (f"power_plants_all_data_equivalent_technologies_file_path: {power_plants_all_data_equivalent_technologies_file_path}")
print (f"power_plants_all_data_equivalent_fuels_file_path: {power_plants_all_data_equivalent_fuels_file_path}")
print (f"power_plants_all_data_equivalent_CHPTypes_file_path: {power_plants_all_data_equivalent_CHPTypes_file_path}")
print (f"EU_Power_Units_Technical_Features_file_path: {EU_Power_Units_Technical_Features_file_path}")
print (f"ENTSOE_Production_Type_file_path: {ENTSOE_Production_Type_file_path}")

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    7.3. ENTSOE SOURCE FILES DATA PRODUCTION TECHNOLOGY TYPE
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Getting the technology and fuel clasification for all the raw data from ENTSO-E source.
        <br>
    Since all the data obtained from ENTSO-E web portal does not make a classification of generation technology and fuel source, it is going to use the dictionary called "ENTSO-E_Production_Type.csv" where all the equivalent technologies and fuels are going to be extracted accoding the "Production Type" classification used by ENTSO-E.
 
</div>

In [None]:
# Read the CSV files
#power_plants_file_path = "/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2023_power_plants_raw_data_sources_20240507_132711/2023_power_plants_raw_data_sources_20240507_132711.csv"
#entsoe_production_type_file_path = "/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/ENTSOE_Production_Type.csv"

power_plants_df = pd.read_csv(power_plants_raw_data_sources_file_path)
entsoe_production_type_df = pd.read_csv(ENTSOE_Production_Type_file_path)

# Function to process each text file
def process_text_file(text_file_path):
    # Read the text file
    with open(text_file_path, 'r') as file:
        lines = file.readlines()

    # Check if 'Production Type' column exists
    header = lines[0].strip().split(',')
    if 'Production Type' in header:
        # Add 'Technology' and 'Fuel' columns if not present
        if 'Technology' not in header:
            header.extend(['Technology', 'Fuel'])
            lines[0] = ','.join(header) + '\n'

        # Extract 'Production Type' column index
        production_type_index = header.index('Production Type')

        # Process each row in the text file
        for i in range(1, len(lines)):
            fields = lines[i].strip().split(',')
            production_type = fields[production_type_index]

            # Find corresponding technology and fuel
            match_row = entsoe_production_type_df[entsoe_production_type_df['Complete Term'] == production_type]
            if not match_row.empty:
                technology = match_row['technology'].values[0]
                fuel = match_row['fuel'].values[0]
                fields.extend([technology, fuel])
                lines[i] = ','.join(fields) + '\n'

        # Write back to the text file
        with open(text_file_path, 'w') as file:
            file.writelines(lines)
        
        # Print message when the file is done
        print(f"Processed {text_file_path}")

# Process each row in power_plants_df
for index, row in power_plants_df.iterrows():
    text_file_path = row['Final_File_Path']
    if os.path.exists(text_file_path):
        process_text_file(text_file_path)

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    7.4. Raw Data Files Zone Classification
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Copying all the needed columns from the raw data files to the clean data files in base to the order given by the lists power_plants_raw_data_file_list and power_plants_clean_data_file_list.
        <br>
    This processes is made to each zone calling the function copy_columns_to_clean_data.py
</div>

<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [None]:
power_plants_raw_data_file_list

In [None]:
power_plants_clean_data_file_list

In [None]:
from copy_columns_to_clean_data import copy_columns_to_clean_data
# Iterate over each pair of files
for raw_data_file, clean_data_file in zip(power_plants_raw_data_file_list, power_plants_clean_data_file_list):
    # Call the function with appropriate arguments
    copy_columns_to_clean_data(raw_data_file, clean_data_file, power_plants_clean_data_equivalent_headers_file_path)

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    7.5. Units Name Field Fulfill
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Verifying the "Unit" field in all the rows of each cleand data file.
        <br>
     If there is no value, the corresponding latitud and longitud value is going to be used as identifier of the corresponding unit. 
        <br>
    If there is no value in the latitud and longitud fields the corresponding "Company" data is going to be taken.
    <br>
    Finally if there is not any company related data, the "Unit" field is going to be performed form the zone name plus an increasing numner.
        <br>
    This processes is made to each zone calling the function units_name_filfilling.py
</div>

<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [None]:
power_plants_clean_data_file_list

In [None]:
from Data_Processing_Functions import fill_unit_field

# List of file paths
#power_plants_clean_data_file_list = ['path_to_file1.csv', 'path_to_file2.csv', 'path_to_file3.csv']

# Iterate over each file path
for file_path in power_plants_clean_data_file_list:
    fill_unit_field(file_path)

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    7.6. Nunits Field Fulfill
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Verifying the "Nunit" field in all the rows of each cleand data file.
        <br>
     If there is no value, the field is gonna be filled by 1.
        <br>
    This processes is made for each zone calling the function fill_nunits_column.py
</div>

<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [None]:
power_plants_clean_data_file_list

In [None]:
from Data_Processing_Functions import fill_nunits_column

# List of file paths
#clean_data_file_paths = ["path/to/your/clean_data1.csv", "path/to/your/clean_data2.csv", ...]

# Apply the function to each file
for file_path in power_plants_clean_data_file_list:
    fill_nunits_column(file_path)

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    7.7. Zone Field Filter
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Verifying the "Zone" field in all the rows of each cleand data file.
        <br>
     If there is no value, the field is gonna be took out from the file.
    <br>
     Additionally all the eliminated rows are going to be storage in a csv file called power_plants_all_data_not_defined_units.
        <br>
    This processes is made for each zone calling the function units_name_filter.py
</div>

<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [None]:
print (f"dispaSET_unleash_folder_name: {dispaSET_unleash_folder_name}")
print (f"dispaSET_unleash_folder_path: {dispaSET_unleash_folder_path}")
print (f"power_plants_raw_data_folder_path: {power_plants_raw_data_folder_path}")
print (f"zone_names: {zone_names}")
print (f"created_zones: {created_zones}")
print (f"power_plants_raw_data_sources_file_path: {power_plants_raw_data_sources_file_path}")
print (f"power_plants_raw_data_sources_file_name: {power_plants_raw_data_sources_file_name}")
print (f"data_year: {data_year}")
print (f"download_links_zone_related: {download_links_zone_related}")
print (f"power_plants_raw_data_file_list: {power_plants_raw_data_file_list}")
print (f"power_plants_clean_data_file_list: {power_plants_clean_data_file_list}")
print (f"power_plants_clean_data_equivalent_headers_file_path: {power_plants_clean_data_equivalent_headers_file_path}")
print (f"power_plants_all_data_equivalent_technologies_file_path: {power_plants_all_data_equivalent_technologies_file_path}")
print (f"power_plants_all_data_equivalent_fuels_file_path: {power_plants_all_data_equivalent_fuels_file_path}")
print (f"power_plants_all_data_equivalent_CHPTypes_file_path: {power_plants_all_data_equivalent_CHPTypes_file_path}")
print (f"EU_Power_Units_Technical_Features_file_path: {EU_Power_Units_Technical_Features_file_path}")

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 16px; font-family: TimesNewRoman;">
    7.7.1. Not Defined Units File Creation
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Creating a .csv file for each zone where all those units that has some missing data in one of their field are going to be written such as all the units that are took out from the final clean data file for some other reason.

</div>

In [None]:
# Function to create the CSV file with specified headers
def create_csv_file(file_path, headers):
    with open(file_path, 'w', newline='') as csvfile:
        writer = csv.writer(csvfile)
        writer.writerow(headers)

# Define the headers for the CSV file
headers = ["", "Unit", "PowerCapacity", "Nunits", "Zone", "Zone_th", "Zone_h2", "Technology", "Fuel", "Efficiency",
           "MinUpTime", "MinDownTime", "RampUpRate", "RampDownRate", "StartUpCost", "NoLoadCost_pu", "RampingCost",
           "PartLoadMin", "MinEfficiency", "StartUpTime", "CO2Intensity", "CHPType", "CHPPowerToHeat",
           "CHPPowerLossFactor", "CHPMaxHeat", "COP", "Tnominal", "coef_COP_a", "coef_COP_b", "STOCapacity",
           "STOSelfDischarge", "STOMaxChargingPower", "STOChargingEfficiency", "WaterWithdrawal", "WaterConsumption", "Status", "Source",
           "Company", "Lat", "Lon"]

# List to store the paths of created CSV files
power_plants_all_data_not_defined_units_file_list = []

# Read the CSV file and create CSV files in each folder path
with open(power_plants_raw_data_sources_file_path, 'r') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        folder_path = row['Folder_Path']
        file_name = 'power_plants_all_data_not_defined_units.csv'
        file_path = os.path.join(folder_path, file_name)
        create_csv_file(file_path, headers)
        print(f"CSV file created at {file_path}")
        power_plants_all_data_not_defined_units_file_list.append(file_path)

print("All CSV files created successfully.")

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 16px; font-family: TimesNewRoman;">
    7.7.2. No Data Zone Field Elimination
</div>
<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Eliminating all the data that does not have a zone, country, region etc. in its Zone field.
</div>

<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [None]:
power_plants_all_data_not_defined_units_file_list

In [None]:
power_plants_clean_data_file_list

In [None]:
from Data_Processing_Functions import move_rows_with_empty_zone

# Iterate over each pair of files
for clean_data_file, all_data_file in zip(power_plants_clean_data_file_list, power_plants_all_data_not_defined_units_file_list):
    # Call the function with appropriate arguments
    move_rows_with_empty_zone([clean_data_file], [all_data_file])

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    7.8. Technology Field Filter
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Re-naming the current value of all the fileds of Technology column.
    <br>
    Since the generation technology name of the power units may not be homogeneous among all data sources. The current value of this field is compared with the technology equivalence dictionary database to obtain the one corresponding to the Dispa-SET format.
        <br>
    Additionally all the rows which Technolgy field have not been matched with any equivalency are going to be took out fron the cleand data file and added to the not defined units data list.
    <br>
    This process is done for each zone.
</div>

<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [None]:
print (f"dispaSET_unleash_folder_name:                                {dispaSET_unleash_folder_name}")
print (f"dispaSET_unleash_folder_path:                                {dispaSET_unleash_folder_path}")
print (f"power_plants_raw_data_folder_path:                           {power_plants_raw_data_folder_path}")
print (f"zone_names:                                                  {zone_names}")
print (f"created_zones:                                               {created_zones}")
print (f"power_plants_raw_data_sources_file_path:                     {power_plants_raw_data_sources_file_path}")
print (f"power_plants_raw_data_sources_file_name:                     {power_plants_raw_data_sources_file_name}")
print (f"data_year:                                                   {data_year}")
print (f"download_links_zone_related:                                 {download_links_zone_related}")
print (f"power_plants_raw_data_file_list:                             {power_plants_raw_data_file_list}")
print (f"power_plants_clean_data_file_list:                           {power_plants_clean_data_file_list}")
print (f"power_plants_clean_data_equivalent_headers_file_path:        {power_plants_clean_data_equivalent_headers_file_path}")
print (f"power_plants_all_data_equivalent_technologies_file_path:     {power_plants_all_data_equivalent_technologies_file_path}")
print (f"power_plants_all_data_equivalent_fuels_file_path:            {power_plants_all_data_equivalent_fuels_file_path}")
print (f"power_plants_all_data_equivalent_CHPTypes_file_path:         {power_plants_all_data_equivalent_CHPTypes_file_path}")
print (f"EU_Power_Units_Technical_Features_file_path:                 {EU_Power_Units_Technical_Features_file_path}")
print (f"power_plants_clean_data_file_list:                           {power_plants_clean_data_file_list}")
print (f"power_plants_raw_data_file_list:                             {power_plants_raw_data_file_list}")
print (f"power_plants_all_data_not_defined_units_file_list:           {power_plants_all_data_not_defined_units_file_list}")

In [None]:
from Data_Processing_Functions import update_technology_equivalents

# Iterate over each pair of files
for clean_data_file, not_defined_units_file in zip(power_plants_clean_data_file_list, power_plants_all_data_not_defined_units_file_list):
    # Call the function with appropriate arguments
    update_technology_equivalents(clean_data_file, power_plants_all_data_equivalent_technologies_file_path, not_defined_units_file)

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    7.9. Fuel Field Filter
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Re-naming the current value of all the fields of Fuel column.
    <br>
    Since the fuel sources names of the power units may not be homogeneous among all data sources. The current value of this field is compared with the fuel equivalence dictionary database to obtain the one corresponding to the Dispa-SET format.
        <br>
    Additionally all the rows which Fuel field have not been matched with any equivalency are going to be took out fron the cleand data file and added to the not defined units data list. 
    <br>
    This process is done for each zone.
</div>

In [None]:
from Data_Processing_Functions import update_fuel_equivalents

# Iterate over each pair of files
for clean_data_file_path, not_defined_units_file_path in zip(power_plants_clean_data_file_list, power_plants_all_data_not_defined_units_file_list):
    # Call the function with appropriate arguments
    update_fuel_equivalents(clean_data_file_path, power_plants_all_data_equivalent_fuels_file_path, not_defined_units_file_path)

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    7.10. CHPType, CHPPowerToHeat, CHPPowerLossFactor and CHPMaxHeat Fields Filter
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Re-naming the current value of all the fields of CHPType column.
    <br>
    Since the fuel sources names of the power units may not be homogeneous among all data sources. The current value of CHPType field is compared with the CHP equivalence dictionary database to obtain the one corresponding to the Dispa-SET format.
        <br>
    This part reads for each unit the current value of the column called "CHPMaxHeat" of power_plants_clean_data file. If the unit has any value on this field,  checks out its respective value in the column CHPType, and compares wiht the table power_plants_all_data_equivalent_CHPTypes_file, if there is no coincydense, copies the row to the power_plants_all_data_not_defined_units_file leaving this field empty, but keeping the information of the other related fields.
     <br>
    This process is done for each zone.
</div

In [None]:
from Data_Processing_Functions import update_chp_types

# Iterate over each pair of files
for clean_data_file_path, not_defined_units_file_path in zip(power_plants_clean_data_file_list, power_plants_all_data_not_defined_units_file_list):
    # Call the function with appropriate arguments
    update_chp_types(clean_data_file_path, power_plants_all_data_equivalent_CHPTypes_file_path, not_defined_units_file_path)

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    7.11. Power Capacity Field Filter
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Erasing all the units with the Power Capacity field empty or with zero value.
         <br>
    This process is done for each zone.
</div

In [None]:
from Data_Processing_Functions import remove_zero_power_capacity

# Iterate over each pair of files
for clean_data_file_path, not_defined_units_file_path in zip(power_plants_clean_data_file_list, power_plants_all_data_not_defined_units_file_list):
    # Call the function with appropriate arguments
    remove_zero_power_capacity(clean_data_file_path, not_defined_units_file_path)

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    7.12. Efficiency Field Fullfilling
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Finding the closest efficiency value for all the units with the Efficiency field empty or with zero value.
    <br>
    The Efficiency technical feauture is taken from the already agregated data base EU_Power_Units_Technical_Features.csv using the notebook: EU_Power_Plant_Technical_Data_Base_Gathering.ipynb and is os sellected in base on the Technology, the Fuel and the PowerCapacity features.
     <br>
    This process is done for each zone.
</div

In [None]:
power_plants_clean_data_file_list

In [None]:
EU_Power_Units_Technical_Features_file_path

In [None]:
# Import the function from the external script
from Data_Processing_Functions import copy_technical_values

# Define the paths and variables
#EU_Power_Units_Technical_Features_file_path = '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/EU_Power_Units_Technical_Features.csv'
#power_plants_clean_data_file_list = [
#    '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240407_172323/2020.csv',
#    '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK/2020.csv',
#    '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH/2020.csv',
#    '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE/2020.csv'
#]
common_columns = ['PowerCapacity', 'Technology', 'Fuel']
column_to_copy = 'Efficiency'

copy_technical_values(EU_Power_Units_Technical_Features_file_path, power_plants_clean_data_file_list, common_columns, column_to_copy)

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Filling with a value of 1 the efficiency technical feature for all the units with the Efficiency field empty or with zero value.
    <br>
    Additionally all this units found with any efficiency value, will be copied to the corresponding no denined units file.
     <br>
    This process is done for each zone.
</div

In [None]:
from Data_Processing_Functions import fill_empty_values_with_specified

# Define the columns to be filled
Column = 'Efficiency'
Value = '1'  # Change this to the desired value

# Iterate over each pair of files
for input_file, output_file in zip(power_plants_clean_data_file_list, power_plants_all_data_not_defined_units_file_list):
    # Call the function for the current pair of files
    fill_empty_values_with_specified(input_file, output_file, Column, Value)

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    7.13.  MinDownTime | MinDownTime | RampUpRate | RampDownRate | StartUpCost | NoLoadCost_pu | RampingCost | PartLoadMin | MinEfficiency | StartUpTime | CO2Intensity Fields Fulfilling
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Finding the closest value for the next technical features:
    <br>
    MinUpTime, MinDownTime, MinDownTime, RampUpRate, RampDownRate, StartUpCost, NoLoadCost_pu, RampingCost, PartLoadMin, MinEfficiency, StartUpTime, CO2Intensity.
    <br>
    This is done for all the units which corresponding field value is empty or with zero.
    <br>
    All the technical feautures are taken from the already agregated data base EU_Power_Units_Technical_Features.csv using the notebook: EU_Power_Plant_Technical_Data_Base_Gathering.ipynb and are sellected in base on the Technology, the Fuel and the PowerCapacity features.
     <br>
    This process is done for each zone.
</div

In [None]:
columns_to_copy = [
    'MinUpTime',
    'MinDownTime',
    'RampUpRate',
    'RampDownRate',
    'StartUpCost',
    'NoLoadCost_pu',
    'RampingCost',
    'PartLoadMin',
    'MinEfficiency',
    'StartUpTime',
    'CO2Intensity'
]
common_columns = ['PowerCapacity', 'Technology', 'Fuel']

In [None]:
for column_to_copy in columns_to_copy:
    copy_technical_values(EU_Power_Units_Technical_Features_file_path, power_plants_clean_data_file_list, common_columns, column_to_copy)

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    8. Zone Final Classification
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Filtering by Zone all the formating files resulting from the downloads files set by the "General" key at the begining of the process.
    <br>
    Each row of the files are copied according their zone clasification in their corresponding folder path. 
</div>

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    8.1. Zone Field Homogenization
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Eliminating the space characters from the Zone fileds to avoid errors in the zone classifiaction process.
    <br>
    This process is done for each zone.
</div

In [None]:
# List of file paths
#power_plants_clean_data_file_list = [
#    '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240407_201458/2020.csv',
#    '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK/2020.csv',
#    '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH/2020.csv',
#    '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE/2020.csv'
#]

# Iterate over each file
for file_path in power_plants_clean_data_file_list:
    # Read the CSV file into a DataFrame
    df = pd.read_csv(file_path)

    # Trim values in the 'Zone' column
    if 'Zone' in df.columns:
        df['Zone'] = df['Zone'].str.strip()

        # Write the modified DataFrame back to the CSV file
        df.to_csv(file_path, index=False)

        print(f"Trimmed 'Zone' column values in {file_path}")
    else:
        print(f"'Zone' column not found in {file_path}")


<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Filtering by Zone all the formating files resulting from the downloads files set by the "General" key at the begining of the process.
    <br>
    Each row of the files are copied according their zone clasification in their corresponding folder path. 
</div>

In [None]:
# File paths
#power_plants_raw_data_sources_file_path = '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240407_191506/2020_power_plants_raw_data_sources_20240407_191506.csv'

# Dictionary of created zones
#created_zones = {'DE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE',
#                 'DK': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK',
#                 'CH': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH',
#                'BE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE'}

# Read the first file
first_df = pd.read_csv(power_plants_raw_data_sources_file_path)

# Filter rows with Zone as "General"
general_zones_df = first_df[first_df['Zone'] == 'General']

# Iterate over rows with Zone as "General"
for index, row in general_zones_df.iterrows():
    # Get the path of the corresponding file
    file_path = row['Final_Clean_File_Path']
    
    # Read the second CSV file
    second_df = pd.read_csv(file_path)
    
    # Iterate over zone names
    for zone in zone_names:
        # Filter rows in the second file with matching Zone
        zone_rows = second_df[second_df['Zone'] == zone]
        
        # Check if there are matching rows
        if not zone_rows.empty:
            # Get the corresponding zone path
            zone_path = created_zones.get(zone)
            
            # Create the directory if it does not exist
            if not os.path.exists(zone_path):
                os.makedirs(zone_path)
            
            # Get the file name
            file_name = os.path.basename(file_path)
            zone_file_path = os.path.join(zone_path, file_name)
            
            # Copy the rows to the corresponding zone file
            if not os.path.exists(zone_file_path):
                zone_rows.to_csv(zone_file_path, index=False)
            else:
                zone_rows.to_csv(zone_file_path, mode='a', index=False, header=False)
            
            # Remove the copied rows from the original DataFrame
            second_df.drop(zone_rows.index, inplace=True)
            
            # Print message
            print(f"Copied rows to {zone_file_path} for zone {zone}.")
    
    # Write the updated DataFrame back to the second file
    second_df.to_csv(file_path, index=False)
    print(f"Original file {file_path} updated after row deletion.")

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    9. Zone Clustering
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Grouping power units according the location, the company or the name. 
    <br>
    This part is set to make a cluster of all the units that have the same Latitud and Longitud location (if the data is provided)
     <br>
    However, there is the posibility to cluster the units by the name for all the units that has the same name or by the company (if the data is also provided). 
    <br>
    To do it, just Uncomment the corresponding line of the following lines of the code.
</div>

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    9.1. Lat/Lon Column Creation
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Joining the filed of the columns Lat and Lon into a new column under the name Lat/Lon to clustering purposes.
</div>

In [None]:
#created_zones = {
#    'DE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE',
#    'DK': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK',
#    'CH': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH',
#    'BE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE'
#}

#data_year = '2020'

power_plants_clean_data_file_path_list = []

# Extract and format paths
for zone, path in created_zones.items():
  full_path = f"{path}/{data_year}.csv"
  power_plants_clean_data_file_path_list.append(full_path)

# Print the list of paths
print(power_plants_clean_data_file_path_list)

In [None]:
power_plants_clean_data_file_path_list

In [None]:
for file_path in power_plants_clean_data_file_path_list:
  # Read the CSV file
  df = pd.read_csv(file_path)

  # Check if "Lat" and "Lon" columns exist
  if 'Lat' in df.columns and 'Lon' in df.columns:
    # Create a new column named "Lat+Lon" with empty string as default
    df['Lat/Lon'] = np.where((df['Lat'].notna()) & (df['Lon'].notna()), df['Lat'].astype(str) + "/" + df['Lon'].astype(str), '')

    # Save the modified DataFrame back to the CSV file
    df.to_csv(file_path, index=False)
  else:
    print(f"Warning: 'Lat' or 'Lon' column not found in {file_path}")

print("Lat/Lon column added to files (if columns exist).")

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    9.2. Fields Cluster
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Clustering the units with the repeated values in the Lat/Lon.
     <br>
    The next code just clusters the units which Latitud and longitud repeated values under the condition that have the same Technology, Fuel and CHPType as well.
    <br>
    This process is made for each zone.
    <br>
    There is the posibility to make a cluster looking for duplicate fields in the name of the units or the company (If the data exists). For this purpouse just uncomend the corresponding lines.
</div>

In [None]:
# Column to be looked for
column_to_look_for = 'Lat/Lon'  # Change 'YourColumnNameHere' to the actual column name
#column_to_look_for = 'Unit'
#column_to_look_for = 'Company'


# Iterate over each file
for file_path in power_plants_clean_data_file_path_list:
    # Read the CSV file into a DataFrame
    df = pd.read_csv(file_path)

    # Check if the column to look for exists
    if column_to_look_for in df.columns:
        # Iterate over each value in the specified column
        for value in df[column_to_look_for].unique():
            # Check if the field has some value
            if pd.notnull(value):
                # Filter rows where the specified column has the exact same value
                matched_rows = df[df[column_to_look_for] == value]

                # Check if there are more than one row with the same value
                if len(matched_rows) > 1:
                    # Perform additional matching based on Technology, Fuel, and CHPType
                    # You can adjust the conditions as per your requirement
                    matched_rows = matched_rows.groupby(['Technology', 'Fuel', 'CHPType']).filter(lambda x: len(x) > 1)

                    if len(matched_rows) > 1:
                        # Sum the corresponding fields
                        summed_fields = matched_rows[['PowerCapacity', 'Nunits', 'StartUpCost', 'NoLoadCost_pu',
                                                       'RampingCost', 'CHPPowerToHeat', 'CHPMaxHeat', 'STOCapacity',
                                                       'WaterWithdrawal', 'WaterConsumption']].sum()

                        # Calculate the average of the corresponding fields
                        averaged_fields = matched_rows[['Efficiency', 'MinUpTime', 'MinDownTime', 'RampUpRate',
                                                         'RampDownRate', 'PartLoadMin', 'MinEfficiency', 'StartUpTime',
                                                         'CO2Intensity', 'CHPPowerLossFactor', 'CHPMaxHeat', 'COP',
                                                         'Tnominal', 'coef_COP_a', 'coef_COP_b', 'STOSelfDischarge',
                                                         'STOMaxChargingPower', 'STOChargingEfficiency']].mean()

                        # Keep the first row and update its values with the sums and averages
                        first_row_index = matched_rows.index[0]
                        df.loc[first_row_index, ['PowerCapacity', 'Nunits', 'StartUpCost', 'NoLoadCost_pu',
                                                 'RampingCost', 'CHPPowerToHeat', 'CHPMaxHeat', 'STOCapacity',
                                                 'WaterWithdrawal', 'WaterConsumption']] = summed_fields
                        df.loc[first_row_index, ['Efficiency', 'MinUpTime', 'MinDownTime', 'RampUpRate',
                                                 'RampDownRate', 'PartLoadMin', 'MinEfficiency', 'StartUpTime',
                                                 'CO2Intensity', 'CHPPowerLossFactor', 'CHPMaxHeat', 'COP',
                                                 'Tnominal', 'coef_COP_a', 'coef_COP_b', 'STOSelfDischarge',
                                                 'STOMaxChargingPower', 'STOChargingEfficiency']] = averaged_fields

                        # Copy the matched rows to a new DataFrame
                        clustered_df = matched_rows.copy()

                        # Write the matched rows to a new CSV file
                        output_file_path = os.path.splitext(file_path)[0] + '_clustered.csv'
                        clustered_df.to_csv(output_file_path, index=False)

                        # Erase the rows from the original DataFrame
                        df.drop(matched_rows.index, inplace=True)

    # Write the modified DataFrame back to the original CSV file
    df.to_csv(file_path, index=False)

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
10. Copying Power Plants Formatted Data
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Moving the already clean and formatted data of the power units to the main Dispa-SET data base directory.
     <br>
    <div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 15px; font-family: TimesNewRoman;">
    - The data base directory has to be identified.
</div>

In [None]:
additional_path_1 = "/Database/PowerPlants/"

# Construct the power_plants_raw_data_folder_path variable
power_plants_data_base_folder_path = dispaSET_unleash_folder_path + additional_path_1

In [None]:
# Iterate over each zone in the list
for zone in zone_names:
    # Define the source file path
    source_file_path = os.path.join(power_plants_raw_data_folder_path, zone, f"{data_year}.csv")
    
    # Define the destination folder path
    destination_folder_path = os.path.join(power_plants_data_base_folder_path, zone)
    
    # Define the destination file path
    destination_file_path = os.path.join(destination_folder_path, f"{data_year}.csv")
    
    # Check if the source file exists
    if os.path.exists(source_file_path):
        # Ensure the destination folder exists
        os.makedirs(destination_folder_path, exist_ok=True)
        
        # Copy the file
        shutil.copyfile(source_file_path, destination_file_path)
        print(f"Copied {source_file_path} to {destination_file_path}")
    else:
        print(f"Source file does not exist: {source_file_path}")

print("Task completed.")

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    10. Final considerations 
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    All the files with the appropiated format needed for Dispa-SET simulations are located at the folders named as the created zone entered in section 3 and follows the next path patron:
    Local/paht/to/dispa-SET/folder/RawData/PowerPlants/zone_created
    <br>
    All the formated files are named with the value introduced in the variable "data_year" entered in section 4
     <br>
    The files can be copied to the needed location to into the Dispa-SET directory envinronment.
    <br>
    In the other hand, the formated files still contain columns that are not usefull for Dispa-SET runs, however as the same are not taken account for the simulation those columns can be removed manually or not. In future version of this code is going to be implemented a final stept ot eliminate these columns automatically
</div