<div style="text-align: center; margin-left: 0em; font-weight: bold; font-size: 20px; font-family: TimesNewRoman;">
    POWER PLANTS DATA PROCESSING - Main Notebook
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Each part of the following script was used to proccess the raw data for power plants units of the Dispa-SET_Unleash project. Read explanation text cells to follow and understand all the process until final results were got stept by step.
</div>

<div style="text-align: left; margin-left: 0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    1. Notebook Set Up
</div>

<div style="text-align: left; margin-left: 1.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Importing needed libraries
</div>

In [157]:
import os
import csv
from datetime import datetime
import requests
import pandas as pd
from shutil import move

<div style="text-align: left; margin-left: 0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    2. Dispa-SET_Unleash Folder Path
</div>

<div style="text-align: left; margin-left: 1.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Determinning dynamically the zone_folder_path based on the location of the "Dispa-SET_Unleash" folder relative to the current working directory. If the "Dispa-SET_Unleash" folder is copied to a different machine or location, the dispaSET_unleash_folder_path variable will automatically adjust accordingly.
</div>

In [140]:
# Get the current working directory
current_directory = os.getcwd()

# Navigate to the parent directory of "Dispa-SET_Unleash"
dispaSET_unleash_parent_directory = os.path.dirname(current_directory)

# Get the path to the "Dispa-SET_Unleash" folder
dispaSET_unleash_folder_path = os.path.dirname(dispaSET_unleash_parent_directory)

# Construct the dispaSET_unleash_folder_name variable
dispaSET_unleash_folder_name = os.path.basename(dispaSET_unleash_folder_path)

print("dispaSET_unleash_folder_name:", dispaSET_unleash_folder_name)
print("dispaSET_unleash_folder_path:", dispaSET_unleash_folder_path)

dispaSET_unleash_folder_name: Dispa-SET_Unleash
dispaSET_unleash_folder_path: /home/ray/Dispa-SET_Unleash


<div style="text-align: left; margin-left: 0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    3. Zone(s) Creation
</div>

<div style="text-align: left; margin-left: 1.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Entering the zone name or names (in case of more than one zone wanted to be modelled) to create the folder where all data related to the corresponding zone are going to be storage.
</div>

<div style="text-align: left; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    For European country names use the ISO 3166-1 standars i.e. AT, BE, BG, CH.... etc. to give the zone_name.
    <br>
    For non European countries it would rather to call the zone_name with the same word of how it is defined in the data to be downloaded and processed. e.g. 
    <br>
    <div style="text-align: left; margin-left: 1.50em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
        If it is downloading a csv file with all power plants of Spain but just the units fo Pamplona city are wanted, and in the corresponding donwloaded file, Pamplona is refering with the acronim "PMPLN"; set the zone_name variable with the word "PMPLN".
</div>

In [141]:
# List of folder names to create
zone_names = ["DE", 
              "DK", 
              "CH",
              "BE"]

In [142]:
# Original value of dispaSET_unleash_folder_path
#dispaSET_unleash_folder_path = "/home/ray/Dispa-SET_Unleash"

# Additional string to be appended
additional_path = "/RawData/PowerPlants/"

# Construct the power_plants_raw_data_folder_path variable
power_plants_raw_data_folder_path = dispaSET_unleash_folder_path + additional_path
print("power_plants_raw_data_folder_path:", power_plants_raw_data_folder_path)

# Dictionary to store created zone paths
created_zones = {}

# Create the zone
for zone_name in zone_names:
    zone_path = os.path.join(power_plants_raw_data_folder_path, zone_name)
    os.makedirs(zone_path, exist_ok=True)
    created_zones[zone_name] = zone_path
    print(f"Created zone: {zone_path}")

# Print the created zone paths
print("Created zones:")
for zone_name, zone_path in created_zones.items():
    print(f"{zone_name}: {zone_path}")
    
created_zones

power_plants_raw_data_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/
Created zone: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE
Created zone: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK
Created zone: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH
Created zone: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE
Created zones:
DE: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE
DK: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK
CH: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH
BE: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE


{'DE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE',
 'DK': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK',
 'CH': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH',
 'BE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE'}

<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [154]:
print (f"dispaSET_unleash_folder_name: {dispaSET_unleash_folder_name}")
print (f"dispaSET_unleash_folder_path: {dispaSET_unleash_folder_path}")
print (f"power_plants_raw_data_folder_path: {power_plants_raw_data_folder_path}")
print (f"zone_names: {zone_names}")
print (f"created_zones: {created_zones}")

dispaSET_unleash_folder_name: Dispa-SET_Unleash
dispaSET_unleash_folder_path: /home/ray/Dispa-SET_Unleash
power_plants_raw_data_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/
zone_names: ['DE', 'DK', 'CH', 'BE']
created_zones: {'DE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE', 'DK': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK', 'CH': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH', 'BE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE'}


<div style="text-align: left; margin-left: 0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    4. Download Link Sources
</div>

<div style="text-align: left; margin-left: 1.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Entering the all the download links of where the raw data is content.
    <br>
        That list is going to be saved to be used as input for next stages.
      <br>
    <div style="text-align: left; margin-left: 1.50em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
        Notice that to process the data all the links has to download .csv files.
    <br>
    <div style="text-align: left; margin-left: -1.5em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    In the other hand, it is important to define which zone is refering the download link sources.
    <br>
    <div style="text-align: left; margin-left: 1.5em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    If all the downloaded file contents data that belongs to only one zone, epecify it in the variable download_links_zone_related applying the same order than the variable download_links.
    <br>
    If the downloaded file contents data that refers different zones at the same time, specify it with the word "General" in the variable download_links_zone_related using the same order than the variable download_links.
    <br>
    <div style="text-align: left; margin-left: 2.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Remember that the next filtering stages depend on the correct setting of this step.
    <br>
    <div style="text-align: left; margin-left: -1.5em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Additionally indicate the year of all data is referring to.
    <br>
    <div style="text-align: left; margin-left: 1.5em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    This is going to be used as the name root under which all next files are going to be created.
</div>

In [144]:
# List of the download links:
download_links = [
    'https://data.open-power-system-data.org/conventional_power_plants/2020-10-01/conventional_power_plants_EU.csv',
    'https://data.open-power-system-data.org/renewable_power_plants/2020-08-25/renewable_power_plants_DE.csv',
    'https://data.open-power-system-data.org/renewable_power_plants/2020-08-25/renewable_power_plants_DK.csv',
    'https://data.open-power-system-data.org/renewable_power_plants/2020-08-25/renewable_power_plants_CH.csv',
    'https://opendata.elia.be/api/explore/v2.1/catalog/datasets/ods036/exports/csv?lang=en&timezone=Europe%2FBrussels&use_labels=true&delimiter=%3B'
]

In [145]:
# List of zones related to the download links:
download_links_zone_related = [
    'General',
    'DE',
    'DK',
    'CH',
    'BE'
]

In [146]:
# Year to which data refers to:
data_year = '2020'

In [147]:
def save_download_links_to_csv(links, zones, folder_path, data_year):
    # Create the filename using the data year, current date, and time
    now = datetime.now()
    timestamp = now.strftime("%Y%m%d_%H%M%S")
    file_name = f"{data_year}_power_plants_raw_data_sources_{timestamp}.csv"
    
    # Create a folder with the same name as the file (without extension)
    folder_name = os.path.splitext(file_name)[0]
    folder_path = os.path.join(folder_path, folder_name)
    os.makedirs(folder_path, exist_ok=True)
    
    # Combine the folder path and filename
    file_path = os.path.join(folder_path, file_name)
    
    # Write links to CSV file
    with open(file_path, 'w', newline='') as csvfile:
        writer = csv.writer(csvfile)
        
        # Write the header
        writer.writerow(['Download_Link_Sources', 'Zone', 'File_Name'])
        
        # Write the links, zones, and file names
        for i, (link, zone) in enumerate(zip(links, zones), start=1):
            writer.writerow([link, zone, i])
    
    print(f"Download links saved to: {file_path}")
    
    return file_path, file_name

# Save the download links to a CSV file and get the file path and name
power_plants_raw_data_sources_file_path, power_plants_raw_data_sources_file_name = save_download_links_to_csv(download_links, download_links_zone_related, power_plants_raw_data_folder_path, data_year)

print("File path:", power_plants_raw_data_sources_file_path)
print("File name:", power_plants_raw_data_sources_file_name)

Download links saved to: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240405_083658/2020_power_plants_raw_data_sources_20240405_083658.csv
File path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240405_083658/2020_power_plants_raw_data_sources_20240405_083658.csv
File name: 2020_power_plants_raw_data_sources_20240405_083658.csv


<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [155]:
print (f"dispaSET_unleash_folder_name: {dispaSET_unleash_folder_name}")
print (f"dispaSET_unleash_folder_path: {dispaSET_unleash_folder_path}")
print (f"power_plants_raw_data_folder_path: {power_plants_raw_data_folder_path}")
print (f"zone_names: {zone_names}")
print (f"created_zones: {created_zones}")
print (f"power_plants_raw_data_sources_file_path: {power_plants_raw_data_sources_file_path}")
print (f"power_plants_raw_data_sources_file_name: {power_plants_raw_data_sources_file_name}")
print (f"data_year: {data_year}")
print (f"download_links_zone_related: {download_links_zone_related}")

dispaSET_unleash_folder_name: Dispa-SET_Unleash
dispaSET_unleash_folder_path: /home/ray/Dispa-SET_Unleash
power_plants_raw_data_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/
zone_names: ['DE', 'DK', 'CH', 'BE']
created_zones: {'DE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE', 'DK': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK', 'CH': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH', 'BE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE'}
power_plants_raw_data_sources_file_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240405_083658/2020_power_plants_raw_data_sources_20240405_083658.csv
power_plants_raw_data_sources_file_name: 2020_power_plants_raw_data_sources_20240405_083658.csv
data_year: 2020
download_links_zone_related: ['General', 'DE', 'DK', 'CH', 'BE']


<div style="text-align: left; margin-left: 0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    5. Power Plants Raw Data Download Files
</div>

<div style="text-align: left; margin-left: 1.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Using the download list given previously to download and save all the power units raw data files inside a folder called as is it specified in the variable power_plants raw_data_sources _file_path.
    <br>
    All the downloaded files are named under the ordering of the download_links list.
</div>

In [149]:
def download_files_from_csv(csv_file_path):
    # Create a folder to save downloaded files
    download_folder = os.path.dirname(csv_file_path)
    
    # Open and read the CSV file
    with open(csv_file_path, 'r') as csvfile:
        reader = csv.DictReader(csvfile)
        
        # Iterate over each row
        for row in reader:
            download_link = row['Download_Link_Sources']
            file_name = row['File_Name']
            
            # Download the file from the URL
            response = requests.get(download_link)
            
            # Check if the request was successful
            if response.status_code == 200:
                # Save the downloaded file
                file_path = os.path.join(download_folder, file_name)
                with open(file_path, 'wb') as f:
                    f.write(response.content)
                print(f"File '{file_name}' downloaded and saved successfully.")
            else:
                print(f"Failed to download file from '{download_link}'.")

# Path to the recently created CSV file
recently_created_csv_file_path = power_plants_raw_data_sources_file_path

# Call the function to download files from the CSV
download_files_from_csv(recently_created_csv_file_path)

File '1' downloaded and saved successfully.
File '2' downloaded and saved successfully.
File '3' downloaded and saved successfully.
File '4' downloaded and saved successfully.
File '5' downloaded and saved successfully.


<div style="text-align: left; margin-left: 0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    6. Zone Classification
</div>

<div style="text-align: left; margin-left: 1.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Filtering the data contained in each downloaded file accordingly the zone previously specified. 
</div>

<div style="text-align: left; margin-left: 1.5em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    6.1. Zone Definition
</div>

<div style="text-align: left; margin-left: 3.5em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Adding a new column under the name "Country" to each downloaded file which have been relationated with a key Zone in the list download_links_zone_related fulfilled previously
    <br>
    For all those files that have been set with the key "General", it is assumed that so file contains data from various zones, so It will be filtered in a different way.
</div>

In [150]:
# Read the CSV file specified in power_plants_raw_data_sources_file_path
df_sources = pd.read_csv(power_plants_raw_data_sources_file_path)

# Iterate over each row in the DataFrame
for index, row in df_sources.iterrows():
    file_name = str(row['File_Name'])  # Convert to string
    zone = row['Zone']
    
    # Check if the zone is not "General"
    if zone != "General":
        # Construct the path to the corresponding CSV file
        csv_file_path = os.path.join(os.path.dirname(power_plants_raw_data_sources_file_path), file_name)
        
        # Check if the CSV file exists
        if os.path.exists(csv_file_path):
            # Read the CSV file
            df_csv = pd.read_csv(csv_file_path)
            
            # Add a new column "Country" with the value from the "Zone" column
            df_csv['Country'] = zone
            
            # Write the updated DataFrame back to the CSV file
            df_csv.to_csv(csv_file_path, index=False)
            
            print(f"Added 'Country' column to {file_name} with value '{zone}'")
        else:
            print(f"CSV file {file_name} does not exist.")
    else:
        print(f"No action needed for {file_name}")


No action needed for 1


  df_csv = pd.read_csv(csv_file_path)


Added 'Country' column to 2 with value 'DE'
Added 'Country' column to 3 with value 'DK'
Added 'Country' column to 4 with value 'CH'
Added 'Country' column to 5 with value 'BE'


<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [156]:
print (f"dispaSET_unleash_folder_name: {dispaSET_unleash_folder_name}")
print (f"dispaSET_unleash_folder_path: {dispaSET_unleash_folder_path}")
print (f"power_plants_raw_data_folder_path: {power_plants_raw_data_folder_path}")
print (f"zone_names: {zone_names}")
print (f"created_zones: {created_zones}")
print (f"power_plants_raw_data_sources_file_path: {power_plants_raw_data_sources_file_path}")
print (f"power_plants_raw_data_sources_file_name: {power_plants_raw_data_sources_file_name}")
print (f"data_year: {data_year}")
print (f"download_links_zone_related: {download_links_zone_related}")

dispaSET_unleash_folder_name: Dispa-SET_Unleash
dispaSET_unleash_folder_path: /home/ray/Dispa-SET_Unleash
power_plants_raw_data_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/
zone_names: ['DE', 'DK', 'CH', 'BE']
created_zones: {'DE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE', 'DK': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK', 'CH': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH', 'BE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE'}
power_plants_raw_data_sources_file_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240405_083658/2020_power_plants_raw_data_sources_20240405_083658.csv
power_plants_raw_data_sources_file_name: 2020_power_plants_raw_data_sources_20240405_083658.csv
data_year: 2020
download_links_zone_related: ['General', 'DE', 'DK', 'CH', 'BE']


<div style="text-align: left; margin-left: 1.5em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    6.2. Raw Data File Zone Classification
</div>

<div style="text-align: left; margin-left: 3.5em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Moving each downloade file to its corresponding folder zone accordinlgy the ownload_links_zone_related list.
    <br>
    The files relationed to the key "General" just will keep their current location.
</div>

In [None]:
# Read the CSV file specified in power_plants_raw_data_sources_file_path
df_sources = pd.read_csv(power_plants_raw_data_sources_file_path)

# Iterate over each row in the DataFrame
for index, row in df_sources.iterrows():
    file_name = str(row['File_Name'])  # Convert to string
    zone = row['Zone']
    file_path = os.path.join(os.path.dirname(power_plants_raw_data_sources_file_path), file_name)
    
    # Check if the file exists
    if os.path.exists(file_path):
        # Open and read the file
        df_csv = pd.read_csv(file_path)
        
        # Check if the file has the header 'Country'
        if 'Country' in df_csv.columns:
            # Get the corresponding value of 'Zone'
            zone_value = row['Zone']
            
            # Check if the zone folder exists
            if zone_value in created_zones:
                # Construct the destination folder path
                destination_folder = created_zones[zone_value]
                
                # Move the file to the destination folder
                destination_file_path = os.path.join(destination_folder, file_name)
                move(file_path, destination_file_path)
                print(f"Moved file '{file_name}' to '{destination_folder}'")
            else:
                print(f"Destination folder for zone '{zone_value}' does not exist.")
        else:
            print(f"No 'Country' header found in file '{file_name}'. No action needed.")
    else:
        print(f"File '{file_name}' does not exist.")

<div style="text-align: left; margin-left: 0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    7. Data Formating
</div>

<div style="text-align: left; margin-left: 1.5em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    7.1. Clean Data File Creation
</div>

<div style="text-align: left; margin-left: 3.5em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Creating for each zone an empty csv file with all the technical features needed for Dispa-SET simulations as headers.
    <br>
    This file will be named under the value of the variable data year previously specified
    <br>
    On this csv file all the filtered data in the following steps will be written.
</div>

In [160]:
# Define the headers for the new CSV files
headers = [
    "", "Unit", "PowerCapacity", "Nunits", "Zone", "Zone_th", "Zone_h2", "Technology", "Fuel", "Efficiency",
    "MinUpTime", "MinDownTime", "RampUpRate", "RampDownRate", "StartUpCost", "NoLoadCost_pu", "RampingCost",
    "PartLoadMin", "MinEfficiency", "StartUpTime", "CO2Intensity", "CHPType", "CHPPowerToHeat",
    "CHPPowerLossFactor", "CHPMaxHeat", "COP", "Tnominal", "coef_COP_a", "coef_COP_b", "STOCapacity",
    "STOSelfDischarge", "STOMaxChargingPower", "STOChargingEfficiency", "WaterWithdrawal", "WaterConsumption", "Status", "Source",
    "Company"
]

# Function to create CSV file in a given folder
def create_csv_file(folder_path, data_year):
    # Construct the file path for the new CSV file
    csv_file_path = os.path.join(folder_path, f"{data_year}.csv")
    
    # Write the headers to the new CSV file
    with open(csv_file_path, 'w', newline='') as csvfile:
        writer = csv.writer(csvfile)
        writer.writerow(headers)
    
    print(f"Created CSV file at: {csv_file_path}")
    return csv_file_path

# Iterate over each zone name and its corresponding folder path
for zone_name, folder_path in created_zones.items():
    create_csv_file(folder_path, data_year)

# Create a CSV file in the folder of power_plants_raw_data_sources_file_path
create_csv_file(os.path.dirname(power_plants_raw_data_sources_file_path), data_year)


Created CSV file at: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020.csv
Created CSV file at: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK/2020.csv
Created CSV file at: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH/2020.csv
Created CSV file at: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE/2020.csv
Created CSV file at: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240405_083658/2020.csv


'/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240405_083658/2020.csv'

<div style="text-align: left; margin-left: 1.5em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    7.2. General Raw Data Files Zone Classification
</div>

<div style="text-align: left; margin-left: 3.5em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Coping all the data from all the downloaded files set with the key "General" to the cleand clean data file in their corresponding folde zone.
</div>

<div style="text-align: left; margin-left: 1.5em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    6.5. Clean Data File Creation
</div>

            3. POWER PLANTS DATA FORMATING

3.1. Defining variable as the file name of all clean data and the directory where it will be stored. 
    
    It is recommended to give the year which the data is refering to as the name of the power plants clean data file. 
    Aditionally, take account to give the appropiate file extension to the downloaded file name, .csv in this case:

    power_plants_clean_data_file_name = "20##.csv"
        e.g.
    power_plants_clean_data_file_name = "2020.csv"

In [12]:
# Define the power plants data file name
#power_plants_raw_data_folder_path = "/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/0"
power_plants_clean_data_file_name = "2020.csv"  # Replace with the desired file name
# Call the external Python script with defined variables
#%run -i Power_Plants_Data_Formating.py {power_plants_raw_data_folder_path} {power_plants_data_file_name} 

3.2. Creating .csv clean data file with its respective headers.

In [13]:
import sys
import pandas as pd
import os

def create_csv_file(folder_path, file_name, data=None):
  """
  Creates a CSV file with predefined headers using pandas.

  Args:
      folder_path (str): The path of the folder where the CSV file will be created.
      file_name (str): The name of the CSV file.
      data (list of lists or pandas.DataFrame, optional): Data to write to the CSV (default: None).

  Returns:
      None
  """

  headers = [
      "", "Unit", "PowerCapacity", "Nunits", "Zone", "Zone_th", "Zone_h2", "Technology", "Fuel", "Efficiency",
      "MinUpTime", "MinDownTime", "RampUpRate", "RampDownRate", "StartUpCost", "NoLoadCost_pu", "RampingCost",
      "PartLoadMin", "MinEfficiency", "StartUpTime", "CO2Intensity", "CHPType", "CHPPowerToHeat",
      "CHPPowerLossFactor", "CHPMaxHeat", "COP", "Tnominal", "coef_COP_a", "coef_COP_b", "STOCapacity",
      "STOSelfDischarge", "STOMaxChargingPower", "STOChargingEfficiency", "WaterWithdrawal", "WaterConsumption", "Status", "Source",
      "Company"
  ]

  # Create empty DataFrame if no data provided
  if data is None:
      data = []
  elif not isinstance(data, (list, pd.DataFrame)):
      raise TypeError("Data must be a list of lists or a pandas.DataFrame")

  df = pd.DataFrame(data, columns=headers)

  # Handle potential directory error and combine path with file name
  try:
      file_path = os.path.join(folder_path, file_name)
  except IsADirectoryError:
      print(f"Error: '{folder_path}' is a directory. Please provide a file name within the directory.")
      return None

  # Save DataFrame to CSV file
  df.to_csv(file_path, index=False)  # Don't include row index in CSV

  print(f"CSV file '{file_name}' created with headers: {', '.join(headers)}")
  return file_path

#power_plants_raw_data_folder_path = "/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/0"
#power_plants_clean_data_file_name = "2020.csv"

# Input variables from command line arguments
#power_plants_raw_data_folder_path = sys.argv[1]
#power_plants_data_file_name = sys.argv[2]


create_csv_file(power_plants_raw_data_folder_path, power_plants_clean_data_file_name, data=None)
file_path = create_csv_file(power_plants_raw_data_folder_path, power_plants_clean_data_file_name, data=None)

power_plants_clean_data_folder_path = power_plants_raw_data_folder_path  # Copy the path to a new variable
power_plants_clean_data_file_path = file_path

print("power_plants_clean_data_folder_path:", power_plants_clean_data_folder_path)
print("power_plants_clean_data_file_path:", power_plants_clean_data_file_path)

CSV file '2020.csv' created with headers: , Unit, PowerCapacity, Nunits, Zone, Zone_th, Zone_h2, Technology, Fuel, Efficiency, MinUpTime, MinDownTime, RampUpRate, RampDownRate, StartUpCost, NoLoadCost_pu, RampingCost, PartLoadMin, MinEfficiency, StartUpTime, CO2Intensity, CHPType, CHPPowerToHeat, CHPPowerLossFactor, CHPMaxHeat, COP, Tnominal, coef_COP_a, coef_COP_b, STOCapacity, STOSelfDischarge, STOMaxChargingPower, STOChargingEfficiency, WaterWithdrawal, WaterConsumption, Status, Source, Company
CSV file '2020.csv' created with headers: , Unit, PowerCapacity, Nunits, Zone, Zone_th, Zone_h2, Technology, Fuel, Efficiency, MinUpTime, MinDownTime, RampUpRate, RampDownRate, StartUpCost, NoLoadCost_pu, RampingCost, PartLoadMin, MinEfficiency, StartUpTime, CO2Intensity, CHPType, CHPPowerToHeat, CHPPowerLossFactor, CHPMaxHeat, COP, Tnominal, coef_COP_a, coef_COP_b, STOCapacity, STOSelfDischarge, STOMaxChargingPower, STOChargingEfficiency, WaterWithdrawal, WaterConsumption, Status, Source, Co

- verifiyng variables

  This cells are just to confirm all the file names, file paths and other information related to the data being processed.
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.

In [14]:
print (f"zone_folder_name: {zone_folder_name}\nzone_folder_path: {zone_folder_path}\npower_plants_raw_data_folder_path: {power_plants_raw_data_folder_path}\npower_plants_raw_data_file_name: {power_plants_raw_data_file_name}\npower_plants_raw_data_download_link: {power_plants_raw_data_download_link}\npower_plants_raw_data_file_path: {power_plants_raw_data_file_path}\npower_plants_raw_data_download_link_sources_file_name: {power_plants_raw_data_download_link_sources_file_name}\npower_plants_raw_data_download_link_sources_folder_path: {power_plants_raw_data_download_link_sources_folder_path}\npower_plants_clean_data_file_name: {power_plants_clean_data_file_name}\npower_plants_clean_data_folder_path {power_plants_clean_data_folder_path}\npower_plants_clean_data_file_path {power_plants_clean_data_file_path}")

zone_folder_name: DE
zone_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants
power_plants_raw_data_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE
power_plants_raw_data_file_name: 2020-01.csv
power_plants_raw_data_download_link: https://data.open-power-system-data.org/conventional_power_plants/2020-10-01/conventional_power_plants_DE.csv
power_plants_raw_data_file_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020-01.csv
power_plants_raw_data_download_link_sources_file_name: power_plants_raw_data_download_link_sources.csv
power_plants_raw_data_download_link_sources_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants
power_plants_clean_data_file_name: 2020.csv
power_plants_clean_data_folder_path /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE
power_plants_clean_data_file_path /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020.csv


3.3. Copying the corresponding data to the power_plants_clean_data_file 

    This part compares headers of both files power_plants_raw_data_file and power_plants_clean_data_file to select the common ones according to the equivalencies of the power_plants_clean_data_equivalent_headers_file to finally copy all the data of the selected columns.

In [15]:
power_plants_clean_data_equivalent_headers_file_name = 'power_plants_clean_data_equivalent_headers.csv'
power_plants_clean_data_equivalent_headers_file_path = '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_clean_data_equivalent_headers.csv'

In [16]:
import pandas as pd

# Define file paths
power_plants_raw_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020-01.csv"
power_plants_clean_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020.csv"
power_plants_clean_data_equivalent_headers_file_path = "/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_clean_data_equivalent_headers.csv"

# Read the headers from the raw data file
raw_data_headers = pd.read_csv(power_plants_raw_data_file_path, nrows=0).columns.tolist()

# Read the equivalent headers file
equivalent_headers_df = pd.read_csv(power_plants_clean_data_equivalent_headers_file_path)

# Initialize an empty dictionary to store mapping of equivalent headers to dispaset headers
equivalent_to_dispaset_mapping = {}

# Iterate over the rows in the equivalent headers DataFrame
for index, row in equivalent_headers_df.iterrows():
    # Get the equivalent headers from all columns
    equivalent_headers = []
    for i in range(1, 9):  # Assuming there are 8 columns for equivalent headers
        col_name = f'Equivalent_Headers_{i}'
        if col_name in row and isinstance(row[col_name], str):
            equivalent_headers.extend(row[col_name].split(','))
    
    # Get the dispaset headers
    dispaset_headers = row['Dispaset_Headers'].split(',')
    
    # Map each equivalent header to its corresponding dispaset header
    for eq_header, disp_header in zip(equivalent_headers, dispaset_headers):
        equivalent_to_dispaset_mapping[eq_header] = disp_header

# Read the clean data file to get all existing headers
existing_headers_df = pd.read_csv(power_plants_clean_data_file_path, nrows=0)
existing_headers = existing_headers_df.columns.tolist()

# Read the raw data file
raw_data_df = pd.read_csv(power_plants_raw_data_file_path)

# Initialize a new DataFrame to store the copied columns along with existing headers
clean_data_df = pd.DataFrame()

# Iterate over the existing headers in the clean data file
for existing_header in existing_headers:
    # Check if the existing header exists in the equivalent headers mapping
    if existing_header in equivalent_to_dispaset_mapping.values():
        # Find the corresponding equivalent headers
        equivalent_headers = [key for key, value in equivalent_to_dispaset_mapping.items() if value == existing_header]
        for equivalent_header in equivalent_headers:
            if equivalent_header in raw_data_df.columns:
                # Copy the corresponding column from the raw data file to the clean data DataFrame
                clean_data_df[existing_header] = raw_data_df[equivalent_header]
                break  # Copy only the first matching column
    else:
        # If the existing header doesn't have a corresponding equivalent, copy it as is
        clean_data_df[existing_header] = None

# Write the clean data DataFrame to the clean data file
clean_data_df.to_csv(power_plants_clean_data_file_path, index=False)

print("Columns copied successfully.")


Columns copied successfully.


- verifiyng variables

  This cells are just to confirm all the file names, file paths and other information related to the data being processed.
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.

In [17]:
print (f"zone_folder_name: {zone_folder_name}\nzone_folder_path: {zone_folder_path}\npower_plants_raw_data_folder_path: {power_plants_raw_data_folder_path}\npower_plants_raw_data_file_name: {power_plants_raw_data_file_name}\npower_plants_raw_data_download_link: {power_plants_raw_data_download_link}\npower_plants_raw_data_file_path: {power_plants_raw_data_file_path}\npower_plants_raw_data_download_link_sources_file_name: {power_plants_raw_data_download_link_sources_file_name}\npower_plants_raw_data_download_link_sources_folder_path: {power_plants_raw_data_download_link_sources_folder_path}\npower_plants_clean_data_file_name: {power_plants_clean_data_file_name}\npower_plants_clean_data_folder_path {power_plants_clean_data_folder_path}\npower_plants_clean_data_file_path {power_plants_clean_data_file_path}\npower_plants_clean_data_equivalent_headers_file_name: {power_plants_clean_data_equivalent_headers_file_name}\npower_plants_clean_data_equivalent_headers_file_path: {power_plants_clean_data_equivalent_headers_file_path}")

zone_folder_name: DE
zone_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants
power_plants_raw_data_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE
power_plants_raw_data_file_name: 2020-01.csv
power_plants_raw_data_download_link: https://data.open-power-system-data.org/conventional_power_plants/2020-10-01/conventional_power_plants_DE.csv
power_plants_raw_data_file_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020-01.csv
power_plants_raw_data_download_link_sources_file_name: power_plants_raw_data_download_link_sources.csv
power_plants_raw_data_download_link_sources_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants
power_plants_clean_data_file_name: 2020.csv
power_plants_clean_data_folder_path /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE
power_plants_clean_data_file_path /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020.csv
power_plants_clean_data_equivalent_headers_file_name: power_plants_clean_data_equivalent_headers.csv
power

3.4. Zone Field Filter

    This part erase all data that does not belongs to the zone/country/region/state/etc. to be modeled.
    This stept can be jumped if it would rather to use an a single file where all units of all the zones are placed in however it is recommended to keep the dispa-SET directories structure i.e. storage all the power plants data related to a zone in a separated folder.
    Additioanlly, if the name of the zone/country/region to be filtered is not the same of the zone_folder_name variable uncommen the next line and comment and enter the correponding name of the zone wanted to be filtered next comment the subsecuent line.

In [18]:
#zone_name= "Set the name of zone to be filtered"
zone_name = zone_folder_name

In [19]:
import pandas as pd

# Variables
#power_plants_clean_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020.csv"
#zone_name = zone_folder_name

# Read CSV file
df = pd.read_csv(power_plants_clean_data_file_path)

# Filter rows where the "Zone" column matches zone_folder_name
filtered_df = df[df["Zone"] == zone_name]

# Overwrite the CSV file with the filtered DataFrame
filtered_df.to_csv(power_plants_clean_data_file_path, index=False)

print("Rows filtered successfully.")

Rows filtered successfully.


- verifiyng variables

  This cells are just to confirm all the file names, file paths and other information related to the data being processed.
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.

In [20]:
print (f"zone_folder_name: {zone_folder_name}\nzone_folder_path: {zone_folder_path}\npower_plants_raw_data_folder_path: {power_plants_raw_data_folder_path}\npower_plants_raw_data_file_name: {power_plants_raw_data_file_name}\npower_plants_raw_data_download_link: {power_plants_raw_data_download_link}\npower_plants_raw_data_file_path: {power_plants_raw_data_file_path}\npower_plants_raw_data_download_link_sources_file_name: {power_plants_raw_data_download_link_sources_file_name}\npower_plants_raw_data_download_link_sources_folder_path: {power_plants_raw_data_download_link_sources_folder_path}\npower_plants_clean_data_file_name: {power_plants_clean_data_file_name}\npower_plants_clean_data_folder_path {power_plants_clean_data_folder_path}\npower_plants_clean_data_file_path {power_plants_clean_data_file_path}\npower_plants_clean_data_equivalent_headers_file_name: {power_plants_clean_data_equivalent_headers_file_name}\npower_plants_clean_data_equivalent_headers_file_path: {power_plants_clean_data_equivalent_headers_file_path}")

zone_folder_name: DE
zone_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants
power_plants_raw_data_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE
power_plants_raw_data_file_name: 2020-01.csv
power_plants_raw_data_download_link: https://data.open-power-system-data.org/conventional_power_plants/2020-10-01/conventional_power_plants_DE.csv
power_plants_raw_data_file_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020-01.csv
power_plants_raw_data_download_link_sources_file_name: power_plants_raw_data_download_link_sources.csv
power_plants_raw_data_download_link_sources_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants
power_plants_clean_data_file_name: 2020.csv
power_plants_clean_data_folder_path /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE
power_plants_clean_data_file_path /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020.csv
power_plants_clean_data_equivalent_headers_file_name: power_plants_clean_data_equivalent_headers.csv
power

3.5.  Nunits Field Filter

    This part assign a number of 1 to the column called Units of the respective cell that does not have any data.

In [21]:
import pandas as pd

# Variables
#power_plants_clean_data_file_path = "/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020.csv"
Header_3 = "Nunits"

# Read CSV file
df = pd.read_csv(power_plants_clean_data_file_path)

# Fill empty cells in the "Nunits" column with 1
df[Header_3] = df[Header_3].fillna(1)

# Write the updated DataFrame back to the CSV file
df.to_csv(power_plants_clean_data_file_path, index=False)

print("Empty cells in the 'Nunits' column have been filled with 1.")

Empty cells in the 'Nunits' column have been filled with 1.


- verifiyng variables

  This cells are just to confirm all the file names, file paths and other information related to the data being processed.
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.

In [22]:
print (f"zone_folder_name: {zone_folder_name}\nzone_folder_path: {zone_folder_path}\npower_plants_raw_data_folder_path: {power_plants_raw_data_folder_path}\npower_plants_raw_data_file_name: {power_plants_raw_data_file_name}\npower_plants_raw_data_download_link: {power_plants_raw_data_download_link}\npower_plants_raw_data_file_path: {power_plants_raw_data_file_path}\npower_plants_raw_data_download_link_sources_file_name: {power_plants_raw_data_download_link_sources_file_name}\npower_plants_raw_data_download_link_sources_folder_path: {power_plants_raw_data_download_link_sources_folder_path}\npower_plants_clean_data_file_name: {power_plants_clean_data_file_name}\npower_plants_clean_data_folder_path {power_plants_clean_data_folder_path}\npower_plants_clean_data_file_path {power_plants_clean_data_file_path}\npower_plants_clean_data_equivalent_headers_file_name: {power_plants_clean_data_equivalent_headers_file_name}\npower_plants_clean_data_equivalent_headers_file_path: {power_plants_clean_data_equivalent_headers_file_path}")

zone_folder_name: DE
zone_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants
power_plants_raw_data_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE
power_plants_raw_data_file_name: 2020-01.csv
power_plants_raw_data_download_link: https://data.open-power-system-data.org/conventional_power_plants/2020-10-01/conventional_power_plants_DE.csv
power_plants_raw_data_file_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020-01.csv
power_plants_raw_data_download_link_sources_file_name: power_plants_raw_data_download_link_sources.csv
power_plants_raw_data_download_link_sources_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants
power_plants_clean_data_file_name: 2020.csv
power_plants_clean_data_folder_path /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE
power_plants_clean_data_file_path /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020.csv
power_plants_clean_data_equivalent_headers_file_name: power_plants_clean_data_equivalent_headers.csv
power

3.6.  Unit Field Filter

    This part joint all the units with the same name, Technology and Fuel adding their respective capacity and number of units and CHPMaxHeat values.
    Additionally takes the average of the Efficiency field as resulting value.

In [23]:
import pandas as pd

# Variables
Header_1 = "Unit"
Header_2 = "PowerCapacity"
Header_3 = "Nunits"
Header_4 = "Technology"
Header_5 = "Fuel"
Header_6 = "Efficiency"
Header_7 = "PowerCapacity"
Header_8 = "CHPType"
Header_9 = "CHPMaxHeat"

# Read CSV file
df = pd.read_csv(power_plants_clean_data_file_path)

# Find duplicated cells in the Header_1 column
duplicated_units = df[df.duplicated(subset=[Header_1], keep=False)]

# Process duplicated units
for unit in duplicated_units[Header_1].unique():
    # Filter rows with the current duplicated unit
    unit_rows = df[df[Header_1] == unit]
    
    # Check if there are matching values in Header_4 and Header_5
    if unit_rows[Header_4].nunique() == 1 and unit_rows[Header_5].nunique() == 1:
        # Calculate the average of Header_6
        average_efficiency = unit_rows[Header_6].mean()
        
        # Convert Header_9 to numeric and then sum its values
        unit_rows[Header_9] = pd.to_numeric(unit_rows[Header_9], errors='coerce')
        total_chp_max_heat = unit_rows[Header_9].sum()
        
        # Sum the values of Header_3, Header_7, and Header_9
        total_nunits = unit_rows[Header_3].sum()
        total_power_capacity = unit_rows[Header_7].sum()
        
        # Put the sum and average results in the first duplicated row
        first_row_index = unit_rows.index[0]
        df.at[first_row_index, Header_3] = total_nunits
        df.at[first_row_index, Header_6] = average_efficiency
        df.at[first_row_index, Header_7] = total_power_capacity
        df.at[first_row_index, Header_9] = total_chp_max_heat
        
        # Update Header_8 with next non-empty value if necessary
        if pd.isnull(df.at[first_row_index, Header_8]):
            for index, row in unit_rows.iterrows():
                if not pd.isnull(row[Header_8]):
                    df.at[first_row_index, Header_8] = row[Header_8]
                    break
        
        # Drop other duplicated rows
        df.drop(unit_rows.index[1:], inplace=True)

# Write the updated DataFrame back to the CSV file
df.to_csv(power_plants_clean_data_file_path, index=False)

print("Duplicated cells in the 'Unit' column processed successfully.")


Duplicated cells in the 'Unit' column processed successfully.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  unit_rows[Header_9] = pd.to_numeric(unit_rows[Header_9], errors='coerce')


- verifiyng variables

  This cells are just to confirm all the file names, file paths and other information related to the data being processed.
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.

In [24]:
print (f"zone_folder_name: {zone_folder_name}\nzone_folder_path: {zone_folder_path}\npower_plants_raw_data_folder_path: {power_plants_raw_data_folder_path}\npower_plants_raw_data_file_name: {power_plants_raw_data_file_name}\npower_plants_raw_data_download_link: {power_plants_raw_data_download_link}\npower_plants_raw_data_file_path: {power_plants_raw_data_file_path}\npower_plants_raw_data_download_link_sources_file_name: {power_plants_raw_data_download_link_sources_file_name}\npower_plants_raw_data_download_link_sources_folder_path: {power_plants_raw_data_download_link_sources_folder_path}\npower_plants_clean_data_file_name: {power_plants_clean_data_file_name}\npower_plants_clean_data_folder_path {power_plants_clean_data_folder_path}\npower_plants_clean_data_file_path {power_plants_clean_data_file_path}\npower_plants_clean_data_equivalent_headers_file_name: {power_plants_clean_data_equivalent_headers_file_name}\npower_plants_clean_data_equivalent_headers_file_path: {power_plants_clean_data_equivalent_headers_file_path}")

zone_folder_name: DE
zone_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants
power_plants_raw_data_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE
power_plants_raw_data_file_name: 2020-01.csv
power_plants_raw_data_download_link: https://data.open-power-system-data.org/conventional_power_plants/2020-10-01/conventional_power_plants_DE.csv
power_plants_raw_data_file_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020-01.csv
power_plants_raw_data_download_link_sources_file_name: power_plants_raw_data_download_link_sources.csv
power_plants_raw_data_download_link_sources_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants
power_plants_clean_data_file_name: 2020.csv
power_plants_clean_data_folder_path /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE
power_plants_clean_data_file_path /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020.csv
power_plants_clean_data_equivalent_headers_file_name: power_plants_clean_data_equivalent_headers.csv
power

3.8 Clustering Power Units Data

    This part join all the units of a same "Company" creating a new separated csv file as a clustered verions of the data. The same conditions of the last code was used to this stept too.

In [25]:
import pandas as pd
import os

# Variables
Header_1 = "Company"  # Change to the desired column name
Header_2 = "PowerCapacity"
Header_3 = "Nunits"
Header_4 = "Technology"
Header_5 = "Fuel"
Header_6 = "Efficiency"
Header_7 = "PowerCapacity"
Header_8 = "CHPType"
Header_9 = "CHPMaxHeat"

# Read CSV file
df = pd.read_csv(power_plants_clean_data_file_path)

# Find duplicated cells in the Header_1 column
duplicated_units = df[df.duplicated(subset=[Header_1], keep=False)]

# Process duplicated units
for unit in duplicated_units[Header_1].unique():
    # Filter rows with the current duplicated unit
    unit_rows = df[df[Header_1] == unit]
    
    # Check if there are matching values in Header_4 and Header_5
    if unit_rows[Header_4].nunique() == 1 and unit_rows[Header_5].nunique() == 1:
        # Calculate the average of Header_6
        average_efficiency = unit_rows[Header_6].mean()
        
        # Convert Header_9 to numeric and then sum its values
        unit_rows[Header_9] = pd.to_numeric(unit_rows[Header_9], errors='coerce')
        total_chp_max_heat = unit_rows[Header_9].sum()
        
        # Sum the values of Header_3, Header_7, and Header_9
        total_nunits = unit_rows[Header_3].sum()
        total_power_capacity = unit_rows[Header_7].sum()
        
        # Put the sum and average results in the first duplicated row
        first_row_index = unit_rows.index[0]
        df.at[first_row_index, Header_3] = total_nunits
        df.at[first_row_index, Header_6] = average_efficiency
        df.at[first_row_index, Header_7] = total_power_capacity
        df.at[first_row_index, Header_9] = total_chp_max_heat
        
        # Drop other duplicated rows
        df.drop(unit_rows.index[1:], inplace=True)

# Add a new column "Clustered" and set its value to "yes" for rows where Nunits is more than 1
df['Clustered'] = 'no'
df.loc[df[Header_3] > 1, 'Clustered'] = 'yes'

# Save the updated DataFrame to a new CSV file
new_file_name = os.path.basename(power_plants_clean_data_file_path).replace(".csv", "-Clustered.csv")
new_csv_file_path = os.path.join(power_plants_clean_data_folder_path, new_file_name)
df.to_csv(new_csv_file_path, index=False)


print(f"Processed data saved to: {new_csv_file_path}")
print(f"Processed data saved under the name: {new_file_name}")

Processed data saved to: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020-Clustered.csv
Processed data saved under the name: 2020-Clustered.csv


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  unit_rows[Header_9] = pd.to_numeric(unit_rows[Header_9], errors='coerce')


In [26]:
power_plants_clean_clustered_data_file_name = new_file_name
power_plants_clean_clustered_data_file_path = new_csv_file_path

In [27]:
print (f"zone_folder_name: {zone_folder_name}\nzone_folder_path: {zone_folder_path}\npower_plants_raw_data_folder_path: {power_plants_raw_data_folder_path}\npower_plants_raw_data_file_name: {power_plants_raw_data_file_name}\npower_plants_raw_data_download_link: {power_plants_raw_data_download_link}\npower_plants_raw_data_file_path: {power_plants_raw_data_file_path}\npower_plants_raw_data_download_link_sources_file_name: {power_plants_raw_data_download_link_sources_file_name}\npower_plants_raw_data_download_link_sources_folder_path: {power_plants_raw_data_download_link_sources_folder_path}\npower_plants_clean_data_file_name: {power_plants_clean_data_file_name}\npower_plants_clean_data_folder_path {power_plants_clean_data_folder_path}\npower_plants_clean_data_file_path {power_plants_clean_data_file_path}\npower_plants_clean_data_equivalent_headers_file_name: {power_plants_clean_data_equivalent_headers_file_name}\npower_plants_clean_data_equivalent_headers_file_path: {power_plants_clean_data_equivalent_headers_file_path}\npower_plants_clean_clustered_data_file_name: {power_plants_clean_clustered_data_file_name}\npower_plants_clean_clustered_data_file_path: {power_plants_clean_clustered_data_file_path}")

zone_folder_name: DE
zone_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants
power_plants_raw_data_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE
power_plants_raw_data_file_name: 2020-01.csv
power_plants_raw_data_download_link: https://data.open-power-system-data.org/conventional_power_plants/2020-10-01/conventional_power_plants_DE.csv
power_plants_raw_data_file_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020-01.csv
power_plants_raw_data_download_link_sources_file_name: power_plants_raw_data_download_link_sources.csv
power_plants_raw_data_download_link_sources_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants
power_plants_clean_data_file_name: 2020.csv
power_plants_clean_data_folder_path /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE
power_plants_clean_data_file_path /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020.csv
power_plants_clean_data_equivalent_headers_file_name: power_plants_clean_data_equivalent_headers.csv
power

3.7 Technology Field Filter

    This part read the current value of the column called "technology" of power_plants_clean_data_file and the power_plants_clean_clustered_data_file and makes a compariton with the contend of the power_plants_all_data_equivalent_Technologies_file and replace all the values of the column by their respective equivalent format needed to dispa-SET.
    Additionally it copies to the power_plants_all_data_not_defined_units_file all the rows of the units with not defined technology or other kind of information in that field.

In [28]:
power_plants_all_data_equivalent_technologies_file_name = "power_plants_all_data_equivalent_technologies.csv"
power_plants_all_data_equivalent_technologies_file_path = "/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_all_data_equivalent_technologies.csv"
power_plants_all_data_not_defined_units_file_name = "power_plants_all_data_not_defined_units.csv"
power_plants_all_data_not_defined_units_file_path = "/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_all_data_not_defined_units.csv"

In [29]:
print(f"power_plants_all_data_equivalent_technologies_file_name: {power_plants_all_data_equivalent_technologies_file_name}")
print(f"power_plants_all_data_equivalent_technologies_file_path: {power_plants_all_data_equivalent_technologies_file_path}")
print(f"power_plants_all_data_not_defined_units_file_name: {power_plants_all_data_not_defined_units_file_name}")
print(f"power_plants_all_data_not_defined_units_file_path: {power_plants_all_data_not_defined_units_file_path}")

power_plants_all_data_equivalent_technologies_file_name: power_plants_all_data_equivalent_technologies.csv
power_plants_all_data_equivalent_technologies_file_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_all_data_equivalent_technologies.csv
power_plants_all_data_not_defined_units_file_name: power_plants_all_data_not_defined_units.csv
power_plants_all_data_not_defined_units_file_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_all_data_not_defined_units.csv


In [30]:
import pandas as pd
import os

# Read data from CSV files
clean_data_df = pd.read_csv(power_plants_clean_data_file_path)
equivalent_technologies_df = pd.read_csv(power_plants_all_data_equivalent_technologies_file_path)

# Initialize list to store not defined units
not_defined_units = []

# Iterate over each row in the clean data file
for index, row in clean_data_df.iterrows():
    technology = row["Technology"]
    found = False
    # Look for the technology in equivalent technologies
    for i in range(1, 9):
        equivalent_tech_col = f"Equivalent_Technology_{i}"
        dispaset_tech_col = "Dispaset_Technology"
        if technology in equivalent_technologies_df[equivalent_tech_col].values:
            found = True
            # Get the corresponding Dispaset technology
            dispaset_technology = equivalent_technologies_df.loc[equivalent_technologies_df[equivalent_tech_col] == technology, dispaset_tech_col].iloc[0]
            # Update the Technology column in clean data
            clean_data_df.at[index, "Technology"] = dispaset_technology
            break
    # If not found, add the row to the list of not defined units and drop from clean_data_df
    if not found:
        not_defined_units.append(index)

# Create DataFrame for not defined units
not_defined_df = clean_data_df.iloc[not_defined_units]

# Drop rows with unmatched fields from clean data DataFrame
clean_data_df.drop(not_defined_units, inplace=True)

# Write not defined units to a separate CSV file (append mode)
not_defined_df.to_csv(power_plants_all_data_not_defined_units_file_path, mode='a', index=False, header=not os.path.exists(power_plants_all_data_not_defined_units_file_path))

# Overwrite the clean data CSV file with updated data
clean_data_df.to_csv(power_plants_clean_data_file_path, index=False)

print("Data processed successfully.")

Data processed successfully.


In [31]:
import pandas as pd
import os

# Read data from CSV files
clean_data_df = pd.read_csv(power_plants_clean_clustered_data_file_path)
equivalent_technologies_df = pd.read_csv(power_plants_all_data_equivalent_technologies_file_path)

# Initialize list to store not defined units
not_defined_units = []

# Iterate over each row in the clean data file
for index, row in clean_data_df.iterrows():
    technology = row["Technology"]
    found = False
    # Look for the technology in equivalent technologies
    for i in range(1, 9):
        equivalent_tech_col = f"Equivalent_Technology_{i}"
        dispaset_tech_col = "Dispaset_Technology"
        if technology in equivalent_technologies_df[equivalent_tech_col].values:
            found = True
            # Get the corresponding Dispaset technology
            dispaset_technology = equivalent_technologies_df.loc[equivalent_technologies_df[equivalent_tech_col] == technology, dispaset_tech_col].iloc[0]
            # Update the Technology column in clean data
            clean_data_df.at[index, "Technology"] = dispaset_technology
            break
    # If not found, add the row to the list of not defined units and drop from clean_data_df
    if not found:
        not_defined_units.append(index)

# Create DataFrame for not defined units
not_defined_df = clean_data_df.iloc[not_defined_units]

# Drop rows with unmatched fields from clean data DataFrame
clean_data_df.drop(not_defined_units, inplace=True)

# Write not defined units to a separate CSV file (append mode)
not_defined_df.to_csv(power_plants_all_data_not_defined_units_file_path, mode='a', index=False, header=not os.path.exists(power_plants_all_data_not_defined_units_file_path))

# Overwrite the clean data CSV file with updated data
clean_data_df.to_csv(power_plants_clean_clustered_data_file_path, index=False)

print("Data processed successfully.")

Data processed successfully.


- verifiyng variables

  This cells are just to confirm all the file names, file paths and other information related to the data being processed.
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.

In [32]:
print (f"zone_folder_name: {zone_folder_name}\nzone_folder_path: {zone_folder_path}\npower_plants_raw_data_folder_path: {power_plants_raw_data_folder_path}\npower_plants_raw_data_file_name: {power_plants_raw_data_file_name}\npower_plants_raw_data_download_link: {power_plants_raw_data_download_link}\npower_plants_raw_data_file_path: {power_plants_raw_data_file_path}\npower_plants_raw_data_download_link_sources_file_name: {power_plants_raw_data_download_link_sources_file_name}\npower_plants_raw_data_download_link_sources_folder_path: {power_plants_raw_data_download_link_sources_folder_path}\npower_plants_clean_data_file_name: {power_plants_clean_data_file_name}\npower_plants_clean_data_folder_path {power_plants_clean_data_folder_path}\npower_plants_clean_data_file_path {power_plants_clean_data_file_path}\npower_plants_clean_data_equivalent_headers_file_name: {power_plants_clean_data_equivalent_headers_file_name}\npower_plants_clean_data_equivalent_headers_file_path: {power_plants_clean_data_equivalent_headers_file_path}\npower_plants_clean_clustered_data_file_name: {power_plants_clean_clustered_data_file_name}\npower_plants_clean_clustered_data_file_path: {power_plants_clean_clustered_data_file_path}\npower_plants_all_data_equivalent_technologies_file_name: {power_plants_all_data_equivalent_technologies_file_name}\npower_plants_all_data_equivalent_technologies_file_path: {power_plants_all_data_equivalent_technologies_file_path}\npower_plants_all_data_not_defined_units_file_name: {power_plants_all_data_not_defined_units_file_name}\mpower_plants_all_data_not_defined_units_file_path: {power_plants_all_data_not_defined_units_file_path}")

zone_folder_name: DE
zone_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants
power_plants_raw_data_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE
power_plants_raw_data_file_name: 2020-01.csv
power_plants_raw_data_download_link: https://data.open-power-system-data.org/conventional_power_plants/2020-10-01/conventional_power_plants_DE.csv
power_plants_raw_data_file_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020-01.csv
power_plants_raw_data_download_link_sources_file_name: power_plants_raw_data_download_link_sources.csv
power_plants_raw_data_download_link_sources_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants
power_plants_clean_data_file_name: 2020.csv
power_plants_clean_data_folder_path /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE
power_plants_clean_data_file_path /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020.csv
power_plants_clean_data_equivalent_headers_file_name: power_plants_clean_data_equivalent_headers.csv
power

3.8. Fuel Field Filter

    This part read the current value of the column called "fuel" of power_plants_clean_data_file and power_plants_clean_clustered_data_file and makes a compariton with the contend of the power_plants_all_data_equivalent_fuels_file and replace all the values of the column by their respective equivalent format needed to dispa-SET.
    Additionally it copies to the power_plants_all_data_not_defined_units_file all the rows of the units with not defined fuel or other kind of information in that field.

In [33]:
power_plants_all_data_equivalent_fuels_file_name = "power_plants_all_data_equivalent_fuels.csv"
power_plants_all_data_equivalent_fuels_file_path = "/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_all_data_equivalent_fuels.csv"

In [34]:
import pandas as pd
import os

# Read data from CSV files
clean_data_df = pd.read_csv(power_plants_clean_data_file_path)
equivalent_fuels_df = pd.read_csv(power_plants_all_data_equivalent_fuels_file_path)

# Initialize list to store not defined units
not_defined_units = []

# Iterate over each row in the clean data file
for index, row in clean_data_df.iterrows():
    fuel = row["Fuel"]
    found = False
    # Look for the fuel in equivalent fuels
    for i in range(1, 9):
        equivalent_fuel_col = f"Equivalent_Fuel_{i}"
        dispaset_fuel_col = "Dispaset_Fuel"
        if fuel in equivalent_fuels_df[equivalent_fuel_col].values:
            found = True
            # Get the corresponding Dispaset fuel
            dispaset_fuel = equivalent_fuels_df.loc[equivalent_fuels_df[equivalent_fuel_col] == fuel, dispaset_fuel_col].iloc[0]
            # Update the Fuel column in clean data
            clean_data_df.at[index, "Fuel"] = dispaset_fuel
            break
    # If not found, add the row to the list of not defined units and drop from clean_data_df
    if not found:
        not_defined_units.append(index)

# Create DataFrame for not defined units
not_defined_df = clean_data_df.iloc[not_defined_units]

# Drop rows with unmatched fields from clean data DataFrame
clean_data_df.drop(not_defined_units, inplace=True)

# Write not defined units to a separate CSV file (append mode)
not_defined_df.to_csv(power_plants_all_data_not_defined_units_file_path, mode='a', index=False, header=not os.path.exists(power_plants_all_data_not_defined_units_file_path))

# Overwrite the clean data CSV file with updated data
clean_data_df.to_csv(power_plants_clean_data_file_path, index=False)

print("Data processed successfully.")

Data processed successfully.


In [35]:
import pandas as pd
import os

# Read data from CSV files
clean_data_df = pd.read_csv(power_plants_clean_clustered_data_file_path)
equivalent_fuels_df = pd.read_csv(power_plants_all_data_equivalent_fuels_file_path)

# Initialize list to store not defined units
not_defined_units = []

# Iterate over each row in the clean data file
for index, row in clean_data_df.iterrows():
    fuel = row["Fuel"]
    found = False
    # Look for the fuel in equivalent fuels
    for i in range(1, 9):
        equivalent_fuel_col = f"Equivalent_Fuel_{i}"
        dispaset_fuel_col = "Dispaset_Fuel"
        if fuel in equivalent_fuels_df[equivalent_fuel_col].values:
            found = True
            # Get the corresponding Dispaset fuel
            dispaset_fuel = equivalent_fuels_df.loc[equivalent_fuels_df[equivalent_fuel_col] == fuel, dispaset_fuel_col].iloc[0]
            # Update the Fuel column in clean data
            clean_data_df.at[index, "Fuel"] = dispaset_fuel
            break
    # If not found, add the row to the list of not defined units and drop from clean_data_df
    if not found:
        not_defined_units.append(index)

# Create DataFrame for not defined units
not_defined_df = clean_data_df.iloc[not_defined_units]

# Drop rows with unmatched fields from clean data DataFrame
clean_data_df.drop(not_defined_units, inplace=True)

# Write not defined units to a separate CSV file (append mode)
not_defined_df.to_csv(power_plants_all_data_not_defined_units_file_path, mode='a', index=False, header=not os.path.exists(power_plants_all_data_not_defined_units_file_path))

# Overwrite the clean data CSV file with updated data
clean_data_df.to_csv(power_plants_clean_clustered_data_file_path, index=False)

print("Data processed successfully.")

Data processed successfully.


- verifiyng variables

  This cells are just to confirm all the file names, file paths and other information related to the data being processed.
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.

In [36]:
print (f"zone_folder_name: {zone_folder_name}\nzone_folder_path: {zone_folder_path}\npower_plants_raw_data_folder_path: {power_plants_raw_data_folder_path}\npower_plants_raw_data_file_name: {power_plants_raw_data_file_name}\npower_plants_raw_data_download_link: {power_plants_raw_data_download_link}\npower_plants_raw_data_file_path: {power_plants_raw_data_file_path}\npower_plants_raw_data_download_link_sources_file_name: {power_plants_raw_data_download_link_sources_file_name}\npower_plants_raw_data_download_link_sources_folder_path: {power_plants_raw_data_download_link_sources_folder_path}\npower_plants_clean_data_file_name: {power_plants_clean_data_file_name}\npower_plants_clean_data_folder_path {power_plants_clean_data_folder_path}\npower_plants_clean_data_file_path {power_plants_clean_data_file_path}\npower_plants_clean_data_equivalent_headers_file_name: {power_plants_clean_data_equivalent_headers_file_name}\npower_plants_clean_data_equivalent_headers_file_path: {power_plants_clean_data_equivalent_headers_file_path}\npower_plants_clean_clustered_data_file_name: {power_plants_clean_clustered_data_file_name}\npower_plants_clean_clustered_data_file_path: {power_plants_clean_clustered_data_file_path}\npower_plants_all_data_equivalent_technologies_file_name: {power_plants_all_data_equivalent_technologies_file_name}\npower_plants_all_data_equivalent_technologies_file_path: {power_plants_all_data_equivalent_technologies_file_path}\npower_plants_all_data_not_defined_units_file_name: {power_plants_all_data_not_defined_units_file_name}\npower_plants_all_data_not_defined_units_file_path: {power_plants_all_data_not_defined_units_file_path}\npower_plants_all_data_equivalent_fuels_file_name {power_plants_all_data_equivalent_fuels_file_name}\npower_plants_all_data_equivalent_fuels_file_path {power_plants_all_data_equivalent_fuels_file_path}")

zone_folder_name: DE
zone_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants
power_plants_raw_data_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE
power_plants_raw_data_file_name: 2020-01.csv
power_plants_raw_data_download_link: https://data.open-power-system-data.org/conventional_power_plants/2020-10-01/conventional_power_plants_DE.csv
power_plants_raw_data_file_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020-01.csv
power_plants_raw_data_download_link_sources_file_name: power_plants_raw_data_download_link_sources.csv
power_plants_raw_data_download_link_sources_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants
power_plants_clean_data_file_name: 2020.csv
power_plants_clean_data_folder_path /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE
power_plants_clean_data_file_path /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020.csv
power_plants_clean_data_equivalent_headers_file_name: power_plants_clean_data_equivalent_headers.csv
power

3.9. CHPType, CHPPowerToHeat, CHPPowerLossFactor and CHPMaxHeat Fields Filter

    This part read for each unit the current value of the column called "CHPMaxHeat" of power_plants_clean_data_file and power_plants_clean_clustered_data_file, if the unit has any value on this field,  checks out its respective value in the column CHPType, and compares wiht the table power_plants_all_data_equivalent_CHPTypes_file, if there is no coincydense, copies the row to the power_plants_all_data_not_defined_units_file leaving this field empty, but keeping the information of the other related fields.

In [37]:
power_plants_all_data_equivalent_CHPTypes_file_name = "power_plants_all_data_equivalent_CHPTypes.csv"
power_plants_all_data_equivalent_CHPTypes_file_path = "/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_all_data_equivalent_CHPTypes.csv"

In [38]:
import pandas as pd

# Read CSV files
clean_data_df = pd.read_csv(power_plants_clean_data_file_path)
equivalent_chp_types_df = pd.read_csv(power_plants_all_data_equivalent_CHPTypes_file_path)

# Initialize list to store not defined units
not_defined_units = []

# Iterate over each row in the clean data file
for index, row in clean_data_df.iterrows():
    chtype = row["CHPType"]
    found = False
    # Look for the CHPType in equivalent CHP Types
    for i in range(1, 9):
        equivalent_chp_type_col = f"Equivalent_CHPType_{i}"
        dispaset_chp_type_col = "Dispaset_CHPType"
        if chtype in equivalent_chp_types_df[equivalent_chp_type_col].values:
            found = True
            # Get the corresponding Dispaset CHP Type
            dispaset_chp_type = equivalent_chp_types_df.loc[equivalent_chp_types_df[equivalent_chp_type_col] == chtype, dispaset_chp_type_col].iloc[0]
            # Update the CHPType column in clean data
            clean_data_df.at[index, "CHPType"] = dispaset_chp_type
            break
    if not found:
        clean_data_df.at[index, "CHPType"] = ""
        not_defined_units.append(row)

# Create DataFrame for not defined units
not_defined_df = pd.DataFrame(not_defined_units)

# Write not defined units to a separate CSV file
not_defined_df.to_csv(power_plants_all_data_not_defined_units_file_path, mode='a', index=False, header=not os.path.exists(power_plants_all_data_not_defined_units_file_path))

# Overwrite the clean data CSV file with updated data
clean_data_df.to_csv(power_plants_clean_data_file_path, index=False)

print("Data processed successfully.")

Data processed successfully.


In [39]:
import pandas as pd
import re

# Read CSV files
df = pd.read_csv(power_plants_clean_data_file_path)
equiv_chp_types = pd.read_csv(power_plants_all_data_equivalent_CHPTypes_file_path)

# Columns to clean
columns_to_clean = ['CHPType', 'CHPMaxHeat', 'CHPPowerLossFactor', 'CHPPowerToHeat']

# Define regex pattern to match 'n.b.' and similar values
pattern = r'^n\.b\.|^\s*$'

# Replace matching values with empty strings
df[columns_to_clean] = df[columns_to_clean].replace(to_replace=pattern, value='', regex=True)

# Convert columns to numeric, ignoring errors
df[columns_to_clean[1:]] = df[columns_to_clean[1:]].apply(pd.to_numeric, errors='coerce')

# Write the updated DataFrame back to the CSV file
df.to_csv(power_plants_clean_data_file_path, index=False)

print("Similar values to 'n.b.' and empty fields removed from specified columns.")


Similar values to 'n.b.' and empty fields removed from specified columns.


In [40]:
import pandas as pd

# Read CSV files
clean_data_df = pd.read_csv(power_plants_clean_clustered_data_file_path)
equivalent_chp_types_df = pd.read_csv(power_plants_all_data_equivalent_CHPTypes_file_path)

# Initialize list to store not defined units
not_defined_units = []

# Iterate over each row in the clean data file
for index, row in clean_data_df.iterrows():
    chtype = row["CHPType"]
    found = False
    # Look for the CHPType in equivalent CHP Types
    for i in range(1, 9):
        equivalent_chp_type_col = f"Equivalent_CHPType_{i}"
        dispaset_chp_type_col = "Dispaset_CHPType"
        if chtype in equivalent_chp_types_df[equivalent_chp_type_col].values:
            found = True
            # Get the corresponding Dispaset CHP Type
            dispaset_chp_type = equivalent_chp_types_df.loc[equivalent_chp_types_df[equivalent_chp_type_col] == chtype, dispaset_chp_type_col].iloc[0]
            # Update the CHPType column in clean data
            clean_data_df.at[index, "CHPType"] = dispaset_chp_type
            break
    if not found:
        clean_data_df.at[index, "CHPType"] = ""
        not_defined_units.append(row)

# Create DataFrame for not defined units
not_defined_df = pd.DataFrame(not_defined_units)

# Write not defined units to a separate CSV file
not_defined_df.to_csv(power_plants_all_data_not_defined_units_file_path, mode='a', index=False, header=not os.path.exists(power_plants_all_data_not_defined_units_file_path))

# Overwrite the clean data CSV file with updated data
clean_data_df.to_csv(power_plants_clean_clustered_data_file_path, index=False)

print("Data processed successfully.")

Data processed successfully.


In [41]:
import pandas as pd
import re

# Read CSV files
df = pd.read_csv(power_plants_clean_clustered_data_file_path)
equiv_chp_types = pd.read_csv(power_plants_all_data_equivalent_CHPTypes_file_path)

# Columns to clean
columns_to_clean = ['CHPType', 'CHPMaxHeat', 'CHPPowerLossFactor', 'CHPPowerToHeat']

# Define regex pattern to match 'n.b.' and similar values
pattern = r'^n\.b\.|^\s*$'

# Replace matching values with empty strings
df[columns_to_clean] = df[columns_to_clean].replace(to_replace=pattern, value='', regex=True)

# Convert columns to numeric, ignoring errors
df[columns_to_clean[1:]] = df[columns_to_clean[1:]].apply(pd.to_numeric, errors='coerce')

# Write the updated DataFrame back to the CSV file
df.to_csv(power_plants_clean_clustered_data_file_path, index=False)

print("Similar values to 'n.b.' and empty fields removed from specified columns.")


Similar values to 'n.b.' and empty fields removed from specified columns.


- verifiyng variables

  This cells are just to confirm all the file names, file paths and other information related to the data being processed.
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.

In [42]:
print (f"zone_folder_name: {zone_folder_name}\nzone_folder_path: {zone_folder_path}\npower_plants_raw_data_folder_path: {power_plants_raw_data_folder_path}\npower_plants_raw_data_file_name: {power_plants_raw_data_file_name}\npower_plants_raw_data_download_link: {power_plants_raw_data_download_link}\npower_plants_raw_data_file_path: {power_plants_raw_data_file_path}\npower_plants_raw_data_download_link_sources_file_name: {power_plants_raw_data_download_link_sources_file_name}\npower_plants_raw_data_download_link_sources_folder_path: {power_plants_raw_data_download_link_sources_folder_path}\npower_plants_clean_data_file_name: {power_plants_clean_data_file_name}\npower_plants_clean_data_folder_path {power_plants_clean_data_folder_path}\npower_plants_clean_data_file_path {power_plants_clean_data_file_path}\npower_plants_clean_data_equivalent_headers_file_name: {power_plants_clean_data_equivalent_headers_file_name}\npower_plants_clean_data_equivalent_headers_file_path: {power_plants_clean_data_equivalent_headers_file_path}\npower_plants_clean_clustered_data_file_name: {power_plants_clean_clustered_data_file_name}\npower_plants_clean_clustered_data_file_path: {power_plants_clean_clustered_data_file_path}\npower_plants_all_data_equivalent_technologies_file_name: {power_plants_all_data_equivalent_technologies_file_name}\npower_plants_all_data_equivalent_technologies_file_path: {power_plants_all_data_equivalent_technologies_file_path}\npower_plants_all_data_not_defined_units_file_name: {power_plants_all_data_not_defined_units_file_name}\npower_plants_all_data_not_defined_units_file_path: {power_plants_all_data_not_defined_units_file_path}\npower_plants_all_data_equivalent_fuels_file_name {power_plants_all_data_equivalent_fuels_file_name}\npower_plants_all_data_equivalent_fuels_file_path {power_plants_all_data_equivalent_fuels_file_path}\npower_plants_all_data_equivalent_CHPTypes_file_name {power_plants_all_data_equivalent_CHPTypes_file_name}\npower_plants_all_data_equivalent_CHPTypes_file_path {power_plants_all_data_equivalent_CHPTypes_file_path}")

zone_folder_name: DE
zone_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants
power_plants_raw_data_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE
power_plants_raw_data_file_name: 2020-01.csv
power_plants_raw_data_download_link: https://data.open-power-system-data.org/conventional_power_plants/2020-10-01/conventional_power_plants_DE.csv
power_plants_raw_data_file_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020-01.csv
power_plants_raw_data_download_link_sources_file_name: power_plants_raw_data_download_link_sources.csv
power_plants_raw_data_download_link_sources_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants
power_plants_clean_data_file_name: 2020.csv
power_plants_clean_data_folder_path /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE
power_plants_clean_data_file_path /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020.csv
power_plants_clean_data_equivalent_headers_file_name: power_plants_clean_data_equivalent_headers.csv
power

3.10. Power Capacity field filter

    This part erase all the units with the Power Capacity field empty or zero value.

In [43]:
import pandas as pd

# Read the CSV file
df = pd.read_csv(power_plants_clean_data_file_path)

# Filter rows with empty 'PowerCapacity' column
empty_power_capacity_rows = df[df['PowerCapacity'].isna()]

if empty_power_capacity_rows.empty:
    print("No rows with empty 'PowerCapacity' column found.")
else:
    # Save the filtered rows to a new CSV file
    empty_power_capacity_rows.to_csv(power_plants_all_data_not_defined_units_file_path, index=False)
    print(f"Rows with empty 'PowerCapacity' column saved to: {power_plants_all_data_not_defined_units_file_path}")

No rows with empty 'PowerCapacity' column found.


In [44]:
import pandas as pd

# Read the CSV file
df = pd.read_csv(power_plants_clean_clustered_data_file_path)

# Filter rows with empty 'PowerCapacity' column
empty_power_capacity_rows = df[df['PowerCapacity'].isna()]

if empty_power_capacity_rows.empty:
    print("No rows with empty 'PowerCapacity' column found.")
else:
    # Save the filtered rows to a new CSV file
    empty_power_capacity_rows.to_csv(power_plants_all_data_not_defined_units_file_path, index=False)
    print(f"Rows with empty 'PowerCapacity' column saved to: {power_plants_all_data_not_defined_units_file_path}")

No rows with empty 'PowerCapacity' column found.


3.11. Efficiency Field Fullfilling

    This part copy the closer value of the Efficiency technical feauture from the already agregated data base EU_Power_Units_Technical_Features.csv using the notebook: EU_Power_Plant_Technical_Data_Base_Gathering.ipynb sellecting in base on the Technology, the Fuel and the PowerCapacity features. This is done just to the empty fields.

In [45]:
EU_Power_Units_Technical_Features_File_name = 'EU_Power_Units_Technical_Features.csv'
EU_Power_Units_Technical_Features_File_path = '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/EU_Power_Units_Technical_Features.csv'

In [46]:
import pandas as pd

def copy_efficiency_values(first_file_path, second_file_path, common_columns):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying efficiency values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over unique values of "Fuel" in the second file
        for fuel in second_filtered['Fuel'].unique():
            # Filter rows with matching "Fuel" in both files
            first_fuel_filtered = first_filtered[first_filtered['Fuel'] == fuel]
            second_fuel_filtered = second_filtered[second_filtered['Fuel'] == fuel]

            # Iterate over rows in the second file
            for index, second_row in second_fuel_filtered.iterrows():
                # Check if "Efficiency" value is empty in the second file
                if pd.isnull(second_row['Efficiency']):
                    # Find the row in the first file with the closest "PowerCapacity" value
                    closest_row = first_fuel_filtered.iloc[(first_fuel_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                    # Copy "Efficiency" value from the first file to the second file
                    if not closest_row.empty:
                        second_df.at[index, 'Efficiency'] = closest_row['Efficiency'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("Efficiency values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'Fuel', 'Efficiency']

copy_efficiency_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_data_file_path, common_columns)

Copying efficiency values...
Efficiency values copied successfully.


In [47]:
import pandas as pd

def copy_efficiency_values(first_file_path, second_file_path, common_columns):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying efficiency values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over unique values of "Fuel" in the second file
        for fuel in second_filtered['Fuel'].unique():
            # Filter rows with matching "Fuel" in both files
            first_fuel_filtered = first_filtered[first_filtered['Fuel'] == fuel]
            second_fuel_filtered = second_filtered[second_filtered['Fuel'] == fuel]

            # Iterate over rows in the second file
            for index, second_row in second_fuel_filtered.iterrows():
                # Check if "Efficiency" value is empty in the second file
                if pd.isnull(second_row['Efficiency']):
                    # Find the row in the first file with the closest "PowerCapacity" value
                    closest_row = first_fuel_filtered.iloc[(first_fuel_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                    # Copy "Efficiency" value from the first file to the second file
                    if not closest_row.empty:
                        second_df.at[index, 'Efficiency'] = closest_row['Efficiency'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("Efficiency values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'Fuel', 'Efficiency']

copy_efficiency_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_clustered_data_file_path, common_columns)

Copying efficiency values...
Efficiency values copied successfully.


        Setting with the value of 1 to all those units which no Efficiecy data was found and adding them to the do not defined units file. This is done for both files power_plants_clean_data_file and power_plants_clean_clustered_data_file

In [48]:
import os
import pandas as pd

def fill_empty_efficiency_with_one(input_file_path, output_file_path):
    try:
        # Read the CSV file into a DataFrame
        df = pd.read_csv(input_file_path)
    except FileNotFoundError:
        print(f"File '{input_file_path}' not found.")
        return

    # Check if there are empty cells in the "Efficiency" column
    empty_efficiency_rows = df[df['Efficiency'].isnull()]

    if not empty_efficiency_rows.empty:
        # Replace empty cells in the "Efficiency" column with 1
        df['Efficiency'].fillna(1, inplace=True)

        # Write the updated DataFrame back to the original CSV file
        df.to_csv(input_file_path, index=False)

        print("Empty 'Efficiency' cells filled with 1 in the original file.")

        # Check if the output file exists
        if os.path.exists(output_file_path):
            # Append rows with empty "Efficiency" values to the specified output file without headers
            empty_efficiency_rows.to_csv(output_file_path, mode='a', index=False, header=False)
        else:
            # Write the rows with empty "Efficiency" values to the specified output file with headers
            empty_efficiency_rows.to_csv(output_file_path, index=False)

        print(f"Rows with empty 'Efficiency' values appended to '{output_file_path}'.")

    else:
        print("No empty 'Efficiency' cells found.")

# Example usage:
fill_empty_efficiency_with_one(power_plants_clean_data_file_path, power_plants_all_data_not_defined_units_file_path)

Empty 'Efficiency' cells filled with 1 in the original file.
Rows with empty 'Efficiency' values appended to '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_all_data_not_defined_units.csv'.


In [49]:
import os
import pandas as pd

def fill_empty_efficiency_with_one(input_file_path, output_file_path):
    try:
        # Read the CSV file into a DataFrame
        df = pd.read_csv(input_file_path)
    except FileNotFoundError:
        print(f"File '{input_file_path}' not found.")
        return

    # Check if there are empty cells in the "Efficiency" column
    empty_efficiency_rows = df[df['Efficiency'].isnull()]

    if not empty_efficiency_rows.empty:
        # Replace empty cells in the "Efficiency" column with 1
        df['Efficiency'].fillna(1, inplace=True)

        # Write the updated DataFrame back to the original CSV file
        df.to_csv(input_file_path, index=False)

        print("Empty 'Efficiency' cells filled with 1 in the original file.")

        # Check if the output file exists
        if os.path.exists(output_file_path):
            # Append rows with empty "Efficiency" values to the specified output file without headers
            empty_efficiency_rows.to_csv(output_file_path, mode='a', index=False, header=False)
        else:
            # Write the rows with empty "Efficiency" values to the specified output file with headers
            empty_efficiency_rows.to_csv(output_file_path, index=False)

        print(f"Rows with empty 'Efficiency' values appended to '{output_file_path}'.")

    else:
        print("No empty 'Efficiency' cells found.")

# Example usage:
fill_empty_efficiency_with_one(power_plants_clean_clustered_data_file_path, power_plants_all_data_not_defined_units_file_path)

Empty 'Efficiency' cells filled with 1 in the original file.
Rows with empty 'Efficiency' values appended to '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_all_data_not_defined_units.csv'.


- verifiyng variables

  This cells are just to confirm all the file names, file paths and other information related to the data being processed.
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.

In [50]:
print (f"zone_folder_name: {zone_folder_name}\nzone_folder_path: {zone_folder_path}\npower_plants_raw_data_folder_path: {power_plants_raw_data_folder_path}\npower_plants_raw_data_file_name: {power_plants_raw_data_file_name}\npower_plants_raw_data_download_link: {power_plants_raw_data_download_link}\npower_plants_raw_data_file_path: {power_plants_raw_data_file_path}\npower_plants_raw_data_download_link_sources_file_name: {power_plants_raw_data_download_link_sources_file_name}\npower_plants_raw_data_download_link_sources_folder_path: {power_plants_raw_data_download_link_sources_folder_path}\npower_plants_clean_data_file_name: {power_plants_clean_data_file_name}\npower_plants_clean_data_folder_path {power_plants_clean_data_folder_path}\npower_plants_clean_data_file_path {power_plants_clean_data_file_path}\npower_plants_clean_data_equivalent_headers_file_name: {power_plants_clean_data_equivalent_headers_file_name}\npower_plants_clean_data_equivalent_headers_file_path: {power_plants_clean_data_equivalent_headers_file_path}\npower_plants_clean_clustered_data_file_name: {power_plants_clean_clustered_data_file_name}\npower_plants_clean_clustered_data_file_path: {power_plants_clean_clustered_data_file_path}\npower_plants_all_data_equivalent_technologies_file_name: {power_plants_all_data_equivalent_technologies_file_name}\npower_plants_all_data_equivalent_technologies_file_path: {power_plants_all_data_equivalent_technologies_file_path}\npower_plants_all_data_not_defined_units_file_name: {power_plants_all_data_not_defined_units_file_name}\npower_plants_all_data_not_defined_units_file_path: {power_plants_all_data_not_defined_units_file_path}\npower_plants_all_data_equivalent_fuels_file_name {power_plants_all_data_equivalent_fuels_file_name}\npower_plants_all_data_equivalent_fuels_file_path {power_plants_all_data_equivalent_fuels_file_path}\npower_plants_all_data_equivalent_CHPTypes_file_name {power_plants_all_data_equivalent_CHPTypes_file_name}\npower_plants_all_data_equivalent_CHPTypes_file_path {power_plants_all_data_equivalent_CHPTypes_file_path}\nEU_Power_Units_Technical_Features_File_name: {EU_Power_Units_Technical_Features_File_name}\nEU_Power_Units_Technical_Features_File_path: {EU_Power_Units_Technical_Features_File_path}")

zone_folder_name: DE
zone_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants
power_plants_raw_data_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE
power_plants_raw_data_file_name: 2020-01.csv
power_plants_raw_data_download_link: https://data.open-power-system-data.org/conventional_power_plants/2020-10-01/conventional_power_plants_DE.csv
power_plants_raw_data_file_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020-01.csv
power_plants_raw_data_download_link_sources_file_name: power_plants_raw_data_download_link_sources.csv
power_plants_raw_data_download_link_sources_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants
power_plants_clean_data_file_name: 2020.csv
power_plants_clean_data_folder_path /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE
power_plants_clean_data_file_path /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020.csv
power_plants_clean_data_equivalent_headers_file_name: power_plants_clean_data_equivalent_headers.csv
power

3.12. MinUpTime Field Fulfilling

    This part copy the closer value of the MinUpTime technical feauture from the already agregated data base EU_Power_Units_Technical_Features.csv using the notebook: EU_Power_Plant_Technical_Data_Base_Gathering.ipynb sellecting in base on the Technology, the Fuel and the PowerCapacity features. This is done just to the empty fields.

In [51]:
import pandas as pd
import os

def copy_min_up_time_values(first_file_path, second_file_path, common_columns, not_defined_units_file_path):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying MinUpTime values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over unique values of "Fuel" in the second file
        for fuel in second_filtered['Fuel'].unique():
            # Filter rows with matching "Fuel" in both files
            first_fuel_filtered = first_filtered[first_filtered['Fuel'] == fuel]
            second_fuel_filtered = second_filtered[second_filtered['Fuel'] == fuel]

            # Check if any rows exist in the first file for this combination of "Technology" and "Fuel"
            if first_fuel_filtered.empty:
                # No matching rows found, so copy the entire rows from the second file to the not defined units file
                with open(not_defined_units_file_path, 'a') as f:
                    for index, second_row in second_fuel_filtered.iterrows():
                        f.write(','.join(map(str, second_row.values)) + '\n')
            else:
                # Iterate over rows in the second file
                for index, second_row in second_fuel_filtered.iterrows():
                    # Check if "MinUpTime" value is empty in the second file
                    if pd.isnull(second_row['MinUpTime']):
                        # Find the row in the first file with the closest "PowerCapacity" value
                        closest_row = first_fuel_filtered.iloc[(first_fuel_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                        # Copy "MinUpTime" value from the first file to the second file
                        if not closest_row.empty:
                            second_df.at[index, 'MinUpTime'] = closest_row['MinUpTime'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("MinUpTime values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'Fuel', 'MinUpTime']
not_defined_units_file_path = '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_all_data_not_defined_units.csv'

copy_min_up_time_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_data_file_path, common_columns, not_defined_units_file_path)

Copying MinUpTime values...
MinUpTime values copied successfully.


In [52]:
import pandas as pd
import os

def copy_min_up_time_values(first_file_path, second_file_path, common_columns):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying MinUpTime values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over rows in the second file
        for index, second_row in second_filtered.iterrows():
            # Check if "MinUpTime" value is empty in the second file
            if pd.isnull(second_row['MinUpTime']):
                # Find the row in the first file with the closest "PowerCapacity" value
                closest_row = first_filtered.iloc[(first_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                # Copy "MinUpTime" value from the first file to the second file
                if not closest_row.empty:
                    second_df.at[index, 'MinUpTime'] = closest_row['MinUpTime'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("MinUpTime values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'MinUpTime']

copy_min_up_time_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_data_file_path, common_columns)

Copying MinUpTime values...
MinUpTime values copied successfully.


In [53]:
import pandas as pd
import os

def copy_min_up_time_values(first_file_path, second_file_path, common_columns):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying MinUpTime values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over rows in the second file
        for index, second_row in second_filtered.iterrows():
            # Check if "MinUpTime" value is empty in the second file
            if pd.isnull(second_row['MinUpTime']):
                # Find the row in the first file with the closest "PowerCapacity" value
                closest_row = first_filtered.iloc[(first_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                # Copy "MinUpTime" value from the first file to the second file
                if not closest_row.empty:
                    second_df.at[index, 'MinUpTime'] = closest_row['MinUpTime'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("MinUpTime values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'MinUpTime']

copy_min_up_time_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_clustered_data_file_path, common_columns)

Copying MinUpTime values...
MinUpTime values copied successfully.


In [54]:
import pandas as pd
import os

def copy_min_up_time_values(first_file_path, second_file_path, common_columns):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying MinUpTime values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over rows in the second file
        for index, second_row in second_filtered.iterrows():
            # Check if "MinUpTime" value is empty in the second file
            if pd.isnull(second_row['MinUpTime']):
                # Find the row in the first file with the closest "PowerCapacity" value
                closest_row = first_filtered.iloc[(first_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                # Copy "MinUpTime" value from the first file to the second file
                if not closest_row.empty:
                    second_df.at[index, 'MinUpTime'] = closest_row['MinUpTime'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("MinUpTime values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'MinUpTime']

copy_min_up_time_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_clustered_data_file_path, common_columns)

Copying MinUpTime values...
MinUpTime values copied successfully.


3.13. MinDownTime RampUpRate Fulfilling

    This part copy the closer value of the MinDownTime technical feauture from the already agregated data base EU_Power_Units_Technical_Features.csv using the notebook: EU_Power_Plant_Technical_Data_Base_Gathering.ipynb sellecting in base on the Technology, the Fuel and the PowerCapacity features. This is done just to the empty fields.

In [55]:
import pandas as pd
import os

def copy_min_down_time_values(first_file_path, second_file_path, common_columns, not_defined_units_file_path):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying MinDownTime values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over unique values of "Fuel" in the second file
        for fuel in second_filtered['Fuel'].unique():
            # Filter rows with matching "Fuel" in both files
            first_fuel_filtered = first_filtered[first_filtered['Fuel'] == fuel]
            second_fuel_filtered = second_filtered[second_filtered['Fuel'] == fuel]

            # Check if any rows exist in the first file for this combination of "Technology" and "Fuel"
            if first_fuel_filtered.empty:
                # No matching rows found, so copy the entire rows from the second file to the not defined units file
                with open(not_defined_units_file_path, 'a') as f:
                    for index, second_row in second_fuel_filtered.iterrows():
                        f.write(','.join(map(str, second_row.values)) + '\n')
            else:
                # Iterate over rows in the second file
                for index, second_row in second_fuel_filtered.iterrows():
                    # Check if "MinDownTime" value is empty in the second file
                    if pd.isnull(second_row['MinDownTime']):
                        # Find the row in the first file with the closest "PowerCapacity" value
                        closest_row = first_fuel_filtered.iloc[(first_fuel_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                        # Copy "MinDownTime" value from the first file to the second file
                        if not closest_row.empty:
                            second_df.at[index, 'MinDownTime'] = closest_row['MinDownTime'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("MinDownTime values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'Fuel', 'MinDownTime']
not_defined_units_file_path = '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_all_data_not_defined_units.csv'

copy_min_down_time_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_data_file_path, common_columns, not_defined_units_file_path)

Copying MinDownTime values...
MinDownTime values copied successfully.


In [56]:
import pandas as pd
import os

def copy_min_down_time_values(first_file_path, second_file_path, common_columns):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying MinDownTime values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over rows in the second file
        for index, second_row in second_filtered.iterrows():
            # Check if "MinDownTime" value is empty in the second file
            if pd.isnull(second_row['MinDownTime']):
                # Find the row in the first file with the closest "PowerCapacity" value
                closest_row = first_filtered.iloc[(first_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                # Copy "MinDownTime" value from the first file to the second file
                if not closest_row.empty:
                    second_df.at[index, 'MinDownTime'] = closest_row['MinDownTime'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("MinDownTime values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'MinDownTime']

copy_min_down_time_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_data_file_path, common_columns)

Copying MinDownTime values...
MinDownTime values copied successfully.


In [57]:
import pandas as pd
import os

def copy_min_down_time_values(first_file_path, second_file_path, common_columns, not_defined_units_file_path):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying MinDownTime values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over unique values of "Fuel" in the second file
        for fuel in second_filtered['Fuel'].unique():
            # Filter rows with matching "Fuel" in both files
            first_fuel_filtered = first_filtered[first_filtered['Fuel'] == fuel]
            second_fuel_filtered = second_filtered[second_filtered['Fuel'] == fuel]

            # Check if any rows exist in the first file for this combination of "Technology" and "Fuel"
            if first_fuel_filtered.empty:
                # No matching rows found, so copy the entire rows from the second file to the not defined units file
                with open(not_defined_units_file_path, 'a') as f:
                    for index, second_row in second_fuel_filtered.iterrows():
                        f.write(','.join(map(str, second_row.values)) + '\n')
            else:
                # Iterate over rows in the second file
                for index, second_row in second_fuel_filtered.iterrows():
                    # Check if "MinDownTime" value is empty in the second file
                    if pd.isnull(second_row['MinDownTime']):
                        # Find the row in the first file with the closest "PowerCapacity" value
                        closest_row = first_fuel_filtered.iloc[(first_fuel_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                        # Copy "MinDownTime" value from the first file to the second file
                        if not closest_row.empty:
                            second_df.at[index, 'MinDownTime'] = closest_row['MinDownTime'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("MinDownTime values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'Fuel', 'MinDownTime']
not_defined_units_file_path = '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_all_data_not_defined_units.csv'

copy_min_down_time_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_clustered_data_file_path, common_columns, not_defined_units_file_path)

Copying MinDownTime values...
MinDownTime values copied successfully.


In [58]:
import pandas as pd
import os

def copy_min_down_time_values(first_file_path, second_file_path, common_columns):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying MinDownTime values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over rows in the second file
        for index, second_row in second_filtered.iterrows():
            # Check if "MinDownTime" value is empty in the second file
            if pd.isnull(second_row['MinDownTime']):
                # Find the row in the first file with the closest "PowerCapacity" value
                closest_row = first_filtered.iloc[(first_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                # Copy "MinDownTime" value from the first file to the second file
                if not closest_row.empty:
                    second_df.at[index, 'MinDownTime'] = closest_row['MinDownTime'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("MinDownTime values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'MinDownTime']

copy_min_down_time_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_clustered_data_file_path, common_columns)

Copying MinDownTime values...
MinDownTime values copied successfully.


3.14. RampUpRate Field Fulfilling

    This part copy the closer value of the RampUpRate technical feauture from the already agregated data base EU_Power_Units_Technical_Features.csv using the notebook: EU_Power_Plant_Technical_Data_Base_Gathering.ipynb sellecting in base on the Technology, the Fuel and the PowerCapacity features. This is done just to the empty fields.

In [59]:
import pandas as pd
import os

def copy_ramp_up_rate_values(first_file_path, second_file_path, common_columns, not_defined_units_file_path):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying RampUpRate values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over unique values of "Fuel" in the second file
        for fuel in second_filtered['Fuel'].unique():
            # Filter rows with matching "Fuel" in both files
            first_fuel_filtered = first_filtered[first_filtered['Fuel'] == fuel]
            second_fuel_filtered = second_filtered[second_filtered['Fuel'] == fuel]

            # Check if any rows exist in the first file for this combination of "Technology" and "Fuel"
            if first_fuel_filtered.empty:
                # No matching rows found, so copy the entire rows from the second file to the not defined units file
                with open(not_defined_units_file_path, 'a') as f:
                    for index, second_row in second_fuel_filtered.iterrows():
                        f.write(','.join(map(str, second_row.values)) + '\n')
            else:
                # Iterate over rows in the second file
                for index, second_row in second_fuel_filtered.iterrows():
                    # Check if "RampUpRate" value is empty in the second file
                    if pd.isnull(second_row['RampUpRate']):
                        # Find the row in the first file with the closest "PowerCapacity" value
                        closest_row = first_fuel_filtered.iloc[(first_fuel_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                        # Copy "RampUpRate" value from the first file to the second file
                        if not closest_row.empty:
                            second_df.at[index, 'RampUpRate'] = closest_row['RampUpRate'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("RampUpRate values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'Fuel', 'RampUpRate']
not_defined_units_file_path = '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_all_data_not_defined_units.csv'

copy_ramp_up_rate_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_data_file_path, common_columns, not_defined_units_file_path)

Copying RampUpRate values...
RampUpRate values copied successfully.


In [60]:
import pandas as pd
import os

def copy_ramp_down_rate_values(first_file_path, second_file_path, common_columns, not_defined_units_file_path):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying RampDownRate values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over rows in the second file
        for index, second_row in second_filtered.iterrows():
            # Check if "RampDownRate" value is empty in the second file
            if pd.isnull(second_row['RampDownRate']):
                # Find the row in the first file with the closest "PowerCapacity" value
                closest_row = first_filtered.iloc[(first_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                # Copy "RampDownRate" value from the first file to the second file
                if not closest_row.empty:
                    second_df.at[index, 'RampDownRate'] = closest_row['RampDownRate'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("RampDownRate values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'RampDownRate']
not_defined_units_file_path = '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_all_data_not_defined_units.csv'

copy_ramp_down_rate_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_data_file_path, common_columns, not_defined_units_file_path)

Copying RampDownRate values...
RampDownRate values copied successfully.


In [61]:
import pandas as pd
import os

def copy_ramp_up_rate_values(first_file_path, second_file_path, common_columns, not_defined_units_file_path):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying RampUpRate values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over unique values of "Fuel" in the second file
        for fuel in second_filtered['Fuel'].unique():
            # Filter rows with matching "Fuel" in both files
            first_fuel_filtered = first_filtered[first_filtered['Fuel'] == fuel]
            second_fuel_filtered = second_filtered[second_filtered['Fuel'] == fuel]

            # Check if any rows exist in the first file for this combination of "Technology" and "Fuel"
            if first_fuel_filtered.empty:
                # No matching rows found, so copy the entire rows from the second file to the not defined units file
                with open(not_defined_units_file_path, 'a') as f:
                    for index, second_row in second_fuel_filtered.iterrows():
                        f.write(','.join(map(str, second_row.values)) + '\n')
            else:
                # Iterate over rows in the second file
                for index, second_row in second_fuel_filtered.iterrows():
                    # Check if "RampUpRate" value is empty in the second file
                    if pd.isnull(second_row['RampUpRate']):
                        # Find the row in the first file with the closest "PowerCapacity" value
                        closest_row = first_fuel_filtered.iloc[(first_fuel_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                        # Copy "RampUpRate" value from the first file to the second file
                        if not closest_row.empty:
                            second_df.at[index, 'RampUpRate'] = closest_row['RampUpRate'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("RampUpRate values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'Fuel', 'RampUpRate']
not_defined_units_file_path = '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_all_data_not_defined_units.csv'

copy_ramp_up_rate_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_clustered_data_file_path, common_columns, not_defined_units_file_path)

Copying RampUpRate values...
RampUpRate values copied successfully.


In [62]:
import pandas as pd
import os

def copy_min_down_time_values(first_file_path, second_file_path, common_columns):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying MinDownTime values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over rows in the second file
        for index, second_row in second_filtered.iterrows():
            # Check if "MinDownTime" value is empty in the second file
            if pd.isnull(second_row['MinDownTime']):
                # Find the row in the first file with the closest "PowerCapacity" value
                closest_row = first_filtered.iloc[(first_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                # Copy "MinDownTime" value from the first file to the second file
                if not closest_row.empty:
                    second_df.at[index, 'MinDownTime'] = closest_row['MinDownTime'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("MinDownTime values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'MinDownTime']

copy_min_down_time_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_clustered_data_file_path, common_columns)

Copying MinDownTime values...
MinDownTime values copied successfully.


3.15. RampDownRate Fulfilling

    This part copy the closer value of the RampDownRate technical feauture from the already agregated data base EU_Power_Units_Technical_Features.csv using the notebook: EU_Power_Plant_Technical_Data_Base_Gathering.ipynb sellecting in base on the Technology, the Fuel and the PowerCapacity features. This is done just to the empty fields.

In [63]:
import pandas as pd
import os

def copy_ramp_down_rate_values(first_file_path, second_file_path, common_columns, not_defined_units_file_path):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying RampDownRate values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over unique values of "Fuel" in the second file
        for fuel in second_filtered['Fuel'].unique():
            # Filter rows with matching "Fuel" in both files
            first_fuel_filtered = first_filtered[first_filtered['Fuel'] == fuel]
            second_fuel_filtered = second_filtered[second_filtered['Fuel'] == fuel]

            # Check if any rows exist in the first file for this combination of "Technology" and "Fuel"
            if first_fuel_filtered.empty:
                # No matching rows found, so copy the entire rows from the second file to the not defined units file
                with open(not_defined_units_file_path, 'a') as f:
                    for index, second_row in second_fuel_filtered.iterrows():
                        f.write(','.join(map(str, second_row.values)) + '\n')
            else:
                # Iterate over rows in the second file
                for index, second_row in second_fuel_filtered.iterrows():
                    # Check if "RampDownRate" value is empty in the second file
                    if pd.isnull(second_row['RampDownRate']):
                        # Find the row in the first file with the closest "PowerCapacity" value
                        closest_row = first_fuel_filtered.iloc[(first_fuel_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                        # Copy "RampDownRate" value from the first file to the second file
                        if not closest_row.empty:
                            second_df.at[index, 'RampDownRate'] = closest_row['RampDownRate'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("RampDownRate values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'Fuel', 'RampDownRate']
not_defined_units_file_path = '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_all_data_not_defined_units.csv'

copy_ramp_down_rate_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_data_file_path, common_columns, not_defined_units_file_path)

Copying RampDownRate values...
RampDownRate values copied successfully.


In [64]:
import pandas as pd
import os

def copy_ramp_down_rate_values(first_file_path, second_file_path, common_columns):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying RampDownRate values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over rows in the second file
        for index, second_row in second_filtered.iterrows():
            # Check if "RampDownRate" value is empty in the second file
            if pd.isnull(second_row['RampDownRate']):
                # Find the row in the first file with the closest "PowerCapacity" value
                closest_row = first_filtered.iloc[(first_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                # Copy "RampDownRate" value from the first file to the second file
                if not closest_row.empty:
                    second_df.at[index, 'RampDownRate'] = closest_row['RampDownRate'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("RampDownRate values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'RampDownRate']

copy_ramp_down_rate_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_data_file_path, common_columns)

Copying RampDownRate values...
RampDownRate values copied successfully.


In [65]:
import pandas as pd
import os

def copy_ramp_down_rate_values(first_file_path, second_file_path, common_columns, not_defined_units_file_path):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying RampDownRate values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over unique values of "Fuel" in the second file
        for fuel in second_filtered['Fuel'].unique():
            # Filter rows with matching "Fuel" in both files
            first_fuel_filtered = first_filtered[first_filtered['Fuel'] == fuel]
            second_fuel_filtered = second_filtered[second_filtered['Fuel'] == fuel]

            # Check if any rows exist in the first file for this combination of "Technology" and "Fuel"
            if first_fuel_filtered.empty:
                # No matching rows found, so copy the entire rows from the second file to the not defined units file
                with open(not_defined_units_file_path, 'a') as f:
                    for index, second_row in second_fuel_filtered.iterrows():
                        f.write(','.join(map(str, second_row.values)) + '\n')
            else:
                # Iterate over rows in the second file
                for index, second_row in second_fuel_filtered.iterrows():
                    # Check if "RampDownRate" value is empty in the second file
                    if pd.isnull(second_row['RampDownRate']):
                        # Find the row in the first file with the closest "PowerCapacity" value
                        closest_row = first_fuel_filtered.iloc[(first_fuel_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                        # Copy "RampDownRate" value from the first file to the second file
                        if not closest_row.empty:
                            second_df.at[index, 'RampDownRate'] = closest_row['RampDownRate'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("RampDownRate values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'Fuel', 'RampDownRate']
not_defined_units_file_path = '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_all_data_not_defined_units.csv'

copy_ramp_down_rate_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_clustered_data_file_path, common_columns, not_defined_units_file_path)

Copying RampDownRate values...
RampDownRate values copied successfully.


In [66]:
import pandas as pd
import os

def copy_ramp_down_rate_values(first_file_path, second_file_path, common_columns):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying RampDownRate values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over rows in the second file
        for index, second_row in second_filtered.iterrows():
            # Check if "RampDownRate" value is empty in the second file
            if pd.isnull(second_row['RampDownRate']):
                # Find the row in the first file with the closest "PowerCapacity" value
                closest_row = first_filtered.iloc[(first_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                # Copy "RampDownRate" value from the first file to the second file
                if not closest_row.empty:
                    second_df.at[index, 'RampDownRate'] = closest_row['RampDownRate'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("RampDownRate values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'RampDownRate']

copy_ramp_down_rate_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_clustered_data_file_path, common_columns)

Copying RampDownRate values...
RampDownRate values copied successfully.


3.16. StartUpCost Field Fulfilling

    This part copy the closer value of the StartUpCost technical feauture from the already agregated data base EU_Power_Units_Technical_Features.csv using the notebook: EU_Power_Plant_Technical_Data_Base_Gathering.ipynb sellecting in base on the Technology, the Fuel and the PowerCapacity features. This is done just to the empty fields.

In [67]:
import pandas as pd
import os

def copy_start_up_cost_values(first_file_path, second_file_path, common_columns, not_defined_units_file_path):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying StartUpCost values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over unique values of "Fuel" in the second file
        for fuel in second_filtered['Fuel'].unique():
            # Filter rows with matching "Fuel" in both files
            first_fuel_filtered = first_filtered[first_filtered['Fuel'] == fuel]
            second_fuel_filtered = second_filtered[second_filtered['Fuel'] == fuel]

            # Check if any rows exist in the first file for this combination of "Technology" and "Fuel"
            if first_fuel_filtered.empty:
                # No matching rows found, so copy the entire rows from the second file to the not defined units file
                with open(not_defined_units_file_path, 'a') as f:
                    for index, second_row in second_fuel_filtered.iterrows():
                        f.write(','.join(map(str, second_row.values)) + '\n')
            else:
                # Iterate over rows in the second file
                for index, second_row in second_fuel_filtered.iterrows():
                    # Check if "StartUpCost" value is empty in the second file
                    if pd.isnull(second_row['StartUpCost']):
                        # Find the row in the first file with the closest "PowerCapacity" value
                        closest_row = first_fuel_filtered.iloc[(first_fuel_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                        # Copy "StartUpCost" value from the first file to the second file
                        if not closest_row.empty:
                            second_df.at[index, 'StartUpCost'] = closest_row['StartUpCost'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("StartUpCost values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'Fuel', 'StartUpCost']
not_defined_units_file_path = '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_all_data_not_defined_units.csv'

copy_start_up_cost_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_data_file_path, common_columns, not_defined_units_file_path)

Copying StartUpCost values...
StartUpCost values copied successfully.


In [68]:
import pandas as pd
import os

def copy_start_up_cost_values(first_file_path, second_file_path, common_columns):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying StartUpCost values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over rows in the second file
        for index, second_row in second_filtered.iterrows():
            # Check if "StartUpCost" value is empty in the second file
            if pd.isnull(second_row['StartUpCost']):
                # Find the row in the first file with the closest "PowerCapacity" value
                closest_row = first_filtered.iloc[(first_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                # Copy "StartUpCost" value from the first file to the second file
                if not closest_row.empty:
                    second_df.at[index, 'StartUpCost'] = closest_row['StartUpCost'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("StartUpCost values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'StartUpCost']

copy_start_up_cost_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_data_file_path, common_columns)

Copying StartUpCost values...
StartUpCost values copied successfully.


In [69]:
import pandas as pd
import os

def copy_start_up_cost_values(first_file_path, second_file_path, common_columns, not_defined_units_file_path):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying StartUpCost values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over unique values of "Fuel" in the second file
        for fuel in second_filtered['Fuel'].unique():
            # Filter rows with matching "Fuel" in both files
            first_fuel_filtered = first_filtered[first_filtered['Fuel'] == fuel]
            second_fuel_filtered = second_filtered[second_filtered['Fuel'] == fuel]

            # Check if any rows exist in the first file for this combination of "Technology" and "Fuel"
            if first_fuel_filtered.empty:
                # No matching rows found, so copy the entire rows from the second file to the not defined units file
                with open(not_defined_units_file_path, 'a') as f:
                    for index, second_row in second_fuel_filtered.iterrows():
                        f.write(','.join(map(str, second_row.values)) + '\n')
            else:
                # Iterate over rows in the second file
                for index, second_row in second_fuel_filtered.iterrows():
                    # Check if "StartUpCost" value is empty in the second file
                    if pd.isnull(second_row['StartUpCost']):
                        # Find the row in the first file with the closest "PowerCapacity" value
                        closest_row = first_fuel_filtered.iloc[(first_fuel_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                        # Copy "StartUpCost" value from the first file to the second file
                        if not closest_row.empty:
                            second_df.at[index, 'StartUpCost'] = closest_row['StartUpCost'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("StartUpCost values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'Fuel', 'StartUpCost']
not_defined_units_file_path = '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_all_data_not_defined_units.csv'

copy_start_up_cost_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_clustered_data_file_path, common_columns, not_defined_units_file_path)

Copying StartUpCost values...
StartUpCost values copied successfully.


In [70]:
import pandas as pd
import os

def copy_start_up_cost_values(first_file_path, second_file_path, common_columns):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying StartUpCost values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over rows in the second file
        for index, second_row in second_filtered.iterrows():
            # Check if "StartUpCost" value is empty in the second file
            if pd.isnull(second_row['StartUpCost']):
                # Find the row in the first file with the closest "PowerCapacity" value
                closest_row = first_filtered.iloc[(first_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                # Copy "StartUpCost" value from the first file to the second file
                if not closest_row.empty:
                    second_df.at[index, 'StartUpCost'] = closest_row['StartUpCost'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("StartUpCost values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'StartUpCost']

copy_start_up_cost_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_clustered_data_file_path, common_columns)

Copying StartUpCost values...
StartUpCost values copied successfully.


3.17. NoLoadCost_pu Field Fulfilling

    This part copy the closer value of the NoLoadCost_pu technical feauture from the already agregated data base EU_Power_Units_Technical_Features.csv using the notebook: EU_Power_Plant_Technical_Data_Base_Gathering.ipynb sellecting in base on the Technology, the Fuel and the PowerCapacity features. This is done just to the empty fields.

In [71]:
import pandas as pd
import os

def copy_no_load_cost_values(first_file_path, second_file_path, common_columns, not_defined_units_file_path):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying NoLoadCost_pu values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over unique values of "Fuel" in the second file
        for fuel in second_filtered['Fuel'].unique():
            # Filter rows with matching "Fuel" in both files
            first_fuel_filtered = first_filtered[first_filtered['Fuel'] == fuel]
            second_fuel_filtered = second_filtered[second_filtered['Fuel'] == fuel]

            # Check if any rows exist in the first file for this combination of "Technology" and "Fuel"
            if first_fuel_filtered.empty:
                # No matching rows found, so copy the entire rows from the second file to the not defined units file
                with open(not_defined_units_file_path, 'a') as f:
                    for index, second_row in second_fuel_filtered.iterrows():
                        f.write(','.join(map(str, second_row.values)) + '\n')
            else:
                # Iterate over rows in the second file
                for index, second_row in second_fuel_filtered.iterrows():
                    # Check if "NoLoadCost_pu" value is empty in the second file
                    if pd.isnull(second_row['NoLoadCost_pu']):
                        # Find the row in the first file with the closest "PowerCapacity" value
                        closest_row = first_fuel_filtered.iloc[(first_fuel_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                        # Copy "NoLoadCost_pu" value from the first file to the second file
                        if not closest_row.empty:
                            second_df.at[index, 'NoLoadCost_pu'] = closest_row['NoLoadCost_pu'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("NoLoadCost_pu values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'Fuel', 'NoLoadCost_pu']
not_defined_units_file_path = '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_all_data_not_defined_units.csv'

copy_no_load_cost_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_data_file_path, common_columns, not_defined_units_file_path)

Copying NoLoadCost_pu values...
NoLoadCost_pu values copied successfully.


In [72]:
import pandas as pd
import os

def copy_no_load_cost_values(first_file_path, second_file_path, common_columns):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying NoLoadCost_pu values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over rows in the second file
        for index, second_row in second_filtered.iterrows():
            # Check if "NoLoadCost_pu" value is empty in the second file
            if pd.isnull(second_row['NoLoadCost_pu']):
                # Find the row in the first file with the closest "PowerCapacity" value
                closest_row = first_filtered.iloc[(first_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                # Copy "NoLoadCost_pu" value from the first file to the second file
                if not closest_row.empty:
                    second_df.at[index, 'NoLoadCost_pu'] = closest_row['NoLoadCost_pu'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("NoLoadCost_pu values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'NoLoadCost_pu']

copy_no_load_cost_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_data_file_path, common_columns)

Copying NoLoadCost_pu values...
NoLoadCost_pu values copied successfully.


In [73]:
import pandas as pd
import os

def copy_no_load_cost_values(first_file_path, second_file_path, common_columns, not_defined_units_file_path):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying NoLoadCost_pu values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over unique values of "Fuel" in the second file
        for fuel in second_filtered['Fuel'].unique():
            # Filter rows with matching "Fuel" in both files
            first_fuel_filtered = first_filtered[first_filtered['Fuel'] == fuel]
            second_fuel_filtered = second_filtered[second_filtered['Fuel'] == fuel]

            # Check if any rows exist in the first file for this combination of "Technology" and "Fuel"
            if first_fuel_filtered.empty:
                # No matching rows found, so copy the entire rows from the second file to the not defined units file
                with open(not_defined_units_file_path, 'a') as f:
                    for index, second_row in second_fuel_filtered.iterrows():
                        f.write(','.join(map(str, second_row.values)) + '\n')
            else:
                # Iterate over rows in the second file
                for index, second_row in second_fuel_filtered.iterrows():
                    # Check if "NoLoadCost_pu" value is empty in the second file
                    if pd.isnull(second_row['NoLoadCost_pu']):
                        # Find the row in the first file with the closest "PowerCapacity" value
                        closest_row = first_fuel_filtered.iloc[(first_fuel_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                        # Copy "NoLoadCost_pu" value from the first file to the second file
                        if not closest_row.empty:
                            second_df.at[index, 'NoLoadCost_pu'] = closest_row['NoLoadCost_pu'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("NoLoadCost_pu values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'Fuel', 'NoLoadCost_pu']
not_defined_units_file_path = '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_all_data_not_defined_units.csv'

copy_no_load_cost_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_clustered_data_file_path, common_columns, not_defined_units_file_path)

Copying NoLoadCost_pu values...
NoLoadCost_pu values copied successfully.


In [74]:
import pandas as pd
import os

def copy_no_load_cost_values(first_file_path, second_file_path, common_columns):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying NoLoadCost_pu values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over rows in the second file
        for index, second_row in second_filtered.iterrows():
            # Check if "NoLoadCost_pu" value is empty in the second file
            if pd.isnull(second_row['NoLoadCost_pu']):
                # Find the row in the first file with the closest "PowerCapacity" value
                closest_row = first_filtered.iloc[(first_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                # Copy "NoLoadCost_pu" value from the first file to the second file
                if not closest_row.empty:
                    second_df.at[index, 'NoLoadCost_pu'] = closest_row['NoLoadCost_pu'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("NoLoadCost_pu values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'NoLoadCost_pu']

copy_no_load_cost_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_clustered_data_file_path, common_columns)

Copying NoLoadCost_pu values...
NoLoadCost_pu values copied successfully.


3.18. RampingCost Field Fulfilling

    This part copy the closer value of the RampingCost technical feauture from the already agregated data base EU_Power_Units_Technical_Features.csv using the notebook: EU_Power_Plant_Technical_Data_Base_Gathering.ipynb sellecting in base on the Technology, the Fuel and the PowerCapacity features. This is done just to the empty fields.

In [75]:
import pandas as pd
import os

def copy_ramping_cost_values(first_file_path, second_file_path, common_columns, not_defined_units_file_path):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying RampingCost values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over unique values of "Fuel" in the second file
        for fuel in second_filtered['Fuel'].unique():
            # Filter rows with matching "Fuel" in both files
            first_fuel_filtered = first_filtered[first_filtered['Fuel'] == fuel]
            second_fuel_filtered = second_filtered[second_filtered['Fuel'] == fuel]

            # Check if any rows exist in the first file for this combination of "Technology" and "Fuel"
            if first_fuel_filtered.empty:
                # No matching rows found, so copy the entire rows from the second file to the not defined units file
                with open(not_defined_units_file_path, 'a') as f:
                    for index, second_row in second_fuel_filtered.iterrows():
                        f.write(','.join(map(str, second_row.values)) + '\n')
            else:
                # Iterate over rows in the second file
                for index, second_row in second_fuel_filtered.iterrows():
                    # Check if "RampingCost" value is empty in the second file
                    if pd.isnull(second_row['RampingCost']):
                        # Find the row in the first file with the closest "PowerCapacity" value
                        closest_row = first_fuel_filtered.iloc[(first_fuel_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                        # Copy "RampingCost" value from the first file to the second file
                        if not closest_row.empty:
                            second_df.at[index, 'RampingCost'] = closest_row['RampingCost'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("RampingCost values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'Fuel', 'RampingCost']
not_defined_units_file_path = '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_all_data_not_defined_units.csv'

copy_ramping_cost_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_data_file_path, common_columns, not_defined_units_file_path)

Copying RampingCost values...
RampingCost values copied successfully.


In [76]:
import pandas as pd
import os

def copy_ramping_cost_values(first_file_path, second_file_path, common_columns):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying RampingCost values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over rows in the second file
        for index, second_row in second_filtered.iterrows():
            # Check if "RampingCost" value is empty in the second file
            if pd.isnull(second_row['RampingCost']):
                # Find the row in the first file with the closest "PowerCapacity" value
                closest_row = first_filtered.iloc[(first_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                # Copy "RampingCost" value from the first file to the second file
                if not closest_row.empty:
                    second_df.at[index, 'RampingCost'] = closest_row['RampingCost'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("RampingCost values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'RampingCost']

copy_ramping_cost_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_data_file_path, common_columns)

Copying RampingCost values...
RampingCost values copied successfully.


In [77]:
import pandas as pd
import os

def copy_ramping_cost_values(first_file_path, second_file_path, common_columns, not_defined_units_file_path):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying RampingCost values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over unique values of "Fuel" in the second file
        for fuel in second_filtered['Fuel'].unique():
            # Filter rows with matching "Fuel" in both files
            first_fuel_filtered = first_filtered[first_filtered['Fuel'] == fuel]
            second_fuel_filtered = second_filtered[second_filtered['Fuel'] == fuel]

            # Check if any rows exist in the first file for this combination of "Technology" and "Fuel"
            if first_fuel_filtered.empty:
                # No matching rows found, so copy the entire rows from the second file to the not defined units file
                with open(not_defined_units_file_path, 'a') as f:
                    for index, second_row in second_fuel_filtered.iterrows():
                        f.write(','.join(map(str, second_row.values)) + '\n')
            else:
                # Iterate over rows in the second file
                for index, second_row in second_fuel_filtered.iterrows():
                    # Check if "RampingCost" value is empty in the second file
                    if pd.isnull(second_row['RampingCost']):
                        # Find the row in the first file with the closest "PowerCapacity" value
                        closest_row = first_fuel_filtered.iloc[(first_fuel_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                        # Copy "RampingCost" value from the first file to the second file
                        if not closest_row.empty:
                            second_df.at[index, 'RampingCost'] = closest_row['RampingCost'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("RampingCost values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'Fuel', 'RampingCost']
not_defined_units_file_path = '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_all_data_not_defined_units.csv'

copy_ramping_cost_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_clustered_data_file_path, common_columns, not_defined_units_file_path)

Copying RampingCost values...
RampingCost values copied successfully.


In [78]:
import pandas as pd
import os

def copy_ramping_cost_values(first_file_path, second_file_path, common_columns):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying RampingCost values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over rows in the second file
        for index, second_row in second_filtered.iterrows():
            # Check if "RampingCost" value is empty in the second file
            if pd.isnull(second_row['RampingCost']):
                # Find the row in the first file with the closest "PowerCapacity" value
                closest_row = first_filtered.iloc[(first_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                # Copy "RampingCost" value from the first file to the second file
                if not closest_row.empty:
                    second_df.at[index, 'RampingCost'] = closest_row['RampingCost'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("RampingCost values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'RampingCost']

copy_ramping_cost_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_clustered_data_file_path, common_columns)

Copying RampingCost values...
RampingCost values copied successfully.


3.19. PartLoadMin Field

In [79]:
import pandas as pd
import os

def copy_part_load_min_values(first_file_path, second_file_path, common_columns, not_defined_units_file_path):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying PartLoadMin values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over unique values of "Fuel" in the second file
        for fuel in second_filtered['Fuel'].unique():
            # Filter rows with matching "Fuel" in both files
            first_fuel_filtered = first_filtered[first_filtered['Fuel'] == fuel]
            second_fuel_filtered = second_filtered[second_filtered['Fuel'] == fuel]

            # Check if any rows exist in the first file for this combination of "Technology" and "Fuel"
            if first_fuel_filtered.empty:
                # No matching rows found, so copy the entire rows from the second file to the not defined units file
                with open(not_defined_units_file_path, 'a') as f:
                    for index, second_row in second_fuel_filtered.iterrows():
                        f.write(','.join(map(str, second_row.values)) + '\n')
            else:
                # Iterate over rows in the second file
                for index, second_row in second_fuel_filtered.iterrows():
                    # Check if "PartLoadMin" value is empty in the second file
                    if pd.isnull(second_row['PartLoadMin']):
                        # Find the row in the first file with the closest "PowerCapacity" value
                        closest_row = first_fuel_filtered.iloc[(first_fuel_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                        # Copy "PartLoadMin" value from the first file to the second file
                        if not closest_row.empty:
                            second_df.at[index, 'PartLoadMin'] = closest_row['PartLoadMin'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("PartLoadMin values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'Fuel', 'PartLoadMin']
not_defined_units_file_path = '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_all_data_not_defined_units.csv'

copy_part_load_min_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_data_file_path, common_columns, not_defined_units_file_path)

Copying PartLoadMin values...
PartLoadMin values copied successfully.


In [80]:
import pandas as pd
import os

def copy_part_load_min_values(first_file_path, second_file_path, common_columns):
    # Read CSV files into DataFrames
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    print("Copying PartLoadMin values...")

    # Iterate over unique values of "Technology" in the second file
    for technology in second_df['Technology'].unique():
        # Filter rows in both files where "Technology" matches
        first_filtered = first_df[first_df['Technology'] == technology]
        second_filtered = second_df[second_df['Technology'] == technology]

        # Iterate over rows in the second file
        for index, second_row in second_filtered.iterrows():
            # Check if "PartLoadMin" value is empty in the second file
            if pd.isnull(second_row['PartLoadMin']):
                # Find the row in the first file with the closest "PowerCapacity" value
                closest_row = first_filtered.iloc[(first_filtered['PowerCapacity'] - second_row['PowerCapacity']).abs().argsort()[:1]]

                # Copy "PartLoadMin" value from the first file to the second file
                if not closest_row.empty:
                    second_df.at[index, 'PartLoadMin'] = closest_row['PartLoadMin'].values[0]

    # Write updated DataFrame to the second file
    second_df.to_csv(second_file_path, index=False)

    print("PartLoadMin values copied successfully.")

common_columns = ['PowerCapacity', 'Technology', 'PartLoadMin']

copy_part_load_min_values(EU_Power_Units_Technical_Features_File_path, power_plants_clean_data_file_path, common_columns)

Copying PartLoadMin values...
PartLoadMin values copied successfully.


3.20. MinEfficiency Field

3.21. StartUpTime Field

3.22. CO2Intensity Field

3.23. COP Field

3.24. RampUpRate Field

3.25. Tnominal Field

3.26. coef_COP_a Field

3.27. coef_COP_b Field

3.28. STOCapacity Field

3.29. STOSelfDischarge Field

3.30. STOMaxChargingPower Field

3.31. STOChargingEfficiency Field

3.32. WaterWithdrawal Field

3.33. WaterConsumption Field