<div style="text-align: center; margin-left: 0em; font-weight: bold; font-size: 20px; font-family: TimesNewRoman;">
    POWER PLANTS DATA PROCESSING - Main Notebook
</div>

<div style="text-align: left; margin-left: 0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Each part of the following script was used to proccess the raw data for power plants units of the Dispa-SET_Unleash project.
    <br>
    Read explanation text cells to follow and understand all the process until final results were got stept by step.
</div>

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    1. Notebook Set Up
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Importing needed libraries
</div>

In [1]:
import os
import csv
from datetime import datetime
import requests
import pandas as pd
from shutil import move

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    2. Dispa-SET_Unleash Folder Path
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Determinning dynamically the zone_folder_path based on the location of the "Dispa-SET_Unleash" folder relative to the current working directory. 
<br>
    If the "Dispa-SET_Unleash" folder is copied to a different machine or location, the dispaSET_unleash_folder_path variable will automatically adjust accordingly.
</div>

In [2]:
# Get the current working directory
current_directory = os.getcwd()

# Navigate to the parent directory of "Dispa-SET_Unleash"
dispaSET_unleash_parent_directory = os.path.dirname(current_directory)

# Get the path to the "Dispa-SET_Unleash" folder
dispaSET_unleash_folder_path = os.path.dirname(dispaSET_unleash_parent_directory)

# Construct the dispaSET_unleash_folder_name variable
dispaSET_unleash_folder_name = os.path.basename(dispaSET_unleash_folder_path)

print("dispaSET_unleash_folder_name:", dispaSET_unleash_folder_name)
print("dispaSET_unleash_folder_path:", dispaSET_unleash_folder_path)

dispaSET_unleash_folder_name: Dispa-SET_Unleash
dispaSET_unleash_folder_path: /home/ray/Dispa-SET_Unleash


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    3. Zone(s) Creation
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Entering the zone name or names (in case of more than one zone wanted to be modelled) to create the folder where all data related to the corresponding zone are going to be storage
</div>
<div style="text-align: left; margin-left: 1.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    For European country names use the ISO 3166-1 standars i.e. AT, BE, BG, CH.... etc. to give the zone_name.
<br>
    For non European countries it would rather to call the zone_name with the same word of how it is defined in the data to be downloaded and processed. e.g. 
</div>
    <div style="text-align: left; margin-left: 2.00em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
        If it is downloading a csv file with all power plants of Spain but just the units fo Pamplona city are wanted, and in the corresponding donwloaded file, Pamplona is refering with the acronim "PMPLN"; set the zone_name variable with the word "PMPLN".
</div>

In [3]:
# List of folder names to create
zone_names = ["DE", 
              "DK", 
              "CH",
              "BE"]

In [4]:
# Original value of dispaSET_unleash_folder_path
#dispaSET_unleash_folder_path = "/home/ray/Dispa-SET_Unleash"

# Additional string to be appended
additional_path = "/RawData/PowerPlants/"

# Construct the power_plants_raw_data_folder_path variable
power_plants_raw_data_folder_path = dispaSET_unleash_folder_path + additional_path
print("power_plants_raw_data_folder_path:", power_plants_raw_data_folder_path)

# Dictionary to store created zone paths
created_zones = {}

# Create the zone
for zone_name in zone_names:
    zone_path = os.path.join(power_plants_raw_data_folder_path, zone_name)
    os.makedirs(zone_path, exist_ok=True)
    created_zones[zone_name] = zone_path
    print(f"Created zone: {zone_path}")

# Print the created zone paths
print("Created zones:")
for zone_name, zone_path in created_zones.items():
    print(f"{zone_name}: {zone_path}")
    
created_zones

power_plants_raw_data_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/
Created zone: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE
Created zone: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK
Created zone: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH
Created zone: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE
Created zones:
DE: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE
DK: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK
CH: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH
BE: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE


{'DE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE',
 'DK': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK',
 'CH': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH',
 'BE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE'}

<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [5]:
print (f"dispaSET_unleash_folder_name: {dispaSET_unleash_folder_name}")
print (f"dispaSET_unleash_folder_path: {dispaSET_unleash_folder_path}")
print (f"power_plants_raw_data_folder_path: {power_plants_raw_data_folder_path}")
print (f"zone_names: {zone_names}")
print (f"created_zones: {created_zones}")

dispaSET_unleash_folder_name: Dispa-SET_Unleash
dispaSET_unleash_folder_path: /home/ray/Dispa-SET_Unleash
power_plants_raw_data_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/
zone_names: ['DE', 'DK', 'CH', 'BE']
created_zones: {'DE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE', 'DK': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK', 'CH': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH', 'BE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE'}


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    4. Download Link Sources
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Entering the all the download links of where the raw data is content.
    <br>
        That list is going to be saved to be used as input for next stages.
      <br>
</div>
    <div style="text-align: left; margin-left: 1.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
        Notice that to process the data all the links has to download .csv files.
</div>
    <div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    In the other hand, it is important to define which zone is refering the download link sources.
</div>
    <div style="text-align: left; margin-left: 1.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    If all the downloaded file contents data that belongs to only one zone, epecify it in the variable download_links_zone_related applying the same order than the variable download_links.
    <br>
    If the downloaded file contents data that refers different zones at the same time, specify it with the word "General" in the variable download_links_zone_related using the same order than the variable download_links.
</div>
    <div style="text-align: left; margin-left: 1.0em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    Remember that the next filtering stages depend on the correct setting of this step.

    <div style="text-align: left; margin-left: -2.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Additionally indicate the year of all data is referring to.
    <br>
    <div style="text-align: left; margin-left: 1.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    This is going to be used as the name root under which all next files are going to be created.
</div>

In [6]:
# List of the download links:
download_links = [
    'https://data.open-power-system-data.org/conventional_power_plants/2020-10-01/conventional_power_plants_EU.csv',
    'https://data.open-power-system-data.org/renewable_power_plants/2020-08-25/renewable_power_plants_DE.csv',
    'https://data.open-power-system-data.org/renewable_power_plants/2020-08-25/renewable_power_plants_DK.csv',
    'https://data.open-power-system-data.org/renewable_power_plants/2020-08-25/renewable_power_plants_CH.csv',
    'https://opendata.elia.be/api/explore/v2.1/catalog/datasets/ods036/exports/csv?lang=en&timezone=Europe%2FBrussels&use_labels=true&delimiter=%3B'
]

In [7]:
# List of zones related to the download links:
download_links_zone_related = [
    'General',
    'DE',
    'DK',
    'CH',
    'BE'
]

In [8]:
# Year to which data refers to:
data_year = '2020'

In [9]:
def save_download_links_to_csv(links, zones, folder_path, data_year):
    # Create the filename using the data year, current date, and time
    now = datetime.now()
    timestamp = now.strftime("%Y%m%d_%H%M%S")
    file_name = f"{data_year}_power_plants_raw_data_sources_{timestamp}.csv"
    
    # Create a folder with the same name as the file (without extension)
    folder_name = os.path.splitext(file_name)[0]
    folder_path = os.path.join(folder_path, folder_name)
    os.makedirs(folder_path, exist_ok=True)
    
    # Combine the folder path and filename
    file_path = os.path.join(folder_path, file_name)
    
    # Write links to CSV file
    with open(file_path, 'w', newline='') as csvfile:
        writer = csv.writer(csvfile)
        
        # Write the header
        writer.writerow(['Download_Link_Sources', 'Zone', 'File_Name'])
        
        # Write the links, zones, and file names
        for i, (link, zone) in enumerate(zip(links, zones), start=1):
            writer.writerow([link, zone, i])
    
    print(f"Download links saved to: {file_path}")
    
    return file_path, file_name

# Save the download links to a CSV file and get the file path and name
power_plants_raw_data_sources_file_path, power_plants_raw_data_sources_file_name = save_download_links_to_csv(download_links, download_links_zone_related, power_plants_raw_data_folder_path, data_year)

print("File path:", power_plants_raw_data_sources_file_path)
print("File name:", power_plants_raw_data_sources_file_name)

Download links saved to: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240406_184203/2020_power_plants_raw_data_sources_20240406_184203.csv
File path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240406_184203/2020_power_plants_raw_data_sources_20240406_184203.csv
File name: 2020_power_plants_raw_data_sources_20240406_184203.csv


<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [10]:
print (f"dispaSET_unleash_folder_name: {dispaSET_unleash_folder_name}")
print (f"dispaSET_unleash_folder_path: {dispaSET_unleash_folder_path}")
print (f"power_plants_raw_data_folder_path: {power_plants_raw_data_folder_path}")
print (f"zone_names: {zone_names}")
print (f"created_zones: {created_zones}")
print (f"power_plants_raw_data_sources_file_path: {power_plants_raw_data_sources_file_path}")
print (f"power_plants_raw_data_sources_file_name: {power_plants_raw_data_sources_file_name}")
print (f"data_year: {data_year}")
print (f"download_links_zone_related: {download_links_zone_related}")

dispaSET_unleash_folder_name: Dispa-SET_Unleash
dispaSET_unleash_folder_path: /home/ray/Dispa-SET_Unleash
power_plants_raw_data_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/
zone_names: ['DE', 'DK', 'CH', 'BE']
created_zones: {'DE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE', 'DK': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK', 'CH': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH', 'BE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE'}
power_plants_raw_data_sources_file_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240406_184203/2020_power_plants_raw_data_sources_20240406_184203.csv
power_plants_raw_data_sources_file_name: 2020_power_plants_raw_data_sources_20240406_184203.csv
data_year: 2020
download_links_zone_related: ['General', 'DE', 'DK', 'CH', 'BE']


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    5. Power Plants Raw Data Download Files
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Using the download list given previously to download and save all the power units raw data files inside a folder called as is it specified in the variable power_plants raw_data_sources _file_path.
    <br>
    All the downloaded files are named under the ordering of the download_links list.
    <br>
    Additionally, as some files use different kind of column delimiters e.g. ","; ";"; "/"; etc. the code verifies and changes this to be acceptable for the next steps 
</div>

In [11]:
def detect_delimiter(file_path):
    """
    Detects the delimiter used in a CSV file.

    Parameters:
    - file_path: The path to the CSV file.

    Returns:
    - The detected delimiter character.
    """
    # Common delimiters to test
    delimiters = [',', ';', '\t']

    with open(file_path, 'r', newline='') as file:
        # Read the first few lines of the file to detect the delimiter
        sample_data = file.read(1024)
        for delimiter in delimiters:
            if delimiter in sample_data:
                return delimiter

    # Default to comma if no delimiter is detected
    return ','

def download_files_from_csv(csv_file_path):
    # Create a folder to save downloaded files
    download_folder = os.path.dirname(csv_file_path)
    
    # Open and read the CSV file
    with open(csv_file_path, 'r', newline='') as csvfile:
        reader = csv.DictReader(csvfile)
        
        # Iterate over each row
        for row in reader:
            download_link = row['Download_Link_Sources']
            file_name = row['File_Name']
            
            # Download the file from the URL
            response = requests.get(download_link)
            
            # Check if the request was successful
            if response.status_code == 200:
                # Save the downloaded file
                file_path = os.path.join(download_folder, file_name)
                with open(file_path, 'wb') as f:
                    f.write(response.content)
                print(f"File '{file_name}' downloaded and saved successfully.")
                
                # Detect delimiter
                delimiter = detect_delimiter(file_path)
                print(f"Detected delimiter for file '{file_name}': '{delimiter}'")
                
                # Change delimiter to comma (,) using pandas
                df = pd.read_csv(file_path, delimiter=delimiter)
                df.to_csv(file_path, index=False)
                print(f"Delimiter changed to comma (,) for file '{file_name}'.")
            else:
                print(f"Failed to download file from '{download_link}'.")

# Path to the recently created CSV file
recently_created_csv_file_path = power_plants_raw_data_sources_file_path

# Call the function to download files from the CSV
download_files_from_csv(recently_created_csv_file_path)

File '1' downloaded and saved successfully.
Detected delimiter for file '1': ','
Delimiter changed to comma (,) for file '1'.
File '2' downloaded and saved successfully.
Detected delimiter for file '2': ','


  df = pd.read_csv(file_path, delimiter=delimiter)


Delimiter changed to comma (,) for file '2'.
File '3' downloaded and saved successfully.
Detected delimiter for file '3': ','
Delimiter changed to comma (,) for file '3'.
File '4' downloaded and saved successfully.
Detected delimiter for file '4': ','
Delimiter changed to comma (,) for file '4'.
File '5' downloaded and saved successfully.
Detected delimiter for file '5': ';'
Delimiter changed to comma (,) for file '5'.


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    6. Zone Classification
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Filtering the data contained in each downloaded file accordingly the zone previously specified. 
</div>

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    6.1. Zone Definition
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Adding a new column under the name "Country" to each downloaded file which have been relationated with a key Zone in the list download_links_zone_related fulfilled previously
    <br>
    For all those files that have been set with the key "General", it is assumed that so file contains data from various zones, so It will be filtered in a different way.
</div>

In [12]:
# Read the CSV file specified in power_plants_raw_data_sources_file_path
df_sources = pd.read_csv(power_plants_raw_data_sources_file_path)

# Iterate over each row in the DataFrame
for index, row in df_sources.iterrows():
    file_name = str(row['File_Name'])  # Convert to string
    zone = row['Zone']
    
    # Check if the zone is not "General"
    if zone != "General":
        # Construct the path to the corresponding CSV file
        csv_file_path = os.path.join(os.path.dirname(power_plants_raw_data_sources_file_path), file_name)
        
        # Check if the CSV file exists
        if os.path.exists(csv_file_path):
            # Read the CSV file
            df_csv = pd.read_csv(csv_file_path)
            
            # Add a new column "Country" with the value from the "Zone" column
            df_csv['Country'] = zone
            
            # Write the updated DataFrame back to the CSV file
            df_csv.to_csv(csv_file_path, index=False)
            
            print(f"Added 'Country' column to {file_name} with value '{zone}'")
        else:
            print(f"CSV file {file_name} does not exist.")
    else:
        print(f"No action needed for {file_name}")


No action needed for 1


  df_csv = pd.read_csv(csv_file_path)


Added 'Country' column to 2 with value 'DE'
Added 'Country' column to 3 with value 'DK'
Added 'Country' column to 4 with value 'CH'
Added 'Country' column to 5 with value 'BE'


<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [13]:
print (f"dispaSET_unleash_folder_name: {dispaSET_unleash_folder_name}")
print (f"dispaSET_unleash_folder_path: {dispaSET_unleash_folder_path}")
print (f"power_plants_raw_data_folder_path: {power_plants_raw_data_folder_path}")
print (f"zone_names: {zone_names}")
print (f"created_zones: {created_zones}")
print (f"power_plants_raw_data_sources_file_path: {power_plants_raw_data_sources_file_path}")
print (f"power_plants_raw_data_sources_file_name: {power_plants_raw_data_sources_file_name}")
print (f"data_year: {data_year}")
print (f"download_links_zone_related: {download_links_zone_related}")

dispaSET_unleash_folder_name: Dispa-SET_Unleash
dispaSET_unleash_folder_path: /home/ray/Dispa-SET_Unleash
power_plants_raw_data_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/
zone_names: ['DE', 'DK', 'CH', 'BE']
created_zones: {'DE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE', 'DK': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK', 'CH': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH', 'BE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE'}
power_plants_raw_data_sources_file_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240406_184203/2020_power_plants_raw_data_sources_20240406_184203.csv
power_plants_raw_data_sources_file_name: 2020_power_plants_raw_data_sources_20240406_184203.csv
data_year: 2020
download_links_zone_related: ['General', 'DE', 'DK', 'CH', 'BE']


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    6.2. Raw Data File Zone Classification
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Moving each downloaded file to its corresponding folder zone accordinlgy the download_links_zone_related list.
    <br>
    The files relationed to the key "General" just will keep their current location.
</div>

In [14]:
# Read the raw data sources file
df_sources = pd.read_csv(power_plants_raw_data_sources_file_path)

# Create a new column Final_File_Path
df_sources['Final_File_Path'] = ''

# Iterate over each row in the DataFrame
for index, row in df_sources.iterrows():
    file_name = str(row['File_Name'])
    file_path = os.path.join(os.path.dirname(power_plants_raw_data_sources_file_path), file_name)
    
    # Check if the file exists
    if os.path.exists(file_path):
        # Open and read the file
        df_csv = pd.read_csv(file_path)
        
        # Check if the file has the header 'Country'
        if 'Country' in df_csv.columns:
            # Get the corresponding value of 'Zone'
            zone_value = row['Zone']
            
            # Check if the zone folder exists
            if zone_value in zone_names:
                # Construct the destination folder path
                destination_folder = os.path.join(power_plants_raw_data_folder_path, zone_value)
                
                # Move the file to the destination folder
                destination_file_path = os.path.join(destination_folder, file_name)
                move(file_path, destination_file_path)
                print(f"Moved file '{file_name}' to '{destination_folder}'")
                
                # Get the current path of the moved file
                final_file_path = os.path.abspath(destination_file_path)
                df_sources.at[index, 'Final_File_Path'] = final_file_path
            else:
                print(f"Destination folder for zone '{zone_value}' does not exist.")
        else:
            print(f"No 'Country' header found in file '{file_name}'. No action needed.")
    else:
        print(f"File '{file_name}' does not exist.")
        
    # If file was not moved, update Final_File_Path with current path
    if not df_sources.at[index, 'Final_File_Path']:
        df_sources.at[index, 'Final_File_Path'] = os.path.abspath(file_path)

# Save the DataFrame back to the CSV file
df_sources.to_csv(power_plants_raw_data_sources_file_path, index=False)

No 'Country' header found in file '1'. No action needed.


  df_csv = pd.read_csv(file_path)


Moved file '2' to '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE'
Moved file '3' to '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK'
Moved file '4' to '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH'
Moved file '5' to '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE'


<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Extracting a list of all the path of the power plants raw data files.
    <br>
    This list is gonna be used as reference for future filtration stepts.
</div>

In [15]:
# Read the CSV file
df = pd.read_csv(power_plants_raw_data_sources_file_path)

# Create the list power_plants_raw_data_file_list
power_plants_raw_data_file_list = []

# Iterate over each row in the DataFrame
for index, row in df.iterrows():
    # Convert integer values to string for File_Name and Final_File_Path
    file_name = str(row['File_Name'])
    final_file_path = str(row['Final_File_Path'])
    
    # Concatenate File_Name and Final_File_Path and append to the list
    file_path = os.path.join(file_name, final_file_path)
    power_plants_raw_data_file_list.append(file_path)

# Print the list
print(power_plants_raw_data_file_list)

['/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240406_184203/1', '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2', '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK/3', '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH/4', '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE/5']


In [16]:
# Iterate over each row in the DataFrame
for index, row in df.iterrows():
    # Extract the folder path
    folder_path = os.path.dirname(row['Final_File_Path'])
    
    # Update the DataFrame with the folder path
    df.at[index, 'Folder_Path'] = folder_path

# Save the DataFrame back to the CSV file
df.to_csv(power_plants_raw_data_sources_file_path, index=False)


<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [17]:
print (f"dispaSET_unleash_folder_name: {dispaSET_unleash_folder_name}")
print (f"dispaSET_unleash_folder_path: {dispaSET_unleash_folder_path}")
print (f"power_plants_raw_data_folder_path: {power_plants_raw_data_folder_path}")
print (f"zone_names: {zone_names}")
print (f"created_zones: {created_zones}")
print (f"power_plants_raw_data_sources_file_path: {power_plants_raw_data_sources_file_path}")
print (f"power_plants_raw_data_sources_file_name: {power_plants_raw_data_sources_file_name}")
print (f"data_year: {data_year}")
print (f"download_links_zone_related: {download_links_zone_related}")
print (f"power_plants_raw_data_file_list: {power_plants_raw_data_file_list}")

dispaSET_unleash_folder_name: Dispa-SET_Unleash
dispaSET_unleash_folder_path: /home/ray/Dispa-SET_Unleash
power_plants_raw_data_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/
zone_names: ['DE', 'DK', 'CH', 'BE']
created_zones: {'DE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE', 'DK': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK', 'CH': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH', 'BE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE'}
power_plants_raw_data_sources_file_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240406_184203/2020_power_plants_raw_data_sources_20240406_184203.csv
power_plants_raw_data_sources_file_name: 2020_power_plants_raw_data_sources_20240406_184203.csv
data_year: 2020
download_links_zone_related: ['General', 'DE', 'DK', 'CH', 'BE']
power_plants_raw_data_file_list: ['/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240406_184203/1', '/home/

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 18px; font-family: TimesNewRoman;">
    7. Data Formating
</div>

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    7.1. Clean Data File Creation
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Creating for each zone an empty csv file with all the technical features needed for Dispa-SET simulations as headers.
    <br>
    This file will be named under the value of the variable data year previously specified
    <br>
    On this csv file all the filtered data in the following steps will be written.
</div>

In [18]:
# Read the CSV file
df = pd.read_csv(power_plants_raw_data_sources_file_path)

# Iterate over each row in the DataFrame
for index, row in df.iterrows():
    # Get the folder path and file name
    folder_path = row['Folder_Path']
    file_name = f"{data_year}.csv"
    
    # Check if the file already exists in the folder
    file_path = os.path.join(folder_path, file_name)
    if os.path.exists(file_path):
        print(f"File '{file_name}' already exists in '{folder_path}'.")
        # Update the DataFrame with the existing file name and path
        df.at[index, 'Clean_File_Name'] = file_name
        df.at[index, 'Final_Clean_File_Path'] = file_path
    else:
        # Create the new CSV file with the specified headers
        headers = ["", "Unit", "PowerCapacity", "Nunits", "Zone", "Zone_th", "Zone_h2", "Technology", "Fuel", "Efficiency",
                   "MinUpTime", "MinDownTime", "RampUpRate", "RampDownRate", "StartUpCost", "NoLoadCost_pu", "RampingCost",
                   "PartLoadMin", "MinEfficiency", "StartUpTime", "CO2Intensity", "CHPType", "CHPPowerToHeat",
                   "CHPPowerLossFactor", "CHPMaxHeat", "COP", "Tnominal", "coef_COP_a", "coef_COP_b", "STOCapacity",
                   "STOSelfDischarge", "STOMaxChargingPower", "STOChargingEfficiency", "WaterWithdrawal", "WaterConsumption", "Status", "Source",
                   "Company", "Lat", "Lon"]
        with open(file_path, 'w') as f:
            f.write(','.join(headers))
        print(f"Created file '{file_name}' in '{folder_path}'.")
        # Update the DataFrame with the new file name and path
        df.at[index, 'Clean_File_Name'] = file_name
        df.at[index, 'Final_Clean_File_Path'] = file_path

# Create the list of clean data file paths
power_plants_clean_data_file_list = df['Final_Clean_File_Path'].tolist()

# Remove the header from the list
power_plants_clean_data_file_list.pop(0)

# Save the DataFrame back to the CSV file
df.to_csv(power_plants_raw_data_sources_file_path, index=False)

Created file '2020.csv' in '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240406_184203'.
Created file '2020.csv' in '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE'.
Created file '2020.csv' in '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK'.
Created file '2020.csv' in '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH'.
Created file '2020.csv' in '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE'.


In [19]:
# Read the CSV file
df = pd.read_csv(power_plants_raw_data_sources_file_path)

# Create the list power_plants_clean_data_file_list
power_plants_clean_data_file_list = []

# Iterate over each row in the DataFrame
for index, row in df.iterrows():
    # Convert integer values to string for File_Name and Final_Clean_File_Path
    file_name = str(row['File_Name'])
    final_file_path = str(row['Final_Clean_File_Path'])
    
    # Concatenate File_Name and Final_Clean_File_Path and append to the list
    file_path = os.path.join(file_name, final_file_path)
    power_plants_clean_data_file_list.append(file_path)

# Print the list
print(power_plants_clean_data_file_list)

['/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240406_184203/2020.csv', '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020.csv', '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK/2020.csv', '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH/2020.csv', '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE/2020.csv']


<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [20]:
print (f"dispaSET_unleash_folder_name: {dispaSET_unleash_folder_name}")
print (f"dispaSET_unleash_folder_path: {dispaSET_unleash_folder_path}")
print (f"power_plants_raw_data_folder_path: {power_plants_raw_data_folder_path}")
print (f"zone_names: {zone_names}")
print (f"created_zones: {created_zones}")
print (f"power_plants_raw_data_sources_file_path: {power_plants_raw_data_sources_file_path}")
print (f"power_plants_raw_data_sources_file_name: {power_plants_raw_data_sources_file_name}")
print (f"data_year: {data_year}")
print (f"download_links_zone_related: {download_links_zone_related}")
print (f"power_plants_raw_data_file_list: {power_plants_raw_data_file_list}")
print (f"power_plants_clean_data_file_list: {power_plants_clean_data_file_list}")

dispaSET_unleash_folder_name: Dispa-SET_Unleash
dispaSET_unleash_folder_path: /home/ray/Dispa-SET_Unleash
power_plants_raw_data_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/
zone_names: ['DE', 'DK', 'CH', 'BE']
created_zones: {'DE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE', 'DK': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK', 'CH': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH', 'BE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE'}
power_plants_raw_data_sources_file_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240406_184203/2020_power_plants_raw_data_sources_20240406_184203.csv
power_plants_raw_data_sources_file_name: 2020_power_plants_raw_data_sources_20240406_184203.csv
data_year: 2020
download_links_zone_related: ['General', 'DE', 'DK', 'CH', 'BE']
power_plants_raw_data_file_list: ['/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240406_184203/1', '/home/

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    7.2. Dictionary Files
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Getting the current paths of all the dictionary files that are going to be used as reference for the next filtering data process.
        <br>
    The dictionady files are .csv files that contain a small data base of al the technical denomination ussually used for the power units, e.g. 'PowerCapacity', 'Energy Source", 'Technology' etc.
</div>

In [21]:
# Given variables
dispaSET_unleash_folder_path = "/home/ray/Dispa-SET_Unleash"

# List of file names
file_names = [
    "power_plants_clean_data_equivalent_headers.csv",
    "power_plants_all_data_equivalent_technologies.csv",
    "power_plants_all_data_equivalent_fuels.csv",
    "power_plants_all_data_equivalent_CHPTypes.csv",
    "EU_Power_Units_Technical_Features.csv"
]

# Dictionary to store file paths
file_paths = {}

# Construct file paths
for file_name in file_names:
    # Construct the file path
    file_path = os.path.join(dispaSET_unleash_folder_path, "RawData", "PowerPlants", file_name)
    
    # Store the file path in the dictionary
    file_paths[file_name] = file_path

# Create variables for each file path
for file_name, file_path in file_paths.items():
    # Generate variable name with the file name and extension
    variable_name = f"{file_name.split('.')[0]}_file_path"
    
    # Set the variable in the global scope
    globals()[variable_name] = file_path

# Print the variables
for file_name, file_path in file_paths.items():
    # Generate variable name with the file name and extension
    variable_name = f"{file_name.split('.')[0]}_file_path"
    print(f"{variable_name}: {globals()[variable_name]}")

power_plants_clean_data_equivalent_headers_file_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_clean_data_equivalent_headers.csv
power_plants_all_data_equivalent_technologies_file_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_all_data_equivalent_technologies.csv
power_plants_all_data_equivalent_fuels_file_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_all_data_equivalent_fuels.csv
power_plants_all_data_equivalent_CHPTypes_file_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/power_plants_all_data_equivalent_CHPTypes.csv
EU_Power_Units_Technical_Features_file_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/EU_Power_Units_Technical_Features.csv


<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [22]:
print (f"dispaSET_unleash_folder_name: {dispaSET_unleash_folder_name}")
print (f"dispaSET_unleash_folder_path: {dispaSET_unleash_folder_path}")
print (f"power_plants_raw_data_folder_path: {power_plants_raw_data_folder_path}")
print (f"zone_names: {zone_names}")
print (f"created_zones: {created_zones}")
print (f"power_plants_raw_data_sources_file_path: {power_plants_raw_data_sources_file_path}")
print (f"power_plants_raw_data_sources_file_name: {power_plants_raw_data_sources_file_name}")
print (f"data_year: {data_year}")
print (f"download_links_zone_related: {download_links_zone_related}")
print (f"power_plants_raw_data_file_list: {power_plants_raw_data_file_list}")
print (f"power_plants_clean_data_file_list: {power_plants_clean_data_file_list}")
print (f"power_plants_clean_data_equivalent_headers_file_path: {power_plants_clean_data_equivalent_headers_file_path}")
print (f"power_plants_all_data_equivalent_technologies_file_path: {power_plants_all_data_equivalent_technologies_file_path}")
print (f"power_plants_all_data_equivalent_fuels_file_path: {power_plants_all_data_equivalent_fuels_file_path}")
print (f"power_plants_all_data_equivalent_CHPTypes_file_path: {power_plants_all_data_equivalent_CHPTypes_file_path}")
print (f"EU_Power_Units_Technical_Features_file_path: {EU_Power_Units_Technical_Features_file_path}")

dispaSET_unleash_folder_name: Dispa-SET_Unleash
dispaSET_unleash_folder_path: /home/ray/Dispa-SET_Unleash
power_plants_raw_data_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/
zone_names: ['DE', 'DK', 'CH', 'BE']
created_zones: {'DE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE', 'DK': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK', 'CH': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH', 'BE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE'}
power_plants_raw_data_sources_file_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240406_184203/2020_power_plants_raw_data_sources_20240406_184203.csv
power_plants_raw_data_sources_file_name: 2020_power_plants_raw_data_sources_20240406_184203.csv
data_year: 2020
download_links_zone_related: ['General', 'DE', 'DK', 'CH', 'BE']
power_plants_raw_data_file_list: ['/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240406_184203/1', '/home/

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    7.3. Raw Data Files Zone Classification
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Copying all the needed columns from the raw data files to the clean data files in base to the order given by the lists power_plants_raw_data_file_list and power_plants_clean_data_file_list.
        <br>
    This processes is made to each zone calling the function copy_columns_to_clean_data.py
</div>

<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [23]:
power_plants_raw_data_file_list

['/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240406_184203/1',
 '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2',
 '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK/3',
 '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH/4',
 '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE/5']

In [24]:
power_plants_clean_data_file_list

['/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240406_184203/2020.csv',
 '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020.csv',
 '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK/2020.csv',
 '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH/2020.csv',
 '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE/2020.csv']

In [25]:
from copy_columns_to_clean_data import copy_columns_to_clean_data
# Iterate over each pair of files
for raw_data_file, clean_data_file in zip(power_plants_raw_data_file_list, power_plants_clean_data_file_list):
    # Call the function with appropriate arguments
    copy_columns_to_clean_data(raw_data_file, clean_data_file, power_plants_clean_data_equivalent_headers_file_path)

Columns copied successfully.
Columns copied successfully.
Columns copied successfully.
Columns copied successfully.
Columns copied successfully.


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    7.4. Units Name Fulfill
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Verifying the "Unit" field in all the rows of each cleand data file.
        <br>
     If there is no value, the corresponding latitud and longitud value is going to be used as identifier of the corresponding unit. 
        <br>
    If there is no value in the latitud and longitud fields the corresponding "Company" data is going to be taken.
    <br>
    Finally if there is not any company related data, the "Unit" field is going to be performed form the zone name plus an increasing numner.
        <br>
    This processes is made to each zone calling the function units_name_filfilling.py
</div>

<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [26]:
power_plants_clean_data_file_list

['/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240406_184203/2020.csv',
 '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020.csv',
 '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK/2020.csv',
 '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH/2020.csv',
 '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE/2020.csv']

In [27]:
from Data_Processing_Functions import fill_unit_field

# List of file paths
#power_plants_clean_data_file_list = ['path_to_file1.csv', 'path_to_file2.csv', 'path_to_file3.csv']

# Iterate over each file path
for file_path in power_plants_clean_data_file_list:
    fill_unit_field(file_path)

Unit field filled successfully.


  df.at[index, 'Unit'] = unit_value


Unit field filled successfully.


  df.at[index, 'Unit'] = unit_value


Unit field filled successfully.


  df.at[index, 'Unit'] = unit_value


Unit field filled successfully.
Unit field filled successfully.


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    7.5. Nunits Fulfill
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Verifying the "Nunit" field in all the rows of each cleand data file.
        <br>
     If there is no value, the field is gonna be filled by 1.
        <br>
    This processes is made for each zone calling the function fill_nunits_column.py
</div>

<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [28]:
power_plants_clean_data_file_list

['/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240406_184203/2020.csv',
 '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020.csv',
 '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK/2020.csv',
 '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH/2020.csv',
 '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE/2020.csv']

In [29]:
from Data_Processing_Functions import fill_nunits_column

# List of file paths
#clean_data_file_paths = ["path/to/your/clean_data1.csv", "path/to/your/clean_data2.csv", ...]

# Apply the function to each file
for file_path in power_plants_clean_data_file_list:
    fill_nunits_column(file_path)

Nunits column processed successfully.
Nunits column processed successfully.
Nunits column processed successfully.
Nunits column processed successfully.
Nunits column processed successfully.


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 17px; font-family: TimesNewRoman;">
    7.6. Zone Filter
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Verifying the "Zone" field in all the rows of each cleand data file.
        <br>
     If there is no value, the field is gonna be took out from the file.
    <br>
     Additionally all the eliminated rows are going to be storage in a csv file called power_plants_all_data_not_defined_units.
        <br>
    This processes is made for each zone calling the function units_name_filter.py
</div>

<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [30]:
print (f"dispaSET_unleash_folder_name: {dispaSET_unleash_folder_name}")
print (f"dispaSET_unleash_folder_path: {dispaSET_unleash_folder_path}")
print (f"power_plants_raw_data_folder_path: {power_plants_raw_data_folder_path}")
print (f"zone_names: {zone_names}")
print (f"created_zones: {created_zones}")
print (f"power_plants_raw_data_sources_file_path: {power_plants_raw_data_sources_file_path}")
print (f"power_plants_raw_data_sources_file_name: {power_plants_raw_data_sources_file_name}")
print (f"data_year: {data_year}")
print (f"download_links_zone_related: {download_links_zone_related}")
print (f"power_plants_raw_data_file_list: {power_plants_raw_data_file_list}")
print (f"power_plants_clean_data_file_list: {power_plants_clean_data_file_list}")
print (f"power_plants_clean_data_equivalent_headers_file_path: {power_plants_clean_data_equivalent_headers_file_path}")
print (f"power_plants_all_data_equivalent_technologies_file_path: {power_plants_all_data_equivalent_technologies_file_path}")
print (f"power_plants_all_data_equivalent_fuels_file_path: {power_plants_all_data_equivalent_fuels_file_path}")
print (f"power_plants_all_data_equivalent_CHPTypes_file_path: {power_plants_all_data_equivalent_CHPTypes_file_path}")
print (f"EU_Power_Units_Technical_Features_file_path: {EU_Power_Units_Technical_Features_file_path}")

dispaSET_unleash_folder_name: Dispa-SET_Unleash
dispaSET_unleash_folder_path: /home/ray/Dispa-SET_Unleash
power_plants_raw_data_folder_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/
zone_names: ['DE', 'DK', 'CH', 'BE']
created_zones: {'DE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE', 'DK': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK', 'CH': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH', 'BE': '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE'}
power_plants_raw_data_sources_file_path: /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240406_184203/2020_power_plants_raw_data_sources_20240406_184203.csv
power_plants_raw_data_sources_file_name: 2020_power_plants_raw_data_sources_20240406_184203.csv
data_year: 2020
download_links_zone_related: ['General', 'DE', 'DK', 'CH', 'BE']
power_plants_raw_data_file_list: ['/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240406_184203/1', '/home/

<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 16px; font-family: TimesNewRoman;">
    7.6.1. Not Defined Units File Creation
</div>

<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Creating a .csv file for each zone where all those units that has some missing data in one of their field are going to be written such as all the units that are took out from the final clean data file for some other reason.

</div>

In [31]:
# Function to create the CSV file with specified headers
def create_csv_file(file_path, headers):
    with open(file_path, 'w', newline='') as csvfile:
        writer = csv.writer(csvfile)
        writer.writerow(headers)

# Define the headers for the CSV file
headers = ["", "Unit", "PowerCapacity", "Nunits", "Zone", "Zone_th", "Zone_h2", "Technology", "Fuel", "Efficiency",
           "MinUpTime", "MinDownTime", "RampUpRate", "RampDownRate", "StartUpCost", "NoLoadCost_pu", "RampingCost",
           "PartLoadMin", "MinEfficiency", "StartUpTime", "CO2Intensity", "CHPType", "CHPPowerToHeat",
           "CHPPowerLossFactor", "CHPMaxHeat", "COP", "Tnominal", "coef_COP_a", "coef_COP_b", "STOCapacity",
           "STOSelfDischarge", "STOMaxChargingPower", "STOChargingEfficiency", "WaterWithdrawal", "WaterConsumption", "Status", "Source",
           "Company", "Lat", "Lon"]

# List to store the paths of created CSV files
power_plants_all_data_not_defined_units_file_list = []

# Read the CSV file and create CSV files in each folder path
with open(power_plants_raw_data_sources_file_path, 'r') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        folder_path = row['Folder_Path']
        file_name = 'power_plants_all_data_not_defined_units.csv'
        file_path = os.path.join(folder_path, file_name)
        create_csv_file(file_path, headers)
        print(f"CSV file created at {file_path}")
        power_plants_all_data_not_defined_units_file_list.append(file_path)

print("All CSV files created successfully.")

CSV file created at /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240406_184203/power_plants_all_data_not_defined_units.csv
CSV file created at /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/power_plants_all_data_not_defined_units.csv
CSV file created at /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK/power_plants_all_data_not_defined_units.csv
CSV file created at /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH/power_plants_all_data_not_defined_units.csv
CSV file created at /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE/power_plants_all_data_not_defined_units.csv
All CSV files created successfully.


<div style="text-align: left; margin-left: 3.0em; font-weight: bold; font-size: 16px; font-family: TimesNewRoman;">
    7.6.2. No Zone Data Elimination
</div>
<div style="text-align: left; margin-left: 0.0em; font-weight: unbold; font-size: 16px; font-family: TimesNewRoman;">
    Eliminating all the data that does not have a zone, country, region etc. in its Zone field.
</div>

<div style="text-align: right; margin-left: 3.0em; font-weight: unbold; font-size: 14px; font-family: TimesNewRoman;">
    Tracking Variables. 
    <br>
    <div style="text-align: right; margin-left: 1.50em; font-weight: unbold; font-size: 13px; font-family: TimesNewRoman;">
    This cells are just to confirm all the file names, file paths and other information related to the data being processed.
    <br>
  Also are used to ensure the inputs for next cells in order to avoid to re-enter the same information each time.
</div>

In [32]:
power_plants_all_data_not_defined_units_file_list

['/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240406_184203/power_plants_all_data_not_defined_units.csv',
 '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/power_plants_all_data_not_defined_units.csv',
 '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK/power_plants_all_data_not_defined_units.csv',
 '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH/power_plants_all_data_not_defined_units.csv',
 '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE/power_plants_all_data_not_defined_units.csv']

In [33]:
power_plants_clean_data_file_list

['/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240406_184203/2020.csv',
 '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020.csv',
 '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK/2020.csv',
 '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH/2020.csv',
 '/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE/2020.csv']

In [34]:
from Data_Processing_Functions import move_rows_with_empty_zone

# Iterate over each pair of files
for clean_data_file, all_data_file in zip(power_plants_clean_data_file_list, power_plants_all_data_not_defined_units_file_list):
    # Call the function with appropriate arguments
    move_rows_with_empty_zone([clean_data_file], [all_data_file])

Processed /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/2020_power_plants_raw_data_sources_20240406_184203/2020.csv
Processed /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020.csv
Processed /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DK/2020.csv
Processed /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/CH/2020.csv
Processed /home/ray/Dispa-SET_Unleash/RawData/PowerPlants/BE/2020.csv


In [35]:
verification = pd.read_csv('/home/ray/Dispa-SET_Unleash/RawData/PowerPlants/DE/2020.csv')


In [36]:
verification

Unnamed: 0.1,Unnamed: 0,Unit,PowerCapacity,Nunits,Zone,Zone_th,Zone_h2,Technology,Fuel,Efficiency,...,STOSelfDischarge,STOMaxChargingPower,STOChargingEfficiency,WaterWithdrawal,WaterConsumption,Status,Source,Company,Lat,Lon
0,,48.29242877-8.578018405-DE,0.080000,1.0,DE,,,Sewage and landfill gas,Bioenergy,,...,,,,,,,,,48.292429,8.578018
1,,48.96399433-10.88156233-DE,0.015000,1.0,DE,,,Run-of-river,Hydro,,...,,,,,,,,,48.963994,10.881562
2,,47.84613128-9.664272955-DE,0.055000,1.0,DE,,,Run-of-river,Hydro,,...,,,,,,,,,47.846131,9.664273
3,,47.84613128-9.664272955-DE,0.120000,1.0,DE,,,Run-of-river,Hydro,,...,,,,,,,,,47.846131,9.664273
4,,49.35553908-8.775570795-DE,0.044000,1.0,DE,,,Run-of-river,Hydro,,...,,,,,,,,,49.355539,8.775571
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1768740,,48.99530642-9.097915699-DE,0.004640,1.0,DE,,,Photovoltaics,Solar,,...,,,,,,,,,48.995306,9.097916
1768741,,50.05308711-7.728759356-DE,0.006360,1.0,DE,,,Photovoltaics,Solar,,...,,,,,,,,,50.053087,7.728759
1768742,,49.14289696-12.40827154-DE,0.008400,1.0,DE,,,Photovoltaics,Solar,,...,,,,,,,,,49.142897,12.408272
1768743,,48.96359435-12.05432011-DE,0.749835,1.0,DE,,,Photovoltaics ground,Solar,,...,,,,,,,,,48.963594,12.054320


In [39]:
missing_values = verification['Zone'].isnull()

In [40]:
missing_values

0          False
1          False
2          False
3          False
4          False
           ...  
1768740    False
1768741    False
1768742    False
1768743    False
1768744    False
Name: Zone, Length: 1768745, dtype: bool

3.19. PartLoadMin Field

3.20. MinEfficiency Field

3.21. StartUpTime Field

3.22. CO2Intensity Field

3.23. COP Field

3.24. RampUpRate Field

3.25. Tnominal Field

3.26. coef_COP_a Field

3.27. coef_COP_b Field

3.28. STOCapacity Field

3.29. STOSelfDischarge Field

3.30. STOMaxChargingPower Field

3.31. STOChargingEfficiency Field

3.32. WaterWithdrawal Field

3.33. WaterConsumption Field