# Harnessing Weather Insights for Accurate Energy Load Forecasting

by Florian Schulze, Raffaela Länger, Johanna Kronfuß and Julian Janisch

------------

# Downloading the Data

In this project, we download data from two different websites to predict energy load based on weather data. Since frequent API requests can be inefficient and may be subject to usage limits or costs, we store the downloaded data locally. This allows us to reuse the data for analysis and model optimization without overloading the API, ensuring a more efficient and sustainable workflow.

In [1]:
# This command installs all required dependencies listed in requirements.txt
# It ensures that all necessary libraries are available for the project
%pip install -r requirements.txt

Collecting certifi==2024.12.14
  Downloading certifi-2024.12.14-py3-none-any.whl (164 kB)
[K     |████████████████████████████████| 164 kB 18.0 MB/s eta 0:00:01
[?25hCollecting charset-normalizer==3.4.1
  Downloading charset_normalizer-3.4.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (144 kB)
[K     |████████████████████████████████| 144 kB 84.8 MB/s eta 0:00:01
[?25hCollecting datetime==5.5
  Downloading DateTime-5.5-py3-none-any.whl (52 kB)
[K     |████████████████████████████████| 52 kB 34.8 MB/s eta 0:00:01
[?25hCollecting findspark==2.0.1
  Downloading findspark-2.0.1-py2.py3-none-any.whl (4.4 kB)
Collecting idna==3.10
  Downloading idna-3.10-py3-none-any.whl (70 kB)
[K     |████████████████████████████████| 70 kB 55.6 MB/s eta 0:00:01
[?25hCollecting py4j==0.10.9.7
  Downloading py4j-0.10.9.7-py2.py3-none-any.whl (200 kB)
[K     |████████████████████████████████| 200 kB 61.3 MB/s eta 0:00:01
[?25hCollecting pyspark==3.5.4
  Downloading pyspark-3.5.4.tar.gz

In [5]:
# Imports the requests library for making HTTP requests to APIs or websites
import requests

# Imports the datetime module to handle and manipulate date and time data
import datetime  

# Imports the os module to interact with the operating system (e.g., file handling)
import os  

# Imports the time module to work with time-related functions (e.g., sleep, timestamps)
import time  

### API Settings

In this section, we define the API settings needed to retrieve data from two different sources: the Transparency Portal and GeoSphere. These settings include API keys, time periods, and relevant parameters required for data extraction. By specifying these settings in advance, we ensure consistency, reusability, and easy configuration when making API requests. This approach helps streamline the data collection process and reduces errors by keeping key parameters organized in one place.

#### Transparency Portal

This platform offers a wide range of energy-related data, including electricity consumption, grid balancing, and market prices across different bidding zones in Europe. We focus on retrieving energy load data for Austria (APG zone) for a specified period in order to analyze how the energy grid is managed and how consumption fluctuates. This data is crucial for understanding the energy demand patterns that we aim to predict based on weather conditions.

In [6]:
# API Settings for Transparency Portal
TRANSPARENCY_API_KEY = "9d0ebad5-08cf-4d6f-a752-744ba3707b70"  # API key for authentication
TRANSPARENCY_PERIOD_START = "2024-01-01"  # Start date for data retrieval
TRANSPARENCY_PERIOD_END = "2024-12-31"  # End date for data retrieval
TRANSPARENCY_BIDDING_ZONE = "10YAT-APG------L"  # Defines the bidding zone (Austria) for energy load data

#### GeoSphere

GeoSphere Austria provides comprehensive meteorological data collected from numerous weather stations across the country. This data includes parameters like temperature, humidity, wind speed, and other climate-related factors. By combining this weather data with the energy load information from the Transparency Portal, we can develop more accurate models for forecasting energy consumption, as weather conditions often have a significant impact on energy usage.

In [7]:
# API Settings for GeoSphere
GEOSPHERE_PERIOD_START = "2024-01-01"  # Start date for weather data retrieval
GEOSPHERE_PERIOD_END = "2024-12-31"  # End date for weather data retrieval
GEOSPHERE_STATIONS = "1, 105"  # Specifies the weather stations to collect data from

### Fetching and Saving Energy Load Data from Transparency Portal

This code retrieves energy load data from the Transparency Portal API. It queries data for each month within a specified date range and saves the response (in XML format) to a directory. The process ensures that the necessary API key is provided, handles API requests, and stores the data in a structured folder hierarchy (organized by year and month).

In [8]:
# Restful API Guide: https://documenter.getpostman.com/view/7009892/2s93JtP3F6

# Check if the Transparency API key is missing or set to the demo key
if TRANSPARENCY_API_KEY == "DEMO_KEY":
    print("Error: Missing Transparency API key.")  # If the key is missing, print an error message
else:
    api_url = "https://web-api.tp.entsoe.eu/api"  # Set the base URL for the API to make requests
    
    # Parse the start and end dates from the configuration into datetime objects
    start = datetime.datetime.strptime(TRANSPARENCY_PERIOD_START, "%Y-%m-%d")  # Convert the start date to a datetime object
    end = datetime.datetime.strptime(TRANSPARENCY_PERIOD_END, "%Y-%m-%d")  # Convert the end date to a datetime object
    
    # Start the loop to go through each month between the start and end dates
    current = start  # Initialize the current date as the start date
    while current <= end:  # Continue looping until the current date exceeds the end date
        print("Requesting data for " + current.strftime("%Y-%m"))  # Print which month's data is being requested
        
        # Calculate the first day of the next month to set the period end date for the current month
        next_month = (current.replace(day=1) + datetime.timedelta(days=32)).replace(day=1)  # Move to the next month
        period_end = min(next_month, end + datetime.timedelta(days=1))  # Ensure the period end doesn't exceed the overall end date
        
        # Define the parameters for the API request
        api_params = {
            "documentType": "A65",  # Specify the document type for system total load data
            "processType": "A16",  # Specify the process type for actual realized load data
            "outBiddingZone_Domain": TRANSPARENCY_BIDDING_ZONE,  # Set the bidding zone (e.g., Austria)
            "periodStart": current.strftime("%Y%m%d0000"),  # Format the start date as YYYYMMDD0000
            "periodEnd": period_end.strftime("%Y%m%d0000"),  # Format the end date as YYYYMMDD0000
            "securityToken": TRANSPARENCY_API_KEY  # Include the API key for authentication
        }
        
        # Send the GET request to the API with the defined parameters
        try:
            response = requests.get(api_url, params=api_params)  # Make the API request and get the response
            response.raise_for_status()  # Raise an exception for HTTP error responses
        except requests.exceptions.RequestException as e:
            print(f"Error making API request: {e}")  # Print error message if the request fails
            continue  # Skip to the next iteration if the request fails

        # Check if the request was successful (status code 200)
        if response.status_code == 200:  
            try:
                # Prepare to save the data to a file by creating necessary directories
                year_folder = current.strftime("%Y")  # Extract the year from the current date to create a folder for the year
                month_file = current.strftime("%m")  # Extract the month from the current date to name the file
                os.makedirs(os.path.dirname(f"./data/transparency/{year_folder}/"), exist_ok=True)  # Create the directory if it doesn't exist

                # Open the file in write mode and save the XML data returned by the API
                with open(f"./data/transparency/{year_folder}/{month_file}.xml", "w", encoding='utf-8') as file:
                    file.write(response.text)  # Write the XML response content to the file
                
                print(f"Data saved to ./data/transparency/{year_folder}/{month_file}.xml")  # Print a confirmation message with the file path
            except IOError as e:
                print(f"Error saving data to file: {e}")  # Print error message if there is an issue writing to the file
                
        else:
            print("Error: " + str(response.status_code))  # If the request failed, print the error status code
            print("Response: " + response.text)  # Print the detailed error message from the API response
        
        # Move to the next month by updating the current date
        current = next_month  # Update the current date to the next month's start date


Requesting data for 2024-01
Data saved to ./data/transparency/2024/01.xml
Requesting data for 2024-02
Data saved to ./data/transparency/2024/02.xml
Requesting data for 2024-03
Data saved to ./data/transparency/2024/03.xml
Requesting data for 2024-04
Data saved to ./data/transparency/2024/04.xml
Requesting data for 2024-05
Data saved to ./data/transparency/2024/05.xml
Requesting data for 2024-06
Data saved to ./data/transparency/2024/06.xml
Requesting data for 2024-07
Data saved to ./data/transparency/2024/07.xml
Requesting data for 2024-08
Data saved to ./data/transparency/2024/08.xml
Requesting data for 2024-09
Data saved to ./data/transparency/2024/09.xml
Requesting data for 2024-10
Data saved to ./data/transparency/2024/10.xml
Requesting data for 2024-11
Data saved to ./data/transparency/2024/11.xml
Requesting data for 2024-12
Data saved to ./data/transparency/2024/12.xml


### Fetching and Saving Weather Data form GeoSphere

In [None]:
# Check if the Transparency API key is missing or set to the demo key
if TRANSPARENCY_API_KEY == "DEMO_KEY":
    print("Error: Missing Transparency API key.")  # If the API key is set to the demo key, it's invalid. Print an error message
else:
    # Define the GeoSphere API base URL for historical weather data
    api_url = "https://dataset.api.hub.geosphere.at/v1/station/historical/"
    # Specify the dataset we want to use for historical weather data (klima-v2-1d represents daily climate data)
    api_dataset = "klima-v2-1d"
    
    # Convert the start and end date strings into datetime objects for easier date manipulation
    start = datetime.datetime.strptime(TRANSPARENCY_PERIOD_START, "%Y-%m-%d")  # Start date in YYYY-MM-DD format
    end = datetime.datetime.strptime(TRANSPARENCY_PERIOD_END, "%Y-%m-%d")  # End date in YYYY-MM-DD format
    
    # Set the current date to the start date for the loop
    current = start
    while current <= end:  # Loop through each month between the start and end dates
        # Calculate the first day of the next month using the current date, by adding 32 days and then setting the day to 1
        next_month = (current.replace(day=1) + datetime.timedelta(days=32)).replace(day=1)
        # Ensure the period end does not exceed the overall end date
        period_end = min(next_month, end + datetime.timedelta(days=1))  # Period end is the lesser of next month and the end date
        
        # Print out which month we are requesting data for (for debugging and progress tracking)
        print("Requesting data for " + current.strftime("%Y-%m"))
        
        # Set up the parameters for the API request
        api_params = {
            "parameters": "rr,tl_mittel,bewm_mittel,so_h,vv_mittel",  # Request specific weather parameters: rainfall (rr), temperature (tl_mittel), cloudiness (bewm_mittel), sunshine duration (so_h), and wind speed (vv_mittel)
            "start": current.strftime("%Y-%m-%d"),  # Format the start date as YYYY-MM-DD
            "end": period_end.strftime("%Y-%m-%d"),  # Format the end date as YYYY-MM-DD
            "station_ids": GEOSPHERE_STATIONS,  # Specify the station IDs for which we want data (can be one or multiple stations)
            "output_format": "csv",  # Set the output format to CSV for easy analysis and handling
        }
        
        try:
            # Send the GET request to the GeoSphere API with the defined parameters
            response = requests.get(api_url + api_dataset, params=api_params)
            # Check if the request was successful by raising an exception for any HTTP errors
            response.raise_for_status()
        except requests.exceptions.RequestException as e:
            # If an exception occurs during the API request, catch it and print an error message
            print(f"Error making API request: {e}")
            continue  # Skip the current iteration and move on to the next month
        
        if response.status_code == 200:  # If the response status code is 200 (success)
            # Print the number of remaining requests for the current hour (to track API rate limits)
            print("Remaining requests for this hour: " + response.headers["x-ratelimit-remaining-hour"])
            
            try:
                # Create the appropriate folder for storing the data, using the current year and month
                year_folder = current.strftime("%Y")  # Extract the year from the current date
                month_file = current.strftime("%m")  # Extract the month from the current date
                # Ensure the directory for saving the file exists. If it doesn't, create it
                os.makedirs(os.path.dirname(f"./data/geosphere/{year_folder}/"), exist_ok=True)
                
                # Open the file in write mode and save the response content (CSV data)
                with open(f"./data/geosphere/{year_folder}/{month_file}.csv", "w", encoding='utf-8') as file:
                    file.write(response.text)  # Write the CSV data to the file
                
                # Print a confirmation message that the data has been saved successfully
                print(f"Data saved to ./data/geosphere/{year_folder}/{month_file}.csv")
            except IOError as e:
                # If there is an error while saving the file, catch it and print an error message
                print(f"Error saving data to file: {e}")
            
        else:
            # If the response status code is not 200 (failure), print the error code and the response text for debugging
            print("Error: " + str(response.status_code))
            print("Response: " + response.text)
        
        # To avoid hitting the API rate limit (GeoSphere allows 5 requests per second), sleep for 0.2 seconds between requests
        time.sleep(0.2)
        
        # Move to the next month and continue the loop until we reach the end date
        current = next_month

Requesting data for 2024-01
Remaining requests for this hour: 239
Data saved to ./data/geosphere/2024/01.csv
Requesting data for 2024-02
Remaining requests for this hour: 238
Data saved to ./data/geosphere/2024/02.csv
Requesting data for 2024-03
Remaining requests for this hour: 237
Data saved to ./data/geosphere/2024/03.csv
Requesting data for 2024-04
Remaining requests for this hour: 236
Data saved to ./data/geosphere/2024/04.csv
Requesting data for 2024-05
Remaining requests for this hour: 235
Data saved to ./data/geosphere/2024/05.csv
Requesting data for 2024-06
Remaining requests for this hour: 234
Data saved to ./data/geosphere/2024/06.csv
Requesting data for 2024-07
Remaining requests for this hour: 233
Data saved to ./data/geosphere/2024/07.csv
Requesting data for 2024-08
Remaining requests for this hour: 232
Data saved to ./data/geosphere/2024/08.csv
Requesting data for 2024-09
Remaining requests for this hour: 231
Data saved to ./data/geosphere/2024/09.csv
Requesting data for