# Solar Energy Forecasting

## Introduction

This project focuses on analyzing historical solar irradiance and other features to train a model. With this model using weather forecasts one should be able to predict solar energy generation.
This data is retrieved from the [Photovoltaic Geographical Information System (PVGIS)](https://joint-research-centre.ec.europa.eu/photovoltaic-geographical-information-system-pvgis/pvgis-tools/hourly-radiation_en).

### Data Source (Historical solar radiation data)
The dataset is fetched using the PVGIS API and includes key parameters such as:
- **Global in-plane irradiance (G(i))**: Measured in W\m2
- **Air temperature (T2m)**: Measured in °C
- **Wind speed (WS10m)**: Measured in m/s at 10m
- **Location**: Latitude and Longitude chosen for Freiburg im Breisgau
- **Time**: Span of time for training data is the year 2020 with hourly data, i.e. 8760 entries, each entry representing 1 hour of the year (possible data to fetch from this homepage for years: 2005-2020)

### Project Goal
The objective is to preprocess, analyze, and build models that can predict solar energy generation for Baden-Württemberg based on weather forecasts.


In [8]:
import requests
import json
from io import StringIO
import pandas as pd
import pickle

# API URL
url = "https://re.jrc.ec.europa.eu/api/v5_2/seriescalc?lat=47.99&lon=7.84&startyear=2020&endyear=2020&outputformat=csv"

# Fetch data
response = requests.get(url)

print("response code: ", response.status_code)

# Check if the response is valid
if response.status_code == 200:
    try:
        # Split the response into lines
        lines = response.text.split("\n")

        #column_names = []
        header_metadata = []
        data_lines = []
        header_found = False

        footer_metadata = []
        data_cleaned = []

        for line in lines:
            if line.startswith("time"):
                header_found = True
                column_names = line.strip().split(",")
                data_lines.append(line)
                continue
            if not header_found:
                header_metadata.append(line)
            else:
                data_lines.append(line)

        for row in data_lines:
            if any(c.isalpha() for c in row) and not row.startswith("time"):
                footer_metadata.append(row)
            else:
                data_cleaned.append(row)

        print("Extracted column_names: ", column_names)

        csv_data = "\n".join(data_cleaned)
        df = pd.read_csv(StringIO(csv_data), names=column_names, header=0)

    except Exception as e:
        print("Error while parsing CSV:", str(e))
else:
    print(f"Error {response.status_code}: {response.text}")

df.drop(columns=["H_sun", "Int"], inplace=True)
print("Generated df:\n", df)

with open("../data/raw data/historical_solar_data.pkl", "wb") as file:
    pickle.dump(df, file)

response code:  200
Extracted column_names:  ['time', 'G(i)', 'H_sun', 'T2m', 'WS10m', 'Int']
Generated df:
                time  G(i)   T2m  WS10m
0     20200101:0010   0.0 -1.09   2.28
1     20200101:0110   0.0 -1.10   2.07
2     20200101:0210   0.0 -1.15   2.00
3     20200101:0310   0.0 -1.37   2.00
4     20200101:0410   0.0 -1.52   2.07
...             ...   ...   ...    ...
8779  20201231:1910   0.0 -0.78   2.07
8780  20201231:2010   0.0 -0.93   1.86
8781  20201231:2110   0.0 -1.06   1.66
8782  20201231:2210   0.0 -1.15   1.38
8783  20201231:2310   0.0 -1.17   1.17

[8784 rows x 4 columns]


In [3]:
df.describe()

Unnamed: 0,G(i),H_sun,T2m,WS10m
count,17544.0,17544.0,17544.0,17544.0
mean,142.052595,13.610766,9.125312,1.986063
std,229.332235,18.348125,7.746193,1.123643
min,0.0,0.0,-10.48,0.0
25%,0.0,0.0,2.92,1.24
50%,0.0,0.0,8.68,1.72
75%,193.0,23.9225,14.88,2.41
max,967.01,65.15,33.03,9.03
