# Solar Energy Forecasting Using PVGIS Data

## Introduction

This project focuses on analyzing historical solar irradiance and meteorological data to explore trends, correlations, and potential applications in renewable energy forecasting. The data is retrieved from the [Photovoltaic Geographical Information System (PVGIS)](https://joint-research-centre.ec.europa.eu/photovoltaic-geographical-information-system-pvgis/pvgis-tools/hourly-radiation_en), an open-source resource provided by the European Commission's Joint Research Centre.

### Data Source
The dataset is fetched using the PVGIS API and includes key parameters such as:
- **Global in-plane irradiance (G(i))**: Measured in W\m2
- **Sun height (H_sun)**: Measured in °
- **Air temperature (T2m)**: Measured in °C
- **Wind speed (WS10m)**: Measured in m/s at 10m

### Project Goal
The objective is to preprocess, analyze, and potentially build models that can predict solar energy availability based on historical weather conditions. This can be useful for solar power generation forecasting and energy management.

### First Code Cell: Fetching PVGIS Data
The first code cell retrieves hourly solar radiation and meteorological data for a location (Berlin, lat=52.52, lon=13.41) via an API request. It processes the response to extract relevant information and loads it into a structured DataFrame for further analysis.


In [1]:
import requests
import json
from io import StringIO
import pandas as pd
import pickle

# API URL
url = "https://re.jrc.ec.europa.eu/api/v5_2/seriescalc?lat=52.52&lon=13.41&startyear=2005&endyear=2006&outputformat=csv"

# Fetch data
response = requests.get(url)

print("response code: ", response.status_code)

# Check if the response is valid
if response.status_code == 200:
    try:
        # Split the response into lines
        lines = response.text.split("\n")

        #column_names = []
        header_metadata = []
        data_lines = []
        header_found = False

        footer_metadata = []
        data_cleaned = []

        for line in lines:
            if line.startswith("time"):
                header_found = True
                column_names = line.strip().split(",")
                data_lines.append(line)
                continue
            if not header_found:
                header_metadata.append(line)
            else:
                data_lines.append(line)

        for row in data_lines:
            if any(c.isalpha() for c in row) and not row.startswith("time"):
                footer_metadata.append(row)
            else:
                data_cleaned.append(row)

        print("Extracted column_names: ", column_names)

        csv_data = "\n".join(data_cleaned)
        df = pd.read_csv(StringIO(csv_data), names=column_names, header=0)
        print("Generated df: ", df)

    except Exception as e:
        print("Error while parsing CSV:", str(e))
else:
    print(f"Error {response.status_code}: {response.text}")


with open("solar_data.pkl", "wb") as file:
    pickle.dump(df, file)

response code:  200
Extracted column_names:  ['time', 'G(i)', 'H_sun', 'T2m', 'WS10m', 'Int']
Generated df:                  time  G(i)  H_sun   T2m  WS10m  Int
0      20050101:0011   0.0    0.0  6.83   3.59  0.0
1      20050101:0111   0.0    0.0  6.80   3.45  0.0
2      20050101:0211   0.0    0.0  6.79   3.24  0.0
3      20050101:0311   0.0    0.0  6.44   2.83  0.0
4      20050101:0411   0.0    0.0  6.26   2.62  0.0
...              ...   ...    ...   ...    ...  ...
17515  20061231:1911   0.0    0.0  8.33   4.21  0.0
17516  20061231:2011   0.0    0.0  8.58   4.21  0.0
17517  20061231:2111   0.0    0.0  8.24   4.48  0.0
17518  20061231:2211   0.0    0.0  7.70   4.76  0.0
17519  20061231:2311   0.0    0.0  7.18   4.83  0.0

[17520 rows x 6 columns]
