# 01 – Data Preprocessing: Weather and Solar Data

## Objective
Prepare a clean, aggregated dataset of daily national weather metrics for use in solar generation prediction.

> **Note:** Since aggregated national-level weather data for Ireland is not publicly available, this project manually constructs a representative national dataset by averaging data from **nine strategically selected weather stations**. These stations are geographically distributed across the Republic of Ireland in a grid-like pattern to ensure balanced regional coverage.


## 1: Load Raw Station Data

- Load 9 weather station files from `Solarcast/Data/All_Weather_Stations/`
- Preview shape and content to check for structure consistency


In [1]:
import pandas as pd
import glob

csv_files = glob.glob("Solarcast/Data/All_Weather_Stations/*.csv") 
station_dfs = []

for file in csv_files:
    df = pd.read_csv(file)
    print(f"{file} → {df.shape}")
    station_dfs.append(df)

## 2: Standardise & Clean Columns
- Ensure uniform column names: date, rain, maxtp, mintp, cbl, glorad
- Parse dates correctly
- Handle missing or invalid entries

## 3: Merge Stations by Date
- Outer join all stations on date
- Average numeric columns across all stations

## 4: Export Aggregated Data
- Save result as aggregated_weather_2024.csv in /cleaned_data/  would it be best praxtice to have each as a seperate mark up before the code sections or all together at the beginning 