Analyzing 590K+ observations across 32 European countries to quantify how weather extremes affect electricity consumption and renewable energy production.
This project uses Apache Spark (PySpark) to process large-scale meteorological and energy datasets, revealing statistically significant relationships between weather conditions and energy demand across Europe.
Key finding: Heatwaves (>25°C) increase average electricity consumption by +16.8%, while extreme cold (<0°C) shows a -12.3% decrease at the European level — with significant country-level variation (e.g., Germany shows +8.4% during cold spells due to electric heating).
| Relationship | Pearson r | Interpretation |
|---|---|---|
| Temperature × Consumption | 0.007 (global) | Season-dependent |
| Solar radiation × Solar production | 0.231 | Moderate positive |
| Wind speed × Wind production | 0.262 | Moderate positive |
| Period | Avg Consumption | Δ vs Normal |
|---|---|---|
| Normal (0–25°C) | 10,955 MW | — |
| Heatwave (>25°C) | 12,794 MW | +16.8% |
| Extreme cold (<0°C) | 9,610 MW | -12.3% |
All pairwise differences are statistically significant (p < 0.001, t-test).
Data Sources (OPSD + Open-Meteo API)
→ Ingestion & Chunking
→ PySpark Transformation (unpivot, aggregation)
→ Feature Engineering (join weather + energy)
→ Statistical Analysis (ANOVA, t-tests, correlations)
→ Visualization (Matplotlib, Seaborn)
- Energy data: Open Power System Data (OPSD) — hourly electricity consumption & renewable production
- Weather data: Open-Meteo Historical API — temperature, wind speed, solar radiation
- Coverage: 32 European countries, Jan 2015 – Jun 2020, 590K observations
- Big Data: Apache Spark 3.5.1, PySpark
- Analysis: pandas, NumPy, SciPy (ANOVA, t-tests)
- Visualization: Matplotlib, Seaborn
- Environment: Google Colab / Jupyter
git clone https://github.com/Bassongo/spark-energy-weather-analysis.git
cd spark-energy-weather-analysis
pip install pyspark==3.5.1 pandas numpy scipy matplotlib seaborn openpyxl
jupyter notebook energy_weather_analysis.ipynb├── README.md
├── energy_weather_analysis.ipynb # Main analysis notebook
└── requirements.txt
- Correlations ≠ causation; confounding variables (holidays, events) not controlled
- Daily aggregation masks intra-day peaks
- Next steps: ML-based demand forecasting, extended time range (2010–2024), per-country deep dives
Academic project — Big Data & Cloud Computing (BDCC 2025)
| Name | Role |
|---|---|
| Mouhammadou Dia | Statistical Analyst |
| Kouami Emmanuel Dossekou | Statistical Analyst |
| Marc Mare | Statistical Analyst |
| Ndeye Salla Toure | Statistical Analyst |
Marc Mare — GitHub ENSAE Dakar | MSc SEP, University of Reims (2026)