Forecasting Europe 2030: A Machine Learning Analysis of Growth, Equality and Climate SDGs

Project Overview

The 2030 SDG Monitor is a data science project designed to analyze, forecast and visualize the progress of European countries towards the United Nations Sustainable Development Goals (SDGs) for the year 2030.

Focusing on SDG 8 (Decent Work), SDG 10 (Reduced Inequalities) and SDG 13 (Climate Action), this project utilizes a predictive pipeline based on historical data (2005–2019) to project future trends and assess whether countries are on track to meet EU targets.

Key Features

Predictive Modeling: Forecasts indicators up to 2030 using Linear Regression models.
Hybrid Methodology: Combines time-series trends with autoregressive dynamics (lagged socio-economic variables).
Model Validation: Includes a backtesting module to evaluate model accuracy (MAE) using a 2019 cutoff (pre-pandemic).
Interactive Dashboard: A web-based interface (Plotly Dash) featuring:
- Choropleth maps for European overview.
- Trend lines comparing historical data, forecasts and 2030 targets.
- A color system (Green/Orange/Red) to visualize distance to targets.

Project Structure

Project_Datascience/
├── data/
│   └── Final_Cleaned_Database.csv    # Historical dataset (2005-2022)
├── results/
│   ├── descriptive_analysis/         # Exploratory data analysis charts
│   ├── forecast_2030/                # Generated forecast data and static plots
│   └── model_validation_plot/        # Backtesting performance charts
├── src/
│   ├── dashboard.py                  # Interactive Dash application
│   ├── descriptive_analysis.py       # Descriptive analysis script
│   ├── forecast_to_2030.py           # Main forecasting script
│   ├── model_validation.py           # Backtesting and error analysis script
│   └── preprocessing_data.py         # Data preprocessing script
└── README.md

Installation & Requirements

This project requires Python 3.9+. Install the necessary dependencies:

pip install -r requirements.txt

Usage

1. Data Preprocessing

Clean and prepare the raw dataset for analysis.

python src/preprocessing_data.py

Output: Generates the cleaned dataset data/Final_cleaned_database.csv.

2. Descriptive Analysis

Perform exploratory data analysis to visualize historical trends.

python src/descriptive_analysis.py

Output: Generates distribution and correlation plots in results/descriptive_analysis/.

3. Model Validation (Optional)

Run the backtesting script to evaluate the reliability of the models. It trains on data up to 2019 and tests on 2020-2022.

python src/model_validation.py

Output: Generates validation charts and an error summary table in results/model_validation_plot/.

4. Generate Forecasts

Run the main forecasting script to generate predictions for 2023-2030.

python src/forecast_to_2030.py

Output: Creates graph_forecast_data.csv and static trend images in results/forecast_2030/.

5. Launch the Dashboard

Start the interactive web application to explore the results.

python src/dashboard.py

Output: The app will run locally. Open your browser at http://127.0.0.1:8050/.

Methodology

To address the complexity of socio-economic and environmental indicators, we developed a Linear Regression pipeline combining temporal trends and autoregressive dynamics:

Socio-Economic Indicators (e.g., GDP, Unemployment, Inequality):
- Modeled using Lagged Features (e.g., $X_{t-1}$) to capture system dynamics.
- Example: Unemployment Rate is predicted using the previous year's NEET Rate and Income Distribution.
Environmental Indicators (GHG Emissions, Renewable Share):
- Modeled using Time-Series Trends (Year as the sole feature).
- This approach captures the structural, often policy-driven trajectory of green transition metrics.

Note on Training: The models are trained on data from 2005 to 2019 to avoid biasing the long-term trend with the specific anomalies of the COVID-19 pandemic years.

Indicators & Targets

Indicator	SDG	2030 Target / Goal
Real GDP Per Capita	SDG 8	Growth Trend
Unemployment Rate	SDG 8	≤ 5.0%
NEET Rate	SDG 8	≤ 9.0%
Income Distribution (S80/S20)	SDG 10	Reduction Trend
Income Share Bottom 40%	SDG 10	Increase Trend
Renewable Energy Share	SDG 13	≥ 42.5%
GHG Emissions	SDG 13	Reduction Trend

Visualization Logic (Color System)

The dashboard uses a color-coded system to evaluate progress:

For Numeric Targets: The color is determined by comparing the 2030 Forecast directly to the fixed threshold.
For Trend Targets: The color is determined by comparing the 2030 Forecast to the Last Historical Value (2022).
- Green: The forecast shows an improvement relative to the 2022 baseline.
- Red: The forecast shows a deterioration or stagnation relative to the 2022 baseline.
- Note: Even if the trend line points in the right direction, the status remains Red if the final 2030 prediction is not better than the actual 2022 value.

Author

Erin Anzallo - M1 Data Science Project

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
data		data
results		results
src		src
.gitignore		.gitignore
AI_USAGE.md		AI_USAGE.md
LICENSE		LICENSE
PROPOSAL.md		PROPOSAL.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Forecasting Europe 2030: A Machine Learning Analysis of Growth, Equality and Climate SDGs

Project Overview

Key Features

Project Structure

Installation & Requirements

Usage

1. Data Preprocessing

2. Descriptive Analysis

3. Model Validation (Optional)

4. Generate Forecasts

5. Launch the Dashboard

Methodology

Indicators & Targets

Visualization Logic (Color System)

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Forecasting Europe 2030: A Machine Learning Analysis of Growth, Equality and Climate SDGs

Project Overview

Key Features

Project Structure

Installation & Requirements

Usage

1. Data Preprocessing

2. Descriptive Analysis

3. Model Validation (Optional)

4. Generate Forecasts

5. Launch the Dashboard

Methodology

Indicators & Targets

Visualization Logic (Color System)

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages