**Data** - World Development Indicators (WDI) collection from the World Bank, which contains economic, social, and environmental indicators for every country from 1960–2024.

**Problem Statement**

Does economic growth necessarily lead to higher carbon emissions, or have developed nations succeeded in decoupling growth from environmental impact?

**Goal:**

Analyze the relationship between GDP growth and CO2 emissions across countries.

Identify patterns showing whether wealthier nations emit less CO2 per unit of economic output compared to developing economies.

Predict future emission trends based on GDP growth using a machine learning model.

Provide insights that can guide sustainable economic policies for developing countries.


# Phase 1

##Use World Bank API to fetch data

Use the World Bank Data API to fetch real-time data on CO2 emissions, GDP per capita, energy use, and population. Combine and clean data using Pandas and NumPy for structured analysis.


## Load and Clean Data

Filter relevant indicators and countries.

Handle missing or NaN values.

Convert year columns to long format.

Use a tokenizer to prepare text description input

Add PII identification layer

##Merge Metadata

Join with WDICountry to include region and income group.

Join with WDISeries to get indicator definitions.

###Text Summarization of Indicator Definitions

Fine-tune a summarizer like t5-small or use pre-trained

##Perform Analysis

Trend analysis - visualize change over time.

Correlation analysis - identify relationships between indicators.

Comparative analysis - group by region/income and compare averages.

Regression - model relationship (e.g., CO2 vs. GDP).

##Visualize Insights

Use line plots (time trends), scatter plots (correlation), bar charts (regional comparison).

Highlight top 5 performing or worst regions.

##Decoupling Index Calculation

Calculate a decoupling metric that measures whether countries are growing economically while reducing emissions.

##Forecasting Future CO2

Use a model like LSTM to forecast next 5 years of CO₂ emissions for a selected countries.

#Phase 2


##Machine Learning (Prediction & Regression)

Predict future values or relationships
between indicators.

Predict CO2 emissions using GDP, energy use, and population data.






Use Scikit-learn regression algorithms to predict CO2 emissions based on GDP and energy variables. Models include: Linear Regression, Random Forest Regressor and XGBoost

###Apply cross-validation, grid search to optimize model performance.
Using the Region, Lending Category, and other categorical features from WDICountry.csv (One-Hot Encoded) as predictor variables we will split the original cleaned dataset into training and testing sets.

### Model Selection
Train multiple classification models on the training set to predict the Development column. Using random forest classifier, SVC, kNN.

---



###Hyperparameter Tuning
Select the best-performing model from the above task and use an automated search (e.g., grid search or randomized search) to find the best combination of hyperparameters.


###Final classification on the test set
Run the final and optimized classifier on the test set.

### CO2 Emission Predictor

User inputs GDP per capita, energy use, and region.

Model outputs predicted CO2 emission per capita.

Show visualization (bar/line chart).

#Phase 3

##Learning Models (TensorFlow)

Build neural network models using TensorFlow to capture nonlinear relationships and temporal patterns.

Dense Layers for tabular data

Early Stopping and Learning Rate Scheduling for optimization

Evaluate using MAE, RMSE, and R² metrics.

##Tokenization Integration
Embedding + Tokenizer Integration

Use Hugging Face AutoTokenizer to tokenize country data and indicator text (from WDISeries.csv) and embed semantic meaning.

##Combine Structured + Textual Data

Try to combine numeric features (GDP, population) with textual descriptions (long and short indicator notes) to create a hybrid model of structured + NLP embeddings

##Use xgboost to train locally in Python.

Wrap it into a Hugging Face transformers pipeline using AutoModel or Trainer.

Push the trained model to Hugging Face Hub.

#Phase 4

### Uploading the model on hugging face hub
Deploy model using higging face interface API on hugging face space

### Documenting the project