**Data** - World Development Indicators (WDI) collection from the World Bank, which contains economic, social, and environmental indicators for every country from 1960–2024.

**Problem Statement**

Does economic growth necessarily lead to higher carbon emissions, or have developed nations succeeded in decoupling growth from environmental impact?

**Goal:**

Analyze the relationship between GDP growth and CO₂ emissions across income groups.

Identify patterns showing whether wealthier nations emit less CO₂ per unit of economic output compared to developing economies.

Predict future emission trends based on GDP growth using a machine learning model.

Provide insights that can guide sustainable economic policies for developing countries.


# Phase 1

Use world bank api to fetch data


## Load and Clean Data

Filter relevant indicators and countries.

Handle missing or NaN values.

Convert year columns to long format.

Use a tokenizer to prepare text description input

Add PII identification layer

##Merge Metadata

Join with WDICountry to include region and income group.

Join with WDISeries to get indicator definitions.

###Text Summarization of Indicator Definitions

Fine-tune a summarizer like t5-small or use pre-trained

##Perform Analysis

Trend analysis - visualize change over time.

Correlation analysis - identify relationships between indicators.

Comparative analysis - group by region/income and compare averages.

Regression - model relationship (e.g., CO₂ vs. GDP).

##Visualize Insights

Use line plots (time trends), scatter plots (correlation), bar charts (regional comparison).

Highlight top 5 performing or worst regions.

##Forecasting Future CO₂

Use a model like LSTM to forecast next 5 years of CO₂ emissions for a selected countries.

#Phase 2


##Machine Learning (Prediction & Regression)

Predict future values or relationships between indicators.

Predict CO₂ emissions using GDP, energy use, and population data.

Train a regression model with scikit-learn and upload it to Hugging Face Hub.

### CO₂ Emission Predictor

User inputs GDP per capita, energy use, and region.

Model outputs predicted CO₂ emission per capita.

Show visualization (bar/line chart).

#Phase 3

##Tokenization Integration
Embedding + Tokenizer Integration

Use Hugging Face AutoTokenizer to tokenize country metadata or indicator text (from WDISeries.csv) and embed semantic meaning

##Combine Structured + Textual Data

Try to combine numeric features (GDP, population) with textual descriptions (indicator notes) to create a hybrid model — structured + NLP embeddings

##Use xgboost to train locally in Python.

Wrap it into a Hugging Face transformers pipeline using AutoModel or Trainer.

Push the trained model to Hugging Face Hub.

### Uploading the model on hugging face hub
Deploy model using higging face interface API on hugging face space

### Documenting the project