# Phase 3 Project - The Economic Impact of Corruption Using WGI

## 📌 Project Overview
This project explores the relationship between **corruption and economic stability** using the **Worldwide Governance Indicators (WGI)** dataset. 
We aim to:
- Analyze the impact of governance metrics (e.g., corruption control, rule of law) on **GDP growth and investment trends**.
- Build **classification models** to predict economic risk based on governance scores.

## 📊 Business Problem

Corruption is a major driver of economic instability, affecting **foreign direct investment (FDI), growth rates, and financial resilience**. 
This project provides insights for **governments, investors, financial institutions, and anti-corruption watchdogs** to assess economic risks.

### 🎯 Key Stakeholders:
- **Governments & Policy Makers** → Formulate **anti-corruption reforms** for economic resilience.
- **Investors & Multinational Corporations** → Assess risk levels before entering new markets.
- **International Financial Institutions** (World Bank, IMF) → Use governance scores to determine loan eligibility.
- **Civil Society Organizations** → Advocate for transparency & accountability in governance.

## 🔬 ML Objectives  

### **Analysis Objectives:**
1. **Assess the Impact of Corruption on Economic Growth**
   - Examine how fluctuations in **Control of Corruption** affect **GDP growth**.
   - Compare trends in **developing vs. advanced economies**.

2. **Analyze the Relationship Between Regulatory Quality & Investment Flows**
   - Investigate whether **strong governance frameworks** attract higher FDI.
   - Explore how economic mismanagement affects investor confidence.

### **Modeling Objectives:**
1. **Predict a Country’s Economic Performance Based on Governance Indicators**
   - Build a **regression model** to forecast GDP growth using WGI metrics.
   - Use **Ridge Regression** for better predictive accuracy.

2. **Classify Countries Into High or Low Investment-Attractiveness Groups**
   - Develop a **classification model** to determine investment risk based on corruption trends.
   - Apply **Logistic Regression or Decision Trees** for clear stakeholder interpretation.

# Loading and Inspecting the Worldwide Governanace Indicators dataset

In [12]:
import pandas as pd
#Load WGI Dataset csv to a dataframe
df=pd.read_csv('WB_WGI.csv')
print (df.head())
print (df.info())

       STRUCTURE                STRUCTURE_ID ACTION FREQ FREQ_LABEL REF_AREA  \
0  datastructure  WB.DATA360:DS_DATA360(1.2)      I    A     Annual      ABW   
1  datastructure  WB.DATA360:DS_DATA360(1.2)      I    A     Annual      AFG   
2  datastructure  WB.DATA360:DS_DATA360(1.2)      I    A     Annual      AGO   
3  datastructure  WB.DATA360:DS_DATA360(1.2)      I    A     Annual      AIA   
4  datastructure  WB.DATA360:DS_DATA360(1.2)      I    A     Annual      ALB   

  REF_AREA_LABEL      INDICATOR                  INDICATOR_LABEL SEX  ...  \
0          Aruba  WB_WGI_CC_EST  Control of Corruption: Estimate  _T  ...   
1    Afghanistan  WB_WGI_CC_EST  Control of Corruption: Estimate  _T  ...   
2         Angola  WB_WGI_CC_EST  Control of Corruption: Estimate  _T  ...   
3       Anguilla  WB_WGI_CC_EST  Control of Corruption: Estimate  _T  ...   
4        Albania  WB_WGI_CC_EST  Control of Corruption: Estimate  _T  ...   

  DATABASE_ID                      DATABASE_ID_LABEL UNI

#### view the columns in the world bank dataset of WGI 

In [13]:
print(df.columns)

Index(['STRUCTURE', 'STRUCTURE_ID', 'ACTION', 'FREQ', 'FREQ_LABEL', 'REF_AREA',
       'REF_AREA_LABEL', 'INDICATOR', 'INDICATOR_LABEL', 'SEX', 'SEX_LABEL',
       'AGE', 'AGE_LABEL', 'URBANISATION', 'URBANISATION_LABEL',
       'UNIT_MEASURE', 'UNIT_MEASURE_LABEL', 'COMP_BREAKDOWN_1',
       'COMP_BREAKDOWN_1_LABEL', 'COMP_BREAKDOWN_2', 'COMP_BREAKDOWN_2_LABEL',
       'COMP_BREAKDOWN_3', 'COMP_BREAKDOWN_3_LABEL', 'TIME_PERIOD',
       'OBS_VALUE', 'DATABASE_ID', 'DATABASE_ID_LABEL', 'UNIT_MULT',
       'UNIT_MULT_LABEL', 'UNIT_TYPE', 'UNIT_TYPE_LABEL', 'OBS_STATUS',
       'OBS_STATUS_LABEL', 'OBS_CONF', 'OBS_CONF_LABEL'],
      dtype='object')


#### Inorder for us to understand our dataset we must see what indicators have been labelled.We now see all the releavant governance metrics that divulge their estimates also.

In [14]:
# Check unique indicator labels to find relevant governance metrics

#Our key indicators therein 

#Control of Corruption →  listed as WB_WGI.
#Political Stability → listed as (Political Stability Estimate).
# Rule of Law → listed as (Rule of Law Estimate).

print(df["INDICATOR_LABEL"].unique())

['Control of Corruption: Estimate'
 'Control of Corruption: Number of Sources'
 'Control of Corruption: Percentile Rank'
 'Control of Corruption: Percentile Rank, Lower Bound of 90% Confidence Interval'
 'Control of Corruption: Percentile Rank, Upper Bound of 90% Confidence Interval'
 'Control of Corruption: Standard Error'
 'Government Effectiveness: Estimate'
 'Government Effectiveness: Number of Sources'
 'Government Effectiveness: Percentile Rank'
 'Government Effectiveness: Percentile Rank, Lower Bound of 90% Confidence Interval'
 'Government Effectiveness: Percentile Rank, Upper Bound of 90% Confidence Interval'
 'Government Effectiveness: Standard Error'
 'Political Stability and Absence of Violence/Terrorism: Estimate'
 'Political Stability and Absence of Violence/Terrorism: Number of Sources'
 'Political Stability and Absence of Violence/Terrorism: Percentile Rank'
 'Political Stability and Absence of Violence/Terrorism: Percentile Rank, Lower Bound of 90% Confidence Interval'

#### Handling of missing values 

In [15]:
# Check for any missing values per column
print(df.isnull().sum().sort_values(ascending=False))

OBS_VALUE                 6756
OBS_CONF_LABEL               0
UNIT_MEASURE_LABEL           0
URBANISATION_LABEL           0
URBANISATION                 0
AGE_LABEL                    0
AGE                          0
SEX_LABEL                    0
SEX                          0
INDICATOR_LABEL              0
INDICATOR                    0
REF_AREA_LABEL               0
REF_AREA                     0
FREQ_LABEL                   0
FREQ                         0
ACTION                       0
STRUCTURE_ID                 0
UNIT_MEASURE                 0
COMP_BREAKDOWN_1             0
OBS_CONF                     0
COMP_BREAKDOWN_1_LABEL       0
COMP_BREAKDOWN_2             0
COMP_BREAKDOWN_2_LABEL       0
COMP_BREAKDOWN_3             0
COMP_BREAKDOWN_3_LABEL       0
TIME_PERIOD                  0
DATABASE_ID                  0
DATABASE_ID_LABEL            0
UNIT_MULT                    0
UNIT_MULT_LABEL              0
UNIT_TYPE                    0
UNIT_TYPE_LABEL              0
OBS_STAT

#### Performing a feature selection


In [16]:
# Define relevant governance indicators
governance_indicators = [
    "Control of Corruption: Estimate",
    "Political Stability and Absence of Violence/Terrorism: Estimate",
    "Government Effectiveness: Estimate"
]

# Filter dataset for relevant indicators
df_filtered = df[df["INDICATOR_LABEL"].isin(governance_indicators)].copy()