# 1. Introduction



**Monetary Freedom is Key for Economic Freedom**

This project explores the role of Monetary Freedom in shaping overall Economic Freedom across countries. Using the Heritage Foundation’s Index of Economic Freedom (1995–2024), we analyze whether lower Monetary Freedom scores are associated with weaker performance in other dimensions of economic freedom. The goal is to uncover patterns and insights that can inform policy discussions and economic research.

In this notebook, you can expect the following:

1.  **Data Loading and Initial Exploration**: Load and get a first look at the Economic Freedom Index dataset.
2.  **Data Cleaning and Preprocessing**: Handle missing values, standardize text, and prepare the data for analysis.
3.  **Data Splitting**: Divide the dataset into training and testing sets based on time.
4.  **Exploratory Data Analysis (EDA)**: Visualize distributions, trends over time, and correlations to understand the data better.
5.  **Regression Modeling**: Build and evaluate different regression models to predict the 'Overall Score'.
6.  **Model Evaluation and Optimization**: Assess model performance, analyze residuals, and consider techniques like Bayesian Optimization.
7.  **Bias, Fairness & Explainability**: Investigate potential biases in the model's performance and explore ways to explain its predictions.
8.  **Reporting**: Generate reports summarizing the findings, model performance, and ethical considerations.

# 2. Import & Dataset Loading

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from sklearn import linear_model

In [None]:
df_efi = pd.read_csv('/content/heritage-index-of-economic-freedom-20250825135744.csv', sep=',', skiprows=3)

# Define the correct column names based on the provided format
column_names = ['Country', 'Index Year', 'Overall Score', 'Property Rights', 'Government Integrity', 'Judicial Effectiveness', 'Tax Burden', 'Government Spending', 'Fiscal Health', 'Business Freedom', 'Labor Freedom', 'Monetary Freedom', 'Trade Freedom', 'Investment Freedom', 'Financial Freedom']
df_efi.columns = column_names

display(df_efi.head())

Unnamed: 0,Country,Index Year,Overall Score,Property Rights,Government Integrity,Judicial Effectiveness,Tax Burden,Government Spending,Fiscal Health,Business Freedom,Labor Freedom,Monetary Freedom,Trade Freedom,Investment Freedom,Financial Freedom
0,Afghanistan,2025,,7.4,14.1,2.7,,,,,,,,,
1,Afghanistan,2024,,4.9,18.1,4.9,,,,,,,,,
2,Afghanistan,2023,,5.8,5.4,12.7,,,,34.6,45.1,,,,
3,Afghanistan,2022,,,,,,,,,,,,,
4,Afghanistan,2021,53.0,30.3,29.1,25.7,91.1,76.1,99.9,53.9,59.9,80.8,68.6,10.0,10.0


In [None]:
df_efi.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5704 entries, 0 to 5703
Data columns (total 15 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Country                 5704 non-null   object 
 1   Index Year              5704 non-null   int64  
 2   Overall Score           5146 non-null   float64
 3   Property Rights         5204 non-null   float64
 4   Government Integrity    5220 non-null   float64
 5   Judicial Effectiveness  1648 non-null   float64
 6   Tax Burden              5163 non-null   float64
 7   Government Spending     5178 non-null   float64
 8   Fiscal Health           1620 non-null   float64
 9   Business Freedom        5202 non-null   float64
 10  Labor Freedom           3707 non-null   float64
 11  Monetary Freedom        5186 non-null   float64
 12  Trade Freedom           5174 non-null   float64
 13  Investment Freedom      5187 non-null   float64
 14  Financial Freedom       5165 non-null   

In [None]:
print("\nColumns in df_efi:")
for col in df_efi.columns:
    print(col)



Columns in df_efi:
Country
Index Year
Overall Score
Property Rights
Government Integrity
Judicial Effectiveness
Tax Burden
Government Spending
Fiscal Health
Business Freedom
Labor Freedom
Monetary Freedom
Trade Freedom
Investment Freedom
Financial Freedom


In [None]:
# Create a summary table using pandas methods
summary = pd.DataFrame({
    'Feature Name': df_efi.columns,
    'Type': df_efi.dtypes,
    'Missing?': df_efi.isnull().mean().round(2),
    'Unique Values': df_efi.nunique(),
    'Description': '' # Add an empty Description column
})

# Reset index to make 'Feature Name' a regular column
summary = summary.reset_index(drop=True)

# Add descriptions based on feature names and context
descriptions = {
    'country': 'Name of the country',
    'index_year': 'Year of the economic freedom index',
    'overall_score': 'Overall economic freedom score', # Keep or refine based on context
    'property_rights': 'Protection of private ownership and use',
    'judicial_effectiveness': 'Fair, efficient, and independent judiciary',
    'government_integrity': 'Transparent, impartial, corruption-free governance',
    'tax_burden': 'Level of overall taxation impact',
    'government_spending': 'Public expenditures and economic influence',
    'fiscal_health': 'Sustainability of finances and debt',
    'business_freedom': 'Entrepreneurship without excessive regulation',
    'labor_freedom': 'Flexible labor market with contract freedom',
    'monetary_freedom': 'Price stability and independent monetary policy',
    'trade_freedom': 'Free international goods and services exchange',
    'investment_freedom': 'Open capital markets and opportunities',
    'financial_freedom': 'Access to transparent financial services'
}

# Map descriptions to the summary table
summary['Description'] = summary['Feature Name'].map(descriptions)


print("Summary Table of DataFrame Features with Descriptions:")
display(summary)

Summary Table of DataFrame Features with Descriptions:


Unnamed: 0,Feature Name,Type,Missing?,Unique Values,Description
0,Country,object,0.0,186,
1,Index Year,int64,0.0,31,
2,Overall Score,float64,0.1,582,
3,Property Rights,float64,0.09,693,
4,Government Integrity,float64,0.08,706,
5,Judicial Effectiveness,float64,0.71,702,
6,Tax Burden,float64,0.09,617,
7,Government Spending,float64,0.09,802,
8,Fiscal Health,float64,0.72,678,
9,Business Freedom,float64,0.09,701,


In [None]:
print("\nMissing values per column:")
display(df_efi.isnull().sum())


Missing values per column:


Unnamed: 0,0
Country,0
Index Year,0
Overall Score,558
Property Rights,500
Government Integrity,484
Judicial Effectiveness,4056
Tax Burden,541
Government Spending,526
Fiscal Health,4084
Business Freedom,502


# 3. Data Cleaning & Preprocessing before Split

In [None]:
# Check for exact duplicates across all columns
exact_duplicates = df_efi.duplicated().sum()
print(f"Exact duplicates found: {exact_duplicates}")

# View duplicate rows
duplicate_rows = df_efi[df_efi.duplicated(keep=False)]
print(f"Total rows involved in duplication: {len(duplicate_rows)}")

# Check duplicates on specific key columns
key_duplicates = df_efi.duplicated(subset=['Overall Score']).sum()
print(f"Duplicates based on Overall Score: {key_duplicates}")

Exact duplicates found: 0
Total rows involved in duplication: 0
Duplicates based on Overall Score: 5121


In [None]:
df_efi = df_efi.reset_index(drop=True)
df_efi.index = df_efi.index + 1

print("DataFrame after resetting index:")
display(df_efi.head())
display(df_efi.index)

DataFrame after resetting index:


Unnamed: 0,Country,Index Year,Overall Score,Property Rights,Government Integrity,Judicial Effectiveness,Tax Burden,Government Spending,Fiscal Health,Business Freedom,Labor Freedom,Monetary Freedom,Trade Freedom,Investment Freedom,Financial Freedom
1,Afghanistan,2025,,7.4,14.1,2.7,,,,,,,,,
2,Afghanistan,2024,,4.9,18.1,4.9,,,,,,,,,
3,Afghanistan,2023,,5.8,5.4,12.7,,,,34.6,45.1,,,,
4,Afghanistan,2022,,,,,,,,,,,,,
5,Afghanistan,2021,53.0,30.3,29.1,25.7,91.1,76.1,99.9,53.9,59.9,80.8,68.6,10.0,10.0


RangeIndex(start=1, stop=5705, step=1)

> Standardize text columns

In [None]:
# Standardize 'Country' column by converting to lowercase and removing leading/trailing whitespace
df_efi['Country'] = df_efi['Country'].str.lower().str.strip()

print("DataFrame after standardizing 'Country' column:")
display(df_efi.head())

DataFrame after standardizing 'Country' column:


Unnamed: 0,Country,Index Year,Overall Score,Property Rights,Government Integrity,Judicial Effectiveness,Tax Burden,Government Spending,Fiscal Health,Business Freedom,Labor Freedom,Monetary Freedom,Trade Freedom,Investment Freedom,Financial Freedom
1,afghanistan,2025,,7.4,14.1,2.7,,,,,,,,,
2,afghanistan,2024,,4.9,18.1,4.9,,,,,,,,,
3,afghanistan,2023,,5.8,5.4,12.7,,,,34.6,45.1,,,,
4,afghanistan,2022,,,,,,,,,,,,,
5,afghanistan,2021,53.0,30.3,29.1,25.7,91.1,76.1,99.9,53.9,59.9,80.8,68.6,10.0,10.0


> Standardize column headers

In [None]:
# Standardize column headers
df_efi.columns = df_efi.columns.str.lower().str.replace(' ', '_')

print("DataFrame after standardizing column headers:")
display(df_efi.head())
display(df_efi.columns)

DataFrame after standardizing column headers:


Unnamed: 0,country,index_year,overall_score,property_rights,government_integrity,judicial_effectiveness,tax_burden,government_spending,fiscal_health,business_freedom,labor_freedom,monetary_freedom,trade_freedom,investment_freedom,financial_freedom
1,afghanistan,2025,,7.4,14.1,2.7,,,,,,,,,
2,afghanistan,2024,,4.9,18.1,4.9,,,,,,,,,
3,afghanistan,2023,,5.8,5.4,12.7,,,,34.6,45.1,,,,
4,afghanistan,2022,,,,,,,,,,,,,
5,afghanistan,2021,53.0,30.3,29.1,25.7,91.1,76.1,99.9,53.9,59.9,80.8,68.6,10.0,10.0


Index(['country', 'index_year', 'overall_score', 'property_rights',
       'government_integrity', 'judicial_effectiveness', 'tax_burden',
       'government_spending', 'fiscal_health', 'business_freedom',
       'labor_freedom', 'monetary_freedom', 'trade_freedom',
       'investment_freedom', 'financial_freedom'],
      dtype='object')

In [None]:
# Ensure 'Overall Score' is numeric before Split
df_efi['overall_score'] = pd.to_numeric(df_efi['overall_score'], errors='coerce')

In [None]:
# Create a new dependent variable excluding 'Monetary Freedom'
# We need to recalculate the overall score. Assuming the overall score is the average of the sub-indices.
# If there is a specific formula for the overall score, we should use that instead.
# For this analysis, let's assume the overall score is the average of all sub-indices.
# The original independent variables include 'Monetary Freedom'.
# We need to exclude 'Monetary Freedom' from the list of features used to calculate the new dependent variable.

# Define the list of independent variables by filtering the dataframe columns
independent_variables = [col for col in df_efi.columns if col not in ['country', 'index_year', 'overall_score']]

# List of independent variables excluding 'Monetary Freedom'
independent_variables_without_monetary_freedom = [col for col in independent_variables if col != 'monetary_freedom']

# Calculate the new dependent variable for the original dataframe
# We will take the mean of the columns in 'independent_variables_without_monetary_freedom' for each row
df_efi['overall_score_without_monetary_freedom'] = df_efi[independent_variables_without_monetary_freedom].mean(axis=1)

# Display the first few rows with the new column
print("DataFrame with new dependent variable:")
display(df_efi[['overall_score', 'monetary_freedom', 'overall_score_without_monetary_freedom']].head())

# Update the dependent variable name for subsequent steps
dependent_variable_new = 'overall_score_without_monetary_freedom'

# Display descriptive statistics for the new dependent variable
print("\nDescriptive Statistics for new dependent variable:")
display(df_efi[dependent_variable_new].describe())

DataFrame with new dependent variable:


Unnamed: 0,overall_score,monetary_freedom,overall_score_without_monetary_freedom
1,,,8.066667
2,,,9.3
3,,,20.72
4,,,
5,53.0,80.8,50.418182



Descriptive Statistics for new dependent variable:


Unnamed: 0,overall_score_without_monetary_freedom
count,5229.0
mean,57.894434
std,12.420122
min,1.111111
25%,50.836364
50%,58.0
75%,66.055556
max,91.675
