### **Problem statement**

Does democracy cause economic growth? 

**Libraries imports**

In [301]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.api as sm

In [302]:
# Set dimensions for all the plots of seaborn
sns.set_theme(rc={'figure.figsize':(11.7,8.27)})

**Loading data**

In [303]:
democracy_data=pd.read_csv('Data/democracy-index-eiu.csv')
gdpPerCapita_data=pd.read_csv('Data/gdp-per-capita-worldbank.csv')


In [304]:
democracy=democracy_data.copy()
gdpPerCapita=gdpPerCapita_data.copy()

In [305]:
print("Shape of democracy dataframe:",democracy.shape)
democracy.head()

Shape of democracy dataframe: (2765, 4)


Unnamed: 0,Entity,Code,Year,Democracy score
0,Afghanistan,AFG,2006,3.06
1,Afghanistan,AFG,2008,3.02
2,Afghanistan,AFG,2010,2.48
3,Afghanistan,AFG,2011,2.48
4,Afghanistan,AFG,2012,2.48


In [306]:
print("Shape of gdpPerCapita dataframe:",gdpPerCapita.shape)
gdpPerCapita.head()

Shape of gdpPerCapita dataframe: (6562, 4)


Unnamed: 0,Entity,Code,Year,"GDP per capita, PPP (constant 2017 international $)"
0,Afghanistan,AFG,2002,1280.4631
1,Afghanistan,AFG,2003,1292.3335
2,Afghanistan,AFG,2004,1260.0605
3,Afghanistan,AFG,2005,1352.3207
4,Afghanistan,AFG,2006,1366.9932


In [307]:
democracy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2765 entries, 0 to 2764
Data columns (total 4 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Entity           2765 non-null   object 
 1   Code             2669 non-null   object 
 2   Year             2765 non-null   int64  
 3   Democracy score  2765 non-null   float64
dtypes: float64(1), int64(1), object(2)
memory usage: 86.5+ KB


In [308]:
gdpPerCapita.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6562 entries, 0 to 6561
Data columns (total 4 columns):
 #   Column                                               Non-Null Count  Dtype  
---  ------                                               --------------  -----  
 0   Entity                                               6562 non-null   object 
 1   Code                                                 6133 non-null   object 
 2   Year                                                 6562 non-null   int64  
 3   GDP per capita, PPP (constant 2017 international $)  6562 non-null   float64
dtypes: float64(1), int64(1), object(2)
memory usage: 205.2+ KB


In [309]:
democracy.describe()

Unnamed: 0,Year,Democracy score
count,2765.0,2765.0
mean,2015.259675,5.500987
std,4.897042,2.191708
min,2006.0,0.32
25%,2011.0,3.53
50%,2015.0,5.79
75%,2019.0,7.24
max,2023.0,9.93


In [310]:
gdpPerCapita.describe()

Unnamed: 0,Year,"GDP per capita, PPP (constant 2017 international $)"
count,6562.0,6562.0
mean,2006.434014,18307.128922
std,9.415812,20344.95962
min,1990.0,430.41354
25%,1998.0,3726.7931
50%,2007.0,10638.246
75%,2015.0,26873.1735
max,2022.0,157600.64


##### **Data cleaning and preparation**

In [311]:
# Dropping columns that are not necessary
colsToBeDropped=['Code']
democracy=democracy.drop(columns=colsToBeDropped,axis=1)
gdpPerCapita=gdpPerCapita.drop(columns=colsToBeDropped,axis=1)

In [312]:
democracy.rename(columns={'Democracy score': 'democracy_score'}, inplace=True)
gdpPerCapita.rename(columns={'GDP per capita, PPP (constant 2017 international $)': 'gdpPerCapita'}, inplace=True)

In [313]:
democracy.head()

Unnamed: 0,Entity,Year,democracy_score
0,Afghanistan,2006,3.06
1,Afghanistan,2008,3.02
2,Afghanistan,2010,2.48
3,Afghanistan,2011,2.48
4,Afghanistan,2012,2.48


In [314]:
gdpPerCapita.head()

Unnamed: 0,Entity,Year,gdpPerCapita
0,Afghanistan,2002,1280.4631
1,Afghanistan,2003,1292.3335
2,Afghanistan,2004,1260.0605
3,Afghanistan,2005,1352.3207
4,Afghanistan,2006,1366.9932


In [315]:
democracy=democracy[(democracy.Year<2023)]
gdpPerCapita=gdpPerCapita[(gdpPerCapita.Year>=2006)]

In [316]:
def check_year_count(entity_name):
  democracy_count = democracy[democracy['Entity'] == entity_name]['Year'].count()
  gdpPerCapita_count = gdpPerCapita[gdpPerCapita['Entity'] == entity_name]['Year'].count()
  
  print(f"Year count for {entity_name} in democracy dataframe: {democracy_count}")
  print(f"Year count for {entity_name} in gdpPerCapita dataframe: {gdpPerCapita_count}")
  
  return democracy_count,gdpPerCapita_count

check_year_count('Afghanistan')

Year count for Afghanistan in democracy dataframe: 15
Year count for Afghanistan in gdpPerCapita dataframe: 16


(np.int64(15), np.int64(16))

In [317]:
def find_missing_years(entity_name):
  democracy_count, gdpPerCapita_count = check_year_count(entity_name)
  
  if democracy_count != gdpPerCapita_count:
    democracy_years = set(democracy[(democracy['Entity'] == entity_name) & (democracy['Year'] >= 2006) & (democracy['Year'] <= 2022)]['Year'])
    gdpPerCapita_years = set(gdpPerCapita[(gdpPerCapita['Entity'] == entity_name) & (gdpPerCapita['Year'] >= 2006) & (gdpPerCapita['Year'] <= 2022)]['Year'])
    
    missing_in_democracy = sorted(gdpPerCapita_years - democracy_years)
    missing_in_gdpPerCapita = sorted(democracy_years - gdpPerCapita_years)
    
    if missing_in_democracy:
      print(f"Years missing in democracy dataframe for {entity_name}: {missing_in_democracy}")
    if missing_in_gdpPerCapita:
      print(f"Years missing in gdpPerCapita dataframe for {entity_name}: {missing_in_gdpPerCapita}")
  else:
    print(f"No missing years for {entity_name} between 2006 and 2022.")

  return missing_in_democracy,missing_in_gdpPerCapita

find_missing_years('Afghanistan')

Year count for Afghanistan in democracy dataframe: 15
Year count for Afghanistan in gdpPerCapita dataframe: 16
Years missing in democracy dataframe for Afghanistan: [2007, 2009]
Years missing in gdpPerCapita dataframe for Afghanistan: [2022]


([2007, 2009], [2022])

In [318]:
def get_missing_years_dataframe():
  entities = set(democracy['Entity']).union(set(gdpPerCapita['Entity']))
  missing_years_data = []

  for entity in entities:
    missing_in_democracy, missing_in_gdpPerCapita = find_missing_years(entity)
    missing_years_data.append({
      'Country': entity,
      'Missing in Democracy': missing_in_democracy,
      'Missing in GDP per Capita': missing_in_gdpPerCapita
    })

  missing_years_df = pd.DataFrame(missing_years_data)
  return missing_years_df

missing_years_df = get_missing_years_dataframe()
missing_years_df

Year count for South Korea in democracy dataframe: 15
Year count for South Korea in gdpPerCapita dataframe: 17
Years missing in democracy dataframe for South Korea: [2007, 2009]
Year count for Lower-middle-income countries in democracy dataframe: 0
Year count for Lower-middle-income countries in gdpPerCapita dataframe: 17
Years missing in democracy dataframe for Lower-middle-income countries: [2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022]
Year count for Mexico in democracy dataframe: 15
Year count for Mexico in gdpPerCapita dataframe: 17
Years missing in democracy dataframe for Mexico: [2007, 2009]
Year count for Grenada in democracy dataframe: 0
Year count for Grenada in gdpPerCapita dataframe: 17
Years missing in democracy dataframe for Grenada: [2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022]
Year count for Brunei in democracy dataframe: 0
Year count for Brunei in gdpPerCapi

Unnamed: 0,Country,Missing in Democracy,Missing in GDP per Capita
0,South Korea,"[2007, 2009]",[]
1,Lower-middle-income countries,"[2006, 2007, 2008, 2009, 2010, 2011, 2012, 201...",[]
2,Mexico,"[2007, 2009]",[]
3,Grenada,"[2006, 2007, 2008, 2009, 2010, 2011, 2012, 201...",[]
4,Brunei,"[2006, 2007, 2008, 2009, 2010, 2011, 2012, 201...",[]
...,...,...,...
216,Russia,"[2007, 2009]",[]
217,Trinidad and Tobago,"[2007, 2009]",[]
218,Czechia,"[2007, 2009]",[]
219,Macao,"[2006, 2007, 2008, 2009, 2010, 2011, 2012, 201...",[]


In [319]:
# Get the list of countries that are missing more than two years in democracy dataframe
countries_to_remove = missing_years_df[missing_years_df['Missing in Democracy'].apply(len) > 2]['Country'].tolist()
print(countries_to_remove)
# Remove these countries from the democracy dataframe
democracy = democracy[~democracy['Entity'].isin(countries_to_remove)]
democracy.head()

['Lower-middle-income countries', 'Grenada', 'Brunei', 'Dominica', 'Tuvalu', 'Middle-income countries', 'Samoa', 'North America (WB)', 'Saint Vincent and the Grenadines', 'Belize', 'Marshall Islands', 'Upper-middle-income countries', 'Bahamas', 'Aruba', 'Saint Kitts and Nevis', 'Sao Tome and Principe', 'East Asia and Pacific (WB)', 'Somalia', 'Sint Maarten (Dutch part)', 'Europe and Central Asia (WB)', 'Low-income countries', 'Cayman Islands', 'Maldives', 'South Asia (WB)', 'Latin America and Caribbean (WB)', 'Curacao', 'Bermuda', 'Puerto Rico', 'Seychelles', 'Turks and Caicos Islands', 'Palau', 'Vanuatu', 'Nauru', 'Middle East and North Africa (WB)', 'European Union (27)', 'Barbados', 'San Marino', 'Antigua and Barbuda', 'Kosovo', 'Kiribati', 'High-income countries', 'Tonga', 'Saint Lucia', 'Sub-Saharan Africa (WB)', 'Solomon Islands', 'Macao', 'Micronesia (country)']


Unnamed: 0,Entity,Year,democracy_score
0,Afghanistan,2006,3.06
1,Afghanistan,2008,3.02
2,Afghanistan,2010,2.48
3,Afghanistan,2011,2.48
4,Afghanistan,2012,2.48


In [320]:
# Remove the countries from the gdpPerCapita dataframe
gdpPerCapita = gdpPerCapita[~gdpPerCapita['Entity'].isin(countries_to_remove)]
# Display the unique number of Entity in each dataframe
unique_entities_democracy = democracy['Entity'].nunique()
unique_entities_gdpPerCapita = gdpPerCapita['Entity'].nunique()

print(f"Unique number of Entity in democracy dataframe: {unique_entities_democracy}")
print(f"Unique number of Entity in gdpPerCapita dataframe: {unique_entities_gdpPerCapita}")

Unique number of Entity in democracy dataframe: 174
Unique number of Entity in gdpPerCapita dataframe: 161


In [321]:
# Get the list of entities present in the gdpPerCapita dataframe
entities_in_gdpPerCapita = set(gdpPerCapita['Entity'])
# Remove entities from the democracy dataframe that are not in the gdpPerCapita dataframe
democracy = democracy[democracy['Entity'].isin(entities_in_gdpPerCapita)]
unique_entities_democracy = democracy['Entity'].nunique()
print(f"Unique number of Entity in democracy dataframe: {unique_entities_democracy}")
print(f"Unique number of Entity in gdpPerCapita dataframe: {unique_entities_gdpPerCapita}")


Unique number of Entity in democracy dataframe: 161
Unique number of Entity in gdpPerCapita dataframe: 161


In [322]:
# Merge the two dataframes on the 'Entity' and 'Year' columns
merged_df = pd.merge(democracy, gdpPerCapita, on=['Entity', 'Year'])
# Print the dimensions of the new dataframe
print("Dimensions of the merged dataframe:", merged_df.shape)


Dimensions of the merged dataframe: (2407, 4)


In [323]:
merged_df.head()

Unnamed: 0,Entity,Year,democracy_score,gdpPerCapita
0,Afghanistan,2006,3.06,1366.9932
1,Afghanistan,2008,3.02,1556.8445
2,Afghanistan,2010,2.48,2026.1638
3,Afghanistan,2011,2.48,1961.0963
4,Afghanistan,2012,2.48,2122.8308


I have decided to fill the missing values for democracy index and GDP by taking the mean of the previous and the following value.

In [324]:
def fill_missing_years(df, entity, year_col, value_cols):
    for year in range(2006, 2023):
        if year not in df[df['Entity'] == entity][year_col].values:
            prev_year = year - 1
            next_year = year + 1
            
            if prev_year in df[df['Entity'] == entity][year_col].values and next_year in df[df['Entity'] == entity][year_col].values:
                prev_values = df[(df['Entity'] == entity) & (df[year_col] == prev_year)][value_cols].values[0]
                next_values = df[(df['Entity'] == entity) & (df[year_col] == next_year)][value_cols].values[0]
                
                mean_values = (prev_values + next_values) / 2
                new_row = {'Entity': entity, year_col: year}
                new_row.update(dict(zip(value_cols, mean_values)))
                
                new_row_df = pd.DataFrame([new_row])  
                df = pd.concat([df, new_row_df], ignore_index=True)  
    return df

# Apply the function to fill missing years for each country
for entity in merged_df['Entity'].unique():
    merged_df = fill_missing_years(merged_df, entity, 'Year', ['democracy_score', 'gdpPerCapita'])

# Sort the dataframe by Entity and Year
merged_df = merged_df.sort_values(by=['Entity', 'Year']).reset_index(drop=True)

merged_df.head(20)  # Display the first 20 rows to verify the changes

Unnamed: 0,Entity,Year,democracy_score,gdpPerCapita
0,Afghanistan,2006,3.06,1366.9932
1,Afghanistan,2007,3.04,1461.91885
2,Afghanistan,2008,3.02,1556.8445
3,Afghanistan,2009,2.75,1791.50415
4,Afghanistan,2010,2.48,2026.1638
5,Afghanistan,2011,2.48,1961.0963
6,Afghanistan,2012,2.48,2122.8308
7,Afghanistan,2013,2.48,2165.3408
8,Afghanistan,2014,2.77,2144.4497
9,Afghanistan,2015,2.77,2108.714


In [325]:
# Calculate the percentage change in GDP per capita for each country over time
merged_df['gdpPerCapita_pct_change'] = merged_df.groupby('Entity')['gdpPerCapita'].pct_change() * 100
# Explicitly assign the result of fillna() to the column
merged_df['gdpPerCapita_pct_change'] = merged_df['gdpPerCapita_pct_change'].fillna(0)
merged_df.head(20)

Unnamed: 0,Entity,Year,democracy_score,gdpPerCapita,gdpPerCapita_pct_change
0,Afghanistan,2006,3.06,1366.9932,0.0
1,Afghanistan,2007,3.04,1461.91885,6.94412
2,Afghanistan,2008,3.02,1556.8445,6.493223
3,Afghanistan,2009,2.75,1791.50415,15.072774
4,Afghanistan,2010,2.48,2026.1638,13.098471
5,Afghanistan,2011,2.48,1961.0963,-3.211364
6,Afghanistan,2012,2.48,2122.8308,8.247147
7,Afghanistan,2013,2.48,2165.3408,2.002515
8,Afghanistan,2014,2.77,2144.4497,-0.964795
9,Afghanistan,2015,2.77,2108.714,-1.666428


In [326]:
# Remove all rows where Year is 2006
merged_df = merged_df[merged_df['Year'] != 2006]
merged_df.head()

Unnamed: 0,Entity,Year,democracy_score,gdpPerCapita,gdpPerCapita_pct_change
1,Afghanistan,2007,3.04,1461.91885,6.94412
2,Afghanistan,2008,3.02,1556.8445,6.493223
3,Afghanistan,2009,2.75,1791.50415,15.072774
4,Afghanistan,2010,2.48,2026.1638,13.098471
5,Afghanistan,2011,2.48,1961.0963,-3.211364


I have used the following link as a reference to categorize countries based on their democracy indices.
https://en.wikipedia.org/wiki/The_Economist_Democracy_Index

In [327]:

def categorize_democracy(score):
  if 8.00 <= score <= 10.00:
    return 'full'
  elif 6.00 <= score <= 7.99:
    return 'flawed'
  elif 4.00 <= score <= 5.99:
    return 'hybrid'
  elif 0.00 <= score <= 3.99:
    return 'authoritarian'
  else:
    return 'unknown'

# Apply the function to create a new column 'democracy_type'
merged_df['democracy_type'] = merged_df['democracy_score'].apply(categorize_democracy)

# Display the first few rows to verify the changes
merged_df.head()

Unnamed: 0,Entity,Year,democracy_score,gdpPerCapita,gdpPerCapita_pct_change,democracy_type
1,Afghanistan,2007,3.04,1461.91885,6.94412,authoritarian
2,Afghanistan,2008,3.02,1556.8445,6.493223,authoritarian
3,Afghanistan,2009,2.75,1791.50415,15.072774,authoritarian
4,Afghanistan,2010,2.48,2026.1638,13.098471,authoritarian
5,Afghanistan,2011,2.48,1961.0963,-3.211364,authoritarian


In [328]:
# Create dummy variables for 'democracy_type' with 'authoritarian' as the baseline
democracy_type_dummies = pd.get_dummies(merged_df['democracy_type'], prefix='democracy_type', drop_first=True)
# Concatenate the dummy variables with the original dataframe
merged_df = pd.concat([merged_df, democracy_type_dummies], axis=1)
merged_df.head()

Unnamed: 0,Entity,Year,democracy_score,gdpPerCapita,gdpPerCapita_pct_change,democracy_type,democracy_type_flawed,democracy_type_full,democracy_type_hybrid
1,Afghanistan,2007,3.04,1461.91885,6.94412,authoritarian,False,False,False
2,Afghanistan,2008,3.02,1556.8445,6.493223,authoritarian,False,False,False
3,Afghanistan,2009,2.75,1791.50415,15.072774,authoritarian,False,False,False
4,Afghanistan,2010,2.48,2026.1638,13.098471,authoritarian,False,False,False
5,Afghanistan,2011,2.48,1961.0963,-3.211364,authoritarian,False,False,False


In [329]:
print(merged_df['gdpPerCapita_pct_change'].dtype)
print(merged_df[['democracy_type_flawed', 'democracy_type_full', 'democracy_type_hybrid']].dtypes)

float64
democracy_type_flawed    bool
democracy_type_full      bool
democracy_type_hybrid    bool
dtype: object


**Regression Analysis**

In [330]:
# Define the independent variables (X) and dependent variable (y)
X = merged_df[['democracy_type_flawed', 'democracy_type_full', 'democracy_type_hybrid']].astype(int)
y = merged_df['gdpPerCapita_pct_change']
# Add a constant to the independent variables
X = sm.add_constant(X)
# Fit the regression model
model = sm.OLS(y, X).fit()
# Print the summary of the regression model
print(model.summary())

                               OLS Regression Results                              
Dep. Variable:     gdpPerCapita_pct_change   R-squared:                       0.010
Model:                                 OLS   Adj. R-squared:                  0.008
Method:                      Least Squares   F-statistic:                     8.243
Date:                     Fri, 25 Oct 2024   Prob (F-statistic):           1.86e-05
Time:                             14:30:54   Log-Likelihood:                -7828.3
No. Observations:                     2567   AIC:                         1.566e+04
Df Residuals:                         2563   BIC:                         1.569e+04
Df Model:                                3                                         
Covariance Type:                 nonrobust                                         
                            coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------

I tried to check the regression just by using a specific year for all the countries.

In [339]:
def run_regression_for_year(year):
  df_year = merged_df[merged_df['Year'] == year]
  X_year = df_year[['democracy_type_flawed', 'democracy_type_full', 'democracy_type_hybrid']].astype(int)
  X_year = sm.add_constant(X_year)
  y_year = df_year['gdpPerCapita_pct_change']
  model_year = sm.OLS(y_year, X_year).fit()
  return model_year.summary()

print(run_regression_for_year(2007))

                               OLS Regression Results                              
Dep. Variable:     gdpPerCapita_pct_change   R-squared:                       0.064
Model:                                 OLS   Adj. R-squared:                  0.046
Method:                      Least Squares   F-statistic:                     3.548
Date:                     Fri, 25 Oct 2024   Prob (F-statistic):             0.0160
Time:                             14:31:09   Log-Likelihood:                -434.89
No. Observations:                      160   AIC:                             877.8
Df Residuals:                          156   BIC:                             890.1
Df Model:                                3                                         
Covariance Type:                 nonrobust                                         
                            coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------

**Regression with control variables**

I decided to take the inflation rate as a control variable.

In [331]:
inflation_data = pd.read_csv('Data/inflation.csv', header=2)  
inflation_data.head()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2015,2016,2017,2018,2019,2020,2021,2022,2023,Unnamed: 68
0,Aruba,ABW,"Inflation, consumer prices (annual %)",FP.CPI.TOTL.ZG,,,,,,,...,0.474764,-0.931196,-1.028282,3.626041,4.257462,,,,,
1,Africa Eastern and Southern,AFE,"Inflation, consumer prices (annual %)",FP.CPI.TOTL.ZG,,,,,,,...,5.245878,6.571396,6.399343,4.720805,4.653665,5.405162,7.240978,10.773751,7.126975,
2,Afghanistan,AFG,"Inflation, consumer prices (annual %)",FP.CPI.TOTL.ZG,,,,,,,...,-0.661709,4.383892,4.975952,0.626149,2.302373,,,,,
3,Africa Western and Central,AFW,"Inflation, consumer prices (annual %)",FP.CPI.TOTL.ZG,,,,,,,...,2.130817,1.487416,1.725486,1.78405,1.760112,2.437609,3.653533,7.967574,4.670084,
4,Angola,AGO,"Inflation, consumer prices (annual %)",FP.CPI.TOTL.ZG,,,,,,,...,9.355972,30.694415,29.84448,19.628938,17.080954,22.271539,25.754295,21.35529,13.644102,


In [332]:
inflation_data = inflation_data.drop(columns=['Unnamed: 68'])
inflation_long = inflation_data.melt(id_vars=['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code'], 
                   var_name='Year', 
                   value_name='Inflation')
inflation_long['Year'] = inflation_long['Year'].astype(int)
inflation_long = inflation_long.drop(columns=['Country Code', 'Indicator Name', 'Indicator Code'])
inflation_long.rename(columns={'Country Name': 'Entity'}, inplace=True)
inflation_long.head()

Unnamed: 0,Entity,Year,Inflation
0,Aruba,1960,
1,Africa Eastern and Southern,1960,
2,Afghanistan,1960,
3,Africa Western and Central,1960,
4,Angola,1960,


In [333]:
filtered_inflation_long = inflation_long[inflation_long['Entity'].isin(merged_df['Entity'].unique())]
filtered_inflation_long = filtered_inflation_long[filtered_inflation_long['Year'].isin(merged_df['Year'].unique())]
filtered_inflation_long.head()

Unnamed: 0,Entity,Year,Inflation
12504,Afghanistan,2007,8.680571
12506,Angola,2007,12.251497
12507,Albania,2007,2.932682
12510,United Arab Emirates,2007,
12511,Argentina,2007,


In [334]:
# Get the list of unique entities in the merged_df
entities_in_merged_df = set(merged_df['Entity'])
# Get the list of unique entities in the filtered_inflation_long
entities_in_inflation = set(filtered_inflation_long['Entity'])
# Find the entities that are in merged_df but not in filtered_inflation_long
entities_without_inflation_data = entities_in_merged_df - entities_in_inflation
print(f"Number of countries without inflation data: {len(entities_without_inflation_data)}")
print(f"Countries without inflation data: {entities_without_inflation_data}")

Number of countries without inflation data: 16
Countries without inflation data: {'East Timor', 'Congo', 'Cape Verde', 'Iran', 'South Korea', 'Kyrgyzstan', 'Gambia', 'Palestine', 'Turkey', 'Hong Kong', 'Egypt', 'Vietnam', 'Russia', 'Democratic Republic of Congo', 'Laos', 'Slovakia'}


In [335]:
# Drop the countries without inflation data from filtered_inflation_long
filtered_inflation_long = filtered_inflation_long[~filtered_inflation_long['Entity'].isin(entities_without_inflation_data)]
# Create a new dataframe from merged_df by removing the countries without inflation data
merged_df_filtered = merged_df[~merged_df['Entity'].isin(entities_without_inflation_data)]
# Join the dataframes on Entity and Year
final_df = pd.merge(merged_df_filtered, filtered_inflation_long, on=['Entity', 'Year'], how='inner')
final_df.head()

Unnamed: 0,Entity,Year,democracy_score,gdpPerCapita,gdpPerCapita_pct_change,democracy_type,democracy_type_flawed,democracy_type_full,democracy_type_hybrid,Inflation
0,Afghanistan,2007,3.04,1461.91885,6.94412,authoritarian,False,False,False,8.680571
1,Afghanistan,2008,3.02,1556.8445,6.493223,authoritarian,False,False,False,26.418664
2,Afghanistan,2009,2.75,1791.50415,15.072774,authoritarian,False,False,False,-6.811161
3,Afghanistan,2010,2.48,2026.1638,13.098471,authoritarian,False,False,False,2.178538
4,Afghanistan,2011,2.48,1961.0963,-3.211364,authoritarian,False,False,False,11.804186


In [336]:
# Remove rows where Inflation is NaN
final_df = final_df.dropna(subset=['Inflation'])
final_df.head()

Unnamed: 0,Entity,Year,democracy_score,gdpPerCapita,gdpPerCapita_pct_change,democracy_type,democracy_type_flawed,democracy_type_full,democracy_type_hybrid,Inflation
0,Afghanistan,2007,3.04,1461.91885,6.94412,authoritarian,False,False,False,8.680571
1,Afghanistan,2008,3.02,1556.8445,6.493223,authoritarian,False,False,False,26.418664
2,Afghanistan,2009,2.75,1791.50415,15.072774,authoritarian,False,False,False,-6.811161
3,Afghanistan,2010,2.48,2026.1638,13.098471,authoritarian,False,False,False,2.178538
4,Afghanistan,2011,2.48,1961.0963,-3.211364,authoritarian,False,False,False,11.804186


In [337]:
# Define the independent variables (X) and dependent variable (y)
X = final_df[['democracy_type_flawed', 'democracy_type_full', 'democracy_type_hybrid','Inflation']].astype(int)
y = final_df['gdpPerCapita_pct_change']
# Add a constant to the independent variables
X = sm.add_constant(X)
# Fit the regression model
model = sm.OLS(y, X).fit()
# Print the summary of the regression model
print(model.summary())

                               OLS Regression Results                              
Dep. Variable:     gdpPerCapita_pct_change   R-squared:                       0.016
Model:                                 OLS   Adj. R-squared:                  0.014
Method:                      Least Squares   F-statistic:                     8.817
Date:                     Fri, 25 Oct 2024   Prob (F-statistic):           4.63e-07
Time:                             14:30:54   Log-Likelihood:                -6789.8
No. Observations:                     2230   AIC:                         1.359e+04
Df Residuals:                         2225   BIC:                         1.362e+04
Df Model:                                4                                         
Covariance Type:                 nonrobust                                         
                            coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------

In [338]:
def run_regression_for_year_with_inflation(year): # please ignore the repetition in code 
  df_year = final_df[final_df['Year'] == year]
  X_year = df_year[['democracy_type_flawed', 'democracy_type_full', 'democracy_type_hybrid', 'Inflation']].astype(float)
  y_year = df_year['gdpPerCapita_pct_change']
  X_year = sm.add_constant(X_year)
  model_year = sm.OLS(y_year, X_year).fit()
  return model_year.summary()
print(run_regression_for_year_with_inflation(2007))

                               OLS Regression Results                              
Dep. Variable:     gdpPerCapita_pct_change   R-squared:                       0.170
Model:                                 OLS   Adj. R-squared:                  0.145
Method:                      Least Squares   F-statistic:                     6.826
Date:                     Fri, 25 Oct 2024   Prob (F-statistic):           4.99e-05
Time:                             14:30:54   Log-Likelihood:                -348.16
No. Observations:                      138   AIC:                             706.3
Df Residuals:                          133   BIC:                             720.9
Df Model:                                4                                         
Covariance Type:                 nonrobust                                         
                            coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------

**Results**

1. **I decided to take the GDP per capita percentage change as the metric for economic growth and categorized the countries among four groups based on their democracy indices. I was expecting a higher R^2, but I only got a R^2 score of 0.010 when I included all the countries and the data from all years.**

2. **When I included a control variable i.e. inflation rate in my regression model, the R^2 score had a neligible improvement to 0.0160.**

3. **I did the analysis also for a specific year, such as 2007, both with and without control variables. Without control variables, the R^2 was 0.060, and it improved to 0.170 after I used the control variable.** 

**Conclusion and limitations**

In conclusion, the analysis did not establish that democracy drives economic growth, as causation remains far off, especially given that even correlation was weak. The low R² value suggests that democracy and economic growth may not be strongly linked. However, after introducing a control variable, we observed a slight improvement in correlation, indicating that if we account for additional factors—such as literacy rates, geographic indicators, or other economic influencers that are not collinear with democracy type—more meaningful insights may emerge. Expanding the model with these variables could offer a clearer understanding of the relationship between governance and economic outcomes.