<a href="https://colab.research.google.com/github/PranayPrasanth/100DaysOfCode-DataScience-Projects/blob/master/FDI_vs_GDP_BRICS.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# The Dynamic Impact of Foreign Direct Investment on Economic Growth in BRICS Countries

## Introduction
Foreign Direct Investment (FDI) is a crucial catalyst for global economic integration, particularly for emerging economies. The BRICS nations (Brazil, Russia, India, China, South Africa) have become significant FDI recipients, yet the precise impact of FDI on their economic growth remains a subject of debate, showing varied patterns across member states.

This research will rigorously investigate the dynamic and multifaceted relationship between FDI and economic growth in BRICS countries from 1995 to 2023. Employing advanced panel data methodologies, this study will explore FDI's long-run contribution to GDP, the mechanisms through which this occurs, and how critical macroeconomic factors (e.g., financial development, trade openness, gross fixed capital formation, and inflation) condition this relationship.

## Research Questions

**Main Research Question**  
What is the dynamic and conditional impact of Foreign Direct Investment on economic growth in BRICS countries, considering macroeconomic factors?

**Sub-Questions**
- What is the magnitude and persistence of FDI's influence on economic growth in BRICS nations?
- How do crucial macroeconomic factors (e.g., Gross Fixed Capital Formation, Trade, Inflation, Financial Development) affect the FDI-growth nexus?
- Has the relationship between FDI and economic growth in BRICS countries undergone significant changes following major global events, such as the 2008 Global Financial Crisis?


## Import packages

In [1]:
# Install required packages (Colab only)
!pip install linearmodels arch openpyxl --quiet

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from linearmodels.panel import PanelOLS, RandomEffects
from statsmodels.formula.api import ols
from arch.unitroot import ADF
from scipy import stats
from google.colab import files
from statsmodels.tsa.stattools import adfuller
from statsmodels.stats.outliers_influence import variance_inflation_factor





[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m18.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m985.3/985.3 kB[0m [31m53.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m117.2/117.2 kB[0m [31m11.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.9/43.9 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25h

## Upload and Read Data

In [18]:
# Read csv Data and convert it into a pandas DataFrame
df = pd.read_csv('BRICS_data.csv')
df = pd.DataFrame(df)

# Investigate shape of dataset, datatypes, column names
print("Shape of dataset:", df.shape)
print("\nData types:")
print(df.dtypes)
print("\nOriginal columns:")
print(df.columns.tolist())

Shape of dataset: (150, 9)

Data types:
Unnamed: 0                                             int64
Country Name                                          object
Year                                                   int64
Domestic credit to private sector (% of GDP)         float64
Foreign direct investment, net inflows (% of GDP)    float64
GDP per capita (constant 2015 US$)                   float64
Gross fixed capital formation (% of GDP)             float64
Inflation, consumer prices (annual %)                float64
Trade (% of GDP)                                     float64
dtype: object

Original columns:
['Unnamed: 0', 'Country Name', 'Year', 'Domestic credit to private sector (% of GDP)', 'Foreign direct investment, net inflows (% of GDP)', 'GDP per capita (constant 2015 US$)', 'Gross fixed capital formation (% of GDP)', 'Inflation, consumer prices (annual %)', 'Trade (% of GDP)']


## Preliminary Data Exploration and Data Cleaning


In [19]:
# Map long names to short ones used in the script:
rename_map = {
    'Country Name': 'country',
    'Year': 'year',
    'Foreign direct investment, net inflows (% of GDP)': 'fdi',
    'GDP per capita (constant 2015 US$)': 'gdp_pc',      # will be used to compute growth
    'Gross fixed capital formation (% of GDP)': 'gfcf',
    'Inflation, consumer prices (annual %)': 'inflation',
    'Trade (% of GDP)': 'trade',
    'Domestic credit to private sector (% of GDP)': 'domcred'
}

df = df.rename(columns=rename_map)

# Keep only relevant columns (if extra columns exist they are ignored)
keep_cols = ['country', 'year', 'gdp_pc', 'fdi', 'gfcf', 'inflation', 'trade', 'domcred']
df = df[[c for c in keep_cols if c in df.columns]]

# -------------------------
# 3) Type conversions & cleaning
# -------------------------
# Ensure year is integer and country is string
df['year'] = df['year'].astype(int)
df['country'] = df['country'].astype(str)

# Convert numeric-like columns to numeric (handles strings like "1,234" etc.)
num_cols = ['gdp_pc', 'fdi', 'gfcf', 'inflation', 'trade', 'domcred']
for col in num_cols:
    if col in df.columns:
        df[col] = pd.to_numeric(df[col], errors='coerce')

# Interpolate numeric missing values within each country (time series)
df = df.sort_values(['country', 'year'])
df[num_cols] = df.groupby('country')[num_cols].transform(lambda g: g.interpolate(method='linear', limit_direction='both'))

# Investigate first few rows
df.head(10)


Unnamed: 0,country,year,gdp_pc,fdi,gfcf,inflation,trade,domcred
0,Brazil,1995,6596.335727,0.631586,20.286298,66.007034,16.98446,43.494525
1,Brazil,1996,6640.727007,1.475965,18.640654,15.757666,15.635591,40.778494
2,Brazil,1997,6764.858421,2.150453,19.122901,6.926713,16.576209,40.852237
3,Brazil,1998,6687.495236,3.340888,18.542349,3.195076,16.438585,29.532261
4,Brazil,1999,6621.636762,4.73377,17.016294,4.858447,20.982166,29.82691
5,Brazil,2000,6817.784241,5.033917,18.304488,7.044141,22.639761,31.140986
6,Brazil,2001,6823.03397,4.147594,18.418087,6.840359,26.936285,29.004038
7,Brazil,2002,6944.623375,3.253581,17.926251,8.450164,27.618357,29.645142
8,Brazil,2003,6941.440457,1.813401,16.604759,14.71492,28.140385,27.68567
9,Brazil,2004,7258.781852,2.713532,17.320233,6.597185,29.678252,29.37277


# Calculating GDP per capita growth from GDP per capita

In [21]:
# Compute annual percent change of GDP per capita (multiply by 100 for percent)
df['gdp_growth'] = df.groupby('country')['gdp_pc'].transform(lambda x: x.pct_change() * 100)

# Verify the dataset

In [29]:
# Filtering the dataset to include data from 2000 to 2023
# df_final = df[(df['year'] >= 2000) & (df['year'] <= 2023)]

# print("\nAfter cleaning, sample rows:")
# display(df.head())

# print("\nData types:")
# print(df.dtypes)

# missing_rows = df[df['gdp_growth'].isna()].shape[0]
# print(f"\nNumber of missing rows: {missing_rows}")

# df.to_csv('df_growth.csv', index=False)
# files.download('df_growth.csv')

df_final = pd.read_csv('df_growth.csv')
df_final.head()

Unnamed: 0,country,year,gdp_pc,fdi,gfcf,inflation,trade,domcred,gdp_growth
0,Brazil,1995,6596.335727,0.631586,20.286298,66.007034,16.98446,43.494525,2.632056
1,Brazil,1996,6640.727007,1.475965,18.640654,15.757666,15.635591,40.778494,0.672969
2,Brazil,1997,6764.858421,2.150453,19.122901,6.926713,16.576209,40.852237,1.869244
3,Brazil,1998,6687.495236,3.340888,18.542348,3.195076,16.438585,29.532261,-1.143604
4,Brazil,1999,6621.636762,4.73377,17.016294,4.858447,20.982166,29.82691,-0.9848


# Descriptive Statistics

In [30]:

# -------------------------
# 6) Descriptive stats & correlation (numeric only)
# -------------------------
numeric_df = df_final.select_dtypes(include=[np.number])
print("\nDescriptive statistics (numeric variables):")
display(numeric_df.describe())




Descriptive statistics (numeric variables):


Unnamed: 0,year,gdp_pc,fdi,gfcf,inflation,trade,domcred,gdp_growth
count,150.0,150.0,150.0,150.0,150.0,150.0,150.0,150.0
mean,2009.5,5803.348408,2.138158,24.64075,8.706439,42.572733,75.277973,3.537981
std,8.684438,3162.951223,1.499088,9.034801,18.329569,12.610015,43.869472,3.94963
min,1995.0,620.699954,-1.73681,13.051369,-1.401473,15.635591,16.837772,-7.827749
25%,2002.0,2672.306541,0.947852,17.76665,3.730373,33.638121,41.983966,1.145551
50%,2009.5,6021.274678,1.889431,20.953588,5.679009,44.545429,57.015257,3.633641
75%,2017.0,8637.36796,3.237892,31.284409,8.34024,51.674817,112.195288,6.347265
max,2024.0,13121.67699,9.660265,44.075543,197.414268,69.393282,194.165997,13.555366


## Descriptive Statistics (Key Points)
- GDP per capita averages **USD 6,141**, with substantial cross-country variation.  
- FDI averages **2.24%** of GDP, ranging from -1.74% to 9.66%.  
- GFCF averages **25.17%** of GDP, ranging from 13.05% to 44.08%.  
- Inflation averages **5.90%**, with extremes from -0.73% to 21.48%.  
- Domestic credit averages **78.51%** of GDP, highly dispersed.  
- GDP growth averages **3.70%**, ranging from -7.83% to 13.56%.

## Correlation Matrix

In [31]:
# print("\nCorrelation matrix (numeric variables):")
# plt.figure(figsize=(8,6))
# sns.heatmap(numeric_df.corr(), annot=True, fmt=".2f", cmap='coolwarm', linewidths=0.5)
# plt.title("Correlation matrix (numeric variables)")
# plt.show()
numeric_df.corr()

Unnamed: 0,year,gdp_pc,fdi,gfcf,inflation,trade,domcred,gdp_growth
year,1.0,0.473305,-0.084974,0.106739,-0.28026,0.179491,0.30038,-0.133902
gdp_pc,0.473305,1.0,0.03433,-0.191755,-0.046354,-0.0823,0.252138,-0.257076
fdi,-0.084974,0.03433,1.0,0.082418,-0.141136,-0.089775,0.012761,0.21941
gfcf,0.106739,-0.191755,0.082418,1.0,-0.165321,-0.001185,0.393694,0.590059
inflation,-0.28026,-0.046354,-0.141136,-0.165321,1.0,0.135893,-0.29265,-0.183706
trade,0.179491,-0.0823,-0.089775,-0.001185,0.135893,1.0,0.205075,0.144297
domcred,0.30038,0.252138,0.012761,0.393694,-0.29265,0.205075,1.0,0.131866
gdp_growth,-0.133902,-0.257076,0.21941,0.590059,-0.183706,0.144297,0.131866,1.0


## Correlation Insights
- GDP growth is most strongly correlated with Gross Fixed Capital Formation (**0.58**), highlighting investment’s role in driving output growth.  
- GDP per capita is negatively related to GDP growth (**-0.27**), consistent with convergence theory.  
- FDI has only a weak positive correlation with GDP growth (**0.17**).  
- Inflation is negatively correlated with both GFCF (**-0.39**) and domestic credit (**-0.56**), suggesting macroeconomic instability can constrain investment and credit.  
- All correlations are below **0.6** except GFCF–growth, indicating low risk of severe multicollinearity.  

In [32]:
# 4. VIF (Multicollinearity Check)
# ================================
X = df_final[['fdi', 'gfcf', 'inflation', 'trade', 'domcred', 'gdp_pc']]
vif_data = pd.DataFrame()
vif_data['Variable'] = X.columns
vif_data['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
print(vif_data)

    Variable       VIF
0        fdi  2.943445
1       gfcf  7.587014
2  inflation  1.420499
3      trade  7.903618
4    domcred  6.189195
5     gdp_pc  4.014330


## Stationarity Tests


In [33]:


# List of numeric variables to test
variables = ['gdp_pc', 'fdi', 'gfcf', 'inflation', 'trade', 'domcred', 'gdp_growth']

# Create a results list
results = []

# Loop through variables and countries
for var in variables:
    for country in df_final['country'].unique():
        series = df_final.loc[df_final['country'] == country, var].dropna()

        if len(series) > 1:  # Ensure enough data points
            adf_result = adfuller(series, autolag='AIC')
            results.append({
                'Variable': var,
                'Country': country,
                'ADF Statistic': adf_result[0],
                'p-value': adf_result[1],
                'Stationary?': 'Yes' if adf_result[1] < 0.05 else 'No'
            })

# Convert to DataFrame
results_df = pd.DataFrame(results)

# Summary: % of countries where each variable is stationary
summary_df = results_df.groupby('Variable')['Stationary?'].apply(lambda x: (x == 'Yes').mean() * 100).reset_index()
summary_df.columns = ['Variable', '% Stationary (ADF)']

# Show results
print("Detailed results:")
print(results_df)

print("\nSummary of stationarity by variable:")
print(summary_df)


Detailed results:
      Variable             Country  ADF Statistic       p-value Stationary?
0       gdp_pc              Brazil      -0.558213  8.801475e-01          No
1       gdp_pc               China       3.999459  1.000000e+00          No
2       gdp_pc               India       2.578427  9.990709e-01          No
3       gdp_pc  Russian Federation      -0.544446  8.830476e-01          No
4       gdp_pc        South Africa      -2.171620  2.167065e-01          No
5          fdi              Brazil      -1.048776  7.350614e-01          No
6          fdi               China      -0.513239  8.894069e-01          No
7          fdi               India      -2.267680  1.826037e-01          No
8          fdi  Russian Federation      -1.336041  6.125266e-01          No
9          fdi        South Africa      -5.197187  8.909092e-06         Yes
10        gfcf              Brazil      -2.152511  2.239301e-01          No
11        gfcf               China      -1.973185  2.984529e-01       

In [38]:


# --- Step 1: Load your data ---
# df: panel dataset with columns country, year, and all variables
# adf_results: table of ADF results exactly like the one you posted
# Make sure 'Stationary?' column is exactly 'Yes' or 'No'

# Drop GDP per capita from ADF results (not needed for modelling)
adf_filtered = results_df[results_df["Variable"] != "gdp_pc"]

# --- Step 2: Build a transformation map ---
# Map: { (variable, country) : True if stationary, False if not }
stationarity_map = {
    (row["Variable"], row["Country"]): (row["Stationary?"] == "Yes")
    for _, row in adf_filtered.iterrows()
}

# --- Step 3: Transform the dataset ---
df = df.sort_values(["country", "year"]).copy()

transformed_df = []
for country, group in df.groupby("country"):
    group = group.copy()
    for var in adf_filtered["Variable"].unique():
        if (var, country) in stationarity_map:
            if stationarity_map[(var, country)]:
                # Keep in levels if stationary
                pass  # leave as is
            else:
                # Difference if non-stationary
                group[var] = group[var].diff()
    transformed_df.append(group)

df_transformed = pd.concat(transformed_df)

# --- Step 4: Drop first NA after differencing ---
df_model = df_transformed.dropna()

df_model.head()

# df_model is now ready for FE/RE regression


Unnamed: 0,country,year,gdp_pc,fdi,gfcf,inflation,trade,domcred,gdp_growth
1,Brazil,1996,6640.727007,0.844379,-1.645644,15.757666,-1.348869,-2.716031,0.672969
2,Brazil,1997,6764.858421,0.674488,0.482247,6.926713,0.940618,0.073743,1.869244
3,Brazil,1998,6687.495236,1.190435,-0.580553,3.195076,-0.137624,-11.319977,-1.143604
4,Brazil,1999,6621.636762,1.392882,-1.526054,4.858447,4.543582,0.29465,-0.9848
5,Brazil,2000,6817.784241,0.300147,1.288194,7.044141,1.657595,1.314076,2.96222
