# Analysis of Growth Determinants in BRICS Countries

This notebook performs an exploratory data analysis and prepares a dataset to investigate the determinants of economic growth in BRICS countries (Brazil, Russia, India, China, and South Africa). The analysis includes:

1.  **Data Loading and Inspection**: Loading the dataset containing macroeconomic variables for the selected countries and years.
2.  **Stationarity Testing**: Assessing the stationarity of the time series data for each variable and country using the Augmented Dickey-Fuller (ADF) test.
3.  **Data Preparation**: Transforming non-stationary variables through differencing to ensure suitability for panel data regression models.
4.  **Panel Data Modeling (Next Steps)**: Preparing the transformed data for Fixed Effects (FE) and Random Effects (RE) models, followed by a Hausman test to select the appropriate model.

The dataset used in this analysis was sourced from [mention your data source here, e.g., World Bank Data].

## Import Statements and Data Upload


In [4]:
# Install required packages (Colab only)
# !pip install linearmodels arch openpyxl --quiet

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from linearmodels.panel import PanelOLS, RandomEffects
from statsmodels.formula.api import ols
from arch.unitroot import ADF
from scipy import stats
from google.colab import files
from statsmodels.tsa.stattools import adfuller
from statsmodels.stats.outliers_influence import variance_inflation_factor



df_final = pd.read_csv('df_growth.csv')
print("Data Preview: \n", df_final.head(5))


print("Data Types: ", df_final.dtypes)

Data Preview: 
   country  year       gdp_pc       fdi       gfcf  inflation      trade  \
0  Brazil  1995  6596.335727  0.631586  20.286298  66.007034  16.984460   
1  Brazil  1996  6640.727007  1.475965  18.640654  15.757666  15.635591   
2  Brazil  1997  6764.858421  2.150453  19.122901   6.926713  16.576209   
3  Brazil  1998  6687.495236  3.340888  18.542348   3.195076  16.438585   
4  Brazil  1999  6621.636762  4.733770  17.016294   4.858447  20.982166   

     domcred  gdp_growth  
0  43.494525    2.632056  
1  40.778494    0.672969  
2  40.852237    1.869244  
3  29.532261   -1.143604  
4  29.826910   -0.984800  
Data Types:  country        object
year            int64
gdp_pc        float64
fdi           float64
gfcf          float64
inflation     float64
trade         float64
domcred       float64
gdp_growth    float64
dtype: object


## Stationarity Tests

In [5]:


# List of numeric variables to test
variables = ['gdp_pc', 'fdi', 'gfcf', 'inflation', 'trade', 'domcred', 'gdp_growth']

# Create a results list
results = []

# Loop through variables and countries
for var in variables:
    for country in df_final['country'].unique():
        series = df_final.loc[df_final['country'] == country, var].dropna()

        if len(series) > 1:  # Ensure enough data points
            adf_result = adfuller(series, autolag='AIC')
            results.append({
                'Variable': var,
                'Country': country,
                'ADF Statistic': adf_result[0],
                'p-value': adf_result[1],
                'Stationary?': 'Yes' if adf_result[1] < 0.05 else 'No'
            })

# Convert to DataFrame
results_df = pd.DataFrame(results)

# Summary: % of countries where each variable is stationary
summary_df = results_df.groupby('Variable')['Stationary?'].apply(lambda x: (x == 'Yes').mean() * 100).reset_index()
summary_df.columns = ['Variable', '% Stationary (ADF)']

# Show results
print("Detailed results:")
print(results_df)

print("\nSummary of stationarity by variable:")
print(summary_df)


Detailed results:
      Variable             Country  ADF Statistic       p-value Stationary?
0       gdp_pc              Brazil      -0.558213  8.801475e-01          No
1       gdp_pc               China       3.999459  1.000000e+00          No
2       gdp_pc               India       2.578427  9.990709e-01          No
3       gdp_pc  Russian Federation      -0.544446  8.830476e-01          No
4       gdp_pc        South Africa      -2.171620  2.167065e-01          No
5          fdi              Brazil      -1.048776  7.350614e-01          No
6          fdi               China      -0.513239  8.894069e-01          No
7          fdi               India      -2.267680  1.826037e-01          No
8          fdi  Russian Federation      -1.336041  6.125266e-01          No
9          fdi        South Africa      -5.197187  8.909092e-06         Yes
10        gfcf              Brazil      -2.152511  2.239301e-01          No
11        gfcf               China      -1.973185  2.984529e-01       

## Data Preparation for Modelling - Differencing the non-stationary variables

In [6]:


# --- Step 1: Load your data ---
# df: panel dataset with columns country, year, and all variables
# adf_results: table of ADF results exactly like the one you posted
# Make sure 'Stationary?' column is exactly 'Yes' or 'No'

# Drop GDP per capita from ADF results (not needed for modelling)
adf_filtered = results_df[results_df["Variable"] != "gdp_pc"]

# --- Step 2: Build a transformation map ---
# Map: { (variable, country) : True if stationary, False if not }
stationarity_map = {
    (row["Variable"], row["Country"]): (row["Stationary?"] == "Yes")
    for _, row in adf_filtered.iterrows()
}

# --- Step 3: Transform the dataset ---
df = df_final.sort_values(["country", "year"]).copy()

transformed_df = []
for country, group in df.groupby("country"):
    group = group.copy()
    for var in adf_filtered["Variable"].unique():
        if (var, country) in stationarity_map:
            if stationarity_map[(var, country)]:
                # Keep in levels if stationary
                pass  # leave as is
            else:
                # Difference if non-stationary
                group[var] = group[var].diff()
    transformed_df.append(group)

df_transformed = pd.concat(transformed_df)

# --- Step 4: Drop first NA after differencing ---
df_model = df_transformed.dropna()

df_model.head()

# df_model is now ready for FE/RE regression


Unnamed: 0,country,year,gdp_pc,fdi,gfcf,inflation,trade,domcred,gdp_growth
1,Brazil,1996,6640.727007,0.844379,-1.645644,15.757666,-1.348869,-2.716031,0.672969
2,Brazil,1997,6764.858421,0.674488,0.482247,6.926713,0.940618,0.073743,1.869244
3,Brazil,1998,6687.495236,1.190435,-0.580553,3.195076,-0.137624,-11.319977,-1.143604
4,Brazil,1999,6621.636762,1.392882,-1.526054,4.858447,4.543582,0.29465,-0.9848
5,Brazil,2000,6817.784241,0.300147,1.288194,7.044141,1.657595,1.314076,2.96222


## Conclusion and Next Steps

Based on the stationarity tests conducted using the Augmented Dickey-Fuller (ADF) method, we found that the stationarity of variables varies across countries. GDP per capita, trade, and domestic credit are non-stationary in all countries tested. FDI and GFCF are stationary in 20% of the countries, while inflation is stationary in 60% and GDP growth in 80% of the countries.

To prepare the data for panel data modeling, non-stationary variables for each country were differenced, while stationary variables were kept in levels. The resulting `df_model` DataFrame contains the transformed data, ready for further analysis.

The next steps will involve performing Fixed Effects (FE) and Random Effects (RE) regressions using the `df_model` dataset to analyze the relationships between the variables. Following the regressions, a Hausman test will be conducted to determine the most appropriate model (FE or RE) for this dataset.