<p align="center">
<img src="Images/sorbonne_logo.png" alt="Logo" width="300"/>
</p>

# **PANEL DATA ECONOMETRICS - Code File**

* **Author**: Elia Landini
* **Student ID**: 12310239
* **Course**: EESM2-Financial Economics 
* **Class**: Panel Data Econometrics
* **Supervisor**: Jean-Bernard Chatelain 
* **Reference Paper**: Lofaro, A., & Di Bucchianico, S. (2025). Impact of monetary policy on functional income distribution: A panel vector autoregressive analysis. Economic Modelling, 107227
* **Reference Repository**: https://github.com/EliaLand/PVAR_MonetaryPolicy_FunctionalIncome

### **0) REQUIREMENTS SET-UP**

In [3]:
# Requirements.txt file installation
# !pip install -r requirements.txt

In [4]:
# Libraries import
import warnings
import pandas as pd
import numpy as np
import random
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import scipy.stats as stats
from scipy.stats import norm
from scipy.stats import levene
from scipy.stats import ks_2samp
from scipy.stats import kstest
from scipy.stats import pearsonr
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm
import sklearn.tree
import sklearn.metrics
import sklearn.metrics
import sklearn.model_selection
import sklearn.preprocessing 
from sklearn.model_selection import train_test_split, StratifiedKFold, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn.metrics import (roc_auc_score, roc_curve, confusion_matrix,
                             precision_score, recall_score, f1_score,
                             accuracy_score, precision_recall_curve, auc, 
                             RocCurveDisplay, ConfusionMatrixDisplay)
from sklearn.linear_model import (LinearRegression, LogisticRegression)
from sklearn.calibration import calibration_curve, CalibratedClassifierCV
from sklearn.utils.class_weight import compute_class_weight
import plotly.express as px
import openpyxl as pxl
from stargazer.stargazer import Stargazer
from IPython.core.display import HTML
from IPython.display import Image
import itertools
from imblearn.over_sampling import SMOTE
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization, Input
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import plot_model
from tensorflow.keras.layers import Dense, Dropout
from scikeras.wrappers import KerasClassifier
from ann_visualizer.visualize import ann_viz
from collections import Counter
import shap

In [5]:
# Statistical Significance labelling 
def significance_stars(p):
    if p < 0.001:
        return "***"  
    elif p < 0.01:
        return "**"    
    elif p < 0.05:
        return "*"   
    else:
        return ""

In [6]:
# We supress potential warnings with this command
warnings.filterwarnings("ignore")

### **1) PART 1 - DATASET, UNIVARIATE & BIVARIATE DESCRIPTIVE STATISTICS AFTER TRANSFORMATIONS**

##### <span style="color: dodgerblue"> **1.1) In your data set, which are the variables which are varying with respect to two indices (or more) if you consider inflows and outflows from one individual or country to another individual or countries? Which are the variables which are varying only with respect to time? Which are the variables which are varying only with respect to individuals?** </span>

In [None]:
raw_data = pd.read_csv()

##### <span style="color: dodgerblue"> **1.2) What is the largest number of period T for individuals? What is the number of individuals (countries)?** </span>

##### <span style="color: dodgerblue"> **1.3) Comment on the structure of the unbalanced panel (how many (and which) countries have a single observation, discontinuities between observations, how many individuals have at least 2 consecutive observations (which is useful to compute lags, autocorrelations, first difference and within estimators)?** </span>

##### <span style="color: dodgerblue"> **1.4) VARIABLE TRANSFORMATIONS PART 1: Compute between transformed and one-way-within transformed variables for all variables. Present a table with the the variance of the one-way-within-fixed-effects, between and pooled data for each variable. Compute the share of between and within variance in the total variance for each variable. Comment these results.** </span>

##### <span style="color: dodgerblue"> **1.5) Plot the distribution of the one-way-fixed-effects-within x(it)-x(i.) and between (x(i.)) transformed dependent variable and of you key (preferred) explanatory variable (not all the explanatory variable) plotting on the same graph an histogram, a normal law with same empirical mean and standard error and a kernel continuous approximation. Comment the between and within difference for each variable, and compare within/within for dependent and explanatory variable, and between/between for dependent and explanatory variable: kurtosis, skewness, non-normality, high leverage observation (far from the mean), several modes (mixture of distribution)?** </span>

##### <span style="color: dodgerblue"> **1.6) FD. FIRST DIFFERENCE VARIABLE TRANSFORMATIONS: Compute the first-differences x(i,t)-x(i,t-1) for panel data. Check for the first 3 changes of individuals (for data sorted by individual and then time) in say 3T+1 first observations that when there is a change of individual in the stacked vector individuals x time, the first differences is a dot for “not available”. In other words, for first-differences for panel data, check that when you change individual, the first observation is missing with a dot, and it is not the difference of the first observation of the second individual minus the last observation of the first individual, for example.** </span>

##### <span style="color: dodgerblue"> **1.7) First differences distributions. Plot the distributions (histogram, KDE, normal law with same mean and standard error) for first difference dependent GDPG and first difference explanatory EDA/GDP for each of these two transformations.** </span>

##### <span style="color: dodgerblue"> **1.8) First differences simple correlation. For these FD transformed variables, plot the bivariate cloud of points with regression line and on top the marginal distribution of the horizontal axis and on right hand side the marginal distribution of the variable on the vertical axis. Compare with one-way-fixed effects and between distributions. Report the simple correlation coefficient on the graph.** </span>

##### <span style="color: dodgerblue"> **1.9) Restricted sample with a BALANCED PANEL: Two way fixed effects (TWFE) formula. Restrict the sample to the countries/individuals available with the longest duration (N=… countries over T=… periods). Compute -x(.t)+x(..), report the 6 numbers in a table as a function of time and plot them as a function of time, then comment. Then compute two-way-fixed-effects x(it)-x(i.)-x(.t)+x(..) transformed variables.** </span>

##### <span style="color: dodgerblue"> **1.10) TWFE Balanced panel. Compute descriptive statistics. Plot boxplots by country ordered by their variance from the smallest to the largest.** </span>

##### <span style="color: dodgerblue"> **1.11) TWFE Balanced panel. Present a table ordering the simple correlation coefficients of TWFE transformed GDPG and EDA/GDP by country from the largest positive to the lowest negative, with the standard error of GDPG and EDA in another column and the coefficient of simple regression: correlation coefficient * standard error of GDPG / standard error of EDA/GDP. Comment.** </span>

##### <span style="color: dodgerblue"> **1.12) UNBALANCED PANEL and TWFE transformation (remove countries with a single observation). Regress within transformed GDPG on time dummies and collect the residuals: this is the TWFE transformation. Regress within transformed EDA/GDP on time dummies and collect the residuals: this is the TWFE transformation. Alternatively, code the Wansbeek Kapstein (1989) transformation for two way fixed effects resulting in their equation 2.13 which is an extension of x(it)-x(i.)-x(.t)+x(..) obtained in the balanced panel case.** </span>

##### <span style="color: dodgerblue"> **1.13) TWFE unbalanced panel: plot the distribution (Kernel DE, histogram, corresponding normal law with the same two first moments). for dependent GDPG and explanatory EDA/GDP. Compare with one-way-fixed effects, between distributions.** </span>

##### <span style="color: dodgerblue"> **1.14) Plot boxplots of between distribution (all countries), then one-way and two-way-fixed effects and first differences distribution BY countries (or 20 individuals if your data set has more than 20 individuals), for the dependent variable and the key explanatory variables. Comment that you find the same insights from question 5. Comment on their differences of standard errors and means for each individuals** </span>

##### <span style="color: dodgerblue"> **1.15) Compute univariate descriptive statistics (min, Q1, median, Q3, max, mean, standard error) for one-way-Within, Between, two-way-fixed-effects and first differences transformed variables. Is the mean different from the median and why? How many standard errors from the mean are the MIN and MAX extremes. Report in the tables standardized MAX and MIN: (MAX-average)/standard error and (MIN-average)/standard error instead of MAX and MIN?** </span>

##### <span style="color: dodgerblue"> **1.16) Compare and comment the between versus one-way-within transformed bivariate correlation matrix for all variables (include a time trend 1,2,.,T) and with their lag (for time varying variables). Check poor simple correlation with the dependent variables and high correlation between explanatory variables.** </span>

##### <span style="color: dodgerblue"> **1.17) Comment the bivariate auto-correlation and trend-correlations (check the number of observations).** </span>

##### <span style="color: dodgerblue"> **1.18) In what follows, you do not need to include a deterministic time trend 1,2,.,T because the two transformations used eliminate it. Compare and comment of the two-way-within transformed bivariate simple correlation matrix of all the variables and another bivariate simple correlation matrix with all the first differences transformed variables (in the case of first differences, include also the lag of all variables). Check poor simple correlation below 0.1 with the dependent variables and high correlation between explanatory variables (over 0.8). Show the first 30 observations for the first differences and the lag of first differences. Check that each time you change individual, you have a dot for missing observation.** </span>

##### <span style="color: dodgerblue"> **1.19) Comment the bivariate graphs with linear, quadratic and Lowess fit for dependent and key explanatory variable (growth of gdp/head on vertical axis and aid/gdp): Within transformed, Between transformed, First differences, two-way-within transformed.** </span>

### **2) PART 2 - CLASSIC BENCHMARK MULTIVARIATE PANEL DATA ESTIMATORS**

##### <span style="color: orange"> **2.20) In a single table, report and comment the results of estimations of Between, Within (one-way fixed effects, (fe)) and Mundlak (random effects (re) including all X(i.) as regressors), two-way fixed effects (add year dummies in fe regression) and First differences, including all explanatory variables except the ones with high near-multicollinearity after their transformation.** </span>

##### <span style="color: orange"> **2.21) If, for the first differences dependent variable, it remains a simple auto-correlation above 0.1, a dynamic panel estimator can be tried. The estimators of the generalized method of moments (GMM) for panel data are only valid for short time panel T<10 and they face the issue of too many weak instruments. We suggest using its precursor, the Anderson-Hsiao (1981) estimator which allows to check the first stage of instrumental variables and to test for weak lagged instruments. Estimate an auto-regressive distributed lag (ARDL) model for dynamic panel data including the first lag of the dependent variable (for example: GDP per head growth) and the first lag of the key explanatory variable (for example: foreign aid/GDP), adding the first lag of other control variables is optional: Δ GDPGi,t = βy Δ GDPGi,t-1 + β1 Δ (aid/GDP)i,t + β2 Δ (aid/GDP)i,t-1 + Δ Controls i,t + Δ αi + Δ αt + Δ εi** </span>

!!! it has a lot of parts (check the paper)

### **2) PART 2B - OPTIONAL (if one of your variable is time-invariant z(i))**

##### <span style="color: orange"> **2.22) If one of your variable is time-invariant z(i), run a baseline Hausman Taylor estimation (pre-coded only in STATA) including all X(i.) as instruments. Comment the results. Else skip this question.** </span>

##### <span style="color: orange"> **2.23) If one of your variable is time-invariant z(i), run a between regression on z(i) explained by X(i.) and other time invariant variable (only with N observations). If the R2 is low, this may signal X(i.) are weak instruments poorly correlated with the variable z(i) to be instrumented. Comment. Else skip this question.** </span>

##### <span style="color: orange"> **2.24) If one of your variable is time-invariant z(i), as seen above, time invariant explanatory variables cannot explain the time varying within variance of the dependent variable and the Hausman Taylor internal instruments estimator is not so practical. Therefore, a practical shortcut is to include a time invariant variable multiplied by a time varying variable (interaction term): z(i) multiplied for x(it). Generate such a variable Include this product AND foreign aid into a one way fixed effects regression. Plot the estimated marginal effect (derivative) with respect to ICRG as a function of EDA/GDP (which is positive and goes as far as 20%).** </span>

### **3) PART 3 - OPEN SECTION**

##### <span style="color: red"> **3.25) Do whatever seem interesting to you in terms of original estimations (not already done by the replication of the original authors) with this database, present the table(s) in this file with comments, not only in the html output with code and output.** </span>