## **Project: Study On Panel Data Methodologies With Application To Macroeconometrics (Inflation Forecasting)**.

> ### **Title**: Merge of Dataset.


#### **Table of Contents:**
<ul>
<li><a href="#1">1. .</a></li>
<li><a href="#2">2. .</a></li>
<li><a href="#3">3. .</a></li>
</ul>

<a id=''></a>

#### Dataset Description and Variable Overview:

---

> #### **Inflation & Price Stability**

| **Variable Code** | **Description**                                    | **Units**                                         |
| ----------------- | -------------------------------------------------- | ------------------------------------------------- |
| **PCPIPCH**       | Inflation, average consumer prices `(Target)`      | Percent change                                    |

---

> #### **Public Finance**
| **Variable Code** | **Description**                                    | **Units**                                         |
| ----------------- | -------------------------------------------------- | ------------------------------------------------- |
| GGSB_NPGDP        | General government structural balance              | Percent of potential GDP                          |
| GGXWDG_NGDP       | General government gross debt                      | Percent of GDP                                    |

---


> #### **Economic Output & Productivity**
| **Variable Code** | **Description**                                    | **Units**                                         |
| ----------------- | -------------------------------------------------- | ------------------------------------------------- |
| PPPPC             | Gross domestic product per capita, current prices  | Purchasing power parity; international dollars    |

---

> #### **International Trade & Balance**
| **Variable Code** | **Description**                                    | **Units**                                         |
| ----------------- | -------------------------------------------------- | ------------------------------------------------- |
| TX_RPCH           | Volume of exports of goods and services            | Percent change                                    |
| TM_RPCH           | Volume of imports of goods and services            | Percent change                                    |
---

> #### **Savings & Investment **
| **Variable Code** | **Description**                                    | **Units**                                         |
| ----------------- | -------------------------------------------------- | ------------------------------------------------- |
| NID_NGDP          | Total investment                                   | Percent of GDP                                    |

---

> #### **Country Metadata**

| **Variable Code** | **Description**                                    | **Units**                                         |
| ----------------- | -------------------------------------------------- | ------------------------------------------------- |
| Country_Code      | ID number for each country                         | ID                                                |
| Country           | Name of 70 countries                               | String                                            |
| Advanced_Country  | Is the country developed (1) or developing (0)?    | Boolean                                           |
| Years             | date from 2000 to 2024                             | Date                                              |

---


#### **1. Inflation & Price Stability (التضخم واستقرار الأسعار)**

| Variable Code    | Term             | التفسير |                تأثيره على التضخم                                            |
| ---------------- | ---------------- | --------------------------------- | --------------------------------------------------- |
| **PCPIPCH**   | Inflation (CPI) | معدل التضخم بناءً على متوسط أسعار المستهلكين؛ مؤشر رئيسي لاستقرار الأسعار. | المتغير الهدف، ويقيس بشكل مباشر مدى ارتفاع الأسعار. |

---

#### **2. Public Finance (المالية العامة)**

| Variable Code    | Term             | التفسير |                تأثيره على التضخم                                            |
| ---------------- | ---------------- | --------------------------------- | --------------------------------------------------- |
| **GGSB_NPGDP**  | Structural Budget Balance         | الميزان الهيكلي بعد خصم أثر الدورة الاقتصادية.                         | الفائض الهيكلي يُعتبر إشارة إلى سياسة مالية انكماشية تقلل من التضخم.      |
| **GGXWDG_NGDP** | Gross Government Debt (% of GDP)  | الدين العام كنسبة من الناتج؛ يعكس عبء الحكومة المالي.                  | ارتفاع الدين قد يُجبر الحكومة على التوسع النقدي مستقبلاً مما يزيد التضخم. |


---


#### **3. Economic Output & Productivity (الإنتاجية والناتج الاقتصادي)**

| Variable Code    | Term             | التفسير |                تأثيره على التضخم                                            |
| ---------------- | ---------------- | --------------------------------- | --------------------------------------------------- |
| **PPPPC**      | GDP per Capita (PPP)   | نصيب الفرد من الناتج باستخدام تعادل القوة الشرائية.             | ارتفاعه يشير إلى قدرة شرائية أعلى، ما قد يدفع بالأسعار إلى الارتفاع.               |

---


#### **4. International Trade & Balance (التجارة الدولية والحساب الجاري)**

| Variable Code    | Term             | التفسير |                تأثيره على التضخم                                            |
| ---------------- | ---------------- | --------------------------------- | --------------------------------------------------- |
| **TX_RPCH**   | Export Volume Growth            | نمو حجم الصادرات.                   | زيادة الصادرات قد تقلل المعروض المحلي وترفع الأسعار.                |
| **TM_RPCH**   | Import Volume Growth            | نمو حجم الواردات.                   | زيادة الواردات توفر بدائل أرخص وتقلل من التضخم.                     |

---

#### **5. Savings & Investment (الادخار والاستثمار)**

| Variable Code    | Term             | التفسير |                تأثيره على التضخم                                            |
| ---------------- | ---------------- | --------------------------------- | --------------------------------------------------- |
| **NID_NGDP**  | Gross Capital Formation | الاستثمار الإجمالي كنسبة من الناتج. | استثمار أكبر قد يرفع الإنتاج في الأجل الطويل مما يقلل التضخم. |

---

#### **6. Country Metadata (بيانات الدول)**

| Variable Code    | Term             | التفسير |                تأثيره على التضخم                                            |
| ---------------- | ---------------- | --------------------------------- | --------------------------------------------------- |
| **Country_Code**     | Country ID         | معرف رقمي فريد لكل دولة. | -                 |
| **Country**           | Country Name       | اسم الدولة.              | -                 |
| **Advanced_Country** | Development Status | متقدمة (1) أو نامية (0). | -                 |
| **Years**             | Year               | السنة ما بين 2000 و2024. | -                 |

---



**Import Library**

In [2]:
# Load required libraries
library(dplyr)
library(readxl)
library(car)
library(gplots)
library(plm)

library(tidyverse)
library(corrplot)

library(Metrics)  # for rmse
library(caret)    # for R-squared
library(lmtest)


library(sandwich)


**Load Dataset**

In [171]:

# Load the dataset.
df <- read.csv("../02-Dataset/01.4-Data_Clean.csv")  

# Display the first 5 rows of data.
dim(df)
head(df)

Unnamed: 0_level_0,WEO_Country_Code,Country,Advanced_Country,Year,PCPIPCH,GGSB_NPGDP,GGXWDG_NGDP,PPPPC,TX_RPCH,TM_RPCH,NID_NGDP
Unnamed: 0_level_1,<int>,<chr>,<int>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,193,Australia,1,1980,10.136,-0.2769864,9.396714,10277.79,-0.915,4.997,27.139
2,193,Australia,1,1981,9.488,0.3266791,5.183462,11529.56,-3.403,10.042,28.897
3,193,Australia,1,1982,11.352,-0.2234347,14.483242,12049.62,8.771,5.46,26.502
4,193,Australia,1,1983,10.039,-2.3088319,18.715089,12305.54,-4.36,-9.819,23.062
5,193,Australia,1,1984,3.96,0.5929632,18.584186,13391.17,16.092,22.058,26.734
6,193,Australia,1,1985,6.735,-0.3028125,13.750254,14363.83,10.397,3.79,27.243


In [172]:
# Drop "WEO_Country_Code"
df$WEO_Country_Code <- NULL

str(df)

'data.frame':	2925 obs. of  10 variables:
 $ Country         : chr  "Australia" "Australia" "Australia" "Australia" ...
 $ Advanced_Country: int  1 1 1 1 1 1 1 1 1 1 ...
 $ Year            : int  1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 ...
 $ PCPIPCH         : num  10.14 9.49 11.35 10.04 3.96 ...
 $ GGSB_NPGDP      : num  -0.277 0.327 -0.223 -2.309 0.593 ...
 $ GGXWDG_NGDP     : num  9.4 5.18 14.48 18.72 18.58 ...
 $ PPPPC           : num  10278 11530 12050 12306 13391 ...
 $ TX_RPCH         : num  -0.915 -3.403 8.771 -4.36 16.092 ...
 $ TM_RPCH         : num  5 10.04 5.46 -9.82 22.06 ...
 $ NID_NGDP        : num  27.1 28.9 26.5 23.1 26.7 ...


In [173]:
### Display Descriptive Statistics
summary(df)


   Country          Advanced_Country      Year         PCPIPCH       
 Length:2925        Min.   :0.0000   Min.   :1980   Min.   : -3.967  
 Class :character   1st Qu.:0.0000   1st Qu.:1991   1st Qu.:  1.832  
 Mode  :character   Median :1.0000   Median :2002   Median :  3.535  
                    Mean   :0.5385   Mean   :2002   Mean   :  5.832  
                    3rd Qu.:1.0000   3rd Qu.:2013   3rd Qu.:  7.172  
                    Max.   :1.0000   Max.   :2024   Max.   :109.200  
   GGSB_NPGDP        GGXWDG_NGDP         PPPPC             TX_RPCH       
 Min.   :-29.2427   Min.   :-70.88   Min.   :   274.6   Min.   :-73.052  
 1st Qu.: -4.1170   1st Qu.: 29.70   1st Qu.:  7912.5   1st Qu.:  0.948  
 Median : -2.1317   Median : 47.22   Median : 16352.3   Median :  5.457  
 Mean   : -2.3861   Mean   : 54.40   Mean   : 22605.5   Mean   :  5.099  
 3rd Qu.: -0.2388   3rd Qu.: 69.01   3rd Qu.: 31644.3   3rd Qu.:  9.651  
 Max.   :125.1350   Max.   :418.38   Max.   :151145.8   Max.   :24

In [174]:
# panel data
panel_df <- pdata.frame(df, index = c("Country", "Year"))
head(panel_df)


Unnamed: 0_level_0,Country,Advanced_Country,Year,PCPIPCH,GGSB_NPGDP,GGXWDG_NGDP,PPPPC,TX_RPCH,TM_RPCH,NID_NGDP
Unnamed: 0_level_1,<fct>,<int>,<fct>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Australia-1980,Australia,1,1980,10.136,-0.2769864,9.396714,10277.79,-0.915,4.997,27.139
Australia-1981,Australia,1,1981,9.488,0.3266791,5.183462,11529.56,-3.403,10.042,28.897
Australia-1982,Australia,1,1982,11.352,-0.2234347,14.483242,12049.62,8.771,5.46,26.502
Australia-1983,Australia,1,1983,10.039,-2.3088319,18.715089,12305.54,-4.36,-9.819,23.062
Australia-1984,Australia,1,1984,3.96,0.5929632,18.584186,13391.17,16.092,22.058,26.734
Australia-1985,Australia,1,1985,6.735,-0.3028125,13.750254,14363.83,10.397,3.79,27.243


In [175]:
# Apply log transformation: log((x + 1) - min(x)) within each country
panel_df <- panel_df %>%
  group_by(Country) %>%
  mutate(
    PCPIPCH = log((PCPIPCH + 1) - min(PCPIPCH))
  ) %>%
  ungroup()

# Summary statistics
summary(panel_df$PCPIPCH)


   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   1.114   1.588   1.626   2.075   4.713 

In [176]:
# #    of PCPIPCH within each country
# panel_df <- panel_df %>%
#   group_by(Country) %>%
#   mutate(PCPIPCH_diff = PCPIPCH - lag(PCPIPCH))
# # View result
# summary(panel_df$PCPIPCH_diff)

<a id='1'></a>

### **1. :**

In [177]:
# ========== Split data for out-of-sample forecasting ==========
# Define dependent and independent variables
y <- panel_df[,'PCPIPCH']  # Inflation Rate (Consumer Prices, annual %)
X_vars <- c(

    # 2. Public Finance

    "GGSB_NPGDP",   ## General government structural balance
    "GGXWDG_NGDP",  # General government gross debt (% of GDP)

    
    # 3. Economic Output & Productivity & Exchange & Purchasing Power
    "PPPPC",        # GDP per capita based on PPP
    

    # 4. International Trade & Balance
    "TX_RPCH",      # Export volume growth
    "TM_RPCH",     ## Import volume growth


    # 5. Savings & Investment
    #"NGSD_NGDP",    ## Net government saving (% of GDP)
    "NID_NGDP",     ## Investment (% of GDP)
    
    # 6. Metadata
   "Advanced_Country"
    
)

X <- panel_df[, X_vars]

# Define evaluation function (RMSE and R-squared)
evaluate <- function(y_true, y_pred) {
  rmse <- sqrt(mean((y_true - y_pred)^2))                  # Root Mean Squared Error
  ss_total <- sum((y_true - mean(y_true))^2)               # Total Sum of Squares
  ss_res <- sum((y_true - y_pred)^2)                       # Residual Sum of Squares
  r2 <- 1 - (ss_res / ss_total)                            # R-squared
  return(list(rmse = rmse, r2 = r2))
}


In [178]:
# Step 1: Convert Year to numeric to allow comparison
panel_df$Year <- as.numeric(as.character(panel_df$Year))

# Split data into train and test
train <- panel_df %>% 
  as.data.frame() %>%
  filter(Year <= 2024) %>%
  pdata.frame(index = c("Country", "Year"))

test <- panel_df %>% 
  as.data.frame() %>%
  filter(Year > 2022) %>%
  pdata.frame(index = c("Country", "Year"))

# Define dependent and independent variables
y_train <- train$PCPIPCH
X_train <- train[, X_vars]

y_test <- test$PCPIPCH
X_test <- test[, X_vars]

# Combine y and X for training model
train_model_df <- train[, c("PCPIPCH", X_vars)]

test_model_df <- test[, c("PCPIPCH", X_vars)]


In [179]:
# ===============================
# A. Pooled OLS model

# Create formula for model
formula_str <- as.formula(paste("PCPIPCH ~ -1 +", paste(X_vars, collapse = " + ")))

#  Fit Pooled OLS model
pooled_model <- plm(formula_str, data = train_model_df, model = "pooling", index = c("Country", "Year"))

# Display model summary
summary(pooled_model)

# Predict using the model
pooled_preds <- predict(pooled_model, newdata = X_test)

# Evaluate model performance
pooled_rmse <- rmse(y_test, pooled_preds)
pooled_r2   <- R2( y_test, pooled_preds)

# Print results
cat(sprintf("Pooled OLS RMSE (out-of-sample): %.4f\n", pooled_rmse))
cat(sprintf("Pooled OLS R² (out-of-sample): %.4f\n", pooled_r2))


Pooling Model

Call:
plm(formula = formula_str, data = train_model_df, model = "pooling", 
    index = c("Country", "Year"))

Balanced Panel: n = 65, T = 45, N = 2925

Residuals:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -3.072  -0.403   0.058   0.131   0.616   4.453 

Coefficients:
                    Estimate  Std. Error t-value  Pr(>|t|)    
GGSB_NPGDP       -1.7910e-02  3.6616e-03 -4.8914 1.055e-06 ***
GGXWDG_NGDP       2.3110e-03  3.8441e-04  6.0118 2.062e-09 ***
PPPPC            -5.0830e-06  9.7650e-07 -5.2054 2.071e-07 ***
TX_RPCH          -7.0637e-03  1.5607e-03 -4.5260 6.250e-06 ***
TM_RPCH           7.2974e-03  1.9197e-03  3.8013 0.0001469 ***
NID_NGDP          5.9139e-02  1.2373e-03 47.7973 < 2.2e-16 ***
Advanced_Country  1.9037e-02  4.0447e-02  0.4707 0.6379062    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    1698.9
Residual Sum of Squares: 2191.2
R-Squared:      0.014048
Adj. R-Squared: 0.012021
F-statistic: 1377.99 o

Pooled OLS RMSE (out-of-sample): 0.7881
Pooled OLS R² (out-of-sample): 0.0331


In [180]:
if ("Advanced_Country" %in% colnames(X_train)) {
  X_train <- X_train[, !colnames(X_train) %in% "Advanced_Country"]
}

if ("Advanced_Country" %in% colnames(X_test)) {
  X_test <- X_test[, !colnames(X_test) %in% "Advanced_Country"]
}

if ("Advanced_Country" %in% colnames(train_model_df)) {
  train_model_df <- train_model_df[, !colnames(train_model_df) %in% "Advanced_Country"]
}
if ("Advanced_Country" %in% colnames(test_model_df)) {
  test_model_df <- test_model_df[, !colnames(test_model_df) %in% "Advanced_Country"]
}

X_vars <- setdiff(X_vars, "Advanced_Country")


# Create formula for model
formula_str <- as.formula(paste("PCPIPCH ~ -1 +", paste(X_vars, collapse = " + " )))


In [181]:
# ===============================
# B. Fixed Effects model

# Fit Fixed Effects (within)
fe_model <- plm(formula_str, data = train_model_df, model = "within", index = c("Country", "Year"))

# Display model summary
summary(fe_model)

# Predict using the model
fe_preds <- predict(fe_model, newdata = X_test)

# Evaluate model performance

fe_rmse <- rmse(y_test, fe_preds)
fe_r2   <- R2( y_test,fe_preds)

# Print results
cat(sprintf("Fixed Effects RMSE (out-of-sample): %.4f\n", fe_rmse))
cat(sprintf("Fixed Effects R² (out-of-sample): %.4f\n", fe_r2))


Oneway (individual) effect Within Model

Call:
plm(formula = formula_str, data = train_model_df, model = "within", 
    index = c("Country", "Year"))

Balanced Panel: n = 65, T = 45, N = 2925

Residuals:
    Min.  1st Qu.   Median  3rd Qu.     Max. 
-2.52817 -0.35175  0.00574  0.36369  2.58234 

Coefficients:
               Estimate  Std. Error  t-value  Pr(>|t|)    
GGSB_NPGDP   1.7102e-03  3.3331e-03   0.5131    0.6079    
GGXWDG_NGDP -3.3253e-03  5.1724e-04  -6.4288 1.502e-10 ***
PPPPC       -1.0433e-05  9.1817e-07 -11.3630 < 2.2e-16 ***
TX_RPCH      2.4345e-03  1.2666e-03   1.9220    0.0547 .  
TM_RPCH     -1.4594e-03  1.5022e-03  -0.9715    0.3314    
NID_NGDP     1.0294e-02  2.3524e-03   4.3758 1.253e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    1360.7
Residual Sum of Squares: 1232.4
R-Squared:      0.094306
Adj. R-Squared: 0.072092
F-statistic: 49.5289 on 6 and 2854 DF, p-value: < 2.22e-16

Fixed Effects RMSE (out-of-sample): 0.7045
Fixed Effects R² (out-of-sample): 0.0697


In [182]:
# ===============================
# C. Random Effects model


# Step 2: Fit Random Effects model using plm
re_model <- plm(formula_str, data = train_model_df, model = "random", index = c("Country", "Year"))

# re_model <- plm(formula_str, data = train_model_df, model = "random", 
#                 random.method = "amemiya", index = c("Country", "Year"))

# re_model <- plm(formula_str, data = train_model_df, model = "random", 
#                 random.method = "walhus", index = c("Country", "Year"))

# Step 3: Display model summary
summary(re_model)

# Step 4: Predict using the model
re_preds <- predict(re_model, newdata = X_test)

# Step 5: Evaluate model performance
re_rmse <- rmse(y_test, re_preds)
re_r2   <- R2( y_test, re_preds)

# Step 6: Print results
cat(sprintf("Random Effects RMSE (out-of-sample): %.4f\n", re_rmse))
cat(sprintf("Random Effects R² (out-of-sample): %.4f\n", re_r2))


Oneway (individual) effect Random Effect Model 
   (Swamy-Arora's transformation)

Call:
plm(formula = formula_str, data = train_model_df, model = "random", 
    index = c("Country", "Year"))

Balanced Panel: n = 65, T = 45, N = 2925

Effects:
                 var std.dev share
idiosyncratic 0.4318  0.6571 0.695
individual    0.1893  0.4351 0.305
theta: 0.7804

Residuals:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -2.622  -0.213   0.156   0.164   0.534   3.427 

Coefficients:
               Estimate  Std. Error z-value  Pr(>|z|)    
GGSB_NPGDP  -7.1503e-03  3.4904e-03 -2.0485   0.04051 *  
GGXWDG_NGDP  1.2153e-04  5.0050e-04  0.2428   0.80814    
PPPPC       -7.0694e-06  9.2817e-07 -7.6164 2.608e-14 ***
TX_RPCH     -8.6847e-05  1.3417e-03 -0.0647   0.94839    
TM_RPCH     -4.9378e-05  1.5990e-03 -0.0309   0.97536    
NID_NGDP     4.2054e-02  1.8042e-03 23.3093 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    1377
Residual

Random Effects RMSE (out-of-sample): 1.1709
Random Effects R² (out-of-sample): 0.0120


<a id='2'></a>

### **2. :**

In [183]:
# ===============================
# Hausman test: fixed vs random effects
hausman_test <- phtest(fe_model, re_model)
print(hausman_test)

haus_pval <- hausman_test$p.value

if (haus_pval < 0.05) {
  cat("Hausman test suggests Fixed Effects preferred.\n")
} else {
  cat("Hausman test suggests Random Effects preferred.\n")
}


	Hausman Test

data:  formula_str
chisq = 424.73, df = 6, p-value < 2.2e-16
alternative hypothesis: one model is inconsistent

Hausman test suggests Fixed Effects preferred.


In [184]:
# ===============================
# Wald test on Fixed Effects (joint significance of entity effects)
# ===============================

comparison <- waldtest(pooled_model, fe_model, test = "F")
cat("Wald Test (F-test) for joint significance of entity effects:\n")
comparison

# Extract p-value from the comparison table (last row, last column)
wald_pval <- comparison[2, "Pr(>F)"]

# Decision rule
if (wald_pval < 0.05) {
  cat("Wald test suggests Fixed Effects preferred.\n")
} else {
  cat("Wald test suggests Pooled OLS preferred.\n")
}


Wald Test (F-test) for joint significance of entity effects:


Unnamed: 0_level_0,Res.Df,Df,F,Pr(>F)
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>
1,2918,,,
2,2854,-1.0,0.2215374,0.6379062


Wald test suggests Pooled OLS preferred.


### 3.

#### # 1.Multicollinearity check using VIF

In [185]:
# ===============================
# 1.Multicollinearity check using VIF

model <- lm(PCPIPCH ~ . , data = train_model_df)

# Compute VIF values
vif_values <- vif(model)
# Create a data frame to store VIF results
vif_data <- data.frame(
  feature = names(vif_values),
  VIF = as.numeric(vif_values)
)

# Display the result
print("Variance Inflation Factors (VIF):")
print(vif_data)

[1] "Variance Inflation Factors (VIF):"
      feature      VIF
1  GGSB_NPGDP 1.052794
2 GGXWDG_NGDP 1.053897
3       PPPPC 1.026589
4     TX_RPCH 1.476352
5     TM_RPCH 1.343068
6    NID_NGDP 1.176903


In [186]:
# ===============================
# 2. Heteroskedasticity tests (Breusch-Pagan and White on Fixed Effects residuals)

print("Breusch-Pagan and White Test for Heteroskedasticity:")

# Breusch-Pagan Test
bp_test <- bptest(fe_model)
cat(sprintf("Breusch-Pagan test: stat=%.4f, p-value=%.4f\n", bp_test$statistic, bp_test$p.value))

# White Test (using quadratic interactions)
fe_residuals <- residuals(fe_model)

white_test <- bptest(lm(fe_residuals^2 ~ ., data = train_model_df[, X_vars]))
cat(sprintf("White test: stat=%.4f, p-value=%.4f\n", white_test$statistic, white_test$p.value))

# Decision rule
if (bp_test$p.value < 0.05 | white_test$p.value < 0.05) {
  cat("Heteroskedasticity detected: consider robust standard errors.\n")
} else {
  cat("No significant heteroskedasticity detected.\n")
}

[1] "Breusch-Pagan and White Test for Heteroskedasticity:"
Breusch-Pagan test: stat=185.4529, p-value=0.0000
White test: stat=53.2275, p-value=0.0000
Heteroskedasticity detected: consider robust standard errors.


In [187]:
# ===============================
# 3. Serial Correlation test (Breusch-Godfrey) on Fixed Effects residuals

bg_test <- pbgtest(fe_model)
print("Breusch-Godfrey/Wooldridge test for Serial Correlation:")
bg_test


[1] "Breusch-Godfrey/Wooldridge test for Serial Correlation:"



	Breusch-Godfrey/Wooldridge test for serial correlation in panel models

data:  formula_str
chisq = 1464.7, df = 45, p-value < 2.2e-16
alternative hypothesis: serial correlation in idiosyncratic errors


In [188]:

# Test Pesaran CD
pcdtest(fe_model, test = "cd")



	Pesaran CD test for cross-sectional dependence in panels

data:  PCPIPCH ~ -1 + GGSB_NPGDP + GGXWDG_NGDP + PPPPC + TX_RPCH + TM_RPCH +     NID_NGDP
z = 104.39, p-value < 2.2e-16
alternative hypothesis: cross-sectional dependence


In [189]:
library(urca)

# test Levin, Lin & Chu
llc_test <- purtest(panel_df$PCPIPCH, test = "levinlin")
summary(llc_test)


ERROR: Error in purtest(panel_df$PCPIPCH, test = "levinlin"): the individual dimension is undefined


In [190]:
# Correcting standard error using Cluster-Robust SE (country level)
robust_se <- vcovHC(fe_model, method = "arellano", type = "HC1", cluster = "group")

# Testing significance using corrected standard error
coeftest(fe_model, vcov = robust_se)


t test of coefficients:

               Estimate  Std. Error t value  Pr(>|t|)    
GGSB_NPGDP   1.7102e-03  7.8321e-03  0.2184   0.82716    
GGXWDG_NGDP -3.3253e-03  1.7570e-03 -1.8926   0.05851 .  
PPPPC       -1.0433e-05  2.5882e-06 -4.0311 5.697e-05 ***
TX_RPCH      2.4345e-03  2.1046e-03  1.1568   0.24747    
TM_RPCH     -1.4594e-03  2.6753e-03 -0.5455   0.58545    
NID_NGDP     1.0294e-02  6.9638e-03  1.4782   0.13947    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


In [191]:
robust_se_scc <- vcovSCC(fe_model, type = "HC1")
coeftest(fe_model, vcov = robust_se_scc)



t test of coefficients:

               Estimate  Std. Error t value Pr(>|t|)   
GGSB_NPGDP   1.7102e-03  4.1588e-03  0.4112 0.680931   
GGXWDG_NGDP -3.3253e-03  1.1255e-03 -2.9546 0.003156 **
PPPPC       -1.0433e-05  6.4070e-06 -1.6284 0.103547   
TX_RPCH      2.4345e-03  2.4532e-03  0.9924 0.321107   
TM_RPCH     -1.4594e-03  3.7618e-03 -0.3879 0.698085   
NID_NGDP     1.0294e-02  4.0548e-03  2.5387 0.011181 * 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


## GMM

In [192]:
head(panel_df)

Country,Advanced_Country,Year,PCPIPCH,GGSB_NPGDP,GGXWDG_NGDP,PPPPC,TX_RPCH,TM_RPCH,NID_NGDP
<fct>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Australia,1,1980,2.389771,-0.2769864,9.396714,10277.79,-0.915,4.997,27.139
Australia,1,1981,2.328545,0.3266791,5.183462,11529.56,-3.403,10.042,28.897
Australia,1,1982,2.495434,-0.2234347,14.483242,12049.62,8.771,5.46,26.502
Australia,1,1983,2.380842,-2.3088319,18.715089,12305.54,-4.36,-9.819,23.062
Australia,1,1984,1.554982,0.5929632,18.584186,13391.17,16.092,22.058,26.734
Australia,1,1985,2.016235,-0.3028125,13.750254,14363.83,10.397,3.79,27.243


In [193]:
# ──────────────────────────────────────────────────────────────────────────────
# 2) Specify the Dynamic Panel (GMM) model
# ──────────────────────────────────────────────────────────────────────────────

# Build the formula string:
# Left of | : regressors that include lag(PCPIPCH,1) + strictly exogenous X’s
# Right of |: instruments for the endogenous part = lag(PCPIPCH, 2:…) 
dynamic_formula <- as.formula(
  paste0( "PCPIPCH ~ lag(PCPIPCH, 1) + ",
    paste(X_vars, collapse = " + "), 
    " | lag(PCPIPCH, 2:7)"
  )
)

In [194]:
# ──────────────────────────────────────────────────────────────────────────────
# 3) Estimate the GMM model: Difference GMM (Arellano–Bond)
# ──────────────────────────────────────────────────────────────────────────────

# (A) Difference GMM (one‐step)
gmm_model <- pgmm(
  formula   = dynamic_formula,
  data      = panel_df,
  effect    = "individual",    # “individual” = fixed‐effect on Country, twoways
  model     = "twosteps",       # onestep GMM   |  # "twosteps"> two‐step GMM
  transformation = "d",         # difference GMM (Arellano–Bond) |  # "ld"> difference + level (System GMM)
  collapse       = TRUE

)
summary(gmm_model)

"duplicate couples (id-time) in resulting pdata.frame
 to find out which, use, e.g., table(index(your_pdataframe), useNA = "ifany")"


ERROR: Error in eval(mf, parent.frame()): Repeated values of timevar within one or more groups


In [195]:
# ──────────────────────────────────────────────────────────────────────────────
# 4) Diagnostics for the Difference GMM model
# ──────────────────────────────────────────────────────────────────────────────

# (i) Hansen/Sargan test for overid (validity of instruments)
cat("Sargan test (overid):\n")
print(sargan(gmm_model))

# (ii) Arellano–Bond tests for autocorrelation in residuals
cat("\nArellano–Bond AR(1) test:\n") #  rehect
print(mtest(gmm_model, order = 1))

cat("\nArellano–Bond AR(2) test:\n") # Not rehect (p > 0.05)
print(mtest(gmm_model, order = 2))


Sargan test (overid):

	Sargan test

data:  dynamic_formula
chisq = 10.162, df = 5, p-value = 0.07078
alternative hypothesis: overidentifying restrictions not valid


Arellano–Bond AR(1) test:

	Arellano-Bond autocorrelation test of degree 1

data:  dynamic_formula
normal = -3.6489, p-value = 0.0002634
alternative hypothesis: autocorrelation present


Arellano–Bond AR(2) test:

	Arellano-Bond autocorrelation test of degree 2

data:  dynamic_formula
normal = -1.3894, p-value = 0.1647
alternative hypothesis: autocorrelation present



# **END**