# Remittance to the Philippines â€“ Econometric Analysis

**Dataset Source:**  
https://www.kaggle.com/datasets/joshbuttler/remittance-to-the-philippines

**Input File:**  
data/processed/remittance_cleaned.csv

**Purpose:**  
Apply econometric models to:
- Quantify determinants of remittance flows
- Control for time and country effects
- Diagnose model assumptions
- Provide interpretable, policy-relevant estimates

In [None]:
import pandas as pd
import numpy as np

import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.stats.outliers_influence import variance_inflation_factor
from statsmodels.stats.diagnostic import het_breuschpagan

import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style="whitegrid")
plt.rcParams["figure.figsize"] = (12, 6)

pd.set_option("display.max_columns", None)
pd.set_option("display.float_format", "{:,.2f}".format)

In [None]:
DATA_PATH = "../data/processed/remittance_cleaned.csv"
df = pd.read_csv(DATA_PATH)

df.head()

In [None]:
# Target variable
amount_col = "amount" if "amount" in df.columns else df.select_dtypes(np.number).columns[0]

# Time variables
if "date" in df.columns:
    df["date"] = pd.to_datetime(df["date"])
    df["year"] = df["date"].dt.year

# Key categorical controls
country_cols = [c for c in df.columns if "country" in c.lower() or "origin" in c.lower()]
channel_cols = [c for c in df.columns if "channel" in c.lower() or "method" in c.lower()]

country_col = country_cols[0] if country_cols else None
channel_col = channel_cols[0] if channel_cols else None

amount_col, country_col, channel_col

In [None]:
df["log_amount"] = np.log1p(df[amount_col])

In [None]:
model_ols_1 = smf.ols(
    formula="log_amount ~ year",
    data=df
).fit(cov_type="HC3")

model_ols_1.summary()

In [None]:
formula_terms = ["year"]

if country_col:
    formula_terms.append(f"C({country_col})")

if channel_col:
    formula_terms.append(f"C({channel_col})")

formula = "log_amount ~ " + " + ".join(formula_terms)
formula

In [None]:
model_ols_2 = smf.ols(
    formula=formula,
    data=df
).fit(cov_type="HC3")

model_ols_2.summary()

In [None]:
if country_col:
    fe_model = smf.ols(
        formula=f"log_amount ~ year + C({country_col})",
        data=df
    ).fit(cov_type="HC3")

    fe_model.summary()

In [None]:
sns.histplot(model_ols_2.resid, kde=True)
plt.title("Residual Distribution")
plt.show()

In [None]:
sns.scatterplot(x=model_ols_2.fittedvalues, y=model_ols_2.resid)
plt.axhline(0, linestyle="--", color="red")
plt.title("Residuals vs Fitted Values")
plt.show()

In [None]:
bp_test = het_breuschpagan(
    model_ols_2.resid,
    model_ols_2.model.exog
)

labels = ["LM Statistic", "LM-Test p-value", "F-Statistic", "F-Test p-value"]
dict(zip(labels, bp_test))

In [None]:
X = model_ols_2.model.exog
vif_data = pd.DataFrame({
    "variable": model_ols_2.model.exog_names,
    "VIF": [variance_inflation_factor(X, i) for i in range(X.shape[1])]
})

vif_data

In [None]:
marginal_effects = model_ols_2.params.reset_index()
marginal_effects.columns = ["variable", "coefficient"]
marginal_effects

## Econometric Interpretation

- The time trend coefficient captures long-run growth in remittance inflows.
- Country fixed effects absorb structural differences across remitting countries.
- Log transformation allows coefficients to be interpreted as approximate percentage effects.
- Robust standard errors address heteroskedasticity typical of financial data.
- Results are suitable for policy and macroeconomic interpretation.