# Remittance to the Philippines â€“ Exploratory Data Analysis (EDA)

**Dataset Source:**  
https://www.kaggle.com/datasets/joshbuttler/remittance-to-the-philippines

**Input File:**  
data/processed/remittance_cleaned.csv

**Purpose:**  
Perform descriptive and exploratory analysis to understand:
- Time trends and seasonality
- Distributional characteristics
- Geographic patterns
- Key structural features of remittance flows

In [None]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

# Visualization settings
sns.set(style="whitegrid")
plt.rcParams["figure.figsize"] = (12, 6)

pd.set_option("display.max_columns", None)
pd.set_option("display.float_format", "{:,.2f}".format)

In [None]:
DATA_PATH = "../data/processed/remittance_cleaned.csv"

df = pd.read_csv(DATA_PATH)

print("Dataset shape:", df.shape)
df.head()

In [None]:
df.describe().T

In [None]:
df.describe(include="object").T

In [None]:
df.info()

In [None]:
df.nunique().sort_values(ascending=False)

In [None]:
# Convert date column if present
if "date" in df.columns:
    df["date"] = pd.to_datetime(df["date"])
    df["year"] = df["date"].dt.year
    df["month"] = df["date"].dt.month

In [None]:
df[["year", "month"]].dropna().head()

In [None]:
amount_col = "amount" if "amount" in df.columns else df.select_dtypes(np.number).columns[0]

ts_year = (
    df.groupby("year")[amount_col]
      .sum()
      .reset_index()
)

plt.plot(ts_year["year"], ts_year[amount_col], marker="o")
plt.title("Total Remittances Over Time")
plt.xlabel("Year")
plt.ylabel("Total Remittance Amount")
plt.show()

In [None]:
if "month" in df.columns:
    monthly_avg = (
        df.groupby("month")[amount_col]
          .mean()
          .reset_index()
    )

    sns.lineplot(data=monthly_avg, x="month", y=amount_col, marker="o")
    plt.title("Average Monthly Remittance Pattern")
    plt.xlabel("Month")
    plt.ylabel("Average Remittance Amount")
    plt.show()

In [None]:
sns.histplot(df[amount_col], bins=50, kde=True)
plt.title("Distribution of Remittance Amounts")
plt.xlabel("Remittance Amount")
plt.show()

In [None]:
sns.boxplot(x=df[amount_col])
plt.title("Boxplot of Remittance Amounts")
plt.show()


In [None]:
geo_col_candidates = [c for c in df.columns if "country" in c.lower() or "origin" in c.lower()]
geo_col_candidates

In [None]:
if geo_col_candidates:
    geo_col = geo_col_candidates[0]

    geo_summary = (
        df.groupby(geo_col)[amount_col]
          .sum()
          .sort_values(ascending=False)
          .head(15)
          .reset_index()
    )

    sns.barplot(data=geo_summary, y=geo_col, x=amount_col)
    plt.title("Top Remittance Sending Countries")
    plt.xlabel("Total Remittance Amount")
    plt.ylabel("Country")
    plt.show()

In [None]:
if geo_col_candidates and "year" in df.columns:
    heatmap_df = (
        df.pivot_table(
            values=amount_col,
            index="year",
            columns=geo_col,
            aggfunc="sum"
        )
        .fillna(0)
    )

    plt.figure(figsize=(14, 8))
    sns.heatmap(heatmap_df, cmap="YlGnBu")
    plt.title("Remittance Heatmap by Year and Country")
    plt.show()

In [None]:
ts_year["growth_rate"] = ts_year[amount_col].pct_change() * 100
ts_year

In [None]:
sns.barplot(data=ts_year, x="year", y="growth_rate")
plt.axhline(0, color="red", linestyle="--")
plt.title("Year-on-Year Growth Rate of Remittances (%)")
plt.show()

In [None]:
corr = df.select_dtypes(np.number).corr()

plt.figure(figsize=(10, 6))
sns.heatmap(corr, annot=True, fmt=".2f", cmap="coolwarm")
plt.title("Correlation Matrix (Numeric Variables)")
plt.show()

## Preliminary Observations

- Remittance inflows show a clear long-term trend over time.
- Distribution is right-skewed, indicating the presence of high-value transfers.
- Seasonality effects may be present depending on monthly patterns.
- Geographic concentration suggests reliance on a small number of sending countries.
- Growth rates show periods of volatility, warranting further econometric analysis.