# Remittance to the Philippines â€“ Comparative Analysis

**Dataset Source:**  
https://www.kaggle.com/datasets/joshbuttler/remittance-to-the-philippines

**Input File:**  
data/processed/remittance_cleaned.csv

**Purpose:**  
Compare remittance patterns across:
- Sending countries / regions
- Time periods
- Remittance channels (if available)
- High vs low remittance segments

In [None]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style="whitegrid")
plt.rcParams["figure.figsize"] = (12, 6)

pd.set_option("display.max_columns", None)
pd.set_option("display.float_format", "{:,.2f}".format)

In [None]:
DATA_PATH = "../data/processed/remittance_cleaned.csv"
df = pd.read_csv(DATA_PATH)

df.head()

In [None]:
amount_col = "amount" if "amount" in df.columns else df.select_dtypes(np.number).columns[0]

country_cols = [c for c in df.columns if "country" in c.lower() or "origin" in c.lower()]
channel_cols = [c for c in df.columns if "channel" in c.lower() or "method" in c.lower()]

country_cols, channel_cols

In [None]:
if "date" in df.columns:
    df["date"] = pd.to_datetime(df["date"])
    df["year"] = df["date"].dt.year
    df["month"] = df["date"].dt.month

In [None]:
if country_cols:
    country_col = country_cols[0]

    country_summary = (
        df.groupby(country_col)[amount_col]
          .agg(["sum", "mean", "count"])
          .sort_values("sum", ascending=False)
          .reset_index()
    )

    country_summary.head(10)

In [None]:
sns.barplot(
    data=country_summary.head(10),
    y=country_col,
    x="sum"
)
plt.title("Top 10 Sending Countries by Total Remittance")
plt.xlabel("Total Remittance Amount")
plt.ylabel("Country")
plt.show()

In [None]:
if country_cols and "year" in df.columns:
    top_countries = country_summary.head(5)[country_col].tolist()

    ts_country = (
        df[df[country_col].isin(top_countries)]
        .groupby(["year", country_col])[amount_col]
        .sum()
        .reset_index()
    )

    sns.lineplot(
        data=ts_country,
        x="year",
        y=amount_col,
        hue=country_col,
        marker="o"
    )
    plt.title("Remittance Trends by Top Sending Countries")
    plt.show()

In [None]:
if channel_cols:
    channel_col = channel_cols[0]

    channel_summary = (
        df.groupby(channel_col)[amount_col]
          .agg(["sum", "mean", "count"])
          .reset_index()
          .sort_values("sum", ascending=False)
    )

    channel_summary

In [None]:
if channel_cols:
    sns.barplot(
        data=channel_summary,
        y=channel_col,
        x="sum"
    )
    plt.title("Remittance Volume by Channel")
    plt.xlabel("Total Remittance Amount")
    plt.ylabel("Channel")
    plt.show()

In [None]:
if country_cols:
    sns.boxplot(
        data=df[df[country_col].isin(top_countries)],
        x=country_col,
        y=amount_col
    )
    plt.xticks(rotation=45)
    plt.title("Remittance Amount Distribution by Country")
    plt.show()

In [None]:
if country_cols and "year" in df.columns:
    growth_df = (
        ts_country
        .sort_values(["country", "year"])
        .groupby(country_col)[amount_col]
        .pct_change() * 100
    )

    ts_country["growth_rate"] = growth_df
    ts_country.head()

In [None]:
sns.lineplot(
    data=ts_country,
    x="year",
    y="growth_rate",
    hue=country_col
)
plt.axhline(0, linestyle="--", color="red")
plt.title("Year-on-Year Growth Rate Comparison")
plt.show()

In [None]:
df["remittance_segment"] = pd.qcut(
    df[amount_col],
    q=3,
    labels=["Low", "Medium", "High"]
)

df["remittance_segment"].value_counts()

In [None]:
sns.boxplot(
    data=df,
    x="remittance_segment",
    y=amount_col
)
plt.title("Remittance Segments Comparison")
plt.show()

In [None]:
comparison_table = (
    df.groupby(["remittance_segment"] + ([country_col] if country_cols else []))[amount_col]
      .mean()
      .reset_index()
)

comparison_table.head()

## Key Comparative Insights

- A small number of countries account for a disproportionate share of remittance inflows.
- Growth trajectories differ significantly across sending regions.
- Distributional differences suggest structural variation in remittance behavior.
- Channel-based differences (if present) may reflect cost and accessibility factors.
- High-value remittances exhibit greater volatility than low-value segments.