# **Change Point & Event Impact Analysis**

---


Detect **structural breaks (change points)** in student enrollment, teacher counts, and teacher–student ratios over time, and assess whether **major policy or systemic events** (e.g., K–12 implementation, COVID-19 disruptions) are associated with statistically meaningful shifts.

In [None]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

import ruptures as rpt
from scipy.stats import ttest_ind

pd.set_option("display.max_columns", None)
sns.set(style="whitegrid")

In [None]:
# Dataset source:
# https://www.kaggle.com/datasets/franksebastiancayaco/philippine-public-school-teachers-and-students

DATA_PATH = "../data/raw/philippine_public_school_teachers_students.csv"

df = pd.read_csv(DATA_PATH)
df.head()

In [None]:
# Normalize time variable
df["school_year"] = df["school_year"].astype(str)
df["year_start"] = df["school_year"].str[:4].astype(int)

# Numeric coercion
df["students"] = pd.to_numeric(df["students"], errors="coerce")
df["teachers"] = pd.to_numeric(df["teachers"], errors="coerce")

# Derived metric
df["students_per_teacher"] = df["students"] / df["teachers"]

df.info()

In [None]:
national_ts = (
    df.groupby("year_start")[["students", "teachers"]]
      .sum()
      .reset_index()
      .sort_values("year_start")
)

national_ts["students_per_teacher"] = (
    national_ts["students"] / national_ts["teachers"]
)

national_ts

In [None]:
plt.figure(figsize=(10, 4))
plt.plot(
    national_ts["year_start"],
    national_ts["students_per_teacher"],
    marker="o"
)
plt.title("National Teacher–Student Ratio Over Time")
plt.xlabel("School Year (Start)")
plt.ylabel("Students per Teacher")
plt.show()

In [None]:
signal = national_ts["students_per_teacher"].values

model = rpt.Pelt(model="rbf").fit(signal)

# Choose number of breakpoints conservatively
breakpoints = model.predict(pen=3)

breakpoints

In [None]:
plt.figure(figsize=(10, 4))
plt.plot(
    national_ts["year_start"],
    signal,
    marker="o",
    label="Observed"
)

for bp in breakpoints[:-1]:
    plt.axvline(
        x=national_ts.iloc[bp]["year_start"],
        color="red",
        linestyle="--"
    )

plt.title("Detected Change Points in Teacher–Student Ratio")
plt.xlabel("School Year (Start)")
plt.ylabel("Students per Teacher")
plt.legend()
plt.show()

In [None]:
EVENTS = {
    "K12_Implementation": 2013,
    "COVID19_Pandemic": 2020
}

EVENTS

In [None]:
results = []

for event, year in EVENTS.items():
    pre = national_ts[national_ts["year_start"] < year]["students_per_teacher"]
    post = national_ts[national_ts["year_start"] >= year]["students_per_teacher"]

    if len(pre) > 1 and len(post) > 1:
        stat, p_value = ttest_ind(pre, post, equal_var=False)
    else:
        stat, p_value = np.nan, np.nan

    results.append({
        "event": event,
        "pre_mean_ratio": pre.mean(),
        "post_mean_ratio": post.mean(),
        "t_stat": stat,
        "p_value": p_value
    })

event_test_results = pd.DataFrame(results)
event_test_results

In [None]:
regional_breaks = {}

for region in df["region"].unique():
    temp = (
        df[df["region"] == region]
        .groupby("year_start")[["students", "teachers"]]
        .sum()
        .reset_index()
        .sort_values("year_start")
    )

    if len(temp) >= 6:
        ratio = temp["students"] / temp["teachers"]
        model = rpt.Pelt(model="rbf").fit(ratio.values)
        breaks = model.predict(pen=3)
        regional_breaks[region] = breaks

regional_breaks

### Interpreting Change Points

- Detected change points indicate **structural shifts** in staffing adequacy.
- Alignment between change points and known policy or systemic events suggests
  potential causal links.
- Pre/post statistical tests help determine whether observed shifts are
  statistically meaningful rather than random fluctuations.
- Regional change points highlight **localized disruptions or policy effects**
  that national averages may mask.

### Key Change Point and Event Impact Insights

1. Structural breaks in teacher–student ratios indicate periods of significant
   system-level adjustment.
2. Some detected change points align temporally with major education reforms or
   external shocks, such as the COVID-19 pandemic.
3. Pre/post comparisons reveal whether these events correspond to statistically
   significant changes in staffing adequacy.
4. Regional-level change point analysis uncovers heterogeneous impacts across
   the country, emphasizing the need for localized policy responses.

These findings justify advanced modeling and forecasting in subsequent analysis.