# **Teacher–Student Ratio Analysis**


Evaluate **staffing adequacy** in Philippine public schools by computing and analyzing **teacher–student ratios** over time, across regions, and by school category. This ratio is a key indicator for education quality, teacher workload, and policy planning.

In [None]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

pd.set_option("display.max_columns", None)
sns.set(style="whitegrid")

In [None]:
# Dataset source:
# https://www.kaggle.com/datasets/franksebastiancayaco/philippine-public-school-teachers-and-students

DATA_PATH = "../data/raw/philippine_public_school_teachers_students.csv"

df = pd.read_csv(DATA_PATH)

df.head()

In [None]:
# Normalize school year
df["school_year"] = df["school_year"].astype(str)
df["year_start"] = df["school_year"].str[:4].astype(int)

# Ensure numeric fields
df["students"] = pd.to_numeric(df["students"], errors="coerce")
df["teachers"] = pd.to_numeric(df["teachers"], errors="coerce")

df.info()

In [None]:
df["students_per_teacher"] = df["students"] / df["teachers"]

df[["students", "teachers", "students_per_teacher"]].describe()


In [None]:
national_ratio = (
    df.groupby("year_start")[["students", "teachers"]]
      .sum()
      .reset_index()
)

national_ratio["students_per_teacher"] = (
    national_ratio["students"] / national_ratio["teachers"]
)

national_ratio

In [None]:
plt.figure(figsize=(8, 4))
plt.plot(
    national_ratio["year_start"],
    national_ratio["students_per_teacher"],
    marker="o"
)

plt.title("National Teacher–Student Ratio Over Time")
plt.xlabel("School Year (Start)")
plt.ylabel("Students per Teacher")
plt.tight_layout()
plt.show()

In [None]:
regional_ratio = (
    df.groupby(["region", "year_start"])[["students", "teachers"]]
      .sum()
      .reset_index()
)

regional_ratio["students_per_teacher"] = (
    regional_ratio["students"] / regional_ratio["teachers"]
)

regional_ratio.head()

In [None]:
plt.figure(figsize=(12, 6))

for region in regional_ratio["region"].unique():
    subset = regional_ratio[regional_ratio["region"] == region]
    plt.plot(
        subset["year_start"],
        subset["students_per_teacher"],
        alpha=0.6
    )

plt.title("Regional Teacher–Student Ratio Trends")
plt.xlabel("School Year (Start)")
plt.ylabel("Students per Teacher")
plt.show()

In [None]:
latest_year = regional_ratio["year_start"].max()

latest_ratios = regional_ratio[
    regional_ratio["year_start"] == latest_year
]

plt.figure(figsize=(10, 5))
sns.boxplot(
    x="students_per_teacher",
    y="region",
    data=latest_ratios
)

plt.title(f"Teacher–Student Ratio Distribution by Region ({latest_year})")
plt.xlabel("Students per Teacher")
plt.ylabel("Region")
plt.show()

In [None]:
category_ratio = (
    df.groupby(["school_category", "year_start"])[["students", "teachers"]]
      .sum()
      .reset_index()
)

category_ratio["students_per_teacher"] = (
    category_ratio["students"] / category_ratio["teachers"]
)

category_ratio.head()

In [None]:
plt.figure(figsize=(10, 5))
sns.lineplot(
    data=category_ratio,
    x="year_start",
    y="students_per_teacher",
    hue="school_category",
    marker="o"
)

plt.title("Teacher–Student Ratio by School Category")
plt.xlabel("School Year (Start)")
plt.ylabel("Students per Teacher")
plt.show()

In [None]:
benchmark = 35

plt.figure(figsize=(8, 4))
plt.plot(
    national_ratio["year_start"],
    national_ratio["students_per_teacher"],
    marker="o",
    label="Observed"
)

plt.axhline(
    benchmark,
    color="red",
    linestyle="--",
    label="Benchmark (35:1)"
)

plt.title("National Ratio vs Benchmark")
plt.xlabel("School Year (Start)")
plt.ylabel("Students per Teacher")
plt.legend()
plt.show()

### Key Teacher–Student Ratio Insights

1. National teacher–student ratios show periods of improvement and stress,
   reflecting changes in enrollment growth and teacher deployment.
2. Regional disparities are evident, with some regions consistently exceeding
   recommended ratio benchmarks.
3. School category analysis suggests differing staffing pressures across
   elementary, junior high, and senior high levels.
4. Persistent high ratios indicate potential teacher workload issues and
   resource allocation gaps.

These findings justify further geographic, inequality, and policy impact
analyses in subsequent notebooks.