# **Category-Level Comparisons (School Level Analysis)**

---

Compare **student enrollment**, **teacher deployment**, and **teacher–student ratios** across **school categories** (e.g., Elementary, Junior High, Senior High). This notebook identifies where staffing pressures and enrollment growth are most pronounced across the basic education system.


In [None]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

pd.set_option("display.max_columns", None)
sns.set(style="whitegrid")

In [None]:
# Dataset source:
# https://www.kaggle.com/datasets/franksebastiancayaco/philippine-public-school-teachers-and-students

DATA_PATH = "../data/raw/philippine_public_school_teachers_students.csv"

df = pd.read_csv(DATA_PATH)
df.head()

In [None]:
# Normalize school year
df["school_year"] = df["school_year"].astype(str)
df["year_start"] = df["school_year"].str[:4].astype(int)

# Ensure numeric fields
df["students"] = pd.to_numeric(df["students"], errors="coerce")
df["teachers"] = pd.to_numeric(df["teachers"], errors="coerce")

# Compute teacher–student ratio
df["students_per_teacher"] = df["students"] / df["teachers"]

df.info()

In [None]:
category_summary = (
    df.groupby("school_category")[["students", "teachers"]]
      .sum()
      .reset_index()
)

category_summary["students_per_teacher"] = (
    category_summary["students"] / category_summary["teachers"]
)

category_summary

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

sns.barplot(
    data=category_summary,
    x="school_category",
    y="students",
    ax=axes[0]
)
axes[0].set_title("Total Students by School Category")

sns.barplot(
    data=category_summary,
    x="school_category",
    y="teachers",
    ax=axes[1]
)
axes[1].set_title("Total Teachers by School Category")

plt.tight_layout()
plt.show()

In [None]:
plt.figure(figsize=(6, 4))

sns.barplot(
    data=category_summary,
    x="school_category",
    y="students_per_teacher"
)

plt.title("Teacher–Student Ratio by School Category")
plt.xlabel("School Category")
plt.ylabel("Students per Teacher")
plt.show()

In [None]:
category_trends = (
    df.groupby(["school_category", "year_start"])[["students", "teachers"]]
      .sum()
      .reset_index()
)

category_trends["students_per_teacher"] = (
    category_trends["students"] / category_trends["teachers"]
)

category_trends.head()

In [None]:
plt.figure(figsize=(10, 5))

sns.lineplot(
    data=category_trends,
    x="year_start",
    y="students_per_teacher",
    hue="school_category",
    marker="o"
)

plt.title("Teacher–Student Ratio Trends by School Category")
plt.xlabel("School Year (Start)")
plt.ylabel("Students per Teacher")
plt.show()

In [None]:
latest_year = df["year_start"].max()

latest_category = (
    df[df["year_start"] == latest_year]
    .groupby("school_category")[["students", "teachers"]]
    .sum()
    .reset_index()
)

latest_category["students_per_teacher"] = (
    latest_category["students"] / latest_category["teachers"]
)

latest_category

In [None]:
plt.figure(figsize=(6, 4))

sns.boxplot(
    data=df[df["year_start"] == latest_year],
    x="school_category",
    y="students_per_teacher"
)

plt.title(f"Distribution of Ratios by Category ({latest_year})")
plt.xlabel("School Category")
plt.ylabel("Students per Teacher")
plt.show()

In [None]:
category_growth = category_trends.copy()

category_growth["student_growth_rate"] = (
    category_growth
    .groupby("school_category")["students"]
    .pct_change() * 100
)

category_growth["teacher_growth_rate"] = (
    category_growth
    .groupby("school_category")["teachers"]
    .pct_change() * 100
)

category_growth.head()

In [None]:
plt.figure(figsize=(10, 5))

sns.lineplot(
    data=category_growth,
    x="year_start",
    y="student_growth_rate",
    hue="school_category",
    marker="o",
    linestyle="--"
)

sns.lineplot(
    data=category_growth,
    x="year_start",
    y="teacher_growth_rate",
    hue="school_category",
    marker="o"
)

plt.axhline(0, color="black", linestyle="--")
plt.title("Enrollment vs Teacher Growth Rates by Category")
plt.xlabel("School Year (Start)")
plt.ylabel("Growth Rate (%)")
plt.show()

In [None]:
RATIO_THRESHOLD = 40

high_risk_categories = latest_category[
    latest_category["students_per_teacher"] > RATIO_THRESHOLD
]

high_risk_categories

### Key Category-Level Insights

1. Enrollment and staffing levels vary substantially across school categories,
   reflecting structural differences in the Philippine basic education system.
2. Certain categories exhibit persistently higher teacher–student ratios,
   indicating elevated instructional workload and staffing gaps.
3. Growth rate analysis shows whether teacher hiring has kept pace with category-
   specific enrollment growth.
4. Identified high-risk categories provide clear targets for category-specific
   staffing policies and budget allocation.

These insights support deeper correlation, inequality, and policy impact analyses
in subsequent notebooks.