<a href="https://colab.research.google.com/github/Abhyudaya01/Impact-of-COVID-19-School-Lockdowns-on-Student-Academic-Performance/blob/main/602_p1_abhyudayalohani.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Problem Statement

Given student demographic, socioeconomic, and time-period data, can we predict academic performance (math, reading, writing scores) and quantify the causal impact of COVID-19 school lockdowns while accounting for school socioeconomic status and technology access? Why this matters: COVID-19 school closures affected over 1.6 billion students globally, representing the largest educational disruption in modern history. Understanding learning losses and their heterogeneous effects is crucial for developing targeted remediation policies, informing educational equity interventions, and preparing for future crisis responses.

# Population

**Target population**: K-12 students (grades 6-12) in Portland, Oregon metropolitan area during COVID-19 pandemic, experiencing both in-person (pre-COVID) and online learning (post-COVID) across diverse socioeconomic backgrounds and technology access levels.

**Operational sample**: Balanced panel dataset of 1,400 students (8,400 observations) equally distributed between high-socioeconomic (School A, n=700) and low-socioeconomic (School B, n=700) schools, observed across 6 consecutive academic periods with Public Domain licensing.

# Variables

***Dependent (Outcomes we're measuring)***:

Math, reading, and writing test scores (0-100 scale)

**Independent**:

**Treatment**:

**Time period**: Pre-COVID in-person (periods 0-2) vs. Post-COVID online (periods 3-5)

**Student Background**:

**School type**: Wealthy (School A) vs. Poor (School B)

Family income, free lunch status

Parent education levels (0=no HS diploma to 4=PhD)

Grade level, gender, COVID infection status

**Technology Access**:

Number of home computers (0-5)

Family size

Confounders = variables that affect both treatment and outcomes, creating false associations.

**Key Confounders**:

**Wealth disparities**: Rich families had tutors, quiet study spaces, and parental support during remote learning—making COVID's impact look smaller for wealthy students

**Technology gap**: Students without computers couldn't effectively participate in online learning—device access directly affected both online learning exposure and performance

**Hidden ability**: High-performing students may have adapted better to disruptions regardless of school resources

**How We Control Confounders**:

Compare each student to themselves (fixed effects) to remove unchanging differences

Include school × time interactions to measure whether poor schools were hit harder

Control for number of computers directly in the model

# Hypothesis

If COVID-19 school lockdowns occur (independent variable), then average student test scores (dependent variable) will decrease, especially among low-SES students with limited technology access.

# Data-collection plan

**Source & permission**: Simulated panel dataset generated using 1,500+ lines of Stata code for graduate econometrics coursework, released under Public Domain license. Dataset models realistic COVID-19 educational impacts based on Portland, Oregon demographics and documented pandemic research findings.

**Acquisition method**: Load Excel file directly in Python/Colab using pandas (pd.read_excel()).

**Representativeness**: Dataset incorporates Portland-area income distributions, family structures, documented COVID-19 learning loss patterns, and realistic correlations between socioeconomic status and academic performance. Balanced panel design (all 1,400 students observed 6 times) eliminates attrition bias.

**Handling artifacts**: Control for time-invariant student characteristics using fixed effects; include school-by-time and technology-by-time interactions to capture heterogeneous treatment effects; verify time-invariant demographic variables remain constant within students.

# Dataset choice & documentation

**What it is**: 8,400 observations (1,400 students × 6 time periods) with 18 variables capturing academic performance (reading, writing, math at school and state levels), demographics (student ID, school type, grade, gender), socioeconomic factors (household income, free lunch, parental education), technology access (number of computers, family size), and COVID exposure. Balanced panel with zero missing values. Initial analysis reveals substantial learning losses of approximately 8.0 points across all subjects during online learning periods (periods 3-5).

**Why interesting**: Provides controlled natural experiment design for studying pandemic educational impacts with realistic demographic heterogeneity and technology access variations crucial for policy-relevant analysis. The 6-period structure enables difference-in-differences estimation to isolate causal effects of COVID-19 lockdowns while controlling for confounding factors. Skills developed include panel data econometrics, causal inference, and policy evaluation—directly applicable to education policy research and program evaluation.

**Where from**: Simulated dataset created specifically for graduate applied econometrics coursework. File: COVID-19-Constructed-Dataset-PANEL.xlsx (882KB, Public Domain).

**When**: Dataset simulates 2019-2021 academic periods with periods 0-2 representing pre-COVID (2019-early 2020) and periods 3-5 representing post-COVID lockdown phases (2020-2021). Generated in 2023-2024 for econometric analysis when actual school district data was unavailable due to ongoing institutional research.

In [None]:
from google.colab import files
uploaded = files.upload()

import pandas as pd
df = pd.read_excel("COVID-19-Constructed-Dataset-(PANEL).xlsx")

# First 5 entries
print("=== First 5 Data Entries ===")
display(df.head(5))



Saving COVID-19-Constructed-Dataset-(PANEL).xlsx to COVID-19-Constructed-Dataset-(PANEL) (1).xlsx
=== First 5 Data Entries ===


Unnamed: 0,studentID,school,gradelevel,gender,covidpos,householdincome,freelunch,numcomputers,familysize,fathereduc,mothereduc,readingscore,writingscore,mathscore,readingscoreSL,writingscoreSL,mathscoreSL,timeperiod
0,1,0,6,1,1,59065.136719,0,5,3,1,0,73.3936,68.84729,86.905823,84.65097,83.348419,71.108353,2
1,1,0,6,1,1,59065.136719,0,5,3,1,0,62.566071,73.258034,56.995117,77.571396,74.59404,57.717148,4
2,1,0,6,1,1,59065.136719,0,5,3,1,0,79.96563,67.070084,56.612415,87.659691,80.292519,85.021355,1
3,1,0,6,1,1,59065.136719,0,5,3,1,0,68.552406,55.633102,73.727753,69.650352,48.293591,86.596375,3
4,1,0,6,1,1,59065.136719,0,5,3,1,0,82.541451,87.166336,65.315819,68.989784,85.802025,65.637871,0
