# Small Area Income and Poverty Estimates (SAIPE): School Districts

This data is a time series, which means that we don't have to
query for a specific year like we do with survey data like the
ACS. Instead, when we download we get multiple years of data at
once.

See https://www.census.gov/data/developers/data-sets/Poverty-Statistics.html.

In [1]:
# So we can run from within the censusdis project and find the packages we need.
import os
import sys

sys.path.append(
    os.path.join(
        os.path.abspath(os.path.join(os.path.curdir, os.path.pardir, os.path.pardir))
    )
)

In [2]:
import censusdis.data as ced

from censusdis.states import STATE_NJ

In [3]:
DATASET = "timeseries/poverty/saipe/schdist"

In [4]:
df_variables = ced.variables.all_variables(DATASET, "timeseries", None)

In [5]:
df_variables

Unnamed: 0,YEAR,DATASET,GROUP,VARIABLE,LABEL,SUGGESTED_WEIGHT,VALUES
0,timeseries,timeseries/poverty/saipe/schdist,,GEOCAT,"Summary Level (950 Elementary, 960 Secondary, ...",,"{'980': 'Administrative', '970': 'Unified', '9..."
1,timeseries,timeseries/poverty/saipe/schdist,,GEOID,Combined codes for the reference geography,,
2,timeseries,timeseries/poverty/saipe/schdist,,GRADE,Grade Range of District,,
3,timeseries,timeseries/poverty/saipe/schdist,,LEAID,School District ID,,
4,timeseries,timeseries/poverty/saipe/schdist,,SAEPOV5_17RV_PT,Relevant Age 5 to 17 in Families in Poverty,,{'0': 'Relevant Age 5 to 17 in Families in Pov...
5,timeseries,timeseries/poverty/saipe/schdist,,SAEPOV5_17V_PT,Relevant Age 5 to 17 Population,,{'0': 'Relevant Age 5 to 17 Population'}
6,timeseries,timeseries/poverty/saipe/schdist,,SAEPOVALL_PT,Total Population,,{'0': 'Total Population'}
7,timeseries,timeseries/poverty/saipe/schdist,,SAEPOVRAT5_17RV_PT,Relevant Age 5 to 17 Poverty Ratio Estimate,,
8,timeseries,timeseries/poverty/saipe/schdist,,SD_NAME,District Name,,
9,timeseries,timeseries/poverty/saipe/schdist,,STATE,State Fips Code,,


In [6]:
variables = ced.variables.group_leaves(DATASET, "timeseries", None)

variables = [v for v in variables if v not in ("time", "STATE")]

variables

['GEOCAT',
 'GEOID',
 'GRADE',
 'LEAID',
 'SAEPOV5_17RV_PT',
 'SAEPOV5_17V_PT',
 'SAEPOVALL_PT',
 'SAEPOVRAT5_17RV_PT',
 'SD_NAME',
 'YEAR']

In [7]:
df_ts = ced.download(
    DATASET,
    "timeseries",
    variables,
    state=STATE_NJ,
    school_district_unified="*",
)

df_ts

Unnamed: 0,STATE,SCHOOL_DISTRICT_UNIFIED,GEOCAT,GEOID,GRADE,LEAID,SAEPOV5_17RV_PT,SAEPOV5_17V_PT,SAEPOVALL_PT,SAEPOVRAT5_17RV_PT,SD_NAME,YEAR
0,34,00004,970,3400004,KG-12,00004,143,3079,18488,4.6,SCH DIST OF THE CHATHAMS,1995
1,34,00008,970,3400008,KG-12,00008,54,1405,6852,3.8,GREAT MEADOWS REGIONAL,1995
2,34,00009,970,3400009,01-12,00009,14,1773,10525,0.8,SOMERSET HILLS REGIONAL,1995
3,34,00780,970,3400780,01-12,00780,2,125,809,1.6,ALLENHURST,1995
4,34,00930,970,3400930,PK-12,00930,1744,3372,17913,51.7,ASBURY PARK CITY,1995
...,...,...,...,...,...,...,...,...,...,...,...,...
7259,34,18150,970,3418150,KG-12,18150,376,1684,10035,22.3,Woodbury City School District,2021
7260,34,18270,970,3418270,KG-12,18270,183,622,2904,29.4,Woodlynne Borough School District,2021
7261,34,18300,970,3418300,KG-12,18300,98,1568,10117,6.3,Wood-Ridge Borough School District,2021
7262,34,18330,970,3418330,KG-12,18330,126,1390,7886,9.1,Woodstown-Pilesgrove Regional School District,2021


## Data Evolution over Time

Notice how the number of rows in each year changes. We could dig much deeper into why
but school districts came into existence, merged with others, and/or failed to report
data in certain years. In some years, like 1996 and 1998, data was not collected at all.
This kind of change over time is not atypical in time series data.

In [8]:
df_ts.groupby("YEAR")["SCHOOL_DISTRICT_UNIFIED"].count()

YEAR
1995    230
1997    234
1999    344
2000    344
2001    238
2002    238
2003    238
2004    238
2005    238
2006    238
2007    238
2008    238
2009    232
2010    232
2011    340
2012    340
2013    339
2014    339
2015    341
2016    341
2017    341
2018    341
2019    341
2020    341
2021    340
Name: SCHOOL_DISTRICT_UNIFIED, dtype: int64