## OSPI Enrollment 2023-24 Dataset

The Office of the Superintendent of Public Instruction (OSPI) maintains a repository of data that is available for public use. In my previous role as the tuition and financial aid director for a private high school, my team and I used data from OSPI to gain understanding of overall student enrollment in Washington State and to guide our strategic enrollment goals and long-term budgeting.


In [1]:
# Import necessary libraries for assignment
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Reading the OSPI Enrollment 2023-2024 file into a DataFrame
try:
    df = pd.read_csv('Report_Card_Enrollment_2023-24_School_Year_20241028.csv')
    # Do some operations with the file
except FileNotFoundError:
    print('File not found.')

In [2]:
# Analyze the dataset to create baseline understanding

print('Display the first 5 rows of the OSPI dataframe')
print(df.head(5)) # Display the first 5 rows of the Dataframe
print("*"*75)

print('Dataframe info:')
print(df.info()) # Display info about the data, including shape & column info
print("*"*75)


print('Missing data:')
print(df.isnull().sum()) # Display columns with missing data


Display the first 5 rows of the OSPI dataframe
  SchoolYear OrganizationLevel County                           ESDName  \
0    2023-24          District  Adams  Educational Service District 101   
1    2023-24          District  Adams  Educational Service District 101   
2    2023-24          District  Adams  Educational Service District 101   
3    2023-24          District  Adams  Educational Service District 101   
4    2023-24          District  Adams  Educational Service District 101   

   ESDOrganizationID  DistrictCode               DistrictName  \
0           100001.0        1109.0  Washtucna School District   
1           100001.0        1109.0  Washtucna School District   
2           100001.0        1109.0  Washtucna School District   
3           100001.0        1109.0  Washtucna School District   
4           100001.0        1109.0  Washtucna School District   

   DistrictOrganizationId  SchoolCode      SchoolName  ...  Non-Foster Care  \
0                100287.0       

### Business Questions for Analysis

- How is enrollment trending since 2020/post-COVID era (2020-2021, 2021-2022, 2022-2023, 2023-2024)
    - by class
- Can we see a heatmap of enrollment increases/decreases across school districts?
- Isolate Spokane area schools and do same analysis as above
    - will require school - zip code dataset to cross-reference
    - can we see private school enrollment? (NWESD101 is private school district)

<mark>read in prior year data sets and merge into one larger set <mark>

In [3]:
# Remove rows with missing or duplicated values from dataframe

cleaned_df = df.dropna()
cleaned_df = cleaned_df.drop_duplicates() 
cleaned_df

Unnamed: 0,SchoolYear,OrganizationLevel,County,ESDName,ESDOrganizationID,DistrictCode,DistrictName,DistrictOrganizationId,SchoolCode,SchoolName,...,Non-Foster Care,Non-Highly Capable,Non-Homeless,Non-Low Income,Non Migrant,Non Military Parent,Non Mobile,Non Section 504,Students without Disabilities,DataAsOf
4459,2023-24,School,Adams,Educational Service District 101,100001.0,1109.0,Washtucna School District,100287.0,3075.0,Washtucna Elementary/High School,...,0,5,5,0,5,5,5,5,5,06/18/2024 12:00:00 AM
4460,2023-24,School,Adams,Educational Service District 101,100001.0,1109.0,Washtucna School District,100287.0,3075.0,Washtucna Elementary/High School,...,0,9,9,2,9,9,8,9,8,06/18/2024 12:00:00 AM
4461,2023-24,School,Adams,Educational Service District 101,100001.0,1109.0,Washtucna School District,100287.0,3075.0,Washtucna Elementary/High School,...,0,6,6,2,6,6,6,4,6,06/18/2024 12:00:00 AM
4462,2023-24,School,Adams,Educational Service District 101,100001.0,1109.0,Washtucna School District,100287.0,3075.0,Washtucna Elementary/High School,...,0,3,3,0,3,3,3,3,3,06/18/2024 12:00:00 AM
4463,2023-24,School,Adams,Educational Service District 101,100001.0,1109.0,Washtucna School District,100287.0,3075.0,Washtucna Elementary/High School,...,0,5,5,1,5,5,5,5,3,06/18/2024 12:00:00 AM
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
20544,2023-24,School,Yakima,Educational Service District 105,100002.0,39209.0,Mount Adams School District,100155.0,2532.0,White Swan High School,...,0,54,39,0,40,58,53,55,49,06/18/2024 12:00:00 AM
20545,2023-24,School,Yakima,Educational Service District 105,100002.0,39209.0,Mount Adams School District,100155.0,2532.0,White Swan High School,...,0,66,58,0,57,68,68,64,60,06/18/2024 12:00:00 AM
20546,2023-24,School,Yakima,Educational Service District 105,100002.0,39209.0,Mount Adams School District,100155.0,2532.0,White Swan High School,...,0,76,56,0,60,80,75,74,69,06/18/2024 12:00:00 AM
20547,2023-24,School,Yakima,Educational Service District 105,100002.0,39209.0,Mount Adams School District,100155.0,2532.0,White Swan High School,...,0,70,51,0,59,70,66,67,52,06/18/2024 12:00:00 AM


## Enrollment Trends

Using merged datasets from above, produce plot line of enrollment change from 2020 to present.

In [8]:
enrollment_data = df.pivot_table(values="All Students", index="GradeLevel", columns="SchoolYear", aggfunc="sum")
enrollment_data = enrollment_data.style.format("{:,.0f}")

enrollment_data

SchoolYear,2023-24
GradeLevel,Unnamed: 1_level_1
10th Grade,265338
11th Grade,263905
12th Grade,277827
1st Grade,233226
2nd Grade,245417
3rd Grade,234714
4th Grade,243331
5th Grade,245036
6th Grade,241771
7th Grade,244126


In [21]:
print('Dataframe objects:')
print(cleaned_df.columns) # Display the columns for df.drop call

Dataframe objects:
Index(['SchoolYear', 'OrganizationLevel', 'County', 'ESDName',
       'ESDOrganizationID', 'DistrictCode', 'DistrictName',
       'DistrictOrganizationId', 'SchoolCode', 'SchoolName',
       'SchoolOrganizationID', 'CurrentSchoolType', 'GradeLevel',
       'All Students', 'Female', 'Gender X', 'Male',
       'American Indian/ Alaskan Native', 'Asian', 'Black/ African American',
       'Hispanic/ Latino of any race(s)',
       'Native Hawaiian/ Other Pacific Islander', 'Two or More Races', 'White',
       'English Language Learners', 'Foster Care', 'Highly Capable',
       'Homeless', 'Low-Income', 'Migrant', 'Military Parent', 'Mobile',
       'Section 504', 'Students with Disabilities',
       'Non-English Language Learners', 'Non-Foster Care',
       'Non-Highly Capable', 'Non-Homeless', 'Non-Low Income', 'Non Migrant',
       'Non Military Parent', 'Non Mobile', 'Non Section 504',
       'Students without Disabilities', 'DataAsOf'],
      dtype='object')


In [None]:
trunc_df = cleaned_df.drop([['SchoolYear', 'OrganizationLevel', 'County', 'ESDName',
       'ESDOrganizationID', 'DistrictCode', 'DistrictName',
       'DistrictOrganizationId', 'SchoolCode', 'SchoolName',
       'SchoolOrganizationID', 'CurrentSchoolType', 'GradeLevel',
       'All Students', 'Female', 'Gender X', 'Male',
       'American Indian/ Alaskan Native', 'Asian', 'Black/ African American',
       'Hispanic/ Latino of any race(s)',
       'Native Hawaiian/ Other Pacific Islander', 'Two or More Races', 'White',
       'English Language Learners', 'Foster Care', 'Highly Capable',
       'Homeless', 'Low-Income', 'Migrant', 'Military Parent', 'Mobile',
       'Section 504', 'Students with Disabilities',
       'Non-English Language Learners', 'Non-Foster Care',
       'Non-Highly Capable', 'Non-Homeless', 'Non-Low Income', 'Non Migrant',
       'Non Military Parent', 'Non Mobile', 'Non Section 504',
       'Students without Disabilities', 'DataAsOf']], axis=1)

In [None]:
# Display statistical summary of data
print('Statistical summary of dataframe')
print(df.describe())


Statistical summary of dataframe
       ESDOrganizationID  DistrictCode  DistrictOrganizationId    SchoolCode  \
count       20189.000000  20549.000000            20549.000000  16090.000000   
mean       100070.693645  22153.483819           100287.240547   3584.890491   
std           613.206326  10424.243415              862.908784   1172.673582   
min        100001.000000   1109.000000           100001.000000   1502.000000   
25%        100003.000000  17001.000000           100084.000000   2659.000000   
50%        100006.000000  21237.000000           100161.000000   3424.000000   
75%        100007.000000  31103.000000           100236.000000   4423.750000   
max        105886.000000  39801.000000           106989.000000   5961.000000   

       SchoolOrganizationID  All Students         Female      Gender X  \
count          16090.000000  2.056500e+04   20565.000000  20565.000000   
mean          102649.440273  3.214552e+02     154.326088      1.356674   
std             1787.442

In [None]:
# Data Definition
pd.pivot_table(ospi_df, values = 'school', index = 'Female', 'Male', 'All Students')

<mark>Type Analysis Here</mark>

hello

