# 2018 CAASPP Scores

Welcome to my exploration of test scores from the 2018 CAASPP testing season! Since I, too, had to take this assessment at some point in high school, I want to make use of the data and see how the scores compare between schools - specifically, between public and charter schools. Hopefully, this will give us information on how students in schools with different structures of education systems perform.

I will be referring to certain columns in the data that aren't clear-cut, so the definitions of each column can be found [here](#https://caaspp.cde.ca.gov/sb2018/research_fixfileformat18).

Since we'll be working with dataframes, we need to import the pandas library in order to create and manipulate dataframes based on our csv files.
We're also going to be looking at statistics of the scores, so we'll need the numpy library to determine statistical values.

In [1]:
import pandas as pd
import numpy as np

Now we can open our csv files as dataframes.

In [2]:
file = pd.read_csv("2018/comma_delimited/csv.txt", delimiter=",", sep=" ")
file_info = pd.read_csv("2018/comma_delimited/csv_entities.txt", delimiter=",", sep=" ", encoding='latin-1')

Since our data has a lot of extraneous information that we don't need, we're going to do some cleaning of the data by removing unnecessary columns and sorting by relevant attributes - School Code (in order to merge the data and its entities), Grade, and Test ID (type of assessment).

One thing you will notice that I also removed Area Percentages. These areas are described [here](#https://caaspp.cde.ca.gov/sb2018/UnderstandingCAASPPReports) and are much more specific in performance per school, so since we want to generalize performance between schools, we will ignore this data.

In [3]:
file_info = file_info.drop('Filler', axis=1)
file_info = file_info.drop(["County Code", "District Code", "Test Year"], axis=1)
file_info = file_info[file_info["School Code"] != 0]
file_info = file_info.dropna()
file_info = file_info.sort_values(by="School Code")
file_info = file_info.reset_index()
file_info.head()

Unnamed: 0,index,School Code,Type Id,County Name,District Name,School Name,Zip Code
0,4473,100016,9,Madera,Sherman Thomas Charter,Sherman Thomas Charter,93638
1,934,100024,7,El Dorado,Buckeye Union Elementary,Oak Meadow Elementary,95762
2,6637,100040,7,Sacramento,Galt Joint Union Elementary,Robert L. Mccaffrey Middle,95632
3,5972,100057,7,Plumas,Plumas County Office Of Education,Plumas County Community,95971
4,221,100065,9,Alameda,Oakland Unity High,Oakland Unity High,94605


In [4]:
file = file.drop('Filler', axis=1)
file = file.drop(["County Code", "District Code", "Test Year", "Subgroup ID"], axis=1)
file = file.drop(["Area 1 Percentage Above Standard", "Area 1 Percentage Near Standard", "Area 1 Percentage Below Standard", "Area 2 Percentage Above Standard", "Area 2 Percentage Near Standard", "Area 2 Percentage Below Standard", "Area 3 Percentage Above Standard", "Area 3 Percentage Near Standard", "Area 3 Percentage Below Standard", "Area 4 Percentage Above Standard", "Area 4 Percentage Near Standard", "Area 4 Percentage Below Standard"], axis=1)
file = file.drop(file[file['Students Tested']=='*'].index)
file = file[file["School Code"] != 0]
file = file.dropna()
file = file.sort_values(by=["School Code", "Grade", "Test Id"])
file = file.reset_index()
file.head()

Unnamed: 0,index,School Code,Test Type,Total Tested At Entity Level,Total Tested with Scores,Grade,Test Id,CAASPP Reported Enrollment,Students Tested,Mean Scale Score,Percentage Standard Exceeded,Percentage Standard Met,Percentage Standard Met and Above,Percentage Standard Nearly Met,Percentage Standard Not Met,Students with Scores
0,39065,100016,B,139,139,3,1,22,22,2432.6,9.09,54.55,63.64,13.64,22.73,22
1,39066,100016,B,139,139,3,2,22,22,2412.2,13.64,18.18,31.82,36.36,31.82,22
2,39067,100016,B,139,139,4,1,28,27,2466.8,25.93,22.22,48.15,14.81,37.04,27
3,39068,100016,B,139,139,4,2,28,27,2454.1,3.7,37.04,40.74,25.93,33.33,27
4,39069,100016,B,139,139,5,1,25,25,2461.9,4.0,32.0,36.0,16.0,48.0,25


We now want to separate the entities by public and charter schools so we can then merge the scores and label our data.

In [5]:
charter_schools = file_info[file_info["Type Id"] >= 9]
charter_schools.head()

Unnamed: 0,index,School Code,Type Id,County Name,District Name,School Name,Zip Code
0,4473,100016,9,Madera,Sherman Thomas Charter,Sherman Thomas Charter,93638
4,221,100065,9,Alameda,Oakland Unity High,Oakland Unity High,94605
8,222,100123,9,Alameda,East Oakland Leadership Academy,East Oakland Leadership Academy,94605
9,2018,100156,10,Kings,Lemoore Union Elementary,Lemoore University Elementary Charter,93245
17,2786,100289,9,Los Angeles,N.E.W. Academy Of Science And Arts,N.E.W. Academy Of Science And Arts,90017


In [6]:
public_schools = file_info[file_info["Type Id"] == 7]
public_schools.head()

Unnamed: 0,index,School Code,Type Id,County Name,District Name,School Name,Zip Code
1,934,100024,7,El Dorado,Buckeye Union Elementary,Oak Meadow Elementary,95762
2,6637,100040,7,Sacramento,Galt Joint Union Elementary,Robert L. Mccaffrey Middle,95632
3,5972,100057,7,Plumas,Plumas County Office Of Education,Plumas County Community,95971
5,2187,100081,7,Los Angeles,Antelope Valley Union High,William J. (Pete) Knight High,93552
6,7212,100107,7,San Bernardino,Ontario-Montclair,Vista Grande Elementary,91762


This is where we merge our data into our new dataframes.

In [7]:
charter_school_data = file.merge(charter_schools, on='School Code').drop(['index_x', 'index_y'], axis=1)
charter_school_data.head()

Unnamed: 0,School Code,Test Type,Total Tested At Entity Level,Total Tested with Scores,Grade,Test Id,CAASPP Reported Enrollment,Students Tested,Mean Scale Score,Percentage Standard Exceeded,Percentage Standard Met,Percentage Standard Met and Above,Percentage Standard Nearly Met,Percentage Standard Not Met,Students with Scores,Type Id,County Name,District Name,School Name,Zip Code
0,100016,B,139,139,3,1,22,22,2432.6,9.09,54.55,63.64,13.64,22.73,22,9,Madera,Sherman Thomas Charter,Sherman Thomas Charter,93638
1,100016,B,139,139,3,2,22,22,2412.2,13.64,18.18,31.82,36.36,31.82,22,9,Madera,Sherman Thomas Charter,Sherman Thomas Charter,93638
2,100016,B,139,139,4,1,28,27,2466.8,25.93,22.22,48.15,14.81,37.04,27,9,Madera,Sherman Thomas Charter,Sherman Thomas Charter,93638
3,100016,B,139,139,4,2,28,27,2454.1,3.7,37.04,40.74,25.93,33.33,27,9,Madera,Sherman Thomas Charter,Sherman Thomas Charter,93638
4,100016,B,139,139,5,1,25,25,2461.9,4.0,32.0,36.0,16.0,48.0,25,9,Madera,Sherman Thomas Charter,Sherman Thomas Charter,93638


In [8]:
public_school_data = file.merge(public_schools, on='School Code').drop(['index_x', 'index_y'], axis=1)
public_school_data.head()

Unnamed: 0,School Code,Test Type,Total Tested At Entity Level,Total Tested with Scores,Grade,Test Id,CAASPP Reported Enrollment,Students Tested,Mean Scale Score,Percentage Standard Exceeded,Percentage Standard Met,Percentage Standard Met and Above,Percentage Standard Nearly Met,Percentage Standard Not Met,Students with Scores,Type Id,County Name,District Name,School Name,Zip Code
0,100024,B,383,383,3,1,99,98,2480.8,50.0,26.53,76.53,17.35,6.12,98,7,El Dorado,Buckeye Union Elementary,Oak Meadow Elementary,95762
1,100024,B,385,385,3,2,99,98,2475.0,41.84,32.65,74.49,17.35,8.16,98,7,El Dorado,Buckeye Union Elementary,Oak Meadow Elementary,95762
2,100024,B,383,383,4,1,128,126,2547.8,63.49,26.98,90.48,5.56,3.97,126,7,El Dorado,Buckeye Union Elementary,Oak Meadow Elementary,95762
3,100024,B,385,385,4,2,128,127,2546.7,48.82,40.16,88.98,8.66,2.36,127,7,El Dorado,Buckeye Union Elementary,Oak Meadow Elementary,95762
4,100024,B,383,383,5,1,161,159,2571.7,46.54,35.85,82.39,11.95,5.66,159,7,El Dorado,Buckeye Union Elementary,Oak Meadow Elementary,95762


Now we will look at the mean scores in each subject.

In [9]:
public_school_english = public_school_data[public_school_data["Test Id"] == 1]
public_school_math = public_school_data[public_school_data["Test Id"] == 2]

charter_school_english = charter_school_data[charter_school_data["Test Id"] == 1]
charter_school_math = charter_school_data[charter_school_data["Test Id"] == 2]

In [44]:
print("English Language Arts/Literacy Mean Scores")
print("Public School: " + str(np.mean(public_school_english["Mean Scale Score"].astype(float))))
print("Charter School: " + str(np.mean(charter_school_english["Mean Scale Score"].astype(float))))

English Mean Scores
Public School: 2487.691795124764
Charter School: 2509.1223082336464


In [43]:
print("Mathematics Mean Scores")
print("Public School: " + str(np.mean(public_school_math["Mean Scale Score"].astype(float))))
print("Charter School: " + str(np.mean(charter_school_math["Mean Scale Score"].astype(float))))

Mathematics Mean Scores
Public School: 2482.1506905623137
Charter School: 2494.133278573107


In [48]:
print("Percentage of Students that Met or Exceeded English Language Arts/Literacy Standards")
print("Public School: " + str(np.mean(public_school_english["Percentage Standard Met and Above"].astype(float))))
print("Charter School: " + str(np.mean(charter_school_english["Percentage Standard Met and Above"].astype(float))))

Percentage of Students that Met or Exceeded English Language Arts/Literacy Standards
Public School: 47.23017223660962
Charter School: 48.493070607553456


In [51]:
print("Percentage of Students that Met or Exceeded Mathematics Standards")
print("Public School: " + str(np.mean(public_school_math["Percentage Standard Met and Above"].astype(float))))
print("Charter School: " + str(np.mean(charter_school_math["Percentage Standard Met and Above"].astype(float))))

Percentage of Students that Met or Exceeded Mathematics Standards
Public School: 38.41453304833945
Charter School: 35.50923726824682
