# Analysis of NYC public schools results in ELA and math grades 6-8

<span style="color: red;">**If kernel can't connect to server again run command:**
*netsh winsock reset*<span>

## Prepare data by school districts

### Table of contents

1. [Data sources](#data)
4. [Performace levels: definitions](#levels_definition)
2. [Imports: modules](#modules)
3. [Read data](#read_data)
1. [Calculating middle schools (grades 6-8) test results by school district](#MS_charts_district)

<a id="data"></a> 
#### Data:
1. New York City grades 3-8 New York State English Language Arts and Math State Tests results 2013-2023:<br>https://data.cityofnewyork.us/<br>(https://data.cityofnewyork.us/Education/English-Language-Arts-ELA-Test-Results-2013-2023/iebs-5yhr/about_data;<br>https://data.cityofnewyork.us/Education/Math-Test-Results-2013-2023/74kb-55u9/about_data)
<br>New York City grades 3-8 New York State English Language Arts and Math State Tests results 2018-2025: <br>https://infohub.nyced.org/reports/academics/test-results
2. New York City school districts boundaries:<br>https://www.nyc.gov/content/planning/pages/resources/datasets/school-districts

<a id="levels_definition"></a> 
#### Definitions of Performance Levels for the 2023 Grades 3-8 English Language Arts and Mathematics Tests  

**NYS Level 1**: Students performing at this level are below proficient in standards for their grade. They may demonstrate limited knowledge, skills, and practices embodied by the Learning Standards that are considered insufficient for the expectations at this grade. 

**NYS Level 2**: Students performing at this level are partially proficient in standards for their grade. They demonstrate knowledge, skills, and practices embodied by the Learning Standards that are considered partial but insufficient for the expectations at this grade. Students performing at Level 2 are considered on track to meet current New York high school graduation requirements but are not yet proficient in Learning Standards at this grade. 

**NYS Level 3**: Students performing at this level are proficient in standards for their grade. They demonstrate knowledge, skills, and practices embodied by the Learning Standards that are considered sufficient for the expectations at this grade.  

**NYS Level 4**: Students performing at this level excel in standards for their grade. They demonstrate knowledge, skills, and practices embodied by the Learning Standards that are considered more than sufficient for the expectations at this grade.  

*Source: NYSED, 2023, https://www.p12.nysed.gov/irs/ela-math/2023/ela-math-score-ranges-performance-levels-2023.pdf*

<a id="questions"></a> 
### Question
*1. How to compare the school districts?*
<br>In this analysis, we choose the sum of shares of students with level 4 test resulsts in state math and ELA test for the last available year as comparison variable. The sum can be between 0 and 2. This indicator is selected to cover both subjects.
ALternatively, the indicator can be sum of shares of students with levels 3+4 test results in math and ELA. The notebook would be needed to changed accordingly.

#### About this notebook

- The notebook '*1._NYC_data_processing_by_schools.ipynb*' contains the steps for the processing data on state testing of NYC public middle schools. 
- This notebook '*2._NYC_ELA_math_data processing_by_districts.ipynb*' contains steps to process district-wide data for NYC public middle schools. Since linking these data to geoJSON can be straitforwadly done at mapping stage by districts numbers, the layer is finalized in the final notebook.
- The notebook '*3._Generating_NYC_map_by_public_schools.ipynb*' contains code to generate the maps from the processed data.
- The map is available at: https://nycmsmap.netlify.app.

<a id="modules"></a> 
#### Imports: modules

In [5]:
import os
import pandas as pd
import matplotlib.pyplot as plt

pd.set_option('display.float_format', '{:.3f}'.format)

<a id="read_data"></a> 
#### Read data

In [None]:
basePath = r"G:\My Drive\Kids\NYC_schools_mapped\raw_data"

#Read math results
fileName_math2025 = "district-math-results-2018-2025-public.xlsx"
mathPath2 = os.path.join(basePath,fileName_math2025)
print(mathPath2)
sheetName_math2 = "Math - All"
math2025DF = pd.read_excel(mathPath2, sheetName_math2)

#Read ELA results
fileName_ELA2025 = "district-ela-results-2018-2025-public.xlsx"
ELAPath2 = os.path.join(basePath, fileName_ELA2025)
print(ELAPath2)
sheetName_ELA2 = "ELA - All"
ELA2025DF = pd.read_excel(ELAPath2, sheetName_ELA2)

G:\My Drive\Kids\NYC_schools_mapped\raw_data\district-math-results-2018-2025-public.xlsx
G:\My Drive\Kids\NYC_schools_mapped\raw_data\district-ela-results-2018-2025-public.xlsx


In [7]:
math2025DF.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1344 entries, 0 to 1343
Data columns (total 16 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   District          1344 non-null   int64  
 1   Grade             1344 non-null   object 
 2   Year              1344 non-null   int64  
 3   Category          1344 non-null   object 
 4   Number Tested     1344 non-null   int64  
 5   Mean Scale Score  1344 non-null   float64
 6   # Level 1         1344 non-null   int64  
 7   % Level 1         1344 non-null   float64
 8   # Level 2         1344 non-null   int64  
 9   % Level 2         1344 non-null   float64
 10  # Level 3         1344 non-null   int64  
 11  % Level 3         1344 non-null   float64
 12  # Level 4         1344 non-null   int64  
 13  % Level 4         1344 non-null   float64
 14  # Level 3+4       1344 non-null   int64  
 15  % Level 3+4       1344 non-null   float64
dtypes: float64(6), int64(8), object(2)
memory 

In [8]:
ELA2025DF.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1344 entries, 0 to 1343
Data columns (total 16 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   District          1344 non-null   int64  
 1   Grade             1344 non-null   object 
 2   Year              1344 non-null   int64  
 3   Category          1344 non-null   object 
 4   Number Tested     1344 non-null   int64  
 5   Mean Scale Score  1344 non-null   float64
 6   # Level 1         1344 non-null   int64  
 7   % Level 1         1344 non-null   float64
 8   # Level 2         1344 non-null   int64  
 9   % Level 2         1344 non-null   float64
 10  # Level 3         1344 non-null   int64  
 11  % Level 3         1344 non-null   float64
 12  # Level 4         1344 non-null   int64  
 13  % Level 4         1344 non-null   float64
 14  # Level 3+4       1344 non-null   int64  
 15  % Level 3+4       1344 non-null   float64
dtypes: float64(6), int64(8), object(2)
memory 

In [9]:
math2025DF2 = math2025DF[math2025DF['Year'] == 2025]
math2025DF2.head(20)

Unnamed: 0,District,Grade,Year,Category,Number Tested,Mean Scale Score,# Level 1,% Level 1,# Level 2,% Level 2,# Level 3,% Level 3,# Level 4,% Level 4,# Level 3+4,% Level 3+4
0,1,3,2025,All Students,540,464.057,58,10.741,119,22.037,180,33.333,183,33.889,363,67.222
1,1,4,2025,All Students,571,466.091,98,17.163,84,14.711,189,33.1,200,35.026,389,68.126
2,1,5,2025,All Students,567,459.517,154,27.16,93,16.402,129,22.751,191,33.686,320,56.437
3,1,6,2025,All Students,524,462.24,123,23.473,104,19.847,112,21.374,185,35.305,297,56.679
4,1,7,2025,All Students,582,469.679,74,12.715,101,17.354,136,23.368,271,46.564,407,69.931
5,1,8,2025,All Students,219,450.498,70,31.963,34,15.525,87,39.726,28,12.785,115,52.511
6,1,All Grades,2025,All Students,3003,463.37,577,19.214,535,17.816,833,27.739,1058,35.231,1891,62.97
42,2,3,2025,All Students,2164,474.032,164,7.579,239,11.044,773,35.721,988,45.656,1761,81.377
43,2,4,2025,All Students,2129,473.943,216,10.146,230,10.803,753,35.369,930,43.682,1683,79.051
44,2,5,2025,All Students,2043,472.259,220,10.768,223,10.915,691,33.823,909,44.493,1600,78.316


In [10]:
ELA2025DF2 = ELA2025DF[ELA2025DF['Year'] == 2025]
ELA2025DF2.head(20)

Unnamed: 0,District,Grade,Year,Category,Number Tested,Mean Scale Score,# Level 1,% Level 1,# Level 2,% Level 2,# Level 3,% Level 3,# Level 4,% Level 4,# Level 3+4,% Level 3+4
0,1,3,2025,All Students,519,457.761,77,14.836,100,19.268,160,30.829,182,35.067,342,65.896
1,1,4,2025,All Students,548,460.442,76,13.869,87,15.876,157,28.65,228,41.606,385,70.255
2,1,5,2025,All Students,553,454.955,113,20.434,85,15.371,179,32.369,176,31.826,355,64.195
3,1,6,2025,All Students,533,452.447,107,20.075,123,23.077,147,27.58,156,29.268,303,56.848
4,1,7,2025,All Students,579,457.972,99,17.098,91,15.717,177,30.57,212,36.615,389,67.185
5,1,8,2025,All Students,543,457.523,81,14.917,121,22.284,125,23.02,216,39.779,341,62.799
6,1,All Grades,2025,All Students,3275,456.869,553,16.885,607,18.534,945,28.855,1170,35.725,2115,64.58
42,2,3,2025,All Students,2097,466.087,172,8.202,259,12.351,618,29.471,1048,49.976,1666,79.447
43,2,4,2025,All Students,2063,465.023,190,9.21,279,13.524,539,26.127,1055,51.139,1594,77.266
44,2,5,2025,All Students,1983,466.257,148,7.463,219,11.044,632,31.871,984,49.622,1616,81.493


<a id="MS_charts_district option "></a>
### Calculating district-wide test results for district layer on the map 

In [11]:
# Initializing the list of subjects to use throughout the notebook
subjects = ['Math', 'ELA'] 

In [12]:
# For convinience of future analysis, adding the data tables into dictionnairy by subjects
resultsDFs = {'Math': math2025DF2, 'ELA': ELA2025DF2}

In [13]:
for subject in subjects:
    resultsDF = resultsDFs[subject]
    resultsDF.info()

<class 'pandas.core.frame.DataFrame'>
Index: 224 entries, 0 to 1308
Data columns (total 16 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   District          224 non-null    int64  
 1   Grade             224 non-null    object 
 2   Year              224 non-null    int64  
 3   Category          224 non-null    object 
 4   Number Tested     224 non-null    int64  
 5   Mean Scale Score  224 non-null    float64
 6   # Level 1         224 non-null    int64  
 7   % Level 1         224 non-null    float64
 8   # Level 2         224 non-null    int64  
 9   % Level 2         224 non-null    float64
 10  # Level 3         224 non-null    int64  
 11  % Level 3         224 non-null    float64
 12  # Level 4         224 non-null    int64  
 13  % Level 4         224 non-null    float64
 14  # Level 3+4       224 non-null    int64  
 15  % Level 3+4       224 non-null    float64
dtypes: float64(6), int64(8), object(2)
memory usage:

In [15]:
for subject in subjects:
    resultsDF = resultsDFs[subject]
    resultsDF['Grade'] = resultsDF['Grade'].apply(pd.to_numeric, errors = 'coerce')
    resultsDF.info()
    print(len(resultsDF))

<class 'pandas.core.frame.DataFrame'>
Index: 224 entries, 0 to 1308
Data columns (total 16 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   District          224 non-null    int64  
 1   Grade             192 non-null    float64
 2   Year              224 non-null    int64  
 3   Category          224 non-null    object 
 4   Number Tested     224 non-null    int64  
 5   Mean Scale Score  224 non-null    float64
 6   # Level 1         224 non-null    int64  
 7   % Level 1         224 non-null    float64
 8   # Level 2         224 non-null    int64  
 9   % Level 2         224 non-null    float64
 10  # Level 3         224 non-null    int64  
 11  % Level 3         224 non-null    float64
 12  # Level 4         224 non-null    int64  
 13  % Level 4         224 non-null    float64
 14  # Level 3+4       224 non-null    int64  
 15  % Level 3+4       224 non-null    float64
dtypes: float64(7), int64(8), object(1)
memory usage:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  resultsDF['Grade'] = resultsDF['Grade'].apply(pd.to_numeric, errors = 'coerce')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  resultsDF['Grade'] = resultsDF['Grade'].apply(pd.to_numeric, errors = 'coerce')


In [16]:
resultsMS_Norm = {}

for subject in subjects:
        
    resultsDF = resultsDFs[subject]
    
    # Dataframe with only grades 6-8 results (middle schools and K-8) by years
    resultsMS = resultsDF[(resultsDF['Grade'] >= 6)&(resultsDF['Grade'] <= 8)]
    
    # Dataframe with results grouped by years
    resultsMS = resultsMS.groupby(['District','Year'])[['# Level 1','# Level 2','# Level 3','# Level 4']].sum()
    
    # Change column names to include subject
    resultsMS.columns = [f'Level 1 {subject}',f'Level 2 {subject}',f'Level 3 {subject}',f'Level 4 {subject}']
    
    # Dataframe for middle schools by years with normalized values
    resultsMS_Norm[subject] = resultsMS.div(resultsMS.sum(axis=1), axis=0)
    resultsMS_Norm[subject].reset_index(inplace=True)
    
    print(resultsMS_Norm[subject].head())

   District  Year  Level 1 Math  Level 2 Math  Level 3 Math  Level 4 Math
0         1  2025         0.202         0.180         0.253         0.365
1         2  2025         0.137         0.143         0.305         0.416
2         3  2025         0.186         0.186         0.269         0.359
3         4  2025         0.288         0.270         0.276         0.165
4         5  2025         0.349         0.234         0.272         0.145
   District  Year  Level 1 ELA  Level 2 ELA  Level 3 ELA  Level 4 ELA
0         1  2025        0.173        0.202        0.271        0.353
1         2  2025        0.104        0.147        0.295        0.454
2         3  2025        0.148        0.192        0.280        0.380
3         4  2025        0.246        0.274        0.261        0.219
4         5  2025        0.255        0.264        0.279        0.201


In [17]:
DFs = list(resultsMS_Norm.values())
districts_combined = pd.merge(DFs[0], DFs[1], left_index=True, right_index=True, suffixes=('', '_drop'))
districts_combined = districts_combined.loc[:, ~districts_combined.columns.str.endswith('_drop')]
print(districts_combined.head())

del DFs

   District  Year  Level 1 Math  Level 2 Math  Level 3 Math  Level 4 Math  \
0         1  2025         0.202         0.180         0.253         0.365   
1         2  2025         0.137         0.143         0.305         0.416   
2         3  2025         0.186         0.186         0.269         0.359   
3         4  2025         0.288         0.270         0.276         0.165   
4         5  2025         0.349         0.234         0.272         0.145   

   Level 1 ELA  Level 2 ELA  Level 3 ELA  Level 4 ELA  
0        0.173        0.202        0.271        0.353  
1        0.104        0.147        0.295        0.454  
2        0.148        0.192        0.280        0.380  
3        0.246        0.274        0.261        0.219  
4        0.255        0.264        0.279        0.201  


In [18]:
# addting the mapping column - combined ELA+Math shares of level 4 MS grades results
districts_combined['Level 4 Math+Ela'] = districts_combined['Level 4 Math']+districts_combined['Level 4 ELA']
districts_combined.head(10)

Unnamed: 0,District,Year,Level 1 Math,Level 2 Math,Level 3 Math,Level 4 Math,Level 1 ELA,Level 2 ELA,Level 3 ELA,Level 4 ELA,Level 4 Math+Ela
0,1,2025,0.202,0.18,0.253,0.365,0.173,0.202,0.271,0.353,0.718
1,2,2025,0.137,0.143,0.305,0.416,0.104,0.147,0.295,0.454,0.869
2,3,2025,0.186,0.186,0.269,0.359,0.148,0.192,0.28,0.38,0.739
3,4,2025,0.288,0.27,0.276,0.165,0.246,0.274,0.261,0.219,0.385
4,5,2025,0.349,0.234,0.272,0.145,0.255,0.264,0.279,0.201,0.347
5,6,2025,0.251,0.271,0.328,0.151,0.264,0.3,0.27,0.166,0.317
6,7,2025,0.373,0.292,0.255,0.079,0.286,0.362,0.248,0.104,0.183
7,8,2025,0.329,0.26,0.285,0.126,0.295,0.298,0.271,0.135,0.261
8,9,2025,0.327,0.285,0.27,0.119,0.305,0.309,0.266,0.119,0.238
9,10,2025,0.331,0.284,0.281,0.104,0.271,0.301,0.28,0.148,0.252


In [19]:
districts_combined.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 11 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   District          32 non-null     int64  
 1   Year              32 non-null     int64  
 2   Level 1 Math      32 non-null     float64
 3   Level 2 Math      32 non-null     float64
 4   Level 3 Math      32 non-null     float64
 5   Level 4 Math      32 non-null     float64
 6   Level 1 ELA       32 non-null     float64
 7   Level 2 ELA       32 non-null     float64
 8   Level 3 ELA       32 non-null     float64
 9   Level 4 ELA       32 non-null     float64
 10  Level 4 Math+Ela  32 non-null     float64
dtypes: float64(9), int64(2)
memory usage: 2.9 KB


In [20]:
# Export the data frame with MS results by districts to excel file for future use
fileName = f'DistrictsMSNorm2025.xlsx'
path = os.path.join(basePath, fileName)
districts_combined.to_excel(path)

del fileName, path