# Analysis of NYC public schools results in ELA and math grades 6-8: prepare data by districts

<span style="color: red;">**If kernel can't connect to server again run command:**
*netsh winsock reset*<span>

## Table of contents

1. [Data sources](#data)
4. [Performace levels: definions](#levels_definition)
2. [Imports: modules](#modules)
3. [Read data](#read_data)
5. [Analysis](#analysis)
    1. [Share of students with results MATH "Level 4" in all NYC public schools by year (Grades 3-8)](#share)
    1. [Math results timeseries chart: NYC](#city)
    1. [Middle Schools (grades 6-8) test results by school district](#MS_charts_district)

<a id="data"></a> 
#### Data:
1. Data New York City grades 3-8 New York State English Language Arts and Math State Tests results 2013-2023:<br>https://infohub.nyced.org/reports/academics/test-results
2. New York City school districts boundaries:<br>https://data.cityofnewyork.us/Education/School-Districts/r8nu-ymqj

<a id="levels_definition"></a> 
#### Definitions of Performance Levels for the 2023 Grades 3-8 English Language Arts and Mathematics Tests  

**NYS Level 1**: Students performing at this level are below proficient in standards for their grade. They may demonstrate limited knowledge, skills, and practices embodied by the Learning Standards that are considered insufficient for the expectations at this grade. 

**NYS Level 2**: Students performing at this level are partially proficient in standards for their grade. They demonstrate knowledge, skills, and practices embodied by the Learning Standards that are considered partial but insufficient for the expectations at this grade. Students performing at Level 2 are considered on track to meet current New York high school graduation requirements but are not yet proficient in Learning Standards at this grade. 

**NYS Level 3**: Students performing at this level are proficient in standards for their grade. They demonstrate knowledge, skills, and practices embodied by the Learning Standards that are considered sufficient for the expectations at this grade.  

**NYS Level 4**: Students performing at this level excel in standards for their grade. They demonstrate knowledge, skills, and practices embodied by the Learning Standards that are considered more than sufficient for the expectations at this grade.  

*Source: NYSED, 2023, https://www.p12.nysed.gov/irs/ela-math/2023/ela-math-score-ranges-performance-levels-2023.pdf*

<a id="questions"></a> 
### Question
*1. How to compare the school districts?*
<br>In this analysis, we choose the sum of shares of students with level 4 test resulsts in state math and ELA test as comparison variable. The sum can be between 0 and 2. This indicator is selected to cover both subjects.
ALternatively, the indicator can be sum of shares of students with levels 3+4 test results in math and ELA. The notebook would be needed to changed accordingly.

<a id="modules"></a> 
#### Imports: modules

In [1]:
import os
import pandas as pd
#import geopandas as gpd
import matplotlib.pyplot as plt

pd.set_option('display.float_format', '{:.3f}'.format)

<a id="read_data"></a> 
#### Read data

In [2]:
basePath = r"G:\My Drive\Kids\NYC_schools_mapped\raw_data"

#Read math results
fileName_math = "school-math-results-2013-2023-(public).xlsx"
mathPath = os.path.join(basePath,fileName_math)
print(mathPath)
sheetName_math = "All"
mathResultsDF = pd.read_excel(mathPath, sheetName_math)

#Read ELA results
fileName_ELA = "school-ela-results-2013-2023-(public).xlsx"
ELAPath = os.path.join(basePath,fileName_ELA)
print(ELAPath)
sheetName_ELA = "All"
ELAResultsDF = pd.read_excel(ELAPath, sheetName_ELA)

G:\My Drive\Kids\NYC_schools_mapped\raw_data\school-math-results-2013-2023-(public).xlsx
G:\My Drive\Kids\NYC_schools_mapped\raw_data\school-ela-results-2013-2023-(public).xlsx


In [23]:
# Change the subject below and rerun the notebook
# subject = 'math'
subject = 'ELA'

In [24]:
resultsDF = ELAResultsDF if subject == 'ELA' else mathResultsDF

In [25]:
resultsDF.head()

Unnamed: 0,DBN,School Name,Grade,Year,Category,Number Tested,Mean Scale Score,# Level 1,% Level 1,# Level 2,% Level 2,# Level 3,% Level 3,# Level 4,% Level 4,# Level 3+4,% Level 3+4
0,01M015,P.S. 015 ROBERTO CLEMENTE,3.0,2023,All Students,24,454.833,4.0,16.667,5.0,20.833,11.0,45.833,4.0,16.667,15.0,62.5
1,01M015,P.S. 015 ROBERTO CLEMENTE,4.0,2023,All Students,17,453.647,1.0,5.882,6.0,35.294,8.0,47.059,2.0,11.765,10.0,58.824
2,01M015,P.S. 015 ROBERTO CLEMENTE,5.0,2023,All Students,30,440.5,10.0,33.333,11.0,36.667,7.0,23.333,2.0,6.667,9.0,30.0
3,01M015,P.S. 015 ROBERTO CLEMENTE,6.0,2023,All Students,1,,,,,,,,,,,
4,01M015,P.S. 015 ROBERTO CLEMENTE,,2023,All Students,72,,,,,,,,,,,


In [26]:
resultsDF.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 42343 entries, 0 to 42342
Data columns (total 17 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   DBN               42343 non-null  object 
 1   School Name       42343 non-null  object 
 2   Grade             32599 non-null  float64
 3   Year              42343 non-null  int64  
 4   Category          42343 non-null  object 
 5   Number Tested     42343 non-null  int64  
 6   Mean Scale Score  42168 non-null  float64
 7   # Level 1         42168 non-null  float64
 8   % Level 1         42168 non-null  float64
 9   # Level 2         42168 non-null  float64
 10  % Level 2         42168 non-null  float64
 11  # Level 3         42168 non-null  float64
 12  % Level 3         42168 non-null  float64
 13  # Level 4         42168 non-null  float64
 14  % Level 4         42168 non-null  float64
 15  # Level 3+4       42168 non-null  float64
 16  % Level 3+4       42168 non-null  float6

In [27]:
# resultsDF.info() showed that most of the columns are objects instead of numbers and needed to be converted
resultsDF_colToConvert = ['Mean Scale Score',
 'Grade',                             
 '# Level 1',
 '% Level 1',
 '# Level 2',
 '% Level 2',
 '# Level 3',
 '% Level 3',
 '# Level 4',
 '% Level 4',
 '# Level 3+4',
 '% Level 3+4']
resultsDF[resultsDF_colToConvert] = resultsDF[resultsDF_colToConvert].apply(pd.to_numeric, errors = 'coerce')
resultsDF.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 42343 entries, 0 to 42342
Data columns (total 17 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   DBN               42343 non-null  object 
 1   School Name       42343 non-null  object 
 2   Grade             32599 non-null  float64
 3   Year              42343 non-null  int64  
 4   Category          42343 non-null  object 
 5   Number Tested     42343 non-null  int64  
 6   Mean Scale Score  42168 non-null  float64
 7   # Level 1         42168 non-null  float64
 8   % Level 1         42168 non-null  float64
 9   # Level 2         42168 non-null  float64
 10  % Level 2         42168 non-null  float64
 11  # Level 3         42168 non-null  float64
 12  % Level 3         42168 non-null  float64
 13  # Level 4         42168 non-null  float64
 14  % Level 4         42168 non-null  float64
 15  # Level 3+4       42168 non-null  float64
 16  % Level 3+4       42168 non-null  float6

<a id="analysis"></a> 
## Analysis

<a id="MS_charts_district"></a>
#### Making middle school test results column charts by school districts

In [8]:
# Make a list of districts numbers in 2-digit format
districts = []
for i in range(1,33):
    prefix = str(i).zfill(2) #make sure that each number is represented as a two-character string, starting with 0 if necessary
    districts.append(prefix)
print(districts)

['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32']


In [28]:
# Dictionaries to hold dataframes
district_dfs = {}
district_grouped_dfs = {}

#Create the dataframes
for i in districts:
    dfName = 'dist'+i+'_MS_DF_'+subject
    dfNameGrouped = dfName + '_grpd'
    district_dfs[dfName] = resultsDF[(resultsDF['DBN'].str.startswith(i)) & (resultsDF['Grade'] >= 6)&(resultsDF['Grade'] <= 8)]
    district_grouped_dfs[dfNameGrouped] = district_dfs[dfName].groupby('Year')[['# Level 1','# Level 2','# Level 3','# Level 4']].sum()
# To access a dataframe: some_dataframe = district_dfs['distXX_MS_DF_XXXX']  
# Replace XX with the desired district code    

In [None]:
#Create a normalized bar chart for each dataframe in the district_dfs 
#1. Normalize the dataframe rows to sum to 1
for dfNameGrouped, current_dataframe in district_grouped_dfs.items():
    # dfNameGrouped contains the name of the dataframe
    # current_dataframe contains the dataframe itself
    # Do something with current_dataframe
    #current_dataframe.dropna()
    normalized_df = current_dataframe.div(current_dataframe.sum(axis=1), axis=0)

    # 2. Plot the normalized dataframe
    normalized_df.plot(kind='bar', stacked=True, figsize=(10,6))
    
    title = 'Results by years, ' + subject + ' ' + dfNameGrouped
    plt.title(title)  # Set the title
    plt.xlabel('Year')  # X-axis label
    plt.ylabel('Share of students tested')  # Y-axis label
    plt.grid(axis='y')
    plt.yticks([0, 0.2, 0.4, 0.6, 0.8, 1.0], ['0%', '20%', '40%', '60%', '80%', '100%'])  # Adjust y-ticks to percentage

    plt.tight_layout()
    pdfTitle = title + '.pdf' 
    plt.savefig(pdfTitle) # Save output as a pdf
    plt.show()

In [29]:
# Produce a data frame with MS results by districts

## Prepare combined DF
districts_combined = pd.DataFrame()
## Select columns to normalize
columns_to_normalize = ['# Level 1', '# Level 2', '# Level 3', '# Level 4']


for dfNameGrouped, dataframe in district_grouped_dfs.items():
    for column in columns_to_normalize:
        # Calculate row sum for selected columns
        row_sum = dataframe[columns_to_normalize].sum(axis=1)
        dataframe[column] = dataframe[column].div(row_sum)
    # Select district number (simbols 5 and 6 from DF names)
    symbols = dfNameGrouped[4:6]
    # Create a new column with these symbols
    dataframe['District'] = symbols
    # Concatenate the data frames
    districts_combined = pd.concat([districts_combined, dataframe], ignore_index=False)

In [30]:
# Make sure that column "Years" is not index column
districts_combined.reset_index(inplace=True)
districts_combined.head()

Unnamed: 0,Year,# Level 1,# Level 2,# Level 3,# Level 4,District
0,2013,0.265,0.461,0.465,0.997,1
1,2014,0.214,0.438,0.518,0.997,1
2,2015,0.216,0.397,0.537,0.997,1
3,2016,0.156,0.425,0.535,0.997,1
4,2017,0.169,0.395,0.501,0.998,1


In [31]:
# Export the data frame with MS results by districts to excel file for future use
fileName = f'DistrictsMS{subject}Norm.xlsx'
path = os.path.join(basePath, fileName)
districts_combined.to_excel(path)

del fileName, path