# Analysis of NYC public schools results in ELA and math grades 6-8

<span style="color: red;">**If kernel can't connect to server again run command:**
*netsh winsock reset*<span>

### Processing data by schools

<a id="TOC"></a> 
## Table of Contents
1. [Data sources and definitions](#data)
2. [Research questions](#questions)
2. [Analysis of test results by middle schools](#analysis)
    1. [Imports: modules](#modules)
    3. [Read and prepare data](#read)
    4. [Getting the baseline change in tests results - citywide change](#citywide)
    5. [Getting the test results for middle schools and calculate comparison indicator by school](#middle) 
        1. [Best middle schools by math](#best)
        2. [Create dataframe with average 2013-2023 math and ELA test results for all middle school grades](#ten)
        3. [Create dataframe with average 2019-2023 (last 3 tests) math and ela test results for all middle school grades](#three)
    6.[Create final dataframe with data for mapping](#final)
        1. [Adding school status (citywide, boroughwide) and the diversity data to the dataframe with all tests resuls](#status)
        2. [Matching the schools names from GeoJSON schools location file and the results dataframe and merging](#match)
        3. [Adding history ELA/math results, demographic data as plots to the geodata frame and saving into GeoJSON file](#plots)

<a id="data"></a> 
### Data sources and definitions

#### Data:
1. New York City grades 3-8 New York State English Language Arts and Math State Tests results 2013-2023:<br>https://infohub.nyced.org/reports/academics/test-results
2. New York City schools demographic data:<br>https://data.cityofnewyork.us/Education/2017-18-2021-22-Demographic-Snapshot/c7ru-d68s/about_data
2. NYS schools locations:<br>
https://data.gis.ny.gov/maps/b6c624c740e4476689aa60fdc4aacb8f/about
3. Citywide or Boroughwide status:
<br>https://www.nycschoolhelp.com/borowide-citywide-middle-schools

#### Definitions of Performance Levels for the 2023 Grades 3-8 English Language Arts and Mathematics Tests  

**NYS Level 1**: Students performing at this level are below proficient in standards for their grade. They may demonstrate limited knowledge, skills, and practices embodied by the Learning Standards that are considered insufficient for the expectations at this grade. 

**NYS Level 2**: Students performing at this level are partially proficient in standards for their grade. They demonstrate knowledge, skills, and practices embodied by the Learning Standards that are considered partial but insufficient for the expectations at this grade. Students performing at Level 2 are considered on track to meet current New York high school graduation requirements but are not yet proficient in Learning Standards at this grade. 

**NYS Level 3**: Students performing at this level are proficient in standards for their grade. They demonstrate knowledge, skills, and practices embodied by the Learning Standards that are considered sufficient for the expectations at this grade.  

**NYS Level 4**: Students performing at this level excel in standards for their grade. They demonstrate knowledge, skills, and practices embodied by the Learning Standards that are considered more than sufficient for the expectations at this grade.  

*Source: NYSED, 2023, https://www.p12.nysed.gov/irs/ela-math/2023/ela-math-score-ranges-performance-levels-2023.pdf*

<a id="questions"></a> 
### Questions
*1. How to compare the schools?*
<br>In this analysis, we choose the sum of shares of students with level 4 test resulsts in state math and ELA test as comparison variable. The sum can be between 0 and 2. This indicator is selected to cover both subjects.
Alternatively, the indicator can be sum of shares of students with levels 3+4 test results in math and ELA. The notebook would be needed to changed accordingly.
<br><br>
*2. How the test results changed?*
<br>Compare last year test results in a school with the school 10-year average as percentage of average:
<br> school_change = (school_current_year - school_10year_average)
<br> citywide_change = (city_current_year - city_10year_average)
<br><br>
*3. How good the school is?* 
<br>Last three testing period results (2019, 2022, 2023) are different for some schools: due to COVID disruptions, testing procedures changes, in Destrict 15 due to admission rules changed. Therefore average 10 years scores do not reflect well schools situation now. Results for these 3 last testing years are taken instead.
<br><br>
*3. Is the school citywide or borowide?*
<br>
*4. Diversity?*
<br>
*5. School size?*

#### About this notebook

- This notebook '*1._NYC_data_processing_by_schools.ipynb*' contains the steps for the processing data on state testing of NYC public middle schools. 
- The notebook '*2._NYC_ELA_math_data processing_by_districts.ipynb*' contains steps to process district-wide data for NYC public middle schools.
- The notebook '*3._Generating_NYC_map_by_public_schools.ipynb*' contains code to generate the maps from the processed data.
- The map is available at: https://nycmsmap.netlify.app.

<a id="analysis"></a> 
### Analysis of test results by middle schools

<a id="modules"></a> 
#### Imports: modules

In [None]:
# Appending the path to 'utils' module

import sys

parent_dir = 'C:\\GITHUB\\NY_schools_maps\\notebooks'
sys.path.append(parent_dir)

In [None]:
import os
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import base64
from io import BytesIO
import math
from tqdm import tqdm
from utils import match_name, create_plot, process_schools, create_chart

pd.set_option('display.float_format', '{:.3f}'.format)

In [None]:
#To reload 'uitls' module if something changed

import utils
from importlib import reload
reload(utils)

Processing the information on citywide or open to Brooklyn borough residents middle schools from the *nyc school help* webpage into a csv file for reuse.

<a id="read"></a> 
#### Read data

In [None]:
basePath = r"G:\My Drive\Kids\NYC_schools_mapped"
dataFolder = r"raw_data"
outputFolder = r"processed_data"

In [None]:
## Read data by schools

#Read math results
fileName_math = "school-math-results-2013-2023-(public).xlsx"
mathPath = os.path.join(basePath,dataFolder,fileName_math)
print(mathPath)
sheetName_math = "All"
mathResultsDF = pd.read_excel(mathPath, sheetName_math)

#Read ELA results
fileName_ELA = "school-ela-results-2013-2023-(public).xlsx"
ELAPath = os.path.join(basePath, dataFolder, fileName_ELA)
print(ELAPath)
sheetName_ELA = "All"
ELAResultsDF = pd.read_excel(ELAPath, sheetName_ELA)

#Read demographic file
fileName_demog = "demographic-snapshot-2018-19-to-2022-23-(public).xlsx"
demogPath = os.path.join(basePath, dataFolder, fileName_demog)
print(demogPath)
sheetName_demog = "School"
demogData = pd.read_excel(demogPath, sheetName_demog)

#Read school status file
fileName_status = "cityBoroughWideschools.csv"
statusPath = os.path.join(basePath, dataFolder, fileName_status)
print(statusPath)
statusData = pd.read_csv(statusPath)

In [None]:
# Initializing the list of subjects to use throughout the notebook
subjects = ['Math', 'ELA'] 

In [None]:
# For convinience of future analysis, adding the data tables into dictionnairy by subjects
resultsDFs = {'Math': mathResultsDF, 'ELA': ELAResultsDF}

In [None]:
# resultsDF.info() showed that most of the columns are objects instead of numbers and 
# needed to be converted
for subject in subjects:
    resultsDF = resultsDFs[subject]
    resultsDF_colToConvert = ['Mean Scale Score',
     'Grade',                             
     '# Level 1',
     '% Level 1',
     '# Level 2',
     '% Level 2',
     '# Level 3',
     '% Level 3',
     '# Level 4',
     '% Level 4',
     '# Level 3+4',
     '% Level 3+4']
    resultsDF[resultsDF_colToConvert] = resultsDF[resultsDF_colToConvert].apply(pd.to_numeric, errors = 'coerce')
    resultsDF.info()
    print(len(resultsDF))
    
del resultsDF

In [None]:
## Read citywide data

# Read math results
fileName_cityMath = "citywide-math-results-2013-2023-(public).xlsx"
cityMathPath = os.path.join(basePath,dataFolder,fileName_cityMath)
print(cityMathPath)
sheetName_cityMath = "All"
cityMathDF = pd.read_excel(cityMathPath, sheetName_cityMath)

#Read ELA results
fileName_cityELA = "citywide-ela-results-2013-2023-(public).xlsx"
cityELAPath = os.path.join(basePath, dataFolder, fileName_cityELA)
print(cityELAPath)
sheetName_cityELA = "All"
cityELADF = pd.read_excel(cityELAPath, sheetName_cityELA)

In [None]:
# Dictionnary for citywide results
cityResultsDFs = {'Math': cityMathDF, 'ELA': cityELADF}

In [None]:
# Checking columns types
cityELADF.info()
cityMathDF.info()

In [None]:
# 'Grade' column in citywide data tables is object, convert to numeric
for subject in subjects:
    resultsDF = cityResultsDFs[subject]
    resultsDF['Grade'] = resultsDF['Grade'].apply(pd.to_numeric, errors = 'coerce')
    resultsDF.info()
    print(len(resultsDF))
    
del resultsDF

<a id="citywide"></a> 
### Getting the baseline change in tests results - citywide change

#### Prepare citywide dataframe with only middle school tests results (grades 6-8)

In [None]:
# Select middle school grades results from the citywide dataframes with math and ELA tests results by year
# and calculate percentages of results of each level
resultsMS_Norm = {}

for subject in subjects:
        
    resultsDF = cityResultsDFs[subject]
    
    # Dataframe with only grades 6-8 results (middle schools and K-8) by years
    resultsMS = resultsDF[(resultsDF['Grade'] >= 6)&(resultsDF['Grade'] <= 8)]
    
    # Dataframe with results grouped by years
    resultsMS = resultsMS.groupby('Year')[['# Level 1','# Level 2','# Level 3','# Level 4']].sum()
    
    # Change column names to include subject
    resultsMS.columns = [f'Level 1 {subject}',f'Level 2 {subject}',f'Level 3 {subject}',f'Level 4 {subject}']
    
    # Dataframe for middle schools by years with normalized values
    resultsMS_Norm[subject] = resultsMS.div(resultsMS.sum(axis=1), axis=0)
    resultsMS_Norm[subject].reset_index(inplace=True)
    
    print(resultsMS_Norm[subject].head())
    
    # Dataframe with average
    
del resultsDF, resultsMS

In [None]:
# Get 10 years average test result

resultsMS_10y_AVG = {}

for subject in subjects:
        
    resultsDF = cityResultsDFs[subject]
    
    # Dataframe with only grades 6-8 results (middle schools and K-8) by years
    resultsMS = resultsDF[(resultsDF['Grade'] >= 6)&(resultsDF['Grade'] <= 8)]
    
    # Dataframe with all results summed by all years
    columns_to_sum = ['# Level 1','# Level 2','# Level 3','# Level 4']
    resultsMS = resultsMS[columns_to_sum]
    
    # Change column names to include subject
    resultsMS.columns = [f'Level 1 {subject}',f'Level 2 {subject}',f'Level 3 {subject}',f'Level 4 {subject}']
    resultsMS = resultsMS.sum() #Dataframe got converted into a series, needs fixing later

    # Change column names to include subject
    resultsMS.columns = [f'Level 1 {subject}',f'Level 2 {subject}',f'Level 3 {subject}',f'Level 4 {subject}']
    
    # Dataframe for middle schools grades with normalized values
    resultsMS_10y_AVG[subject] = resultsMS.div(resultsMS.sum(axis=0))
    
    # Convert the series back into a dataframe
    resultsMS_10y_AVG[subject] = resultsMS_10y_AVG[subject].to_frame().T # Transpose to flip rows and columns
   
    print(resultsMS_10y_AVG[subject].head())
        
del resultsDF, resultsMS

In [None]:
# Make a merged city dataframe with both math and ELA results

DFs = list(resultsMS_10y_AVG.values())
cityAVG10yDF = pd.merge(DFs[0], DFs[1], left_index=True, right_index=True)
print(cityAVG10yDF.head())

del DFs

In [None]:
# Adding column with sum of shares of test results of level 4 in math and ELA

cityAVG10yDF['Level 4 Math+Ela'] = cityAVG10yDF['Level 4 Math']+cityAVG10yDF['Level 4 ELA']

In [None]:
# Add column 'Year' to 'cityAVG10yDF' dataframe to be able to merge the dataframes later

cityAVG10yDF.insert(0, 'Year',0)

In [None]:
cityAVG10yDF.head()

In [None]:
# Make a merged city dataframe with both math and ELA results by years

DFs = list(resultsMS_Norm.values())
cityResultsDF = pd.merge(DFs[0], DFs[1], on = ['Year'], how = 'inner')
print(cityResultsDF.head(11))

del DFs

In [None]:
# Calculating the column with sums of shares of level 4 results

cityResultsDF['Level 4 Math+Ela'] = cityResultsDF['Level 4 Math']+cityResultsDF['Level 4 ELA']
cityResultsDF.head(11)

In [None]:
# Comparison = '2023 - 10 year average' to see citywide trend 

TenyAVG_2023DF = (cityResultsDF.iloc[8] - cityAVG10yDF.iloc[0])
TenyAVG_2023DF = TenyAVG_2023DF.drop('Year')
TenyAVG_2023DF

<a id="middle"></a> 
### Getting the test results for middle schools and calculate comparison indicator by school

#### Prepare schools dataframe with only middle school tests results (grades 6-8)

In [None]:
# Select middle school grades results from the dataframes with math and ELA tests results by schools

resultsMS_bySchl_Norm ={}

for subject in subjects:
    
    
    resultsDF = resultsDFs[subject]
    
    # Dataframe with only grades 6-8 results (middle schools and K-8) by years
    resultsMS_bySchl = resultsDF[(resultsDF['Grade'] >= 6)&(resultsDF['Grade'] <= 8)]
    resultsMS_bySchl = resultsMS_bySchl.groupby(['DBN', 'School Name', 'Year'])[['# Level 1','# Level 2','# Level 3','# Level 4']].sum()
    
    # Change column names to include subject
    resultsMS_bySchl.columns = [f'Level 1 {subject}',f'Level 2 {subject}',f'Level 3 {subject}',f'Level 4 {subject}']
    
    # Dataframe for middle schools by years with normalized values
    resultsMS_bySchl_Norm[subject] = resultsMS_bySchl.div(resultsMS_bySchl.sum(axis=1), axis=0)
    resultsMS_bySchl_Norm[subject].reset_index(inplace=True)
    
    print(resultsMS_bySchl_Norm[subject].head())
    
del resultsDF, resultsMS_bySchl

In [None]:
# Make a merged dataframe with both math and ELA results

DFs = list(resultsMS_bySchl_Norm.values())
allResultsDF = pd.merge(DFs[0], DFs[1], on = ['DBN', 'Year'], how = 'inner', suffixes=('', '_drop'))
allResultsDF = allResultsDF.loc[:, ~allResultsDF.columns.str.endswith('_drop')]
allResultsDF.head(5)

del DFs

In [None]:
# Add colomn with sum of shares of level4 students by math and level4 students ELA

allResultsDF['Level 4 Math+Ela'] = allResultsDF[f'Level 4 {subjects[0]}']+allResultsDF[f'Level 4 {subjects[1]}']
allResultsDF.head(10)

<a id="best"></a> 
#### Select schools with the best results for all middle school grades in 2023
Optional step,except for the first cell (dataframe for 2023), is not needed for the rest of the analysis.

In [None]:
# This dataframe for 2023 is used later to compare school progress to the citywide progress

allSchools2023 = allResultsDF[(allResultsDF['Year'] == 2023)]
allSchools2023.head()

Optional steps if desired:

<a id="ten"></a> 
#### Create dataframe with average 2013-2023 math and ELA test results for all middle school grades

In [None]:
# Make a merged dataframe with both Math and ELA average 2013-2023 results by schools

resultsMS_top50_AVG2013_23 = {}
resultsMS_AVG2013_23 = {}

for subject in subjects:
    
    resultsDF = resultsDFs[subject]
    
    # Dataframe with only grades 6-8 results (middle schools and K-8) by schools
    resultsMS_bySchl_sumed = resultsDF[(resultsDF['Grade'] >= 6)&(resultsDF['Grade'] <= 8)].groupby(['DBN', 'School Name'])[['# Level 1','# Level 2','# Level 3','# Level 4']].sum()
    # Rename columns
    resultsMS_bySchl_sumed.columns = [f'# Level 1 {subject}',f'# Level 2 {subject}',f'# Level 3 {subject}',f'# Level 4 {subject}']

    
    # Dataframe for middle schools by years with normalized values
    resultsMS_bySchl_sumed_Norm = resultsMS_bySchl_sumed.div(resultsMS_bySchl_sumed.sum(axis=1), axis=0)
    resultsMS_bySchl_sumed_Norm.columns = [f'10yrs avg Lvl 1 {subject}',f'10yrs avg Lvl 2 {subject}',f'10yrs avg Lvl 3 {subject}',f'10yrs avg Lvl 4 {subject}']
    resultsMS_bySchl_sumed_Norm.reset_index(inplace = True)
    
    # Add the dataframe to the respective dictionnary 
    resultsMS_AVG2013_23[subject] = resultsMS_bySchl_sumed_Norm
    print(len(resultsMS_AVG2013_23[subject]))
    
del resultsDF, resultsMS_bySchl_sumed_Norm, resultsMS_bySchl_sumed

In [None]:
# Make a merged dataframe with both Math and ELA average 2013-2023 results by schools

AVG2013_23_DFs = list(resultsMS_AVG2013_23.values())
allResultsAVG2013_23DF = pd.merge(AVG2013_23_DFs[0], AVG2013_23_DFs[1], on = ['DBN','School Name'], how = 'inner', suffixes=('', '_drop'))
allResultsAVG2013_23DF = allResultsAVG2013_23DF.loc[:, ~allResultsAVG2013_23DF.columns.str.endswith('_drop')]
allResultsAVG2013_23DF['10yrs avg Lvl 4 Math+Ela'] = allResultsAVG2013_23DF[f'10yrs avg Lvl 4 {subjects[0]}']+allResultsAVG2013_23DF[f'10yrs avg Lvl 4 {subjects[1]}']

del AVG2013_23_DFs

In [None]:
allResultsAVG2013_23DF.head()

In [None]:
# Merging in the 2023 results

allResultsAVG2013_23DF = allResultsAVG2013_23DF.merge(allSchools2023, left_on = 'School Name', right_on = 'School Name',  suffixes=('', '_drop'))
allResultsAVG2013_23DF = allResultsAVG2013_23DF.loc[:, ~allResultsAVG2013_23DF.columns.str.endswith('_drop')]
allResultsAVG2013_23DF.head()

In [None]:
# Adding comparison between results of 2023 and 2013-2023 average

allResultsAVG2013_23DF['2023-10yAVG'] = allResultsAVG2013_23DF['Level 4 Math+Ela'] - allResultsAVG2013_23DF['10yrs avg Lvl 4 Math+Ela']
allResultsAVG2013_23DF.head()

In [None]:
allResultsAVG2013_23DF.info()

<a id="three"></a> 
#### Create dataframe with average 2019-2023 (last 3 tests) math and ela test results for all middle school grades

In [None]:
# Make a merged dataframe with both math and ELA average 2019-2023 results 

resultsMS_AVG2019_23 = {}

for subject in subjects:
    
    resultsDF = resultsDFs[subject]
    
    # Dataframe with only grades 6-8 results (middle schools and K-8) by schools
    resultsMS_bySchl_sumed = resultsDF[((resultsDF['Grade'] >= 6)&(resultsDF['Grade'] <= 8))&(resultsDF['Year'] >= 2019)].groupby(['DBN', 'School Name'])[['# Level 1','# Level 2','# Level 3','# Level 4']].sum()
    # Rename columns
    resultsMS_bySchl_sumed.columns = [f'# Level 1 {subject}',f'# Level 2 {subject}',f'# Level 3 {subject}',f'# Level 4 {subject}']

    
    # Dataframe for middle schools by years with normalized values
    resultsMS_bySchl_sumed_Norm = resultsMS_bySchl_sumed.div(resultsMS_bySchl_sumed.sum(axis=1), axis=0)
    resultsMS_bySchl_sumed_Norm.columns = [f'3yrs avg Lvl 1 {subject}',f'3yrs avg Lvl 2 {subject}',f'3yrs avg Lvl 3 {subject}',f'3yrs avg Lvl 4 {subject}']
    resultsMS_bySchl_sumed_Norm.reset_index(inplace = True)
    
    # Add the dataframe to the respective dictionnary     
    resultsMS_AVG2019_23[subject] = resultsMS_bySchl_sumed_Norm
    print(len(resultsMS_AVG2019_23[subject]))
    
del resultsDF, resultsMS_bySchl_sumed_Norm, resultsMS_bySchl_sumed

In [None]:
# Make a merged dataframe with both Math and ELA average 2019-2023 results 

AVG2019_23_DFs = list(resultsMS_AVG2019_23.values())
allResultsAVG2019_23DF = pd.merge(AVG2019_23_DFs[0], AVG2019_23_DFs[1], on = ['DBN','School Name'], how = 'inner')
allResultsAVG2019_23DF['3yrs avg Lvl 4 Math+Ela'] = allResultsAVG2019_23DF[f'3yrs avg Lvl 4 {subjects[0]}']+allResultsAVG2019_23DF[f'3yrs avg Lvl 4 {subjects[1]}']

del AVG2019_23_DFs

In [None]:
allResultsAVG2019_23DF.head()

<a id="final"></a> 
### Create final dataframe with data for mapping

In [None]:
# Merge dataframes with average 10 years and last 3 tests results

schoolsAllData = pd.merge(allResultsAVG2013_23DF, allResultsAVG2019_23DF, left_on = ['DBN', 'School Name'], right_on = ['DBN', 'School Name'], how = 'inner')
schoolsAllData.head()

In [None]:
# If needed, the dataframe can be saved to csv for safekeeping or for reuse without repeating 
# the steps above

filename = 'schools2013_2023_AVG.csv'
name = os.path.join(basePath, outputFolder,filename)
schoolsAllData.to_csv(name, index = True)
del filename, name

In [None]:
schoolsAllData.info()

<a id="status"></a> 
#### Adding school status (citywide, boroughwide) and the diversity data to the dataframe with all tests resuls

In [None]:
# Preparing the demographic data

demogData.columns = [col.replace('/', '_') for col in demogData.columns]

In [None]:
# Selecting the columns needed for analysis from demography data

cols = ['DBN', 'Year', 'Total Enrollment', '% Asian', '% Black', '% Hispanic', '% Multi-Racial', '% Native American', '% White', '% Missing Race_Ethnicity Data']
diversityData = demogData[cols]
index = diversityData['Year'] == '2022-23'
diversityData = diversityData[index]

In [None]:
len(diversityData)

In [None]:
# Merging the school diversity data and school status (open to city/borough) data

diversityStatusData = pd.merge(diversityData, statusData, on = 'DBN', how = 'outer')
len(diversityStatusData)

In [None]:
# Merging schools data (short version) for analysis with demographic and status data

schoolsMergedData = schoolsAllData.merge(diversityStatusData, on = 'DBN', how = 'inner', suffixes=('', '_drop'))
schoolsMergedData = schoolsMergedData.loc[:, ~schoolsMergedData.columns.str.endswith('_drop')]

In [None]:
len(schoolsMergedData)

In [None]:
schoolsMergedData.info()

#### Read schools geolocation file

In [None]:
## Read GeoJSON into data frame
SchoolsFile = 'NYC_K-12_schools_public.geojson'
NYCSchoolsPath = os.path.join(basePath, dataFolder, SchoolsFile)
NYCSchoolsData = gpd.read_file(NYCSchoolsPath)

DistrictsFile = 'School Districts.geojson'
NYCDistrictsPath = os.path.join(basePath, dataFolder, DistrictsFile)
NYCDistrictsData = gpd.read_file(NYCDistrictsPath)

<a id="match"></a> 
#### Matching the schools names from GeoJSON schools location file and the results dataframe and merging

In [None]:
#NYCSchoolsData.info() #Too many columns --> make a smaller copy
NYCSchoolsDataShort = NYCSchoolsData[['OBJECTID', 'LEGAL_NAME', 'PHYSADDRLINE1', 'PHYSCITY', 'COUNTY_DESC', 'RECORD_TYPE_DESC', 'SDL_DESC', 'geometry']]
NYCSchoolsDataShort.head()

In [None]:
# Matching the school all data file with spatial data (geojson of schools locations)

tqdm.pandas(desc="Matching Names")

# Matching names from resultsMS_bySchl_Norm[subject] to NYCSchoolsDataShort
matched_tuples = schoolsMergedData['School Name'].progress_apply(lambda x: match_name(x, NYCSchoolsDataShort['LEGAL_NAME'], min_score=80))

print('Done.')

In [None]:
print('Appending mathes to the dataframe.')
schoolsMergedData['matched_name'] = list(zip(*matched_tuples))[0]
schoolsMergedData['matched_score'] = list(zip(*matched_tuples))[1]
print('Done.')

In [None]:
schoolsMergedData.head()

In [None]:
# Merging DataFrames based on the matched name
schoolsAllData_mappable = pd.merge(NYCSchoolsDataShort,schoolsMergedData, left_on='LEGAL_NAME', right_on='matched_name')

In [None]:
schoolsAllData_mappable.info()

In [None]:
print(schoolsAllData_mappable['matched_name'].isnull().sum())

<a id="plots"></a> 
#### Adding history ELA/math results, diversity data as plots to the geodata frame and saving into GeoJSON file

In [None]:
# Make piecharts for popups in the map and add them as columns to the mappable dataframe

# Initialize AVGDF_mappable_plots with the original DataFrame to preserve its content across merges
schools_mappable_plots = schoolsAllData_mappable.copy()

# Set interactive mode off
plt.ioff()

# list of schools names

schoolsNames = schoolsAllData_mappable['DBN'].to_list()

# Create disctionnary to hold the dataframes by schools
schoolDFs = {}

# Make dataframes by schools 
for name in schoolsNames:
    dfName = name
    schoolDFs[dfName] = schools_mappable_plots[schools_mappable_plots['DBN'] == name]

plots = []
plotsDFs = {}

print("Making test results plots ...")

columns_to_plot = ['% Asian', '% Black', '% Hispanic', '% Multi-Racial', '% Native American', '% White', '% Missing Race_Ethnicity Data']  
# Plot dataframes by school
for schoolDF, current_dataframe in tqdm(schoolDFs.items()):
    # schoolDF contains the name of the dataframe
    # current_dataframe contains the dataframe itself

        # Do something with current_dataframe
        # Create a plot
        fig = create_chart(current_dataframe, schoolDF, columns_to_plot)

        # Convert the plot to a PNG image and then encode it
        io_buf = BytesIO()
        fig.savefig(io_buf, format='png', bbox_inches='tight')
        # Close the figure
        plt.close()        
        #Reading file to get the base64 string
        io_buf.seek(0)
        base64_string = base64.b64encode(io_buf.read()).decode('utf8')

        pair = (schoolDF, base64_string)

        plots.append(pair)

print('Adding plots to the data frame with test results.')           
# add the plots to the geodataframe of middle schools subject results 
plotsDFs = pd.DataFrame(plots, columns=['DBN', 'Dvst_chart'])

schools_mappable_plots = pd.merge(schools_mappable_plots, plotsDFs, left_on = 'DBN', right_on='DBN')
    
del schoolDFs, columns_to_plot, plotsDFs
print('Done.')   

In [None]:
schools_mappable_plots.info()

In [None]:
# Make plots for popups in the map and add them as columns to the mappable dataframe

# Set interactive mode off
plt.ioff()

# list of schools names

schoolsNames = schools_mappable_plots['DBN'].to_list()
testResults = allResultsDF

# Create disctionnary to hold the dataframes by schools
schoolDFs = {}

# Make dataframes by schools 
for name in schoolsNames:
    dfName = name
    schoolDFs[dfName] = testResults[testResults['DBN'] == name]

plots = []
plotsDFs = {}

print("Making test results plots ...")

for subject in subjects:
    columns_to_plot = [f"Level 1 {subject}", f"Level 2 {subject}", f"Level 3 {subject}", f"Level 4 {subject}"]  
    # Plot dataframes by school
    for schoolDF, current_dataframe in tqdm(schoolDFs.items()):
        # schoolDF contains the name of the dataframe
        # current_dataframe contains the dataframe itself

            # Do something with current_dataframe
            # Create a plot
            fig = create_plot(current_dataframe, schoolDF, columns_to_plot)

            # Convert the plot to a PNG image and then encode it
            io_buf = BytesIO()
            fig.savefig(io_buf, format='png', bbox_inches='tight')
            # Close the figure
            plt.close()
            #Reading file to get the base64 string
            io_buf.seek(0)
            base64_string = base64.b64encode(io_buf.read()).decode('utf8')

            pair = (schoolDF, base64_string)

            plots.append(pair)

    # add the plots to the geodataframe of middle schools subject results 
    plotsDF = pd.DataFrame(plots, columns=['DBN', f'plot {subject}'])

    plotsDFs[subject] = plotsDF
    
print('Adding plots to the data frame with test results.')                
for subject, df in plotsDFs.items():
    schools_mappable_plots = pd.merge(schools_mappable_plots, df, left_on = 'DBN', right_on='DBN')

print('Done.')     

In [None]:
## Saving the resulting geodataframe into geoJSON file to make a map separately.

# If the area to display is less than the whole city or the number of schools
# selected to display is relatively small, the map can be displayed within a jupyter notebook,
# but in this case the dataframe is too big and the map is too loaded with symbols to use them this way.
# Therefore, we'll separate the map making and the data analysis into different notebooks and 
# later save a maps as html file. The geoJSON is used at this next step.

fname = 'schoolDataPlots.geojson'
fpath = os.path.join(basePath, outputFolder, fname)
print(f'Saving to {fpath} ...')
schools_mappable_plots.to_file(fpath, driver="GeoJSON")
print('Saved.')

del fname, fpath

In [None]:
schools_mappable_plots.info()