<a href="https://colab.research.google.com/github/aankit/nycdoe_space_analysis/blob/master/NYCDOE_Space_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Understanding the Physical Space of the Largest School District in the United States

New York City has the largest school district in the country based on the number of students it serves. The COVID-19 pandemic has asked us all to be more cognizant of our physical space whether it be staying 6 feet apart in public or staying sane at home, especially if you live in a tiny NYC apartment.

###How much physical space does the largest school district in the country possess to serve its students? Why hasn't this question and the associated analysis been something we are all reading about as the NYC Department of Education plans for the 2020-21 school year?

I broke this larger question into four smaller questions and analyses. Feel free to reach out to me at hi@aankit.com if you have questions or thoughts!

###Question 1: Does publicly available data about the physical space of schools exist?

###Answer: Yes! Below I point out and load up some data sources for my analysis.

First I'm going to install and load some helpful tools :)

In [None]:
!pip install geopandas

In [41]:
from __future__ import print_function
import requests
import pandas as pd
import io
import geopandas as gpd

DOE Building Space Usage is available on the NYC Open Data Portal [here](https://data.cityofnewyork.us/Education/DOE-Building-Space-Usage/wavz-fkw8). I'm using the CSV because the [JSON API endpoint](https://data.cityofnewyork.us/resource/wavz-fkw8.json) only yielded 1000 rows and I'm too lazy to figure out why.

In [73]:
response = requests.get("https://data.cityofnewyork.us/api/views/wavz-fkw8/rows.csv?accessType=DOWNLOAD")
school_space = pd.read_csv(io.StringIO(response.text))

We will need more information schools - like what grades they serve and where they are located for a proper analysis. 

DOE School Information is available via a file called "LCGMS" [here](https://data.cityofnewyork.us/Education/LCGMS-DOE-School-Information-Report/3bkj-34v2). I'm using the [CSV with additional geocoded fields](https://data.cityofnewyork.us/api/views/3bkj-34v2/files/56813139-9b9d-44fb-b81d-068553b7a9b7?download=true&filename=LCGMS_SchoolData(additional%20geocoded%20fields%20added).csv)

In [51]:
response = requests.get("https://data.cityofnewyork.us/api/views/3bkj-34v2/files/56813139-9b9d-44fb-b81d-068553b7a9b7?download=true&filename=LCGMS_SchoolData(additional geocoded fields added).csv")
lcgms = pd.read_csv(io.StringIO(response.text))

And finally we will want to look at all of this on a map, so let's load a geospatial file of school zones from [here](https://data.cityofnewyork.us/Education/2019-2020-School-Zones-Elementary-/kuk3-ypca). I'm using the shapefile.

This is only relevant for elementary schools (and maybe some middle schools).

In [52]:
response = requests.get("https://data.cityofnewyork.us/api/geospatial/kuk3-ypca?method=export&format=GeoJSON")
school_zones = pd.DataFrame(response.json())

###Question 2: What areas of a school are instructional areas?

The data dictionary provided by the School Construction Authority on the  NYC Open Data Portal is helpful. It tells us that the DOE Building Space data we pulled in and named `school_space` has a `Room Function` column, or field. Let's get a count of the values in the `Room Function` field.

In [78]:
school_space["Room Function"].value_counts()

REGULAR CLASSROOM                70581
STORAGE ROOM                     46699
OTHER OFFICE                     36936
GENERAL BUILDING SUPPORT         31838
REGULAR CLASSROOM - MS GRADES    19807
                                 ...  
ELEVENTH GRADE                      94
GYM/AUD/CAFETERIA                   89
DRAFTING ROOM                       73
TWELFTH GRADE                       51
NEST NINTH-TWELFTH GRADE            13
Name: Room Function, Length: 109, dtype: int64

Based on this quick overview, if `Room Function` contains the word "classroom" or "grade" it can probably be classified as an instructional space.

In [93]:
instructional_school_space = school_space[school_space["Room Function"].str.contains("CLASSROOM|GRADE", na=False)]
instructional_school_space["Room Function"].value_counts()

REGULAR CLASSROOM                   70581
REGULAR CLASSROOM - MS GRADES       19807
D75 SPED CLASSROOM                  15482
NON-D75 SPED CLASSROOM              15481
FIRST GRADE                         13521
SECOND GRADE                        12696
THIRD GRADE                         12193
FOURTH GRADE                        11622
FIFTH GRADE                         11369
ICT - ELEMENTARY SCHOOL GRADES      10215
REGULAR CLASSROOM - HS GRADES        3813
MULTI-PURPOSE CLASSROOM              2544
SCIENCE CLASSROOM FOR PS             2028
SIXTH GRADE                          1644
ICT - MIDDLE SCHOOL GRADES           1558
SEVENTH GRADE                        1526
EIGHTH GRADE                         1482
MULTI-PURPOSE NON CLASSROOM          1177
NEST FIRST-THIRD GRADE                452
ICT - HIGH SCHOOL GRADES              424
NEST SIXTH-EIGHTH GRADE               407
NEST FOURTH-FIFTH GRADE               248
NINTH GRADE                           236
HORIZON SECOND-TWELFTH GRADE      

Let's take a look at the square footage of this instructional space. This view is aggregating across schools, however. The next step is to drill down to the school level before we link schools to the communities they serve.

In [107]:
instructional_school_space.groupby("Room Function").sum()

Unnamed: 0_level_0,Length,Width,Area
Room Function,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
D75 SPED CLASSROOM,372679.0,345510.0,8302789.0
EIGHTH GRADE,37973.0,40526.0,1031246.0
ELEVENTH GRADE,2529.0,2424.0,64696.0
FIFTH GRADE,281419.0,300719.0,7362002.0
FIRST GRADE,339764.0,352957.0,8773473.0
FOURTH GRADE,287273.0,306075.0,7479172.0
HORIZON KINDERGARTEN-FIRST GRADE,2509.0,2243.0,58305.0
HORIZON SECOND-TWELFTH GRADE,5293.0,5087.0,127132.0
ICT - ELEMENTARY SCHOOL GRADES,255937.0,269540.0,6657939.0
ICT - HIGH SCHOOL GRADES,11298.0,10321.0,275810.0


###Question 3: Can we determine each school's square footage per person?

### Question 4: Can we map each school's square footage per person to get an idea of the impact on different communities? 