Exploring Segregation in NYC School
===================================
In 2014, the [_Civil Rights Project/Proyecto Derechos Civiles_](https://civilrightsproject.ucla.edu/) at UCLA released a report indicating that NYC had the most segregated schools in the nation. They followed up the 2014 report in 2021, finding that NYC was still last in terms of racial and ethnic integration of major school districts.

Since the 1950s the city has grappled with it separate and unequal school system, but the problems persist. Recently, activism around Black Lives Matter has re-invigorated the conversation, with student-led organizations like [Teens Take Charge](https://www.instagram.com/teenstakecharge/) playing a leading role. This brief report demonstrates some ways that we can use NYC open data to explore the shape of segregation in the city's schools.


In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from functools import partial

import scipy

from IPython.display import Markdown as md, HTML
from nycschools import schools, geo, ui, class_size


In [28]:
# load common core of data for 2007-2008 to match the data in the journal article
# description of columns: https://nces.ed.gov/ccd/pdf/psu071bgen.pdf
ccd = pd.read_csv("../../../_data/ccod-2007.csv")
ccd.head()
[c for c in ccd.columns if "CITY" in c]


  ccd = pd.read_csv("../../../_data/ccod-2007.csv")


['MCITY07']

In [43]:

ccd_nys = ccd[ccd["state"] == "NY"]
ccd_nys.groupby(["LEAID", "LEANM07"]).MEMBER07.sum().sort_values(ascending=False).head(40)
ccd_nyc = ccd_nys[ccd_nys.LEANM07.str.startswith("NEW YORK CITY")]
ccd_nyc = ccd_nyc[["school_id", "ASIAN07", "BLACK07", "HISP07", "WHITE07"]]
ccd_nyc.rename(columns={"ASIAN07": "asian_n", "BLACK07": "black_n", "HISP07": "hispanic_n", "WHITE07":"white_n"}, inplace=True)
ccd_nyc.to_csv("../../../_data/ccd-2007-nyc2.csv", index=False)


In [4]:
# load the demographic data and get just the most recent year and columns of interest
df = schools.load_school_demographics()
segregation_cols = ['dbn', 'district',  'school_name', 'total_enrollment',
        'asian_n', 'asian_pct', 'black_n', 'black_pct', 'hispanic_n',
        'hispanic_pct', 'white_n', 'white_pct', 'poverty_n', 'poverty_pct']
data = df[df.ay == df.ay.max()].copy()
data = data[segregation_cols]

Index of dissimilarity
----------------------

`D` is the _index of dissimilarity_ which is an "index of unevenness segregation." For our data D measures the unevenness of ethnic/racial distribution across schools. Higher values of D indicate that groups are not spread evenly across schools (more segregation), lower values indicate an even distribution of the population (less segregation). 1 would be perfectly segregated while 0 would be perfectly proportionate distribution. Below we calculate D for each district and then for the entire school system. We find that some districts have a low D index, where it is higher than the city index in other districts. This measure is based on Allen, R., and Vignoles, A. (2007) and Frankel, D. M., and Volij, O. (2011).

We create a function to calculate the D index for a school based on a set of data. We can use this to find unenveness with a geographic school district and/or within the whole city.


In [11]:
# calculate unevenness

def calculate_dissimilarity(data):
    total_black = data['black_n'].sum()
    total_white = data['white_n'].sum()
    total_asian = data['asian_n'].sum()
    total_hispanic = data['hispanic_n'].sum()

    total = data.total_enrollment.sum()

    non_black = total - total_black
    non_white = total - total_white
    non_asian = total - total_asian
    non_hispanic = total - total_hispanic

    def diss(row, eth, eth_total, total):
        cols = list(row.index)
        # the total students in the school outside of the target ethnic group `eth`
        non_eth = sum([row[col] for col in cols if col != eth and col.endswith('_n')])
        D = (row[eth] / eth_total) - (non_eth / total)
        return abs(D)

    black_D = data.apply(partial(diss, eth="black_n", eth_total=total_black, total=non_black), axis=1)
    black_D = black_D.sum() / 2

    white_D = data.apply(partial(diss, eth="white_n", eth_total=total_white, total=non_white), axis=1)
    white_D = white_D.sum() / 2

    asian_D = data.apply(partial(diss, eth="asian_n", eth_total=total_asian, total=non_asian), axis=1)
    asian_D = asian_D.sum() / 2

    hispanic_D = data.apply(partial(diss, eth="hispanic_n", eth_total=total_hispanic, total=non_hispanic), axis=1)
    hispanic_D = hispanic_D.sum() / 2

    # calculated a weighted average of the D indices
    weights = [data.asian_pct.mean(), data.black_pct.mean(), data.hispanic_pct.mean(), data.white_pct.mean()]
    D = np.average([asian_D, black_D, hispanic_D, white_D], weights=weights)
    
    return D

cols = ['dbn', 'district', 'boro', 'total_enrollment', 'black_n', 'white_n', 'asian_n',
        'hispanic_n', 'black_pct', 'white_pct', 'asian_pct', 'hispanic_pct']
data = df[cols].copy()
data.set_index('dbn', inplace=True)
seg_D = pd.DataFrame()
seg_D['district'] = data.district.unique()
seg_D['D'] = seg_D.district.apply(lambda x: calculate_dissimilarity(data[data.district == x]))
nyc_D = calculate_dissimilarity(data)
seg_D = seg_D.sort_values('D', ascending=False)
print("City D", nyc_D)

City D 0.4951746899047429


References
============

Allen, R., & Vignoles, A. (2007). What should an index of school segregation measure? _Oxford Review of Education_, _33_(5), 643–668. https://doi.org/10.1080/03054980701366306

Cohen, D. (2021). NYC School Segregation Report Card: Still Last, Action Needed Now! _Civil Rights Project/Proyecto Derechos Civiles_. UCLA. https://escholarship.org/uc/item/5fx616qn

Frankel, D. M., & Volij, O. (2011). Measuring school segregation. _Journal of Economic Theory_, _146_(1), 1–38. https://doi.org/10.1016/j.jet.2010.10.008

Lauren Lefty. (2021, February 11). [The Long Fight for Educational Equity in NYC](https://www.mcny.org/story/long-fight-educational-equity-nyc). _Museum of the City of New York_.

Zhang, C. H., & Ruther, M. (2021). Contemporary patterns and issues of school segregation and white flight in U.S. metropolitan areas: Towards spatial inquiries. _GeoJournal_, _86_(3), 1511–1526. https://doi.org/10.1007/s10708-019-10122-1
