## UP 213 Project Description

We are joining the following characteristics of MUNI stops and theur corresponding neighborhood’s characteristics from 2015-16:
- Median income
- Vehicle Ownership
- Racial Demographic
- In/Not In an Equity Strategy-designated neighborhood

This data serves as training data in a clustering model. We train the model to identify whether a stop is in an equity strategy neighborhood or not based on both the stop characteristics and neighborhood characteristics.

We then collect the same MUNI stop and neighborhood data from 2022-23, but this time only for equity strategy neighborhoods. This data acts as our test data. We feed this into the clustering model, and see whether it classifies the same stops as in or not in an equity strategy neighborhood based on their updated characteristics. 

If the model classifies our test data as still in equity strategy neighborhoods, then we can conclude that MUNI transit in equity strategy neighborhoods has not improved much since 2015-16. If the model classifies our test data as not in equity strategy neighborhoods, then we can conclude that MUNI transit has improved since 2015-16, and the city can consider graduating such neighborhoods from their “equity strategy” titles. 

### Data Collection

We are collecting data from the following sources:
- MUNI Stop Data from DataSF
- SF Neighbhorhoods from DataSF (SF Find Project)
- Census tract-level income and vehicle ownership data for 2017 and 2019 from the ACS
- Census tract-level race data for 2017 and 2019 from the ACS
- Census tract codes and their corresponding SF neighborhoods from DataSF (Analysis Neighborhoods Project)

We have not yet collected the following data, but would like to include it in a future iteration of this project:
- Bus speeds
- Ridership
- Proximity to Community Anchors

In [49]:
# Collect SF MUNI Stop Location Data
import pandas as pd

allMUNIstops = pd.read_csv('Data/Muni_Stops.csv')
allMUNIstops.head()

Unnamed: 0,OBJECTID,STOPNAME,TRAPEZESTOPABBR,RUCUSSTOPABBR,STOPID,LATITUDE,LONGITUDE,ACCESSIBILITYMASK,ATSTREET,ONSTREET,POSITION,ORIENTATION,SERVICEPLANNINGSTOPTYPE,SHELTER,INSERT_TIMESTAMP,SDE_ID,SIGNUPID,SUPERVISOR_DISTRICT,shape,Neighborhoods,SF Find Neighborhoods,Current Police Districts,Current Supervisor Districts,Analysis Neighborhoods
0,42619,Polk St&Lombard St NW-NS/BZ,POLKLOM0,POLKLOMB,5990,37.80167,-122.42303,0.0,LOMBARD ST,POLK ST,NS,NW,BZ,0.0,20230512124615,14816781,141,,POINT (-122.42303 37.80167),107.0,107.0,4.0,6.0,32.0
1,40917,Chestnut St&Fillmore St NE-NS/BZ,CHESFIL0,CHESFILL,3941,37.800845,-122.436245,0.0,WEBSTER ST,CHESTNUT ST,NS,NE,BZ,1.0,20230512124615,14809056,141,,POINT (-122.43625 37.800846),17.0,17.0,4.0,6.0,13.0
2,41525,Geary Blvd&Arguello Blvd NE-NS/BZ,GEARARG0,GEARARGL,4287,37.781376,-122.458737,0.0,ARGUELLO BLVD,GEARY BLVD,NS,NE,BZ,1.0,20230512124615,14810393,141,,POINT (-122.45874 37.781376),11.0,11.0,8.0,6.0,31.0
3,40679,3rd St&Folsom St N-FS/BZ,.3STFOL0,3STFOLS,3124,37.784204,-122.399326,0.0,CLEMENTINA ST,03RD ST,FS,NO,BZ,1.0,20230512124615,14808200,141,,POINT (-122.39932 37.784203),32.0,32.0,1.0,10.0,8.0
4,43044,Potrero Ave&24th St SW-FS/BZ,POTR24S0,POTR24ST,6039,37.75267,-122.40649,0.0,24TH ST,POTRERO AVE,FS,SW,BZ,0.0,20230512124615,14815720,141,,POINT (-122.40649 37.75267),53.0,53.0,3.0,2.0,20.0


The SF Find Neighborhood Codes column tells us which neighbhorhood each of the MUNI stops is in. Let's find the corresponding neighborhood names data.

In [50]:
# Find the neighborhood names that correspond to SF neighborhood codes.
SFneighborhoods = pd.read_csv('Data/SFFind_Neighborhoods.csv')  # replace with your DataFrame
SFneighborhoods.head()

Unnamed: 0,LINK,the_geom,name
0,"http://en.wikipedia.org/wiki/Sea_Cliff,_San_Fr...",MULTIPOLYGON (((-122.49345526799993 37.7835181...,Seacliff
1,,MULTIPOLYGON (((-122.48715071499993 37.7837854...,Lake Street
2,http://www.nps.gov/prsf/index.htm,MULTIPOLYGON (((-122.47758017099994 37.8109931...,Presidio National Park
3,,MULTIPOLYGON (((-122.47241052999993 37.7873465...,Presidio Terrace
4,http://www.sfgate.com/neighborhoods/sf/innerri...,MULTIPOLYGON (((-122.47262578999994 37.7863148...,Inner Richmond


In [51]:
# Get neighborhood census data for 2017 and for 2019.
import cenpy
from cenpy import products

acs = cenpy.products.ACS()
census2017 = products.ACS(2017).from_place('San Francisco, CA', level='tract',
                                        variables=['B19019_001E','B25046_001E'])
census2017.rename(columns={'B19019_001E':'median_hh_income', 'B25046_001E':'vehicles_avail'}, inplace=True)

census2017.head()

Matched: San Francisco, CA to San Francisco city within layer Incorporated Places


  census2017 = products.ACS(2017).from_place('San Francisco, CA', level='tract',


Unnamed: 0,GEOID,geometry,median_hh_income,vehicles_avail,state,county,tract
0,6075032801,"POLYGON ((-13635048.760 4543918.550, -13634929...",110255.0,2167.0,6,75,32801
1,6075033100,"POLYGON ((-13636532.870 4541575.590, -13636426...",111333.0,2676.0,6,75,33100
2,6075033201,"POLYGON ((-13635142.160 4541306.060, -13635136...",28750.0,442.0,6,75,33201
3,6075030301,"POLYGON ((-13634050.780 4545554.170, -13633943...",140179.0,3607.0,6,75,30301
4,6075031000,"POLYGON ((-13632506.330 4541080.160, -13632485...",131544.0,2244.0,6,75,31000


In [52]:
acs = cenpy.products.ACS()
census2019 = products.ACS(2019).from_place('San Francisco, CA', level='tract',
                                        variables=['B19019_001E','B25046_001E'])
census2019.rename(columns={'B19019_001E':'median_hh_income', 'B25046_001E':'vehicles_avail'}, inplace=True)

census2019.head()

Matched: San Francisco, CA to San Francisco city within layer Incorporated Places


  census2019 = products.ACS(2019).from_place('San Francisco, CA', level='tract',


Unnamed: 0,GEOID,geometry,median_hh_income,vehicles_avail,state,county,tract
0,6075035202,"POLYGON ((-13637736.350 4546153.040, -13637685...",89732.0,2898.0,6,75,35202
1,6075042700,"POLYGON ((-13635913.040 4548886.330, -13635803...",93250.0,2522.0,6,75,42700
2,6075030202,"POLYGON ((-13633379.300 4546390.880, -13633366...",128417.0,2053.0,6,75,30202
3,6075030900,"POLYGON ((-13633895.820 4539985.070, -13633869...",177694.0,4716.0,6,75,30900
4,6075045100,"POLYGON ((-13632661.740 4548547.020, -13632647...",141912.0,2623.0,6,75,45100


In [53]:
race2017 = products.ACS(2017).from_place('San Francisco, CA', level='tract',
                                        variables='B02001')
race2017.rename(columns={'B02001_001E':'total pop', 'B02001_002E':'white','B02001_003E':'black','B02001_004E':'native','B02001_005E':'asian','B02001_006E':'hawaiian/pac islander','B02001_007E':'other'}, inplace=True)

race2017 = race2017.drop(columns=['B02001_008E', 'B02001_009E','B02001_010E'])

race2017.head()

Matched: San Francisco, CA to San Francisco city within layer Incorporated Places


  race2017 = products.ACS(2017).from_place('San Francisco, CA', level='tract',


Unnamed: 0,GEOID,geometry,total pop,white,black,native,asian,hawaiian/pac islander,other,state,county,tract
0,6075032801,"POLYGON ((-13635048.760 4543918.550, -13634929...",4505.0,1522.0,102.0,5.0,2681.0,0.0,46.0,6,75,32801
1,6075033100,"POLYGON ((-13636532.870 4541575.590, -13636426...",3978.0,1439.0,30.0,0.0,2339.0,0.0,48.0,6,75,33100
2,6075033201,"POLYGON ((-13635142.160 4541306.060, -13635136...",4281.0,1759.0,307.0,15.0,944.0,22.0,775.0,6,75,33201
3,6075030301,"POLYGON ((-13634050.780 4545554.170, -13633943...",5907.0,2694.0,120.0,0.0,2543.0,0.0,228.0,6,75,30301
4,6075031000,"POLYGON ((-13632506.330 4541080.160, -13632485...",3799.0,2015.0,71.0,5.0,1255.0,0.0,146.0,6,75,31000


In [54]:
race2019 = products.ACS(2019).from_place('San Francisco, CA', level='tract',
                                        variables='B02001')
race2019.rename(columns={'B02001_001E':'total pop', 'B02001_002E':'white','B02001_003E':'black','B02001_004E':'native','B02001_005E':'asian','B02001_006E':'hawaiian/pac islander','B02001_007E':'other'}, inplace=True)

race2019 = race2019.drop(columns=['B02001_008E', 'B02001_009E','B02001_010E'])

race2019.head()

Matched: San Francisco, CA to San Francisco city within layer Incorporated Places


  race2019 = products.ACS(2019).from_place('San Francisco, CA', level='tract',


Unnamed: 0,GEOID,geometry,total pop,white,black,native,asian,hawaiian/pac islander,other,state,county,tract
0,6075035202,"POLYGON ((-13637736.350 4546153.040, -13637685...",5244.0,2394.0,395.0,39.0,1541.0,274.0,280.0,6,75,35202
1,6075042700,"POLYGON ((-13635913.040 4548886.330, -13635803...",5379.0,2380.0,351.0,0.0,2337.0,0.0,108.0,6,75,42700
2,6075030202,"POLYGON ((-13633379.300 4546390.880, -13633366...",4438.0,2625.0,89.0,0.0,1483.0,0.0,113.0,6,75,30202
3,6075030900,"POLYGON ((-13633895.820 4539985.070, -13633869...",7103.0,3162.0,128.0,18.0,3180.0,2.0,155.0,6,75,30900
4,6075045100,"POLYGON ((-13632661.740 4548547.020, -13632647...",5126.0,2566.0,142.0,29.0,1954.0,0.0,338.0,6,75,45100


In [55]:
# This dataset contains neighborhood names for each SF census tract.
neighborhood_census_tracts = pd.read_csv('Data/Analysis_Neighborhoods_-_2020_census_tracts_assigned_to_neighborhoods.csv')  # replace with your DataFrame

### Data Cleaning & Joining

Now that we've collected all of the relevant demographic and MUNI stop data, we need to join it into a dataset that can be used to train and test our cluster model.

In [56]:
# Join the MUNI stop data with SF Neighborhood Codes
joinedDF = allMUNIstops.join(SFneighborhoods, on='SF Find Neighborhoods', how='inner')

# Now we have a neighborhood name for every MUNI stop.
joinedDF.head()

Unnamed: 0,OBJECTID,STOPNAME,TRAPEZESTOPABBR,RUCUSSTOPABBR,STOPID,LATITUDE,LONGITUDE,ACCESSIBILITYMASK,ATSTREET,ONSTREET,POSITION,ORIENTATION,SERVICEPLANNINGSTOPTYPE,SHELTER,INSERT_TIMESTAMP,SDE_ID,SIGNUPID,SUPERVISOR_DISTRICT,shape,Neighborhoods,SF Find Neighborhoods,Current Police Districts,Current Supervisor Districts,Analysis Neighborhoods,LINK,the_geom,name
0,42619,Polk St&Lombard St NW-NS/BZ,POLKLOM0,POLKLOMB,5990,37.80167,-122.42303,0.0,LOMBARD ST,POLK ST,NS,NW,BZ,0.0,20230512124615,14816781,141,,POINT (-122.42303 37.80167),107.0,107.0,4.0,6.0,32.0,http://sanfrancisco.about.com/od/neighborhoodp...,MULTIPOLYGON (((-122.40295736099989 37.7937798...,Financial District
23,41880,Hyde St&Vallejo St NE-FS,HYDEVAL0,HYDEVALL,5090,37.79744,-122.41863,0.0,VALLEJO ST,HYDE ST,FS,NE,,0.0,20230512124615,14812081,141,,POINT (-122.41863 37.79744),107.0,107.0,6.0,3.0,32.0,http://sanfrancisco.about.com/od/neighborhoodp...,MULTIPOLYGON (((-122.40295736099989 37.7937798...,Financial District
42,42858,Hyde St&Greenwich St SE-NS,HYDEGNW0,HYDEGNWH,5073,37.800993,-122.419322,0.0,GREENWICH ST,HYDE ST,NS,SE,,0.0,20230512124615,14813960,141,,POINT (-122.41932 37.800995),107.0,107.0,6.0,6.0,32.0,http://sanfrancisco.about.com/od/neighborhoodp...,MULTIPOLYGON (((-122.40295736099989 37.7937798...,Financial District
151,41705,Hyde St&Broadway SE-MI,HYDEBDW0,HYDEBDWY,5062,37.796345,-122.418395,0.0,BROADWAY,HYDE ST,NS,SE,,0.0,20230512124615,14811345,141,,POINT (-122.418396 37.796345),107.0,107.0,6.0,3.0,32.0,http://sanfrancisco.about.com/od/neighborhoodp...,MULTIPOLYGON (((-122.40295736099989 37.7937798...,Financial District
224,42849,Mason St&Broadway NW-NS,MASNBDW1,MASNBDWY,5356,37.797351,-122.412009,0.0,BROADWAY,MASON ST,NS,NW,,1.0,20230512124615,14813932,141,,POINT (-122.41201 37.79735),107.0,107.0,6.0,3.0,21.0,http://sanfrancisco.about.com/od/neighborhoodp...,MULTIPOLYGON (((-122.40295736099989 37.7937798...,Financial District


In [57]:
# Isolate stops in equity strategy neighborhoods from those not in equity strategy neighborhoods.

# The following neighborhoods have been designated as equity-strategy neighborhoods by the SF MUNI:
TenderloinStops = joinedDF.loc[joinedDF['name'].isin(['Tenderloin'])]
ChinatownStops = joinedDF.loc[joinedDF['name'].isin(['Chinatown'])]
WesternAdditionStops = joinedDF.loc[joinedDF['name'].isin(['Western Addition'])]
MissionStops = joinedDF.loc[joinedDF['name'].isin(['Mission'])]
BayviewStops = joinedDF.loc[joinedDF['name'].isin(['Bayview'])]
VisitacionValleyStops = joinedDF.loc[joinedDF['name'].isin(['Visitacion Valley'])]
OuterMissionStops = joinedDF.loc[joinedDF['name'].isin(['Outer Mission'])]
OceanViewStops = joinedDF.loc[joinedDF['name'].isin(['Oceanview'])]

ESNstops = pd.concat([TenderloinStops, ChinatownStops, WesternAdditionStops, MissionStops, BayviewStops, 
                      VisitacionValleyStops, OuterMissionStops, OceanViewStops], ignore_index=True)

ESNstops['ESN'] = 1

ESNstops.sample(5)

Unnamed: 0,OBJECTID,STOPNAME,TRAPEZESTOPABBR,RUCUSSTOPABBR,STOPID,LATITUDE,LONGITUDE,ACCESSIBILITYMASK,ATSTREET,ONSTREET,POSITION,ORIENTATION,SERVICEPLANNINGSTOPTYPE,SHELTER,INSERT_TIMESTAMP,SDE_ID,SIGNUPID,SUPERVISOR_DISTRICT,shape,Neighborhoods,SF Find Neighborhoods,Current Police Districts,Current Supervisor Districts,Analysis Neighborhoods,LINK,the_geom,name,ESN
64,42836,Sutter St&Steiner St SW-NS/BZ,SUTTSTE1,SUTTSTEI,6610,37.785794,-122.435028,0.0,STEINER ST,SUTTER ST,NS,SW,BZ,1.0,20230512124615,14818545,141,,POINT (-122.43503 37.785793),103.0,103.0,4.0,11.0,15.0,http://www.sfgate.com/neighborhoods/sf/chinatown/,MULTIPOLYGON (((-122.40954104799994 37.7938519...,Chinatown,1
77,43685,100 O'Shaughnessy Blvd NE-MB/SB,OSHA 101,OSHA 100,5829,37.744481,-122.450678,0.0,,O SHAUGHNESSY BLVD,MB,NE,,0.0,20230512124615,14814772,141,,POINT (-122.450676 37.74448),96.0,96.0,9.0,5.0,10.0,http://www.sfgate.com/neighborhoods/sf/western...,MULTIPOLYGON (((-122.4394803809999 37.78330848...,Western Addition,1
122,42534,Geneva Ave&Santos St N-FS/BZ,GNVASANN,GNVASANT,4900,37.708484,-122.420354,0.0,SANTOS ST,GENEVA AVE,FS,NO,BZ,0.0,20230512124615,14818184,141,,POINT (-122.42036 37.708485),74.0,74.0,9.0,9.0,40.0,http://en.wikipedia.org/wiki/Visitacion_Valley...,MULTIPOLYGON (((-122.41622224799994 37.7083282...,Visitacion Valley,1
105,43201,Church St&24TH St SE-NS/SI,CHUR24S1,CHUR24ST,3996,37.751591,-122.42735,0.0,24TH ST,CHURCH ST,NS,SE,SI,0.0,20230512124615,14817372,141,,POINT (-122.42735 37.75159),52.0,52.0,3.0,5.0,22.0,http://www.sfgate.com/neighborhoods/sf/mission/,MULTIPOLYGON (((-122.42236481799989 37.7698676...,Mission,1
168,41477,Balboa Park BART Station NE-MB/BZ,GNVABART,GNVABART,4803,37.720801,-122.446738,0.0,SAN JOSE AVE,GENEVA AVE,MB,NE,BZ,0.0,20230512124615,14810684,141,,POINT (-122.44674 37.720802),80.0,80.0,9.0,1.0,28.0,http://en.wikipedia.org/wiki/Neighborhoods_in_...,MULTIPOLYGON (((-122.4626396249999 37.71793603...,Oceanview,1


In [58]:
# Now create a dataset with all stops from non-equity strategy neighborhoods.
nonESNstops = joinedDF.loc[joinedDF['name'] != 'Tenderloin']
nonESNstops = nonESNstops.loc[nonESNstops['name'] != 'Chinatown']
nonESNstops = nonESNstops.loc[nonESNstops['name'] != 'Western Addition']
nonESNstops = nonESNstops.loc[nonESNstops['name'] != 'Mission']
nonESNstops = nonESNstops.loc[nonESNstops['name'] != 'Bayview']
nonESNstops = nonESNstops.loc[nonESNstops['name'] != 'Visitacion Valley']
nonESNstops = nonESNstops.loc[nonESNstops['name'] != 'Outer Mission']
nonESNstops = nonESNstops.loc[nonESNstops['name'] != 'Oceanview']

nonESNstops['ESN'] = 0

nonESNstops.sample(5)

Unnamed: 0,OBJECTID,STOPNAME,TRAPEZESTOPABBR,RUCUSSTOPABBR,STOPID,LATITUDE,LONGITUDE,ACCESSIBILITYMASK,ATSTREET,ONSTREET,POSITION,ORIENTATION,SERVICEPLANNINGSTOPTYPE,SHELTER,INSERT_TIMESTAMP,SDE_ID,SIGNUPID,SUPERVISOR_DISTRICT,shape,Neighborhoods,SF Find Neighborhoods,Current Police Districts,Current Supervisor Districts,Analysis Neighborhoods,LINK,the_geom,name,ESN
411,42917,3rd St&Marin St NW-NS/SI,3ST MRN1,3ST MRN1,7364,37.749097,-122.387525,0.0,MARIN ST,03RD ST,NS,NW,SI,0.0,20230512124615,14817076,141,,POINT (-122.38753 37.749096),56.0,56.0,2.0,9.0,1.0,"http://en.wikipedia.org/wiki/Diamond_Heights,_...",MULTIPOLYGON (((-122.43824612599991 37.7486645...,Diamond Heights,0
1266,41966,Hyde St&North Point St SE-NS,HYDEN P0,HYDEN PT,5080,37.805633,-122.420272,0.0,NORTH POINT ST,HYDE ST,NS,SE,,0.0,20230512124615,14811858,141,,POINT (-122.42027 37.805634),99.0,99.0,6.0,6.0,32.0,http://en.wikipedia.org/wiki/Neighborhoods_in_...,MULTIPOLYGON (((-122.42213520799993 37.7894120...,Cathedral Hill,0
9,42712,345 Warren Dr E-MB/SB,WARNH341,WARNH345,6908,37.755402,-122.461312,0.0,,WARREN DR,MB,EA,,0.0,20230512124615,14816846,141,,POINT (-122.46131 37.7554),111.0,111.0,7.0,8.0,14.0,http://en.wikipedia.org/wiki/Neighborhoods_in_...,MULTIPOLYGON (((-122.4447315399999 37.76742587...,Buena Vista,0
2307,42978,Stockton St&Filbert St NW-NS/PS,STOKFIL1,STOKFILB,6515,37.801419,-122.409461,0.0,FILBERT ST,STOCKTON ST,NS,NW,,0.0,20230512124615,14815641,141,,POINT (-122.40946 37.80142),106.0,106.0,6.0,3.0,23.0,http://www.sfgate.com/neighborhoods/sf/russian...,MULTIPOLYGON (((-122.41007596999992 37.7964927...,Russian Hill,0
2117,41707,Hyde St&Beach St SE-NS,HYDEBEC0,HYDEBECH,5064,37.806642,-122.420535,0.0,NORTH POINT ST,HYDE ST,NS,SE,,0.0,20230512124615,14811353,141,,POINT (-122.42053 37.80664),99.0,99.0,6.0,6.0,32.0,http://en.wikipedia.org/wiki/Neighborhoods_in_...,MULTIPOLYGON (((-122.42213520799993 37.7894120...,Cathedral Hill,0


#### Check-In #1

We now have MUNI Stops with the neighborhood name wherein that stop is located, separated by the stop's status as either belonging to, or not belonging to an equity-strategy neighborhood.

Let's join this data with neighborhood census data.

In [59]:
# Join 2017 race with 2017 census data and 2019 race data with 2019 census data.
race2019['tract']=race2019['tract'].astype(int)
census2019['tract']=census2019['tract'].astype(int)

demographics2017 = pd.merge(race2017, census2017, how='inner',
                  left_on=['tract', 'county', 'state', 'geometry', 'GEOID'],
                  right_on=['tract', 'county','state', 'geometry', 'GEOID'])

demographics2019 = pd.merge(race2019, census2019, how='inner',
                  left_on=['tract', 'county', 'state', 'geometry', 'GEOID'],
                  right_on=['tract', 'county','state', 'geometry', 'GEOID'])

demographics2019.head()

Unnamed: 0,GEOID,geometry,total pop,white,black,native,asian,hawaiian/pac islander,other,state,county,tract,median_hh_income,vehicles_avail
0,6075035202,"POLYGON ((-13637736.350 4546153.040, -13637685...",5244.0,2394.0,395.0,39.0,1541.0,274.0,280.0,6,75,35202,89732.0,2898.0
1,6075042700,"POLYGON ((-13635913.040 4548886.330, -13635803...",5379.0,2380.0,351.0,0.0,2337.0,0.0,108.0,6,75,42700,93250.0,2522.0
2,6075030202,"POLYGON ((-13633379.300 4546390.880, -13633366...",4438.0,2625.0,89.0,0.0,1483.0,0.0,113.0,6,75,30202,128417.0,2053.0
3,6075030900,"POLYGON ((-13633895.820 4539985.070, -13633869...",7103.0,3162.0,128.0,18.0,3180.0,2.0,155.0,6,75,30900,177694.0,4716.0
4,6075045100,"POLYGON ((-13632661.740 4548547.020, -13632647...",5126.0,2566.0,142.0,29.0,1954.0,0.0,338.0,6,75,45100,141912.0,2623.0


In [60]:
# Let's make sure tract codes look the same across datasets.
# Add a leading zero to all tract codes that are less than 6 digits in the neighborhood_census_tracts dataset.
neighborhood_census_tracts = neighborhood_census_tracts.astype({'tractce':'string'})
neighborhood_census_tracts['tractce'] = neighborhood_census_tracts['tractce'].apply(lambda x: x.zfill(6))

neighborhood_census_tracts.rename(columns={'tractce':'tract', 'state_fp':'state', 'county_fp':'county', 'geoid':'GEOID'}, inplace=True)
neighborhood_census_tracts = neighborhood_census_tracts.drop(columns=['the_geom', 'name','data_loaded_at', 'data_as_of'])


neighborhood_census_tracts.head()

Unnamed: 0,object_id,state,county,tract,neighborhoods_analysis_boundaries,sup_dist_2012,sup_dist_2022,GEOID
0,242,6,75,980900,Bayview Hunters Point,10,10,6075980900
1,241,6,75,980600,Bayview Hunters Point,10,10,6075980600
2,240,6,75,980501,McLaren Park,10,10,6075980501
3,239,6,75,980401,The Farallones,1,4,6075980401
4,226,6,75,61200,Bayview Hunters Point,10,10,6075061200


In [61]:
# Now join neighborhood names and demographic data on census tract codes.

# Cast nums as ints
demographics2017['GEOID'] = demographics2017['GEOID'].astype(int)
demographics2019['GEOID'] = demographics2019['GEOID'].astype(int)

demographics2017['tract'] = demographics2017['tract'].astype(int)
demographics2019['tract'] = demographics2019['tract'].astype(int)

demographics2017['county'] = demographics2017['county'].astype(int)
demographics2019['county'] = demographics2019['county'].astype(int)

demographics2017['state'] = demographics2017['state'].astype(int)
demographics2019['state'] = demographics2019['state'].astype(int)

neighborhood_census_tracts['GEOID'] = neighborhood_census_tracts['GEOID'].astype(int)
neighborhood_census_tracts['tract'] = neighborhood_census_tracts['tract'].astype(int)

# Merge 2017 data and clean it up.
data2017 = neighborhood_census_tracts.merge(demographics2017, on='tract')
data2017 = data2017.drop(columns=['GEOID_y', 'state_y','county_y','sup_dist_2012', 'sup_dist_2022', 'object_id'])
data2017.rename(columns={'state_x':'state', 'county_x':'county', 'neighborhoods_analysis_boundaries':'neighborhood', 'GEOID_x':'GEOID'}, inplace=True)

# Merge 2019 data and clean it up.
data2019 = neighborhood_census_tracts.merge(demographics2019, on='tract')
data2019 = data2019.drop(columns=['GEOID_y', 'state_y','county_y','sup_dist_2012', 'sup_dist_2022', 'object_id'])
data2019.rename(columns={'state_x':'state', 'county_x':'county', 'neighborhoods_analysis_boundaries':'neighborhood', 'GEOID_x':'GEOID'}, inplace=True)


data2019.head()

Unnamed: 0,state,county,tract,neighborhood,GEOID,geometry,total pop,white,black,native,asian,hawaiian/pac islander,other,median_hh_income,vehicles_avail
0,6,75,980900,Bayview Hunters Point,6075980900,"POLYGON ((-13626279.570 4542831.040, -13626266...",253.0,171.0,18.0,0.0,56.0,0.0,8.0,,
1,6,75,980600,Bayview Hunters Point,6075980600,"POLYGON ((-13624051.400 4540543.790, -13624050...",690.0,148.0,233.0,0.0,170.0,0.0,66.0,66042.0,375.0
2,6,75,980501,McLaren Park,6075980501,"POLYGON ((-13628536.120 4539319.990, -13628532...",507.0,28.0,114.0,0.0,258.0,34.0,67.0,12340.0,125.0
3,6,75,61200,Bayview Hunters Point,6075061200,"POLYGON ((-13625086.220 4542404.940, -13625054...",3842.0,540.0,1115.0,24.0,1129.0,22.0,961.0,67625.0,1705.0
4,6,75,980300,Golden Gate Park,6075980300,"POLYGON ((-13638558.550 4547081.610, -13638506...",63.0,58.0,0.0,0.0,5.0,0.0,0.0,139375.0,55.0


#### Check-In #2

We now have all of our demographic data in one dataset. Our last step is to create two datasets (one for 2017 and one for 2019) that include stop data and neighborhood demographic data.

In [62]:
# Next step is to join MUNI Stop data with the above dataset on neighborhood names.

# Let's concatenate our ESN and non-ESN data.
stops = [ESNstops, nonESNstops]
allMUNIstops = pd.concat(stops)

# The MUNI stop data separates Bayview and Hunter's Point, while the census data combines the two neighborhoods. 
# Let's make all Hunter's Point labels into Bayview.
data2017['neighborhood'] = data2017['neighborhood'].str.replace('Hunters Point','Bayview')
data2017['neighborhood'] = data2017['neighborhood'].str.replace('Bayview Hunters Point','Bayview')
data2019['neighborhood'] = data2017['neighborhood'].str.replace('Hunters Point','Bayview')
data2019['neighborhood'] = data2017['neighborhood'].str.replace('Bayview Hunters Point','Bayview')
allMUNIstops['name'] = allMUNIstops['name'].str.replace('Hunters Point', 'Bayview')

allMUNIstops = allMUNIstops.drop(columns=['SUPERVISOR_DISTRICT','LINK', 'Current Police Districts','Current Supervisor Districts','SERVICEPLANNINGSTOPTYPE','SHELTER','INSERT_TIMESTAMP','Neighborhoods'])
allMUNIstops.rename(columns={'name':'neighborhood', 'shape':'stop_shape','the_geom':'neighborhood_shape','SF Find Neighborhoods': 'sf_find_code', 'Analysis Neighborhoods': 'analysis_neigh_code'}, inplace=True)

pd.set_option('display.max_columns', None)
allMUNIstops.head()

Unnamed: 0,OBJECTID,STOPNAME,TRAPEZESTOPABBR,RUCUSSTOPABBR,STOPID,LATITUDE,LONGITUDE,ACCESSIBILITYMASK,ATSTREET,ONSTREET,POSITION,ORIENTATION,SDE_ID,SIGNUPID,stop_shape,sf_find_code,analysis_neigh_code,neighborhood_shape,neighborhood,ESN
0,42263,Powell St&Sutter St SW-FS,POWLSUT1,POWLSUTT,6076,37.789061,-122.408642,0.0,SUTTER ST,POWELL ST,FS,SW,14816502,141,POINT (-122.408646 37.789062),19.0,21.0,MULTIPOLYGON (((-122.40987401699994 37.7871491...,Tenderloin,1
1,43428,O'Farrell St&Grant Ave S-MB/BB,OFARGRN1,OFARGRNT,5810,37.786642,-122.405629,0.0,GRANT AVE,OFARRELL ST,MB,SO,14819051,141,POINT (-122.40563 37.78664),19.0,8.0,MULTIPOLYGON (((-122.40987401699994 37.7871491...,Tenderloin,1
2,43484,Stockton St&Sutter St NE-FS/BB,STOKSUT0,STOKSUTT,6523,37.79013,-122.40705,0.0,STOCKTON ST,SUTTER ST,FS,NE,14816123,141,POINT (-122.40705 37.79013),19.0,8.0,MULTIPOLYGON (((-122.40987401699994 37.7871491...,Tenderloin,1
3,43683,Market St&Powell St N-NS/BZ,MRKTPOW0,MRKTPOWL,5688,37.784474,-122.407544,0.0,ELLIS ST,MARKET ST,NS,NO,14814764,141,POINT (-122.40755 37.784473),19.0,36.0,MULTIPOLYGON (((-122.40987401699994 37.7871491...,Tenderloin,1
4,42659,Geary St&Powell St NW-FS/BZ,GEARPOW0,GEARPOWL,4757,37.787401,-122.408391,0.0,POWELL ST,GEARY ST,FS,NW,14813783,141,POINT (-122.408394 37.7874),19.0,36.0,MULTIPOLYGON (((-122.40987401699994 37.7871491...,Tenderloin,1


In [63]:
# Now let's join the MUNI stop data with the census data from each year on neighborhood name.
final_df_2017 = pd.merge(data2017, allMUNIstops, how='inner',
                  left_on=['neighborhood'],
                  right_on=['neighborhood'])

final_df_2019 = pd.merge(data2019, allMUNIstops, how='inner',
                  left_on=['neighborhood'],
                  right_on=['neighborhood'])

# Lastly, let's add a column to each datasest specifying the year that its demorgraphic data was collected.
final_df_2017['year_collected'] = '2017'
final_df_2019['year_collected'] = '2019'

pd.set_option('display.max_columns', None)
final_df_2019.sample(5)

Unnamed: 0,state,county,tract,neighborhood,GEOID,geometry,total pop,white,black,native,asian,hawaiian/pac islander,other,median_hh_income,vehicles_avail,OBJECTID,STOPNAME,TRAPEZESTOPABBR,RUCUSSTOPABBR,STOPID,LATITUDE,LONGITUDE,ACCESSIBILITYMASK,ATSTREET,ONSTREET,POSITION,ORIENTATION,SDE_ID,SIGNUPID,stop_shape,sf_find_code,analysis_neigh_code,neighborhood_shape,ESN,year_collected
1006,6,75,21200,Noe Valley,6075021200,"POLYGON ((-13630316.120 4544300.780, -13630299...",3105.0,2376.0,45.0,1.0,288.0,0.0,40.0,195375.0,1830.0,43772,Crescent Ave&Ellsworth St SW-NS/PS,CRESEWTH,,7209,37.734733,-122.415052,0.0,ELLSWORTH ST,CRESCENT AVE,,SW,14814867,141,POINT (-122.415054 37.734734),83.0,2.0,MULTIPOLYGON (((-122.43331953099994 37.7432959...,0,2019
3076,6,75,11902,Nob Hill,6075011902,"POLYGON ((-13626892.830 4550031.370, -13626809...",2712.0,1601.0,25.0,0.0,829.0,12.0,111.0,84073.0,562.0,41562,Fillmore St&Lombard St SW-FS/BZ,FILLLOM0,FILLLOMB,4627,37.799595,-122.436051,0.0,LOMBARD ST,FILLMORE ST,FS,SW,14810937,141,POINT (-122.43605 37.799595),15.0,13.0,MULTIPOLYGON (((-122.41062287099993 37.7908806...,0,2019
266,6,75,33201,Lakeshore,6075033201,"POLYGON ((-13635142.160 4541306.060, -13635136...",4551.0,1747.0,349.0,17.0,1035.0,27.0,996.0,35377.0,499.0,42855,Font Blvd&Juan Bautista Cir E-NS/SB,FONTJBA1,FONTJBAU,4709,37.717701,-122.477076,0.0,JUAN BAUTISTA CIR,FONT BLVD,NS,EA,14813949,141,POINT (-122.47707 37.7177),42.0,16.0,MULTIPOLYGON (((-122.50853817799992 37.7354017...,0,2019
1331,6,75,25800,Portola,6075025800,"POLYGON ((-13626125.950 4540574.190, -13626016...",2224.0,436.0,188.0,74.0,1096.0,0.0,276.0,93750.0,1010.0,42387,Silver Ave&Mission St SE-FS/BZ,SILVMIS0,SILVMISS,6415,37.72866,-122.431217,0.0,MISSION ST,SILVER AVE,FS,SE,14816608,141,POINT (-122.43121 37.72866),90.0,7.0,MULTIPOLYGON (((-122.4050093699999 37.72050867...,0,2019
418,6,75,30302,Inner Sunset,6075030302,"POLYGON ((-13634032.750 4545029.110, -13634019...",4348.0,1686.0,214.0,0.0,2040.0,11.0,121.0,146111.0,2662.0,41684,Kearny St&Bush St NE-FS/BZ,KRNYBUS1,KRNYBUSH,4818,37.791004,-122.403976,0.0,BUSH ST,KEARNY ST,FS,NE,14811158,141,POINT (-122.40398 37.791004),108.0,8.0,MULTIPOLYGON (((-122.47730839499991 37.7654479...,0,2019


#### We finally have our final datasets for training/testing the cluster model!

The final_df_2017 and final_df_2019 datasets contain MUNI stops and their corresponding neighborhood's demographic information.
- The column 'ESN' tells us whether that stop is located in a city-designated Equity Strategy Neighborhood.
- If the ESN value is 0, the stop is not located in an ESN neighborhood. 
- If the ESN value is 1, it is located in an ESN neighborhood.

Let's use the final_df_2017 dataset to train a cluster model. Then, let's test the model using our final_df_2019 dataset.

The cluster model will be trained to identify which stops are located in ESN neighborhoods based on the 2017 demographic data for each SF MUNI stop. The model will then attempt to cluster the 2019 stops as either in/not in an ESN neighborhood based on their demographic data.

#### Why does this matter?

If the cluster model successfully clusters the same MUNI stops in 2019 (compared to 2017) as belonging to ESN neighborhoods, then we know that the conditions of ESN neighborhoods in SF have not improved enough for the stops in these neighborhoods to graduate from their ESN-designation. 

If the model does not successfully cluster the same MUNI stops in 2019 beloning to ESN neighborhoods based on 2019 data, then it's possible that the conditions of ESN neighborhoods (first measured in 2017) may have improved, and that the MUNI stops who were not classified as ESN

### Training Cluster Model

### Test Cluster Model