# Final Notebook Pt 1: means2work, race/eth, income, nbhooods

**By:** Andrew Williams and Ben Brassette
    
**Description:** This notebook is building on earlier queries of means to work. Purpose is to intergrate and align race/eth and median income data. We are tacking points 1, 3, 4, 5, and 7 in this notebook.

**Expected Outputs**
* Trimmed data high transit areas--use same tracts for trimmed race/eth and median income data
* Descriptive Statistics: Bar graphs of race/eth and income by using high public transit data
* Side by side maps

**Areas Where We Need More Work**
* Spatial Statistics of means2work
* Overlay of rail stops and bus lines
* Descriptive Statistics of Access to Car
* Inclusion of commute time and access to car maps (by high transit query)
* Descriptive Statistics of Jobs
* Interactive Map using some combination of our datasets

**Notes for self to advance project:**
1. Biggest step forward, intergrating our notebooks in the remaining few weeks in a targeted fashion
2.  Overlay heavy rail and light rail stops
3.  Provide more accurate description of the areas through a spatial join
4. Compare with race/ethnicity and income data; we should create some side by side comparisons of transit use and race/eth and income
5. Query "high" transit tracts, cross reference with income and race/eth, potential to add spatial dimension of LA neighborhoods
6. Availability of cars in households
7. Bring a shaprer focus to Central LA and the San Fernando Valley

**TOC**
* Section 2: Library Imports
* Section 3: Mode of Travel
* Section 4: Race/Eth and Income
* Section 5: Public Transit Query
* Section 6: Attribute Merge
* Section 7: Matching Tracts
* Section 8: Exploring Neighborhoods

# Library Import 


Importing various libaries

In [None]:
import urllib.request, json 
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns 

Libraries good to go. 


# Mode of Travel

Import data and run quick analysis of data.I have already cleaned this data and saved it from another notebook (public_transit_query)

In [None]:
means2work = gpd.read_file('Data/Means_Transpo_Work_Tract/acs2019_5yr_B08301_14000US06037185320.geojson')

In [None]:
type(means2work)

In [None]:
means2work.shape

In [None]:
means2work.head(3)

In [None]:
means2work.tail(3)

In [None]:
means2work = means2work.drop([1004])

In [None]:
means2work.tail(3)

In [None]:
means2work.columns.to_list()

In [None]:
columns_to_keep = ['geoid',
 'name',
 'B08301001',
 'B08301002',
 'B08301003',
 'B08301004',
 'B08301010',
 'B08301011',
 'B08301012',
 'B08301013',
 'B08301014',
 'B08301016',
 'B08301017',
 'B08301018',
 'B08301019',
 'B08301020',
 'B08301021',
 'geometry']

In [None]:
means2work = means2work [columns_to_keep]

In [None]:
means2work.sample()

In [None]:
#renaming columns
means2work.columns = ['geoid',
 'name',
 'Total',
 'Car, truck, or van',
 'Drove alone',
 'Carpooled',
 'Public transportation',
 'Bus',
 'Subway or elevated il',
 'Long-distance train or commuter rail',
 'Light rail, streetcar or trolley',
 'Taxicab',
 'Motorcycle',
 'Bicycle',
 'Walked',
 'Other means',
 'Worked from home',
 'geometry']

In [None]:
means2work.sample()

In [None]:
means2work['Percent Car, truck, or van'] = means2work['Car, truck, or van']/means2work['Total']*100
means2work['Percent Drove alone'] = means2work['Drove alone']/means2work['Total']*100
means2work['Percent Carpooled'] = means2work['Carpooled']/means2work['Total']*100
means2work['Percent Public transportation'] = means2work['Public transportation']/means2work['Total']*100
means2work['Percent Bus'] = means2work['Bus']/means2work['Total']*100
means2work['Percent Subway or elevated il'] = means2work['Subway or elevated il']/means2work['Total']*100
means2work['Percent Long-distance train or commuter rail'] = means2work['Long-distance train or commuter rail']/means2work['Total']*100
means2work['Percent Light rail, streetcar or trolley'] = means2work['Light rail, streetcar or trolley']/means2work['Total']*100
means2work['Percent Taxicab'] = means2work['Taxicab']/means2work['Total']*100
means2work['Percent Motorcycle'] = means2work['Motorcycle']/means2work['Total']*100
means2work['Percent Bicycle'] = means2work['Bicycle']/means2work['Total']*100
means2work['Percent Walked'] = means2work['Walked']/means2work['Total']*100
means2work['Percent Other means'] = means2work['Other means']/means2work['Total']*100
means2work['Percent Worked from home'] = means2work['Worked from home']/means2work['Total']*100

In [None]:
means2work.sample()

# Race and Ethnicity Data for LA

## Load Census data

I will load one Census data table:

Table B03002: Hispanic or Latino Origin by Race

In [None]:
# loading the three data files
gdf_race = gpd.read_file('Data/race/acs2019_5yr_B03002_raceethnicity.geojson')

## Begin to look at the data


In [None]:
gdf_race.shape

There are 1005 rows of data, referencing the 1005 census tracts in LA. There are 45 race and ethnicity variables.

Checking to see if any data needs to be dropped.

In [None]:
gdf_race.head(4)

In [None]:
gdf_race.tail(4)

Need top drop the last row

In [None]:
gdf_race = gdf_race.drop([1004])

In [None]:
gdf_race.tail(3)

Drop sucessful

In [None]:
# columns to keep
columns_to_keep = ['geoid',
 'name',
 'B03002001',
 'B03002002',
 'B03002003',
 'B03002004',
 'B03002005',
 'B03002006',
 'B03002007',
 'B03002008',
 'B03002009',
 'B03002010',
 'B03002011',
 'B03002012',
 'B03002013',
 'B03002014',
 'B03002015',
 'B03002016',
 'B03002017',
 'B03002018',
 'B03002019',
 'B03002020',
 'B03002021',
 'geometry']

In [None]:
# redefine gdf with only columns to keep
gdf_race = gdf_race[columns_to_keep]

This removes any variables that we do not need.

In [None]:
# check the slimmed down gdf
gdf_race.head()

We can run the .head command to see our first five lines of data and to make sure our varaibles were deleted. Next we will redefine the columns to match the variable names.

In [None]:
gdf_race.columns = ['geoid',
 'name',
 'Total',
 'Not Hispanic or Latino',
 'N_White',
 'N_Black',
 'N_Native',
 'N_Asian',
 'N_Native Hawaiian',
 'N_Some other race alone',
 'N_Two or more races',
 'N_Two races including some other race',
 'N_Two races excluding some other race, and three or more races',
 'Hispanic or Latino',
 'H_White',
 'H_Black',
 'H_Native',
 'H_Asian',
 'H_Native Hawaiian',
 'H_Some other race alone',
 'H_Two or more races',
 'H_Two races including some other race',
 'H_Two races excluding some other race, and three or more races',
 'geometry']

In [None]:
gdf_race.head()

See, it worked! Next we create percentage variables.

In [None]:
gdf_race['N_WhitePercentage']=round(((gdf_race['N_White']/ gdf_race['Total'])*100),2)
gdf_race['N_BlacPercentage']=round(((gdf_race['N_Black']/ gdf_race['Total'])*100),2)
gdf_race['N_NativePercentage']=round(((gdf_race['N_Native']/ gdf_race['Total'])*100),2)
gdf_race['N_AsianPercentage']=round(((gdf_race['N_Asian']/ gdf_race['Total'])*100),2)
gdf_race['N_HawaiianPercentage']=round(((gdf_race['N_Native Hawaiian']/ gdf_race['Total'])*100),2)
gdf_race['N_OtherPercentage']=round(((gdf_race['N_Some other race alone']/ gdf_race['Total'])*100),2)
gdf_race['N_TwoPlusPercentage']=round(((gdf_race['N_Two or more races']/ gdf_race['Total'])*100),2)
gdf_race['N_TwoInclOtherPercentage']=round(((gdf_race['N_Two races including some other race']/ gdf_race['Total'])*100),2)
gdf_race['N_TwoPlusThreePlusPercentage']=round(((gdf_race['N_Two races excluding some other race, and three or more races']/ gdf_race['Total'])*100),2)
gdf_race['HispanicPercentage']=round(((gdf_race['Hispanic or Latino']/ gdf_race['Total'])*100),2)
gdf_race['H_WhitePercentage']=round(((gdf_race['H_White']/ gdf_race['Total'])*100),2)
gdf_race['H_BlacPercentage']=round(((gdf_race['H_Black']/ gdf_race['Total'])*100),2)
gdf_race['H_NativePercentage']=round(((gdf_race['H_Native']/ gdf_race['Total'])*100),2)
gdf_race['H_AsianPercentage']=round(((gdf_race['H_Asian']/ gdf_race['Total'])*100),2)
gdf_race['H_HawaiianPercentage']=round(((gdf_race['H_Native Hawaiian']/ gdf_race['Total'])*100),2)
gdf_race['H_OtherPercentage']=round(((gdf_race['H_Some other race alone']/ gdf_race['Total'])*100),2)
gdf_race['H_TwoPlusPercentage']=round(((gdf_race['H_Two or more races']/ gdf_race['Total'])*100),2)
gdf_race['H_TwoInclOtherPercentage']=round(((gdf_race['H_Two races including some other race']/ gdf_race['Total'])*100),2)
gdf_race['H_TwoPlusThreePlusPercentage']=round(((gdf_race['H_Two races excluding some other race, and three or more races']/ gdf_race['Total'])*100),2)


Plotting to Hispanic spatial distribution across LA

In [None]:
gdf_race.head(1004).plot(figsize=(10,10),column='HispanicPercentage',legend=True)

* Large concntrations in Central, South, and Southeast nieghborhood regions of LA
* Additionally, large concentrates in SF Valley
* There could be a relationship between transit ridership in Central LA and the SF Valley

Plotting to White spatial distribution across LA

In [None]:
gdf_race.head(1004).plot(figsize=(10,10),column='N_WhitePercentage',legend=True)

* White population is almost completely absent in Central LA
* Strong concentrations of White population in Western LA and the western SF Valley
* Will there by any relationship between census tracts and White populaion in the SF Valley?

Plotting to Black spatial distribution across LA

In [None]:
gdf_race.head(1004).plot(figsize=(10,10),column='N_BlacPercentage',legend=True)

* Black population concentrated west of Central LA, though there are a few pockets throughout the city
* From what I know of transit ridership in LA, there is a disconnect between areas of high transit and Black population

Plotting to Asian spatial distribution across LA

In [None]:
gdf_race.head(1004).plot(figsize=(10,10),column='N_AsianPercentage',legend=True)

* Concentrations of Asian population in Central LA and there appears to be a wider distribution of Asian residents in the SF Valley

This graph shows the number of non-hispanic white people in LA.

In [None]:
# create the 1x2 subplots
fig, axs = plt.subplots(1, 2, figsize=(15, 12))

# name each subplot
ax1, ax2 = axs

# regular count map on the left
gdf_race.plot(column='N_BlacPercentage', 
            cmap='viridis', 
            scheme='quantiles',
            k=5, 
            edgecolor='white', 
            linewidth=0., 
            alpha=0.75, 
            ax=ax1, # this assigns the map to the subplot,
            legend=True
           )

ax1.axis("off")
ax1.set_title("Black Population")

# spatial lag map on the right
gdf_race.plot(column='HispanicPercentage', 
            cmap='viridis', 
            scheme='quantiles',
            k=5, 
            edgecolor='white', 
            linewidth=0., 
            alpha=0.75, 
            ax=ax2, # this assigns the map to the subplot
            legend=True
           )

ax2.axis("off")
ax2.set_title("Hispanic Population")

* This map shows the concentration of non-hispanic Black people in LA and Hispanic people in LA, respectively
* On a marco scope, LA seems fairly segregated by race. This does not hold constant for every census tract and nieghborhood, but larger patterns seem to indicate high regional segregation

# Income in Los Angeles

Our group project is a comparison of accessibility and mobility in Los Angeles neighborhoods. We seek to understand who uses transit in LA, and how certain impacts land use or the transportation system may impact their lives. We will be using data from the American Community Survey, LA Metro, the City of LA, LA Times, and the Bureau of Transportation Statistics.

## Load Census data

I will load one Census data table:

Table B19013: Median Household Income in the Last 12 Months

In [None]:
# loading the data file

gdf_income = gpd.read_file('Data/income/acs2019_5yr_B19013_income.geojson')

## Begin to look at the data

In [None]:
gdf_income.shape

In [None]:
gdf_income.tail(3)

There are 1,005 rows of data which represent the 1,004 census tracts in the city of LA. There are 5 columns which will be discussed later.

In [None]:
gdf_income = gdf_income.drop([1004])

In [None]:
gdf_income.tail()

We use the .head command to get a look at the first 5 lines of data. The columns are: geoid (geographic pinpoint of the tract), name (number identification of the census tract), B19013001 and B19013001, error which I discuss below, and the geometry which is that latitude and longitudinal coordinates. 

Since we do not want a column that has "error" in the name, we remove it by defining which columns to keep. 

In [None]:
# columns to keep
columns_to_keep = ['geoid',
 'name',
 'B19013001',
 'geometry']

In [None]:
# redefine gdf with only columns to keep
gdf_income = gdf_income[columns_to_keep]

In [None]:
# check the slimmed down gdf
gdf_income.head()

The dataframe is now reduced to show only the columns we identified, but we still need to rename B19013001. We use the documentation from the dataset to know that this is the variable for median income. 

In [None]:
gdf_income.columns = ['geoid',
 'name',
 'Median Income',
 'geometry']

In [None]:
gdf_income.head()

## Now that we have our data fixed, it's time to make some graphs!

In [None]:
gdf_income.plot(figsize=(12,10),
                 column='Median Income',
                 legend=True, 
                 scheme='UserDefined', cmap='Purples_r',
               classification_kwds=dict(bins=[12760,23700,39450,63100,77300,100490,115950,139140,150000,200000])
               )

This map shows the median income by census tract across LA.

Same map as above, but putting a placeholder in case we want to compare another map later with median income

In [None]:
# create the 1x2 subplots
fig, axs = plt.subplots(1, 2, figsize=(15, 12))

# name each subplot
ax1, ax2 = axs

# regular count map on the left
gdf_income.plot(column='Median Income', 
            cmap='GnBu', 
            scheme='quantiles',
            k=5, 
            edgecolor='white', 
            linewidth=0., 
            alpha=0.75, 
            ax=ax1, # this assigns the map to the subplot,
            legend=True
           )

ax1.axis("off")
ax1.set_title("Median Income")

# spatial lag map on the right
gdf_income.plot(column='Median Income', 
            cmap='GnBu', 
            scheme='quantiles',
            k=5, 
            edgecolor='white', 
            linewidth=0., 
            alpha=0.75, 
            ax=ax2, # this assigns the map to the subplot
            legend=True
           )

ax2.axis("off")
ax2.set_title("Median Income")

# Query for Relative High Transit Areas

## Quick Survey of High Use Transit Areas

In [None]:
transit_indicators = ['Percent Public transportation',
'Percent Bus',
'Percent Subway or elevated il',
'Percent Long-distance train or commuter rail',
'Percent Light rail, streetcar or trolley',]

In [None]:
for indicator in transit_indicators:
    print(indicator)
    print (means2work.sort_values(by = indicator, ascending=False)[indicator].head(10))

I don't know how to call two variables at the same time in my search-- I also want corresponding GEOIDs for my tracts. Will just query below to get the results I need, even if inefficently.

## Digging deeper into Public Transportation

In [None]:
means2work_sorted_pt = means2work.sort_values(by='Percent Public transportation',ascending = False)

In [None]:
means2work_sorted_pt[['geoid','Percent Public transportation']].head(105)

In [None]:
means2work_sorted_pt[means2work_sorted_pt['Percent Public transportation'] > 20]

## Bus

In [None]:
means2work_sorted_bus = means2work.sort_values(by='Percent Bus',ascending = False)

In [None]:
means2work_sorted_bus[['geoid','Percent Bus']].head(10)

In [None]:
means2work_sorted_bus[means2work_sorted_bus['Percent Bus'] > 20]

## Subway

In [None]:
means2work_sorted_sub = means2work.sort_values(by='Percent Subway or elevated il',ascending = False)

In [None]:
means2work_sorted_sub[['geoid','Percent Subway or elevated il']].head(10)

In [None]:
means2work_sorted_sub[means2work_sorted_sub['Percent Subway or elevated il'] > 5]

## Distance Rail

In [None]:
means2work_sorted_commuter = means2work.sort_values(by='Percent Long-distance train or commuter rail',ascending = False)

In [None]:
means2work_sorted_commuter[['geoid','Percent Long-distance train or commuter rail']].head(10)

## Light Rail

In [None]:
means2work_sorted_lr = means2work.sort_values(by='Percent Light rail, streetcar or trolley',ascending = False)

In [None]:
means2work_sorted_lr[['geoid','Percent Light rail, streetcar or trolley']].head(10)

# Attribute Merge

## Means to work and Neighborhood

Importing My previous file with geoids and neigborhoods names and data background check

In [None]:
df_slim = pd.read_csv('slim_full.csv')

In [None]:
type(df_slim)

In [None]:
df_slim.head(4)

In [None]:
df_slim['Name_1'].value_counts()

In [None]:
df_slim.shape

Info is good to go

Starting process of merge, but quick sample of means2work data

In [None]:
means2work.head(2)

In [None]:
m2w_nbh=means2work.merge(df_slim, on='geoid')

In [None]:
m2w_nbh.head()

In [None]:
type(m2w_nbh)

In [None]:
m2w_nbh.shape

In [None]:
m2w_nbh.dtypes

In [None]:
m2w_nbh.info()

In [None]:
m2w_nbh.columns.to_list()

Success! I do want to cut a couple columns. 

In [None]:
col_to_keep = ['geoid',
 'name',
 'Total',
 'Car, truck, or van',
 'Drove alone',
 'Carpooled',
 'Public transportation',
 'Bus',
 'Subway or elevated il',
 'Long-distance train or commuter rail',
 'Light rail, streetcar or trolley',
 'Worked from home',
 'geometry',
 'Percent Car, truck, or van',
 'Percent Drove alone',
 'Percent Carpooled',
 'Percent Public transportation',
 'Percent Bus',
 'Percent Subway or elevated il',
 'Percent Long-distance train or commuter rail',
 'Percent Light rail, streetcar or trolley',
 'Percent Worked from home',
 'Name_1'
]

In [None]:
m2w_nbh = m2w_nbh [col_to_keep]

In [None]:
m2w_nbh.sample(3)

Rename Columns

In [None]:
#renaming columns
m2w_nbh.columns = ['geoid',
 'name',
 'Total',
 'Car Total',
 'Drove alone',
 'Carpooled',
 'Public transportation',
 'Bus',
 'Subway or elevated il',
 'Long-distance train or commuter rail',
 'Light rail',
 'Worked from home',
 'geometry',
 'Percent Car Total',
 'Percent Drove alone',
 'Percent Carpooled',
 'Percent Public transportation',
 'Percent Bus',
 'Percent Subway or elevated il',
 'Percent Long-distance train or commuter rail',
 'Percent Light rail',
 'Percent Worked from home',
 'Neighborhood'
]

In [None]:
m2w_nbh.sample(2)

I'm going to take a moment here to celebrate. I still need to some more work with this, like see which nieghborhoods are missing values and also create more variables to call data for each neighborhood, THAT BEING SAID, I've been workin on in some form for about 4-5 weeks. I'm very please with this initial result. 

## Spatial Join: Median Income and m2w_nbh Merge

Instead of merging neighborhood data seperately to Median Income, I'm going to try try and merge them into dataset. If this works and is manageable, I would continue to add other datasets in order to query data more easily. At least that's my strand of thought for the moment.

In [None]:
m2w_income_nbh=gpd.sjoin(m2w_nbh,gdf_income,how="inner",op="contains")

In [None]:
m2w_income_nbh.head()

In [None]:
m2w_income_nbh.shape

Well that worked! Going to make future work in 20 minutes so much easier for me.

In [None]:
m2w_income_nbh.columns.to_list()

In [None]:
keep_col=['geoid_left',
 'name_left',
 'Neighborhood',
 'Median Income',
 'Total',
 'Car Total',
 'Drove alone',
 'Carpooled',
 'Public transportation',
 'Bus',
 'Subway or elevated il',
 'Long-distance train or commuter rail',
 'Light rail',
 'Worked from home',
 'Percent Car Total',
 'Percent Drove alone',
 'Percent Carpooled',
 'Percent Public transportation',
 'Percent Bus',
 'Percent Subway or elevated il',
 'Percent Long-distance train or commuter rail',
 'Percent Light rail',
 'Percent Worked from home',
 'index_right',
 'geometry',]

In [None]:
m2w_income_nbh = m2w_income_nbh [keep_col]
m2w_income_nbh.sample(5)

In [None]:
m2w_income_nbh.sample(1)

Wonderful! I also rearranged some of the columns to make it more readable for myself. 

In [None]:
m2w_income_nbh.columns=[ 'geoid',
 'name',
 'Neighborhood',
 'Median Income',
 'Total Workers Commuting',
 'Car Total',
 'Drove alone',
 'Carpooled',
 'Public transportation',
 'Bus',
 'Subway or elevated il',
 'Long-distance train or commuter rail',
 'Light rail',
 'Worked from home',
 'Percent Car Total',
 'Percent Drove alone',
 'Percent Carpooled',
 'Percent Public transportation',
 'Percent Bus',
 'Percent Subway or elevated il',
 'Percent Long-distance train or commuter rail',
 'Percent Light rail',
 'Percent Worked from home',
 'index',
 'geometry',]

In [None]:
m2w_income_nbh.sample(5)

In [None]:
type(m2w_income_nbh)

## Merging Race/Eth Data: m2w_income_race

Based on the success of the last merge, will merge race/eth data as well. I may just end up creating a juggernot of a dataset. For some reason, that makes me feel like I'm cheating since it will make my life so much easier. Going to follow the ethic of smarter not harder. 

In [None]:
m2w_income_race=gpd.sjoin(m2w_income_nbh,gdf_race,how="inner",op="contains")

In [None]:
m2w_income_race.head()

Great. Let's cleanup!

In [None]:
m2w_income_race.columns.to_list()

In [None]:
new_keep= ['geoid_left',
 'name_left',
 'Neighborhood',
 'Median Income',
 'Total Workers Commuting',
 'Car Total',
 'Drove alone',
 'Carpooled',
 'Public transportation',
 'Bus',
 'Subway or elevated il',
 'Long-distance train or commuter rail',
 'Light rail',
 'Worked from home',
 'Percent Car Total',
 'Percent Drove alone',
 'Percent Carpooled',
 'Percent Public transportation',
 'Percent Bus',
 'Percent Subway or elevated il',
 'Percent Long-distance train or commuter rail',
 'Percent Light rail',
 'Percent Worked from home',
 'Total',
 'N_White',
 'N_Black',
 'N_Native',
 'N_Asian',
 'N_Native Hawaiian',
 'Hispanic or Latino',
 'N_WhitePercentage',
 'N_BlacPercentage',
 'N_NativePercentage',
 'N_AsianPercentage',
 'N_HawaiianPercentage',
 'HispanicPercentage',
 'geometry',
]

In [None]:
m2w_income_race = m2w_income_race[new_keep]
m2w_income_race.sample(4)

In [None]:
m2w_income_race.columns=['geoid',
 'name',
 'Neighborhood',
 'Median Income',
 'Total Workers Commuting',
 'Car Total',
 'Drove alone',
 'Carpooled',
 'Public transportation',
 'Bus',
 'Subway or elevated il',
 'Long-distance train or commuter rail',
 'Light rail',
 'Worked from home',
 '%Car Total',
 '%Drove alone',
 '%Carpooled',
 '%Public transportation',
 '%Bus',
 '%Subway or elevated il',
 '%Long-distance train or commuter rail',
 '%Light rail',
 '%Worked from home',
 'Total Pop Race',
 'N_White',
 'N_Black',
 'N_Native',
 'N_Asian',
 'N_Native Hawaiian',
 'Hispanic or Latino',
 '%N_White',
 '%N_Black',
 '%N_Native',
 '%N_Asian',
 '%N_Hawaiian',
 '%Hispanic',
 'geometry',]

In [None]:
m2w_income_race.sample(5)

In [None]:
type(m2w_income_race)

Testing it

Comparing an orginal with new combined data.

In [None]:
gdf_income.plot(figsize=(12,10),
                 column='Median Income',
                 legend=True, 
                 scheme='equal_interval')

In [None]:
m2w_income_race.plot(figsize=(12,10),
                 column='Median Income',
                 legend=True, 
                 scheme='equal_interval')   

Maps seems to check out from each dataset!


Great. Time for some exploring.

# Exploring Neighborhoods

Purpose is to query nbhoods, define new variables, find missing tracts

## Lay of the Land: Missing Tracts

In [None]:
m2w_income_race.head()

In [None]:
m2w_income_race.columns.to_list()

In [None]:
m2w_income_race[m2w_income_race['%Public transportation']>20]

Of the 9 I'm missing, 3 of those missing have 20% or higher

In [None]:
m2w_income_race[m2w_income_race['%Bus']>20]

Missing 1 census tract.

In [None]:
m2w_income_race[m2w_income_race['%Subway or elevated il']>5]

Missing no census tracts.

.

In [None]:
m2w_income_race[m2w_income_race['%Long-distance train or commuter rail']>3]

Missing one census tract: 14000US06037104124

Adding biking and walking, as there may be a correlation to transit friendly areas.

In [None]:
m2w_income_race[m2w_income_race['%Light rail']>2]

Lost no Light rail census tracts

## Digging a little deeper into each higher transit neighborhoods

A little trimming to identify neighborhoods

### Public Transit

In [None]:
nbh_pt20 = m2w_income_race[m2w_income_race['%Public transportation']>20]

In [None]:
nbh_pt20.tail()

In [None]:
nbh_pt20['Neighborhood'].value_counts()

In [None]:
nbh_pt20_count = nbh_pt20['Neighborhood'].value_counts()
nbh_pt20_count

In [None]:
nbh_pt20_count = nbh_pt20_count.reset_index()
nbh_pt20_count

### Bus

In [None]:
nbh_bus20 = m2w_income_race[m2w_income_race['%Bus']>20]

In [None]:
nbh_bus20.sample(5)

In [None]:
nbh_bus20['Neighborhood'].value_counts()

### Subway

In [None]:
nbh_sub5 = m2w_income_race[m2w_income_race['%Subway or elevated il']>5]

In [None]:
nbh_sub5

In [None]:
nbh_sub5['Neighborhood'].value_counts()

### Commuter

In [None]:
nbh_commute3 = m2w_income_race[m2w_income_race['%Long-distance train or commuter rail']>3]

In [None]:
nbh_commute3

In [None]:
nbh_commute3['Neighborhood'].value_counts()

### Light Rail

In [None]:
nbh_lr4 = m2w_income_race[m2w_income_race['%Light rail']>4]

In [None]:
nbh_lr4

**Most Frequenct users of Public Transit:**
* Westlake                  23x
* Koreatown                 18x
* Pico-Union                13x
* East Hollywood             9x
* Historic South-Central     6x
* Downtown                   5x
* Hollywood                  5x
* Panorama City              4x
* Harvard Heights            3x
* Vermont Square             2x
* Boyle Heights              2x
* Central-Alameda            2x
* Exposition Park            2x
* Vermont Knolls             1x
* South Park                 1x
* Florence                   1x
* Echo Park                  1x
* University Park            1x
* Chinatown                  1x
* Adams-Normandie            1x
* Highland Park              1x
* Los Feliz                  1x


**Other Neighborhoods to conisder:**
* North Hollywood (sub)     4x
* East Hollywood (sub)      3x
* Mount Washington (sub)    1x
* Chinatown (sub)           1x
* Studio City  (sub)        1x
* Montecito Heights (sub)   1x
* Palms (sub)               1x
* Valley Village (sub)      1x
* Baldwin Hills/Crenshaw (distance rail)    2x
* Jefferson Park (distance rail)            1x
* Exposition Park   (distance rail)         1

A breakdown by region

**SF Valley**
* Studio City
* North Hollywood
* Panorama City 
* Valley Village 

 
**Westside**
* Palms

**Central LA**
* Hollywood
* East Hollywood 
* Koreatown   
* Westlake  
* Downtown
* Echo Park
* Chinatown
* Harvard Heights
* Pico-Union  
* Los Feliz  

**South LA**
* Historic South-Central
* Vermont Square  
* Vermont Knolls 
* Central-Alameda 
* Exposition Park
* South Park
* Florence   
* University Park
* Adams-Normandie
* Baldwin Hills/Crenshaw
* Jefferson Park
* Exposition Park

**Eastside**
* Boyle Heights  

**Northeast LA**
* Highland Park 
* Mount Washington 
* Montecito Heights 

With my partner, we will need to decide which of these meighborhoods we would like to foucs on. Each region and each nieghborhood tells its own tale. We will start with Central LA (likely) as thats the most robust transit area in LA. 

In [None]:
m2w_income_race.to_file('m2w_income_race_new.geojson')

In [None]:
m2w_income_race_2 = gpd.read_file('m2w_income_race_new.geojson') 

In [None]:
type(m2w_income_race_2)

In [None]:
m2w_income_race_2.shape

In [None]:
m2w_income_race_2.head(4)

In [None]:
m2w_income_race_2.columns.to_list()

This concludes this notebook. In order to save space and memory, I will start a second notebook continuing where we left off. 