# <hl> Week 3 Assignment
**Author:** Demetria Murphy
    
**Description:** In this assignment, I will be exploring both the 1940 and 1960 Census data for race by Census tract to better understand the Black population change both before and after racially restrictive covenants were outlawed.
    
**Data Source:** Philip J. Ethington, Anne Marie Kooistra, and Edward DeYoung, *Los Angeles County Union Census Tract Data Series, 1940-1990*, Version 1.01. Created with the support of the John Randolph Haynes and Dora Haynes Foundation. (Los Angeles: University of Southern California, 2000).

In [None]:
import pandas as pd

In [None]:
import geopandas as gpd

In [None]:
df = gpd.read_file('../Data/ethington.zip')

# <hl> Data Exploration

Let's take a first look at the size of our data...

In [None]:
df.head()

In [None]:
df.shape

This data file contains 252 different variables for 1656 Census tracts in Los Angeles County from 1940-1990. *Wow!* This will be so helpful to understand the surrounding Los Angeles landscape from both before and after racially restrictive covenants became outlawed in 1948.

Let's view a sample:

In [None]:
df.sample

What types of data do we have in this dataset?

In [None]:
df.info(verbose=True, null_counts=True)

In [None]:
df.info(verbose=True, show_counts=True)

**Important Note:** Despite much of the data characterized as float, we know based on the data sample that it is likely there is a .0 on most if not all the values. This means that the data is really integers, but Python is reading it as "float64."

Let's take a look at the census tract number column. This will be the identifier for the spatial unit within the scope of this project.

In [None]:
df.CTBNA.head()

# <hl> Shapefile Attributes

We know our file is a shapefile, so let's import the necessary packages to work with our data file.

In [None]:
import os
import matplotlib.pyplot as plt

Let me check which version of Python is running here to find supporting documentation for exploring a shapefile.

In [None]:
from platform import python_version

print(python_version())

Here are the geometric columns used that will plot this spatial dataset.

In [None]:
df.AREA.head()

In [None]:
df.PERIMETER.head()

Let's run an initial plot.

In [None]:
df.plot()
plt.show()

This shape is familiar...the shape of Los Angeles County. Seems like our data is working!

# <hl> Dropping Columns 
 1. Null 
 2. by Name (to columns used on variable list]

In [None]:
df.columns[df.all()].tolist()

Some of these are important columns such as 'P40T', 'AREA', 'PERIMETER' and 'geometry' which we need to plot. *Why are these showing up as null?* Let's try dropping all the columns we do not need, all variables from 1970-1990, by name.

In [None]:
columns_to_drop70 = ['P70T','P70WNH', 'P70BNH', 'P70ONH','P70HIS', 'A700004', 'A700517', 'A701864', 'A7065UP', 'A70MEDN', 'M70SINGL', 'M70MAR', 'M70OTH', 'E70C', 'E70H', 'E70OTH', 'O70WC', 'O70BC', 'R70SH65', 'R70SC65', 'R70OT65', 'H70TOT', 'H70SFU', 'H70OTH', 'INC70MED', 'H70MVL', 'H70MRN']
columns_to_drop80 = ['P80T', 'P80WNH', 'P80BNH', 'P80ONH', 'P80HIS', 'A800004', 'A800517', 'A801864', 'A8065UP', 'A80MEDN', 'M80SINGL', 'M80MAR', 'M80OTH', 'E80C', 'E80H', 'E80OTH', 'O80WC', 'O80BC', 'R80SH75', 'R80SC75', 'R80OT75', 'H80TOT', 'H80SFU', 'H80OTH', 'INC80MED', 'H80MVL', 'H80MRN']
columns_to_drop90 = ['P90T', 'P90WNH', 'P90BNH', 'P90ONH', 'P90HIS', 'A900004', 'A900517', 'A901864', 'A9065UP', 'A90MEDN', 'M90SINGL', 'M90MAR', 'M90OTH', 'E90C', 'E90H', 'E90OTH', 'O90WC', 'O90BC', 'R90SH85', 'R90SC85', 'R90OT85', 'H90TOT', 'H90SFU', 'H90OTH', 'INC90MED', 'H90MVL']

In [None]:
df = df.drop(columns_to_drop70,axis=1)
df.head()

In [None]:
df = df.drop(columns_to_drop80,axis=1)
df.head()

In [None]:
df = df.drop(columns_to_drop90,axis=1)
df.head()

In [None]:
df.columns.to_list()

In [None]:
columns_to_drop = ['P40WNH', #to drop all other columns
 'P40BNH',
 'P40ONH',
 'P40HIS',
 'P40TW',
 'P40NATW',
 'P40FBW',
 'P40TOTNW',
 'P40NBNW',
 'P40MEX',
 'P40CSA',
 'E40H',
 'E40C',
 'O40WC',
 'O40BC',
 'O4001',
 'O4002',
 'O4003',
 'O4004',
 'O4005',
 'O4006',
 'O4007',
 'O4008',
 'O4009',
 'O4010',
 'O4011',
 'O4012',
 'O4013',
 'O4005SPS',
 'O4005PRS',
 'H40TOT',
 'H40NWT',
 'H40OWN',
 'H40NWO',
 'H40RNT',
 'H40WRN',
 'H40NRN',
 'H40MVL',
 'H40MRN',
 'P50WNH',
 'P50BNH',
 'P50ONH',
 'P50HIS',
 'P50TW',
 'P50NATW',
 'P50FBW',
 'P50TOTNW',
 'P50NBNW',
 'P50SST',
 'P50SSN',
 'P50SSFB',
 'P50MEX',
 'P50CSA',
 'E50H',
 'E50C',
 'O50WC',
 'O50BC',
 'O5001',
 'O5001M',
 'O5001F',
 'O5002',
 'O5002M',
 'O5002F',
 'O5003',
 'O5003M',
 'O5003F',
 'O5004',
 'O5004M',
 'O5004F',
 'O5005PRM',
 'O5005PRF',
 'O5005',
 'O5006M',
 'O5006F',
 'O5006',
 'O5007CLM',
 'O5007CLF',
 'O5007SLM',
 'O5007SLF',
 'O5007',
 'O5008M',
 'O5008F',
 'O5008',
 'O5009M',
 'O5009F',
 'O5009',
 'O5010M',
 'O5010F',
 'O5010',
 'O5011M',
 'O5011F',
 'O5011',
 'O5012M',
 'O5012F',
 'O5012',
 'O5013M',
 'O5013F',
 'O5013',
 'H50TOT',
 'H50NWT',
 'H50OWN',
 'H50NWO',
 'H50RNT',
 'H50NRN',
 'H50MVL',
 'H50MRN',
 'P60WNH',
 'P60BNH',
 'P60ONH',
 'P60HIS',
 'P60TW',
 'P60TOTNW',
 'P60NBNW',
 'P60SST',
 'P60SSN',
 'P60SSFB',
 'P60MEX',
 'E60H',
 'E60C',
 'O60WC',
 'O60BC',
 'O6001',
 'O6002',
 'O6003',
 'O6004',
 'O6005PRM',
 'O6005PRF',
 'O6005',
 'O6006M',
 'O6006F',
 'O6006',
 'O6007CLM',
 'O6007CLF',
 'O6007SLM',
 'O6007SLF',
 'O6007',
 'O6008M',
 'O6008F',
 'O6008',
 'O6009M',
 'O6009F',
 'O6009',
 'O6010M',
 'O6010F',
 'O6010',
 'O6011M',
 'O6011F',
 'O6011',
 'O6012M',
 'O6012F',
 'O6012',
 'O6013M',
 'O6013F',
 'O6013',
 'H60TOT',
 'H60NWT',
 'H60OWN',
 'H60NWO',
 'H60RNT',
 'H60NRN',
 'H60MVL',
 'H60MRN']

In [None]:
df = df.drop(columns_to_drop,axis=1)
df.head()

# <hl> Rename Columns

Let's clean up the data we are using for this inquiry.

In [None]:
columns = list(df)
columns

In [None]:
df.columns = ['AREA',
 'PERIMETER',
  'Census_tract',
 'Pop_total_40',
 'Pop_black_40',
 'Pop_total_50',
 'Pop_black_50',
 'Pop_total_60',
 'Pop_black_60',
 'geometry']

In [None]:
df.head()

# <hl> Simple Stats and Plots: Exploring Black Los Angeles County 1940-1960

Let's first explore the overall population in LA County.

In [None]:
df['Pop_total_40'].describe()

In [None]:
df['Pop_total_50'].describe()

In [None]:
df['Pop_total_60'].describe()

In [None]:
df['Pop_total_40'].plot.hist(bins=50)

In [None]:
df['Pop_total_50'].plot.hist(bins=50)

In [None]:
df['Pop_total_60'].plot.hist(bins=50)

It looks like the mean overall population per Census Tract is growing from ~1697 in 1940 to ~3661 in 1960. The histograms show am increase in number of Census tracts with denser populations showing just how intense this population growth was during this post-war time. 

***How does the overall Black population change?***

In [None]:
df['Pop_black_40'].describe()

In [None]:
df['Pop_black_50'].describe()

In [None]:
df['Pop_black_60'].describe()

In [None]:
df['Pop_black_40'].plot.hist(bins=10)

In [None]:
df['Pop_black_50'].plot.hist(bins=10)

In [None]:
df['Pop_black_60'].plot.hist(bins=10)

Black population continues with the overall trend and grows from an average of ~45 per Census tract in 1940 to ~278 Black residents per Census tract. The histograms show us that there were few Census tracts with dense Black populations and they slightly grew from 1940 to 1960.

# <hl> Check Data for Null/Missing Values

In [None]:
len(df)

In [None]:
df.isna().sum()

Awesome. This means that none of our data has null values. Let's continue exploring this data.

# <hl> Sorting

Based on our previous exploration, ***which Census tracts have the highest Black population?*** In 1940, this will give us a sense of where Black people could live. In 1950 and 1960, this will give us a sense of both where Black people lived after the outlaw of racially restrictive covenants. *What correlations are there with our Green Book data locations?*

In [None]:
#Let's start with 1940.
df_sorted = df.sort_values(by='Pop_black_40',ascending = False)

In [None]:
df_sorted[['Census_tract','Pop_black_40']].head(10)

In [None]:
df_sorted.head(10).plot.bar(x='Census_tract',
                            y='Pop_black_40')

In [None]:
df_sorted.head(10).plot.bar(x='Census_tract',
                            y='Pop_black_40',
                            title='Top 10 Census Tracts with Highest Black Population in 1940 Los Angeles County')

***How does this compare to overall population in 1940?***

In [None]:
df_sorted40 = df.sort_values(by='Pop_total_40',ascending = False)

In [None]:
df_sorted40[['Census_tract','Pop_total_40']].head(10)

In [None]:
df_sorted40.head(10).plot.bar(x='Census_tract',
                            y='Pop_total_40')

In [None]:
df_sorted40.head(10).plot.bar(x='Census_tract',
                            y='Pop_total_40',
                            title='Top 10 Census Tracts with Highest Population in 1940 Los Angeles County')

The 4th densest Census tract overall, 2260, has the highest Black population. *I wonder where that is.*

Now, let's explore the 1960 data for overall and Black population to see if there are any other connections we can make.

In [None]:
df_sorted60 = df.sort_values(by='Pop_total_60',ascending = False)

In [None]:
df_sorted60[['Census_tract','Pop_total_60']].head(10)

In [None]:
df_sorted60.head(10).plot.bar(x='Census_tract',
                            y='Pop_total_60')

In [None]:
df_sorted60.head(10).plot.bar(x='Census_tract',
                            y='Pop_total_60',
                            title='Top 10 Census Tracts with Highest Population in 1960 Los Angeles County')

By estimate, it appears as though the densest Census tract has changed to 2780 in 1960 and appears slightly less dense than in 1940. Let's explore how the Black population has changed.

In [None]:
df_sorted60b = df.sort_values(by='Pop_black_60',ascending = False)

In [None]:
df_sorted60b[['Census_tract','Pop_black_60']].head(10)

In [None]:
df_sorted60b.head(10).plot.bar(x='Census_tract',
                            y='Pop_black_60')

In [None]:
df_sorted60b.head(10).plot.bar(x='Census_tract',
                            y='Pop_black_60',
                            title='Top 10 Census Tracts with Highest Population in 1960 Los Angeles County')

It also appears that many of the Census tracts with the highest Black population have changed and also have reached above 6000 people, denser than in 1940. At a cursory glance, it seems as though many Black residents relocated to other neighborhoods in post-War Los Angeles.

***Where did Black residents move?***

*Should we filter out the census tracts without any residents before moving on to mapping?*

In [None]:
df[df['Pop_total_40']==0]

In [None]:
df[df['Pop_total_50']==0]

In [None]:
df[df['Pop_total_60']==0]

**Important Caveat:** Given that we are trying to understand the population change dynamics over a period of time, it isn't smart for us to erase the Census tracts with 0 especially given the population is not consistent in that Census tract over time.

# <hl> Adding Percentage Black Column

Adding this column will give us a better sense of the overall percentage of Black residents to better compare Census tracts.

In [None]:
df['Pct_black_40'] = ''
df['Pct_black_40'] = df['Pop_black_40']/df['Pop_total_40'] *100

In [None]:
df['Pct_black_50'] = ''
df['Pct_black_50'] = df['Pop_black_50']/df['Pop_total_50'] *100

In [None]:
df['Pct_black_60'] = ''
df['Pct_black_60'] = df['Pop_black_60']/df['Pop_total_60'] *100

In [None]:
df.head()

# <hl> Plotting and Mapping

**Mapping 1940:**

In [None]:
df.plot('Pct_black_40')

At first glance, we can see that Black population in 1940 was solely concentrated to the urban core.

In [None]:
df.geometry

In [None]:
df.plot(
            figsize=(20,12),   #size of the plot (a bit bigger than the default)
            column = 'Pct_black_40',   # column that defines the color of the dots
            legend = True,     # add a legend           
            legend_kwds={
               'loc': 'upper right',
               'bbox_to_anchor':(1,1)
            }                  # this puts the legend to the side
       )

In [None]:
import folium

In [None]:
latitude = 34.0522
latitude

In [None]:
longitude = -118.2437
longitude

In [None]:
#now let's adjust the center of the map to LA
m = folium.Map(location=(latitude, longitude))
m

Let's take a look at the breaks in the map to see what looks best.

In [None]:
df.plot(figsize=(12,10),
                 column='Pct_black_40',
                 legend=True, 
                 scheme='NaturalBreaks')

In [None]:
#let's make this one a bit bigger
df.plot(figsize=(35,30),
                 column='Pct_black_40',
                 legend=True, 
                 scheme='equal_interval')

In [None]:
df.plot(figsize=(12,10),
                 column='Pct_black_40',
                 legend=True, 
                 scheme='quantiles')

A map with natural breaks gives us a better sense of the spread of our data, so we'll use that.

# <hl> How does 1940 compare with the 1960 Black population in Los Angeles County?

In [None]:
df.plot(figsize=(12,10),
                 column='Pct_black_60',
                 legend=True, 
                 scheme='NaturalBreaks')

In [None]:
# create the 1x2 subplots
fig, axs = plt.subplots(1, 2, figsize=(15, 12))

# name each subplot
ax1, ax2 = axs

# 1940 map on the left
df.plot(column='Pct_black_40', 
            cmap='RdYlGn_r', 
            scheme='user_defined',
            classification_kwds={'bins':[20,40,60,80,100]},
            edgecolor='white', 
            linewidth=0., 
            alpha=0.75, 
            ax=ax1, # this assigns the map to the subplot,
            legend=True
           )

ax1.axis("off")
ax1.set_title("Percentage of Black Population in Los Angeles County, 1940")

# 1960 on the right
df.plot(column='Pct_black_60', 
            cmap='RdYlGn_r', 
            scheme='user_defined',
            classification_kwds={'bins':[20,40,60,80,100]},
            edgecolor='white', 
            linewidth=0., 
            alpha=0.75, 
            ax=ax2, # this assigns the map to the subplot
            legend=True
           )

ax2.axis("off")
ax2.set_title("Percentage of Black Population in Los Angeles County, 1960")

Looks like there was some movement further outside the true urban core from 1940 to 1960, as expected with this shift in legislation. Additionally, there were more Census tracts in 1960 with 80% or higher Black population.

# <hl> How many Census tracts are more than 80% Black in 1940 vs 1960?

In [None]:
df[df.Pct_black_40 > 80]

In [None]:
df[df.Pct_black_40 > 80].count()

There were **3 Census tracts** in 1940 that had 80% or more Black residents.

Let's plot these.

In [None]:
df[df.Pct_black_40 > 80].plot(figsize=(12,10),
                                             column='Pct_black_40',
                                             legend=True, 
                                             scheme='NaturalBreaks')

In [None]:
df[df.Pct_black_60 > 80]

In [None]:
df[df.Pct_black_60 > 80].count()

In 1960, there were **39 Census tracts** that had 80% or more Black residents, an increase of 36 Census tracts or a **1200% growth rate!**

Let's plot these.

In [None]:
df[df.Pct_black_60 > 80].plot(figsize=(12,10),
                                             column='Pct_black_60',
                                             legend=True, 
                                             scheme='NaturalBreaks')

Let's place these plots side by side to better see the Black population density change from 1940 to 1960.

In [None]:
# create the 1x2 subplots
fig, axs2 = plt.subplots(1, 2, figsize=(15, 12))

# name each subplot
ax1a, ax2b = axs2

# 1940 map on the left
df[df.Pct_black_40 > 80].plot(column='Pct_black_40', 
            cmap='RdYlGn_r', 
            scheme='NaturalBreaks',
            edgecolor='white', 
            linewidth=0., 
            alpha=0.75, 
            ax=ax1a, # this assigns the map to the subplot,
            legend=True
           )

ax1a.axis("off")
ax1a.set_title("80%+ Black Census Tracts in Los Angeles County, 1940")

# 1960 on the right
df[df.Pct_black_60 > 80].plot(column='Pct_black_60', 
            cmap='RdYlGn_r', 
            scheme='NaturalBreaks',
            edgecolor='white', 
            linewidth=0., 
            alpha=0.75, 
            ax=ax2b, # this assigns the map to the subplot
            legend=True
           )

ax2b.axis("off")
ax2b.set_title("80%+ Black Census Tracts in Los Angeles County, 1960")

# <hl> Questions for Further Analysis

*How do you create a side-by-side chloropleth map focusing specifically on Census tracts 80%+?*

Below is the code I used that gave me an error.

In [None]:
import folium

In [None]:
#m = folium.Map(location=[34.2,-118.2], 
#               zoom_start = 9,
#               tiles='CartoDB positron', 
#               attribution='CartoDB')
#
# plot chorpleth over the base map
#folium.Choropleth(
#                  geo_data=df, # geo data
#                  data=df, # data          
#                  key_on='fCensus_tract', # key, or merge column
#                  columns=['Census_tract', 'Pct_black_40'], # [key, value]
#                  fill_color='BuPu',
#                  line_weight=0.1, 
#                  fill_opacity=0.8,
#                  line_opacity=0.2, # line opacity (of the border)
#                  legend_name='Population Black (1940 Census Data)').add_to(m)    # name on the legend color bar
# m