# CP113B: Community and Economic Development

In this datascience module, we're going to explore the relationship between historical patterns of redlining and concentrated poverty today.  Community and economic development is fundamentally about bringing investment back to high-poverty areas due to lack of access to credit and other forms of capital.  It recognizes that poverty is in part due to structural--not individual--causes.  So let's look at how past policies that contributed to disinvestment of minority neighborhoods in the 1940s and 1950s still influence our cities today.

In the first part of today's module, we're going to introduce you to the Jupyter Notebook.  We'll then look at some redlining maps, and discuss them as a group.  Finally, we'll analyze whether areas that were indicated as being "high risk" on the map have higher rates of poverty today.

## 1.0 The Jupyter Notebook

First of all, note that this page is divided into what are called cells. You can navigate cells by clicking on them or by using the up and down arrows. Cells will be highlighted as you navigate them.

### Text cells

Text cells (like this one) can be edited by double-clicking on them. They're written in a simple format called Markdown to add formatting and section headings. You don't need to learn Markdown, but know the difference between Text Cells and Code Cells.

### Code cells

Other cells contain code in the Python 3 language. Don't worry -- we'll show you everything you need to know to succeed in this part of the class.

The fundamental building block of Python code is an expression. Cells can contain multiple lines with multiple expressions. 

Let's learn how to "run" cells.

### Running cells

"Running a cell" is equivalent to pressing "Enter" on a calculator once you've typed in the expression you want to evaluate: it produces an output. When you run a text cell, it outputs clean, organized writing. When you run a code cell, it computes all of the expressions you want to evaluate, and can output the result of the computation.

To run the code in a code cell, first click on that cell to activate it. It'll be highlighted with a little green or blue rectangle. Next, you can either press the ▶| Run button above or press Shift + Return or Shift + Enter. This will run the current cell and select the next one.

Text cells are useful for taking notes and keeping your notebook organized, but your data analysis will be done in code cells. We will focus on code cells for the rest of the class.

### How to Save Your Work

Click on the leftmost icon in the tool bar (left of the plus icon).
Alternatively, you can hit Ctrl+S on a PC or Command+Enter on a Mac!


### Importing libraries

This next code cell allows us to import different packages that will allow us to do the analysis later.  Just run the cell.  If the cell runs successfully, you'll see a number show up in the brackets on the left.

In [None]:
#Importing Utilities
#These are all pre-written Python packages that we will be using to read in, clean, analyze, and model our data.
import os
import re
import geojson
import folium
import math
import geopandas as gpd
import json
import matplotlib as mpl
import pylab as plt
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import ipywidgets as widgets
import geopandas as gpd
from IPython.display import IFrame
plt.style.use('fivethirtyeight')
# import ipywidgets as widgets
%matplotlib inline
pd.options.display.float_format = '{:.2f}'.format

## 2.0  Redlining Maps

Before we analyze how redlining has affected current conditions, let's take a look at the redlining map for Oakland.  Run the next three cells. 

In [None]:
#This cell imports the map file for Oakland

geo_json_data = json.load(open('CAOakland1937.geojson'))

In [None]:
# This cell assigns each of the HOLC grades a color

def my_color_function(feature):
    if feature['properties']['holc_grade']== 'A':
        return '#98ff98'
    elif feature['properties']['holc_grade']== 'B':
        return '#5bc0de'
    elif feature['properties']['holc_grade'] =='C':
        return '#ffe200'
    else:
        return '#ff00aa'

In [None]:
#  This cell draws the map, including the coordinates for Oakland
m = folium.Map([37.8044,-122.271], tiles='cartodbpositron', zoom_start=12)
folium.GeoJson(
    geo_json_data,
        style_function=lambda feature: {
        'fillColor': my_color_function(feature),
        'color' : 'black',
        'weight' : 1,   
        }
    ).add_to(m)
m

## 3.0  Comparing Redlining Scores with Conditions Today

Community development seeks to undo the structural forces that helped to create conditions of concentrated poverty by investing in neighborhoods that were once redlined.  Let's explore how redlining still affects neighborhoods today.

Each of the HOLC grades has been given a number.  Neighborhoods assigned a value of 1 were considered "safe" to lend in and were indicated in green on the map.  Neighborhoods assigned a value of 2 were also considered "safe", were indicated in blue, and borrowers there could get FHA loans.  Neighborhoods assigned a value of 3 (yellow) could still get loans, but the appraisers were indicating that they were concerned that the neighborhood was declining.  Neighborhoods assigned a value of 4 (red) were considered "hazardous" and the federal government refused to guarantee mortgages in those areas.

We're going to see how scores assigned in the 1940s-1960s influence conditions in Oakland today.

In [None]:
#  First, we're going to read in data that Carolina pulled from Policy Map
!pip install xlrd
df_2017 = pd.read_excel('Oakland_1990_2017_Data.xlsx', dtype={"Census Tract":str})

In [None]:
# Let's take a look at the data
df_2017.head()

###  Take a look at the first five rows of data

Which census tract has the highest, or worst, redlining grade?  What do you notice about the characteristics of that tract? 

### Correlation

So even looking at 5 census tracts, we can see that there is a relationship between a higher (worse) redlining score and the poverty rate, even 60 years later.  But maybe this was just a coincidence.  We are going to calculate a "correlation" statistic for all the census tracts in Oakland that will let us compare the relationships more scientifically.

Simply, a correlation statistic tells us the direction and strength of the relationship between two variables.  We're going to create a "widget" that lets us easily compare the redlining grade with different census tract characteristics in 1990 and 2017.  

Go ahead and run the next two cells.

In [None]:
# widget imports 
# import the widgets module
import ipywidgets as widgets
from ipywidgets import interact, interactive, fixed, interact_manual
from IPython.display import display

In [None]:
def f(a, b):
    # Polynomial best fit line.
    df_2017_fit = np.polyfit(df_2017[a],df_2017[b],1)

    # Scatter plots.
    ax1= df_2017.plot(kind='scatter', x=a,y=b, color='blue',alpha=0.5, figsize=(12,6))

    # Regression lines.
    plt.plot(df_2017[a],df_2017_fit[0]*df_2017[a]+df_2017_fit[1], color='darkblue', linewidth=2)

    # Regression equations.
    x_text = max(df_2017[a])
    y_text = min(df_2017[b])
    plt.text(x_text,y_text,'y={:.2f}+{:.2f}*x'.format(df_2017_fit[1],df_2017_fit[0]),color='darkgreen',size=12, horizontalalignment='right',
         verticalalignment='top')

    # Legend, title and labels.
    plt.legend(labels=['DF_2017'+'Regression Line','DF_2017'])
    plt.title('Relationship between '+ a + ' and ' + b, size=18)
    plt.xlabel(a, size=12)
    plt.ylabel(b, size=12);
    corr = df_2017.corr()[a][b]
    print('Correlation: ',corr)
    
display(widgets.interactive(f, a=df_2017.columns.tolist(),b=df_2017.columns.tolist()))

###  Comparing the Redlining Grade with the Poverty Rate in 2017

Put "Redlining Grade" in the a box and "Poverty Rate 2017" in the b box.  How can we interpret the results?  The correlation is a positive .40909, and the line slopes upwards.  This shows that as a census tract's redlining grade gets worse (with 4 being "redlined"), the poverty rate goes up.  A correlation coefficient of .4 is considered a "strong" correlation (anything above a .3 is meaningful).

How about "Redlining Grade" correlated with the Percent BA Degree or Higher in 2017.  This time we get a negative correlation coefficient - -0.4173.  This means that there is a negative relationship between a tract's redlining grade and the percent of the population that has a BA - in other words, tracts that were redlined have fewer residents who have completed their BA degree.

### Gentrification

The dataset also includes a numeric "risk" score for gentrification, based on the gentrification maps produced by the Urban Displacement Project.  A higher number means the neighborhood is at higher risk of gentrification.  

## 4.0 Map the Indicators

Now that we're in Python, it becomes pretty easy to make maps of Oakland using the indicators in our dataset.  Feel free to make some maps of the indicators you explored above!

In [None]:
#  Just run this cell

tracts_gdf = gpd.read_file('Oakland_Tracts.shp')
merged_gdf=tracts_gdf.set_index("GEOID_2").join(df_2017.set_index("Census Tract"))

In [None]:
# Run this to get a list of variables in the data
df_2017.columns

In [None]:
# This cell draws the map - just change the text in red after "column=" and what you want the title to be to change the indicator being mapped

figure, ax = plt.subplots(figsize=(14,10))
ax = merged_gdf.plot(column="Poverty Rate 2017", legend=True, ax=ax, cmap="Blues")
lims=plt.axis("equal") 
ax.set_axis_off()
ax.set_title('Poverty Rate 2017', fontdict= 
            {'fontsize':25})
plt.show()

## 5.0 Bonus Material: Exploring Similar Data for Other Cities

This <a href="https://www.wenfeixu.com/redliningmap/">link</a> takes you to a website where Wen Feixu has developed similar analysis for all cities with redlining maps.  Feel free to explore a different city!

# 6.0  Resetting the Python Notebook

If you want to run through the notebook again on your own, just click on the link in bCourses, and then when you start the session, select "Kernel" from the menu above, and then select "Restart & Clear Output".  This will give you a "refreshed" version of the notebook.  Just don't forget that you need to run the cells in order that they're presented in the notebook!