## I.  Useful Code for Mapping

Hi!  As per request, I've made a notebook that allows you to:

- Create an illustrative boundary around your case study neighborhood
- Classify by standard deviations

In [None]:
%matplotlib inline
import pandas as pd, numpy as np, matplotlib.pyplot as plt
import geopandas as gpd
from geopandas import GeoDataFrame
from shapely.geometry import Point
from scipy import ndimage
import matplotlib.pylab as pylab
import matplotlib.pyplot as plt
import warnings 
warnings.filterwarnings('ignore')

### 1.1  Reading in Tract Level Shapefile

In [None]:
#First, I'm going to read in my tract level shapefile as a geodataframe - see how I've put
#the full path in the file name - this is so I didn't have to copy all the shapefiles
#into a new datahub directory

tracts_gdf = gpd.read_file('/home/jovyan/CP201APythonNotebooks/Mapping/cb_2018_06_tract_500k.shp')

In [None]:
# See what we have

tracts_gdf.plot(figsize = (5, 5), color = "whitesmoke", edgecolor = "lightgrey", linewidth = 0.5).set_axis_off()

In [None]:
# Just keep Alameda County

tracts_gdf.drop(tracts_gdf[tracts_gdf['COUNTYFP']!="001"].index, inplace=True )
tracts_gdf.head()

### 1.2 Read in Data from ACS on Geographic Mobility

In [None]:
variable_types = {'Geo.id': 'str', 'GEO.id2': 'str'}
mobility_df=pd.read_csv('/home/jovyan/CP201APythonNotebooks/Lab2Files/ACS_17_5YR_B07013_with_ann.csv', delimiter = ',', header=[0], skiprows=[1], dtype=variable_types)
mobility_df.head()

In [None]:
# Rename my variables

mobility_df.rename(columns={"GEO.id":"fullfips","GEO.id2":"fips",
"GEO.display-label":"label",
"HD01_VD01":"tot",
"HD02_VD01":"tot_moe",
"HD01_VD03":"owner",
"HD02_VD03":"owner_moe",
"HD01_VD04":"renter",
"HD02_VD04":"renter_moe",
"HD01_VD05":"tot_samehouse",
"HD02_VD05":"tot_samehouse_moe",
"HD01_VD06":"owner_samehouse",
"HD02_VD06":"owner_samehouse_moe",
"HD01_VD07":"renter_samehouse",
"HD02_VD07":"renter_samehouse_moe",
"HD01_VD08":"tot_cty",
"HD02_VD08":"tot_cty_moe",
"HD01_VD09":"owner_cty",
"HD02_VD09":"owner_cty_moe",
"HD01_VD10":"renter_cty",
"HD02_VD10":"renter_cty_moe",
"HD01_VD11":"tot_dcty",
"HD02_VD11":"tot_dcty_moe",
"HD01_VD12":"owner_dcty",
"HD02_VD12":"owner_dcty_moe",
"HD01_VD13":"renter_dcty",
"HD02_VD13":"renter_dcty_moe",
"HD01_VD14":"tot_state",
"HD02_VD14":"tot_state_moe",
"HD01_VD15":"owner_state",
"HD02_VD15":"owner_state_moe",
"HD01_VD16":"renter_state",
"HD02_VD16":"renter_state_moe",
"HD01_VD17":"tot_abroad",
"HD02_VD17":"tot_abroad_moe",
"HD01_VD18":"owner_abroad",
"HD02_VD18":"owner_abroad_moe",
"HD01_VD19":"renter_abroad",
"HD02_VD19":"renter_abroad_moe"}, inplace=True)

In [None]:
# Join my ACS data to my shapefile - note a slightly different code than last time.  Either works, but
# this way I retain my GEOID as a variable that I can use to filter my case study census tracts on.

tracts_gdf['join_id'] = tracts_gdf['GEOID']
mobility_df['join_id']=mobility_df["fips"]
mobility_gdf = tracts_gdf.merge(mobility_df, on = 'join_id')
mobility_gdf.head()

## 1.3  Create a layer of my case study community boundaries

In [None]:
# Select my case study census tracts

dtoakland_gdf=mobility_gdf[(mobility_gdf.GEOID=="06001402800") 
                         | (mobility_gdf.GEOID=="06001402900")
                         | (mobility_gdf.GEOID=="06001403000")
                         | (mobility_gdf.GEOID=="06001403100")
                         | (mobility_gdf.GEOID=="06001403300")
                         | (mobility_gdf.GEOID=="06001983200")].copy()
dtoakland_gdf.head()

In [None]:
# Just keep the tract information. I have to keep the "COUNTYFP" because I am going to "dissolve" all the tracts
#based on that variable in the next step

dtoakland_gdf_simple = dtoakland_gdf[['GEOID','geometry', "COUNTYFP"]].reset_index()
dtoakland_gdf_simple.plot()

In [None]:
#What this step does is "dissolve" or "aggregates" all the polygons that share the same value for the variable I call
casestudy_geo=dtoakland_gdf.dissolve(by="COUNTYFP")
casestudy_geo.plot()

## 1.4 Make a Map!

In [None]:
# Let's start by creating a variable to map.  I'm just going to do percent owners.

mobility_gdf["pct_owner"]=mobility_gdf["owner"]/mobility_gdf["tot"]*100
mobility_gdf.head()

In [None]:
#  Now, I'm just going to make a map!  I'm going to add my case study layer, creating an edge for it, and keeping the
# inside of the polygon blank (facecolor="none")

figure, ax = plt.subplots(figsize=(14,10))
base = mobility_gdf.plot(column="pct_owner", scheme = "quantiles", k = 5, legend=True, ax=ax, cmap="Blues")
casestudy_geo.plot(ax=base, facecolor='none', edgecolor = "orange", linewidth = 2)
lims=plt.axis("equal")
ax.set_axis_off()

ax.set_title('Percent Owners, 2017', fontdict= 
            {'fontsize':25})

plt.show()

## 1.5  Classifying by Standard Deviations

This proved to be a bit harder!  The trick is to install the "map classify" options within the library pysal.

The full documentation can be found here:
https://pysal.readthedocs.io/en/v1.11.0/library/esda/mapclassify.html

And, I figured out how to integrate it with our panda mapping here:
http://darribas.org/gds_scipy16/ipynb_md/02_geovisualization.html

First, we need to install a different version of pysal than is available on datahub.

In [None]:
pip install pysal --upgrade pip

In [None]:
# Next, import the right functions
import pysal as ps
import pysal.viz.mapclassify as mc

The mapclassify function has the following properties:

Attributes:	
yb : array
(n,1), bin ids for observations,

bins : array
(k,1), the upper bounds of each class

k : int
the number of classes

counts : array
(k,1), the number of observations falling in each class



In [None]:
# I'm going to ask it to create standard deviation bins for my pct_owner variable
pct_owner_stdev=mc.Std_Mean(mobility_gdf.pct_owner)

In [None]:
# It automatically created four bins based on the distribution in the data
pct_owner_stdev.k

In [None]:
#This gives me the upper value of each bin.
pct_owner_stdev.bins

In [None]:
#I can also set the bins at specific levels of standard deviations.  Note
#that the bins return "unlikely" values - (e.g. we won't get -19 percent
#owners or more than 100% owners).  That means that we don't have "outlier"
#census tracts with values beyond 2 standard deviations from the mean, and 
#why python automatically assigned 4 bins above.
st_new = mc.Std_Mean(mobility_gdf.pct_owner, multiples = [-3, -2, -1, 1, 2, 3])
st_new.bins

In [None]:
#Here, we'll assign the bins created above to our map
#Note that even though I assign seven bins, it's going to default
#to the four bins that are "realistic" for the data.
figure, ax = plt.subplots(figsize=(14,10))
base = mobility_gdf.assign(cl=pct_owner_stdev.yb).plot(column="cl", categorical=True, k = 7, legend=True, ax=ax, cmap="Blues")