## Improving our Spatial Join output Maps

In some of the maps we made above we have add to address the issue that the census tract data are for all of Alameda County while the Permit data is for the City of Oakland.  We have worked with this by **zooming** into Oakland in our maps. However, the data for locations outside of Oakland are still displayed.

Another way to address this is by reading in a boundary file for the city of Oakland and then mapping our data on top of that.

### City of Oakland data

To do this we will load the boundary file for all census places in California.

In [None]:
places_gdf =  gpd.read_file("zip://../notebook_data/census/Places/cb_2018_06_place_500k.zip")
places_gdf.head(3)

Subset the data to Oakland...

In [None]:
oakland_gdf = places_gdf.loc[places_gdf['NAME']=='Oakland'].copy().reset_index(drop=True) #subset


And plot the data

In [None]:
oakland_gdf.plot();

Now we can recreate our map of tracts with permits and display these on top of the city boundary. This will remove any gaps in the city where we do not have census tracts that contain permit locations.

In [None]:
fig, ax = plt.subplots(figsize = (14,8)) 

# add city boundary
oakland_gdf.plot(ax=ax, color="grey", alpha=0.6) 

# Display the output of our spatial join
tract_permit_counts_gdf.plot(ax=ax,column='units_permit', 
                             scheme="quantiles", 
                             cmap="YlGnBu",
                             edgecolor="grey",
                             legend=True, )

ax.set_title("Count of Permitted Units in Oakland by Census Tract")
ax.set_axis_off() 
plt.show()

Now that we have are permit data aggregated to census tract, let's see how we can explore the relationship between the ACS data and the permit data.

For example, let's see if there is any spatial relationship between the count of permitted units and the percent homeowners (`p_owners`) in the census tract.

First, let's create a point dataset of our census tracts.

In [None]:
tracts_acs_gdf_point = gpd.GeoDataFrame(tracts_acs_gdf.loc[:,tracts_acs_gdf.columns!='geometry'], 
                            geometry=tracts_acs_gdf.centroid)

Now map the census tract points on top of our tract polygons symbolized by our variables of interest.

In [None]:
fig, ax = plt.subplots(figsize = (14,8)) 

# add city boundary
oakland_gdf.plot(ax=ax, color="grey", alpha=0.6) 

# Display the output of our spatial join
tract_permit_counts_gdf.plot(ax=ax,column='units_permit', 
                             scheme="quantiles", 
                             cmap="YlGnBu",
                             edgecolor="grey",
                             legend=True, )

# Display percent home owners
tracts_acs_gdf_point.plot(ax=ax,column='p_owners', 
                             cmap="hot",
                             edgecolor="grey",
                             markersize=60,
                             legend=True, )

ax.set_title("Count of Permitted Units in Oakland by Census Tract")
ax.set_axis_off() 
plt.show()

Well that's not as good as it could be!

The census tract points are for the entire county but our tract polygons, output from `sjoin`, are only in Oakland.

Let's **clip** the census tract points to the boundary of Oakland.

### Clipping GeoDataFrames

Clipping involves cutting out the features (or rows) in one geospatial dataset that spatially intersect the features of a polygon geospatial dataset. It is often called a cookie cutter operation. This is useful if we limit the information to a certain region. For example, if we want the census tracts for the city of Oakland we can clip the census tracts for the state to the boundary of that city.

First, take a look at the Geopandas `clip` function documentation.
- Clip requires both datasets to be in the same CRS. 

In [None]:
# Uncomment to read
#help(gpd.clip)

Clip the census tract points to the boundary of Oakland.

In [None]:
tracts_acs_gdf_point_clipped = gpd.clip(tracts_acs_gdf_point, oakland_gdf).reset_index(drop=True)

Now, let's try that map again.

In [None]:
fig, ax = plt.subplots(figsize = (14,8)) 

# add city boundary
oakland_gdf.plot(ax=ax, color="grey", alpha=0.6) 

# Display the output of our spatial join
tract_permit_counts_gdf.plot(ax=ax,column='units_permit', 
                             scheme="quantiles", 
                             cmap="Blues",
                             edgecolor="grey",
                             legend=True, 
                             legend_kwds={'title':'Permitted Units'})

# Display percent home owners
tracts_acs_gdf_point_clipped.plot(ax=ax,column='p_owners', 
                             cmap="Reds",
                             edgecolor="grey",
                             markersize=60,
                             legend=True, 
                             legend_kwds={'label': 'Proportion of Home Owners'})

ax.set_title("Count of Permitted Units in Oakland by Census Tract")
ax.set_axis_off() 
plt.show()

Now that's better! This map seems to indicate that a larger number of permitted units can be found in areas with lower rates of home ownership.

> `Clip` is a very common geometric data transformation. Check out the optional `Spatial Interpolation notebook` if you want to learn more.

### Any Questions?

### Save your work!
Save the files we created so we can reuse in subsequent notebooks.

In [None]:
# Permit data joined to census tract ACS data
tracts_and_permits_gdf.to_file("../outdata/tracts_and_permits_gdf.json", driver="GeoJSON")

In [None]:
# Tract ACS data joined to Permit date
permits_and_tracts_gdf.to_file("../outdata/permits_and_tracts_gdf.json",driver="GeoJSON")

In [None]:
# City of Oakland boundary file
oakland_gdf.to_file("../outdata/oakland_gdf.json", driver="GeoJSON")

In [None]:
## Improving our Spatial Join output Maps

In some of the maps we made above we have add to address the issue that the census tract data are for all of Alameda County while the Permit data is for the City of Oakland.  We have worked with this by **zooming** into Oakland in our maps. However, the data for locations outside of Oakland are still displayed.

Another way to address this is by reading in a boundary file for the city of Oakland and then mapping our data on top of that.

### City of Oakland data

To do this we will load the boundary file for all census places in California.

places_gdf =  gpd.read_file("zip://../notebook_data/census/Places/cb_2018_06_place_500k.zip")
places_gdf.head(3)

Subset the data to Oakland...

oakland_gdf = places_gdf.loc[places_gdf['NAME']=='Oakland'].copy().reset_index(drop=True) #subset


And plot the data

oakland_gdf.plot();

Now we can recreate our map of tracts with permits and display these on top of the city boundary. This will remove any gaps in the city where we do not have census tracts that contain permit locations.

fig, ax = plt.subplots(figsize = (14,8)) 

# add city boundary
oakland_gdf.plot(ax=ax, color="grey", alpha=0.6) 

# Display the output of our spatial join
tract_permit_counts_gdf.plot(ax=ax,column='units_permit', 
                             scheme="quantiles", 
                             cmap="YlGnBu",
                             edgecolor="grey",
                             legend=True, )

ax.set_title("Count of Permitted Units in Oakland by Census Tract")
ax.set_axis_off() 
plt.show()

Now that we have are permit data aggregated to census tract, let's see how we can explore the relationship between the ACS data and the permit data.

For example, let's see if there is any spatial relationship between the count of permitted units and the percent homeowners (`p_owners`) in the census tract.

First, let's create a point dataset of our census tracts.

tracts_acs_gdf_point = gpd.GeoDataFrame(tracts_acs_gdf.loc[:,tracts_acs_gdf.columns!='geometry'], 
                            geometry=tracts_acs_gdf.centroid)

Now map the census tract points on top of our tract polygons symbolized by our variables of interest.

fig, ax = plt.subplots(figsize = (14,8)) 

# add city boundary
oakland_gdf.plot(ax=ax, color="grey", alpha=0.6) 

# Display the output of our spatial join
tract_permit_counts_gdf.plot(ax=ax,column='units_permit', 
                             scheme="quantiles", 
                             cmap="YlGnBu",
                             edgecolor="grey",
                             legend=True, )

# Display percent home owners
tracts_acs_gdf_point.plot(ax=ax,column='p_owners', 
                             cmap="hot",
                             edgecolor="grey",
                             markersize=60,
                             legend=True, )

ax.set_title("Count of Permitted Units in Oakland by Census Tract")
ax.set_axis_off() 
plt.show()

Well that's not as good as it could be!

The census tract points are for the entire county but our tract polygons, output from `sjoin`, are only in Oakland.

Let's **clip** the census tract points to the boundary of Oakland.

### Clipping GeoDataFrames

Clipping involves cutting out the features (or rows) in one geospatial dataset that spatially intersect the features of a polygon geospatial dataset. It is often called a cookie cutter operation. This is useful if we limit the information to a certain region. For example, if we want the census tracts for the city of Oakland we can clip the census tracts for the state to the boundary of that city.

First, take a look at the Geopandas `clip` function documentation.
- Clip requires both datasets to be in the same CRS. 

# Uncomment to read
#help(gpd.clip)

Clip the census tract points to the boundary of Oakland.

tracts_acs_gdf_point_clipped = gpd.clip(tracts_acs_gdf_point, oakland_gdf).reset_index(drop=True)

Now, let's try that map again.

fig, ax = plt.subplots(figsize = (14,8)) 

# add city boundary
oakland_gdf.plot(ax=ax, color="grey", alpha=0.6) 

# Display the output of our spatial join
tract_permit_counts_gdf.plot(ax=ax,column='units_permit', 
                             scheme="quantiles", 
                             cmap="Blues",
                             edgecolor="grey",
                             legend=True, 
                             legend_kwds={'title':'Permitted Units'})

# Display percent home owners
tracts_acs_gdf_point_clipped.plot(ax=ax,column='p_owners', 
                             cmap="Reds",
                             edgecolor="grey",
                             markersize=60,
                             legend=True, 
                             legend_kwds={'label': 'Proportion of Home Owners'})

ax.set_title("Count of Permitted Units in Oakland by Census Tract")
ax.set_axis_off() 
plt.show()

Now that's better! This map seems to indicate that a larger number of permitted units can be found in areas with lower rates of home ownership.

> `Clip` is a very common geometric data transformation. Check out the optional `Spatial Interpolation notebook` if you want to learn more.

### Any Questions?

### Save your work!
Save the files we created so we can reuse in subsequent notebooks.

# Permit data joined to census tract ACS data
tracts_and_permits_gdf.to_file("../outdata/tracts_and_permits_gdf.json", driver="GeoJSON")

# Tract ACS data joined to Permit date
permits_and_tracts_gdf.to_file("../outdata/permits_and_tracts_gdf.json",driver="GeoJSON")

# City of Oakland boundary file
oakland_gdf.to_file("../outdata/oakland_gdf.json", driver="GeoJSON")