In [None]:
%matplotlib inline

import matplotlib
import numpy as np
import matplotlib.pyplot as plt

import shapely
import geopandas as gpd

import quickplot as qp

# Examples of moving data between geometries in `geopandas`
An important operation that is available when we work with geospatial data, which is unavailable with any old 'ordinary' data is a **spatial join**. This is where we use the spatial relationships between two data sets to associate attributes from one dataset with the geometries of another. 

In hexbinning, we have already seen a specific example of this, where we *count* the point geometries in a dataset dataset contained by the polygons (i.e., hexagons) of another, and associate those counts with the polygons.

This kind of operation can be translated to other kinds of spatial relationship, and makes uses of the **geopandas** [**`sjoin`**](http://geopandas.org/mergingdata.html#spatial-joins), [**`overlay`**](http://geopandas.org/set_operations.html) or [**`merge`**](http://geopandas.org/mergingdata.html) function, depending on the exact situation. We will look at this in the next few cells of this notebook.

Given the introductory nature of this class, we won't delve into any of these in great detail, partly for lack of time, and partly because things get complicated fast.

OK... Let's read in some datasets.

In [None]:
ca = gpd.read_file('ca-counties-LL.shp')
ufos = gpd.read_file('ufos-2014.geojson')
routes = gpd.read_file('routes.shp')

We are only interested here in California (because *California*), so let's use a simple spatial operation to trim the UFOs data down to size.  Make a whole of California polygon by [**dissolve**](http://geopandas.org/aggregation_with_dissolve.html).  Dissolve is a key operation in geospatial analysis, that allows us to combine multiple geometries into a smaller number of geometries, based on shared attributes.  In this case, we want a single state polygon, so we use the **STATE** attributue of the California counties, which will dissolve them all into one larger polygon.

In [None]:
ca_poly = ca.dissolve(by='STATE', as_index=False).geometry[0]
ca_poly

We'll see `dissolve()` again later this week, where we can control how the data associated with each polygon are combined in the new polygon.  For now, we can use this polygon to select only the UFO sitings **within** California.

In [None]:
ufo_ca = ufos[ufos.geometry.within(ca_poly)]
qp.quickplot(ufo_ca)

Next before we can proceed, recalling the importance of projections, we need to check out the coordinate reference systems.

In [None]:
ca.crs, ufo_ca.crs, routes.crs

They are not all the same, so we should make the county and UFO data sets match the projection of the other one, if we are to overlay them successfully (the projection in that case is *California Albers Equal-Area*.

In [None]:
ca = ca.to_crs(routes.crs)
ufo_ca = ufo_ca.to_crs(routes.crs)

Now we've done all that we can make a map of all these layers on top of one another.

In [None]:
fig = plt.figure(figsize=(8,12))
ax = plt.subplot(111)
ax.set_aspect('equal')
qp.quickplot(ca, facecolor='lightgrey', edgecolor='darkgrey', linewidth=0.65)
qp.quickplot(routes, edgecolor='magenta', linewidth=0.5)
qp.quickplot(ufo_ca, color='green')

## Spatial join
First up, imagine we want to count the numbers of UFO sitings in each California county. To do this we want to first perform a **spatial join** between the county and the UFO data. The code for this is simple enough, although there are a variety of options as discussed in the documentation at [**`sjoin`**](http://geopandas.org/mergingdata.html#spatial-joins).

In [None]:
county_ufo = gpd.sjoin(ca, ufo_ca)
county_ufo

The result is a `GeoDataFrame` that has multiple copies of each county (note the NAME and geometry columns), with each row of the table containing the data for both the county *and* the siting, for the county within which the siting took place.  We could actually do this the other way around:

In [None]:
ufo_county = gpd.sjoin(ufo_ca, ca)
ufo_county

This time around we have a record for each UFO siting (note the geometries are POINTs this time) and attached to each siting are the demographic data from the counties in question.  Note how both tables contain 191 rows (because there are 191 sitings).  A similar approach for each of these data tables will get us to our end goal of the number of sitings in each county, if we again use the `dissolve` function, specifying that we want to **sum** any variables that get dissolved together, this will mean that the **cases** column which is set to 1 for all UFO sitings, will get added together to tell us how many sitings occurred in each county.

In [None]:
county_ufo_counts = county_ufo.dissolve(by='NAME', aggfunc='sum', as_index=False)
county_ufo_counts.head()

Notice that `geopandas` knows nothing about the meaning of each column, so it has dumbly summed the demographic variables turning them into nonsense.  Since we only need the counties and the cases count variable, let's throw everything else away.

In [None]:
county_ufo_counts = county_ufo_counts[['NAME', 'cases']]
county_ufo_counts.head()

For the other joined dataset the same procedure will work:

In [None]:
ufo_county_counts = ufo_county.dissolve(by='NAME', aggfunc='sum', as_index=False)[['NAME', 'cases']]
ufo_county_counts.head()

## Merging datasets

If we want to make a map of this, we now need to **merge** these results back into our original counties dataset.  This takes a few more steps, and is annoying fiddly.

In [None]:
## Merge the counts into the county dataset on the NAME variable, retaining only the geometry
## the NAME, the cases, and nPop
ufos_by_county = ca.merge(county_ufo_counts, on='NAME', how='left')[['geometry', 'NAME', 'cases', 'nPop']]
ufos_by_county.head()

In [None]:
## Some counties will have null values (WHY?) which we have to replace with 0's
ufos_by_county = ufos_by_county.fillna(0)
## Calculate a sitings per 10,000 population
ufos_by_county['ufos_pop'] = ufos_by_county.cases / ufos_by_county.nPop * 10000

fig = plt.figure(figsize=(12,9))
ax = plt.subplot(121)
ax.set_aspect('equal')
ax.set_title('UFO sitings')
qp.quickplot(ufos_by_county, column='cases', cmap='Reds', linewidth=0.2, edgecolor='k')

ax = plt.subplot(122)
ax.set_aspect('equal')
ax.set_title('UFO sitings per capita')
qp.quickplot(ufos_by_county, column='ufos_pop', cmap='Reds', linewidth=0.2, edgecolor='k')

## Joining lines and polygons
Let's see what happens if we join polygons and lines.

In [None]:
county_roads = gpd.sjoin(ca, routes)
county_roads.head()

This time around we might want to summarize the total length of roads in each county.  Try adding some cells below and making that dataset...

In [None]:
## WRITE SOME CODE TO DISSOLVE THE county_roads data and determine 
## total length of roads in each county

In [None]:
## ONCE YOU'VE DONE THAT, try merging the results into our 
## ufos_by_county dataset to determine ufo sitings per km of road