# Geopandas and Choropleth Charts

As always, first import the required libraries.

If any of these import commands complain, you'll need to open a `cmd` window (or terminal) and `pip install whatsmissing` (or `python3 -m pip install whatsmissing`)

In [None]:
import matplotlib.pyplot as plt  # these we've seen before
import pandas as pd
import seaborn as sns

import geopandas as gpd          # this is the point: pandas with geos!

import descartes                 # these two won't be used explicitly,
import mapclassify               # but will be used in the background

## Geopandas data
Geopandas is a layer on top of pandas, that can handle geographic polygons (or points) and make maps with them. Geopandas has a couple datasets built-in:

In [None]:
gpd.datasets.available

`naturalearth_lowres` contains the outlines of 177 countries around the world, as well a few more useful columns. Once we read it in, the result is a pandas DataFrame like we've seen before, except there's a `geometry` column which adds special capabilities.

In [None]:
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world.head(20)

In [None]:
world['continent'].value_counts()

In [None]:
world['gdp_md_est'].describe()

## Drawing as a map

A geopandas dataframe has a .plot() function, which simply works:

In [None]:
world.plot()

## Filtering rows
This works the same way as we saw before with regular pandas DataFrames.

In [None]:
asia = world[ world['continent'] == 'Asia' ]
asia.plot()

In [None]:
noam = world[ world['continent'] == 'North America']
noam.plot()

In [None]:
sixc = world[ world['continent'] != 'Antarctica' ]    # != means 'not-equal'
sixc.plot()

In [None]:
# Exercise: create a filtered DataFrame the "continent" of 'Seven seas (open ocean)' 
# What is it?


In [None]:
# Exercise: create a filtered DataFrame containing any 1 country of your choosing 
# (filter using the 'name' or 'iso_a3' column instead of 'continent')


In [None]:
plt.figure(figsize=(18,10))
axes=plt.gca()
sixc.plot(ax=axes, color='lightgrey')
asia.plot(ax=axes, color='green')
noam.plot(ax=axes, color='purple')
axes.set_xticks([])
axes.set_yticks([])
for s in axes.spines.values(): s.set_visible(False) # one-liner for turning off all 4 spines
# don't indent after that line!
plt.show()

## Exercise 1: Filter and Color
* Modify the code cells above to create filtered DataFrames as instructed
* Color the 1 country in 'Seven seas (open ocean)' red -- where is it?
* Color your chosen country blue

## Mapping U.S. States
There are datasets out there suitable for Geopandas for all kinds of countries, regions, and subdivisions. A very good collection [can be found here](https://github.com/deldersveld/topojson). If you need to work with any of those be sure to click to the 'Raw' view, then Save As...

This file `us-albers.json` came from that repository. It has the U.S. states, with Alaska/Hawaii scaled/shifted as customary to make a more compact map.

Note this has 51 'states' in it -- why?

In [None]:
states = gpd.read_file('us-albers.json')
states.info()

In [None]:
states.head()

In [None]:
states.plot()

In [None]:
sns.catplot(x='iso_3166_2', y='census', kind='bar', data=states)

# Exercise 2: Bar Plot Options
One at a time change/add options to `sns.catplot()` above, and see what happens
* `data=states.sort_values('census')`
* `data=states.sort_values('census', ascending=False)`
* `color='b'`
* `aspect=3`

## Choropleth Maps
The world 'choropleth' comes from Greek χῶρος (choros 'area/region') and (πλῆθος plethos 'multitude'). The main purpose (Greek τέλος) of Geopandas is choropleth maps. You just tell `plot()` what column you are interested in, and Geopandas will color each shape accordingly, using a color scheme/map based on the range of values it finds.

In [None]:
plt.figure()
axes=plt.gca()
states.plot(column='census', ax=axes) # column 'census' is the population of each state
plt.show()

# Exercise 2: Choropleth Options
One at a time, add/change options to `states.plot()` above, and see what happens:
* `cmap='Blues'` (or Reds, Greens,... or OrRd, YlGnBu, etc, see [Matplotlib colormaps](https://matplotlib.org/tutorials/colors/colormaps.html))
* `edgecolor='k'`
* `legend=True`
* `legend_kwds={'orientation':'horizontal'}`
* `scheme='quantiles'`
* `legend_kwds={'loc':'lower left'}`
* `legend_kwds={'loc':'lower left', 'bbox_to_anchor':(1,0)}` 

**Note** what happened there (if you did it right) is first the choropleth used a smooth, continuous range of colors from whatever cmap was chosen. `scheme='quantiles'` switched it to 5 discrete colors from that cmap, and it totally changed the type of legend.

The first use of `'lower left'` referred to where within the whole plot to put the legend. When `'bbox_to_anchor'` was added, the meaning of `'lower left'` changed to which corner of the legend to anchor. And (1,0) means 100% of the way to the right of the plot, and 0% of the way up the plot -- the coordinates are not related to the coordinates being plotted.

## Combining DataFrames

Sometimes data you need to relate might be in different files. This example shows how to add more data about all the states from a separate csv file.

In [None]:
ev2016 = pd.read_csv('2016ev.csv')
ev2016.head()

**Note** in our `states` DataFrame, the column with state names is `name`. In this new DataFrame, the column is named `State`. It is important that the values in the Series are spelled and punctuated and capitalized *exactly* the same, or that part of the data won't merge.

After the merge, use `info()` and `head()` to verify that everything merged successfully -- same number of rows as before, and matched up properly.

In [None]:
all = pd.merge(states, ev2016, left_on='name', right_on='State')
all.info()

In [None]:
all.head() # scroll to the right to see the new columns

**As seen above,** choropleths color regions based on values in a numerical column. However, what if the data is categorical? 

This shows an example of creating a new column with colors for categorical data, and having Geopandas map with those colors.

Above we filtered out Asia and North America, and used the Geopandas `plot()` keyword `color` to draw each sub-DataFrame with a single color. We could do that here too, filter 'Winning Party'=='Republicans' or 'Democrats' and used two plot() statements, but if the number of categories gets larger, that gets awkward. So here's another way: we create a new column full of color names for Geopandas to use:

In [None]:
# 'red' and 'blue' is a little intense
all['party_color'] = all['Winning Party'].map({'Republicans':'pink', 'Democrats':'lightblue'})
all.head()
# scroll right to see new column 'party_color'

In [None]:
all['party_color'].value_counts()

In [None]:
all.plot(color=all['party_color'], edgecolor='gray')

## Exercise
Using capabilities examined in the chloropleth exercise above make an excellent chloropleth visualization of `ev_per_million`

In [None]:
# Here's the new column: electoral votes per million people: 
all['ev_per_million'] = all['Votes'] / all['census'] * 1000000