# Project 11: Working with Geocoded Data

## Building Maps in geopandas

In this lesson we will download COVID-19 data from data.world. We will normalize the data to compare spread between counties. Were we to simply plot the total number of cases or deaths by county, the results would be biased as counties with larger populations would likely have more cases and more deaths. We will observe how the spread developed across the country, starting in the northeast, eventually making its way to other regions.

### Installing geopandas

Although there is a geopandas installation available using the conda install command in you command line shell, that package is incomplete for our purposes. We will need to install dependencies - in this order: GDAL,Fiona, and Shapely - for geopandas before installing geopandas. I have included the .whl files for each of these packages in the same folder is this notebook. Download the files and save them to your local folder. To install, use the command:

pip install filename

Finally, install geopandas:

pip install geopandas

In [1]:
!pip install datadotworld

Collecting datadotworld
  Downloading datadotworld-1.8.5-py2.py3-none-any.whl (423 kB)
Collecting tableschema<2.0a,>=1.5.2
  Downloading tableschema-1.20.2-py2.py3-none-any.whl (68 kB)
Collecting datapackage<2.0a,>=1.6.2
  Downloading datapackage-1.15.2-py2.py3-none-any.whl (85 kB)
Collecting tabulator>=1.22.0
  Downloading tabulator-1.53.5-py2.py3-none-any.whl (72 kB)
Collecting configparser<4.0a,>=3.5.0
  Downloading configparser-3.8.1-py2.py3-none-any.whl (22 kB)
Collecting jsonpointer>=1.10
  Downloading jsonpointer-2.3-py2.py3-none-any.whl (7.8 kB)
Collecting isodate>=0.5.4
  Downloading isodate-0.6.1-py2.py3-none-any.whl (41 kB)
Collecting rfc3986>=1.1.0
  Downloading rfc3986-2.0.0-py2.py3-none-any.whl (31 kB)
Collecting jsonlines>=1.1
  Downloading jsonlines-3.1.0-py3-none-any.whl (8.6 kB)
Collecting linear-tsv>=1.0
  Downloading linear-tsv-1.1.0.tar.gz (9.6 kB)
Collecting ijson>=3.0.3
  Downloading ijson-3.2.0.post0-cp39-cp39-win_amd64.whl (48 kB)
Collecting boto3>=1.9
  Downlo

In [2]:
import pandas as pd
import numpy as np
import geopandas
import matplotlib.pyplot as plt
from matplotlib import cm
import datetime

ModuleNotFoundError: No module named 'geopandas'

In [None]:
#def import_geo_data(filename, index_col = "Date", FIPS_name = "FIPS"):
plt.rcParams.update({'font.size': 32})
filename = "countiesWithStatesAndPopulation.shp"
index_col = "FIPS"
map_data = geopandas.read_file(filename = filename).set_index(["State", 
                                                               "NAME"])
states = ["North Dakota", 
          "South Dakota", 
          "Minnesota"]
map_plot_data = map_data.loc[states]
state_df = map_plot_data.dissolve(by=["State"], aggfunc = "median")

fig, ax = plt.subplots(figsize = (30,15))
map_plot_data.plot(column = "Population", 
                   cmap = "viridis",
                   alpha = 1, 
                   edgecolor = "k",
                  ax = ax)
state_df.plot(color = "None", 
                  alpha = 1,
                  edgecolor = "k",
                  linewidth = 8,
                          ax = ax)

In [None]:
def import_geo_data(filename, FIPS_name = "FIPS"):
    map_data = geopandas.read_file(filename = filename).rename(
        columns = {"State":"state"})
    map_data[FIPS_name] = map_data["STATEFP"].astype(str) +\
        map_data["COUNTYFP"].astype(str)
    map_data[FIPS_name] = map_data[FIPS_name].astype(np.int64)
    map_data.set_index(FIPS_name, inplace = True)
    
    return map_data
map_data = import_geo_data(filename = filename, FIPS_name = index_col)
map_data

In [None]:
# plot only counties not in Hawaii or Alaska, CONTINENTAL USA
map_data[~map_data["state"].isin(
    ["Hawaii", "Alaska"])].plot(column = "Population")

In [None]:
u_data = pd.read_csv("countyUnemploymentData.csv",
                    encoding = "latin1",
                    parse_dates = True,
                    index_col = ["date", "fips_code"])
u_data = u_data[list(u_data.keys())[-4:]]
u_data.dtypes

In [None]:
for key in u_data.keys():
    u_data[key] = pd.to_numeric(u_data[key], errors = "coerce")
u_data.dtypes

In [None]:
import copy as copy
def create_merged_geo_dataframe(data, map_data):
    data_frame_initialized = False
    matching_gpd = {}
    counties = data.groupby("fips_code").mean().index.unique()
    dates = data.groupby("date").mean().index.unique()
    for key, val in data.items():
        matching_gpd[key] = copy.copy(
            map_data[map_data.index.isin(counties)])
        for date in dates:
            val_slice = val.loc[date]
            val_slice.reset_index().set_index("fips_code")
            matching_gpd[key][date] = val_slice
    return matching_gpd
dates = u_data.groupby("date").mean().index.unique()
u_data = create_merged_geo_dataframe(u_data, map_data)

In [None]:
u_data["Unemployment Rate"]

In [None]:
# matplotlib will give us warning because we are setting the value a slice
import warnings
warnings.filterwarnings("ignore")
# Normalize Unemployment Feb-20 == 1
key = "Unemployment Rate"
new_key = "Normalized " + key + " (Feb 2020)"
# df.copy() makes a copy of the dataframe
u_data[new_key] = u_data[key].copy()
# take the difference between the observed rate and the Feb rate
for date in dates:
    u_data[new_key][date] = u_data[key][date].sub(
        u_data[key][datetime.datetime(2020,2,1)])

In [None]:
u_data[new_key]

In [None]:
from matplotlib import cm
from mpl_toolkits.axes_grid1 import make_axes_locatable

key = "Unemployment Rate"
plot_data = u_data[key].copy()
plot_data = plot_data[~plot_data["state"].isin(["Hawaii", "Alaska"])]
state_df = plot_data.dissolve(by=["state"], aggfunc = "median")
#for date in dates:
fig, ax = plt.subplots(figsize = (40,20))
    # dissolve performs groupby operation and aggregates geoids to the
    #  level grouped by
    
vmin = -20
vmax = 20
    # choose color bar format (which colors? how many divisions?)
cmap = cm.get_cmap("Reds", 15)
    # choose range of color bar values
norm = cm.colors.Normalize(vmin = vmin, vmax = vmax)
sm = cm.ScalarMappable(cmap = cmap, norm = norm)
    # prepare space for colorbar on fig
divider = make_axes_locatable(ax)
size = "5%"
cax = divider.append_axes("right", 
                            size = size, 
                            pad = .1)
    # add colorbar to space in fig
cbar = fig.colorbar(sm, cax = cax, cmap = cmap)
cbar.ax.tick_params(labelsize = 18)
vals = list(cbar.ax.get_yticks())
    # append max values from plot_df[dates] to vals for cbar
vals.append(plot_data[dates].max().max())
cbar.ax.set_yticklabels(vals)
cbar.ax.set_ylabel(key, fontsize = 20)
plot_data.plot(ax = ax, 
        cax = ax,
        column = date,
        cmap = cmap, legend = False,
        linewidth = .1, edgecolor = "white",
        norm = norm)
    
state_df.plot(color = "None", 
                alpha = 1,
                edgecolor = "k",
                linewidth = 5,
                ax = ax)
ax.set_title(str(date)[:10] + "\n" + key)

In [None]:
key = new_key
plot_data = u_data[key].copy()
plot_data = plot_data[~plot_data["state"].isin(["Hawaii", "Alaska"])]
state_df = plot_data.dissolve(by=["state"], aggfunc = "median")
#for date in dates:
fig, ax = plt.subplots(figsize = (40,20))
    # dissolve performs groupby operation and aggregates geoids to the
    #  level grouped by
    
vmin = -20
vmax = 20
    # choose color bar format (which colors? how many divisions?)
cmap = cm.get_cmap("coolwarm", 15)
    # choose range of color bar values
norm = cm.colors.Normalize(vmin = vmin, vmax = vmax)
sm = cm.ScalarMappable(cmap = cmap, norm = norm)
    # prepare space for colorbar on fig
divider = make_axes_locatable(ax)
size = "5%"
cax = divider.append_axes("right", 
                            size = size, 
                            pad = .1)
    # add colorbar to space in fig
cbar = fig.colorbar(sm, cax = cax, cmap = cmap)
cbar.ax.tick_params(labelsize = 18)
vals = list(cbar.ax.get_yticks())
    # append max values from plot_df[dates] to vals for cbar
vals.append(plot_data[dates].max().max())
cbar.ax.set_yticklabels(vals)
cbar.ax.set_ylabel(key, fontsize = 20)
plot_data.plot(ax = ax, 
        cax = ax,
        column = date,
        cmap = cmap, legend = False,
        linewidth = .5, edgecolor = "white",
        norm = norm)
    
state_df.plot(color = "None", 
                alpha = 1,
                edgecolor = "k",
                linewidth = 5,
                ax = ax)
ax.set_title(str(date)[:10] + "\n" + key)

## Create Interactive Map with Plotly

In [None]:
import plotly.express as px
key = "Unemployment Rate"
plot_df = u_data[key]
plot_df = plot_df.to_crs(epsg=4326).rename(
    # transform dates to str because plotly will throw error
    # if datetime format passed as key
    columns = {date:str(date)[:10]for date in plot_df[dates].keys()})
cname = str(dates[-1])[:10]
plot_df[cname] = plot_df[cname].round(2)
hover_name = "NAME"
fig = px.choropleth_mapbox(plot_df.reset_index(),
                          geojson = plot_df,#.reset_index(),
                          locations = "FIPS",
                          hover_name = hover_name,
                           hover_data = [cname],
                          color = cname,
                           color_continuous_scale = "ylgnbu",
                          center = {"lat":plot_df["geometry"].centroid.y.mean(),
                                    "lon":plot_df["geometry"].centroid.x.mean()},
                          zoom = 4,
                          opacity = .6,
                          title = key,
                          mapbox_style = "carto-positron",
                          height = 900)
#fig.show()
fig.write_html(key+".html")

In [None]:
import os
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres')).to_crs(epsg=4326)
world.set_index("iso_a3", inplace = True)
world

In [None]:
EFW = pd.read_csv("fraserDataWithRGDPPC.csv",
                  index_col = ["ISO_Code_3","Year"],
                  parse_dates = True).rename(columns = {"Summary":"EFW"})
EFW_keys = list(EFW.keys())[-7:]
EFW
#EFW_keys = ["EFW",
 #          "Size of Government",
  #         "Legal System and Property Rights",
   #        "Sound Money",
    #       "Freedom to Trade Internationally",
     #      "Regulation",
      #     "RGDP Per Capita"]

In [None]:
# Creating a dataframe that is only 2018
EFW_2018 = EFW[EFW.index.get_level_values(
    "Year") == "2018"].reset_index().set_index("ISO_Code_3")
for key in EFW_keys:
    world[key + " 2018"] = EFW_2018[key]
world

In [None]:
cities = geopandas.read_file(geopandas.datasets.get_path('naturalearth_cities')).to_crs(epsg=4326)
cities

In [None]:
# Only show cities that appear to be in a country
cities["Country"] = ""
#cities["Country"].iloc[0] = "ITA"
cities

In [None]:
for ix in cities.index:
    try:
        # save the name of the country if the city falls within its boundaries
        cities.loc[ix, "Country"] = world[world["geometry"].contains(cities.loc[ix]["geometry"])].index[0]
    except:
        continue

In [None]:
cities[cities["Country"] != ""]

In [None]:
import matplotlib.patheffects as pe
for key in EFW_keys:
    column = key + " 2018"
    fig, ax = plt.subplots(figsize = (40,30))
    cmap = cm.get_cmap("Blues", 20)
    norm = cm.colors.Normalize(vmin = world[column].min(),
                               vmax = world[column].max())
    sm = cm.ScalarMappable(cmap = cmap,
                          norm = norm)
    # prepare space for colorbar
    divider = make_axes_locatable(ax)
    size = "5%"
    cax = divider.append_axes("right",
                             size = size,
                             pad = .1)
    cbar = fig.colorbar(sm, cax = cax, cmap = cmap)
    cbar.ax.tick_params(labelsize = 24)
    vals = list(cbar.ax.get_yticks())
    vals.append(vmax)
    cbar.ax.set_yticklabels(vals)
    
    
    world.plot(color = "k", alpha = .25, ax = ax)
    world.plot(column = column,
              cmap = cmap,
               linewidth = 1,
               edgecolor = "k",
              ax = ax,
              alpha = .95)
    cities[cities["Country"]!= ""].plot(color = "C3",
                                       markersize = 35,
                                       ax = ax)
    
    area = world.area
    #add country label only if greater than or equal to size of Ireland
    for ix in world.index:
        if area[ix] >= area["IRL"]:
            centroid = world.loc[ix]["geometry"].representative_point()
            x,y = centroid.x, centroid.y 
            ax.text(x, y, ix, va = "center", 
                    ha = "center", fontsize = 10,
                   path_effects = [pe.withStroke(linewidth = .9, 
                                                 foreground = "white")])
            
    ax.set_title(column)