<img src="http://www.lineagelogistics.com/themes/custom/particle/dist/assets/lineage_logo.svg" alt="Lineage" width="400" align="left">

# Overview 

## Objective
To demonstrate `vishelper.plot_map()`

## Imports and setup

In [1]:
# must go first
%matplotlib inline
%config InlineBackend.figure_format='retina'

# Reloads functions each time so you can edit a script
# and not need to restart the kernel
%load_ext autoreload
%autoreload 2

import palettable
import folium

# basic wrangling
import numpy as np
import pandas as pd

import warnings
warnings.filterwarnings('ignore')

In [2]:
import vishelper as vh

# Data

Creating a fake dataset with random number of loads for each state

In [8]:
df = pd.DataFrame()

df['destination_state'] = list(vh.state_map.keys()) # state_map is a dict of state abbreviations to state names

df['num_loads'] = np.random.randint(0, 10000, len(df))

In [10]:
df.head(10)

Unnamed: 0,destination_state,num_loads
0,AK,1768
1,AL,5350
2,AR,8722
3,AS,5312
4,AZ,7103
5,CA,5920
6,CO,1972
7,CT,8839
8,DC,5101
9,DE,4875


# Analysis

`vishelper.plot_map()` requires state names, not abbreviations. We can convert to state names via `vishelper.state_map()`

In [11]:
df = vh.to_state_name(df, 'destination_state', 'state_name')

In [12]:
df.head()

Unnamed: 0,destination_state,num_loads,state_name
0,AK,1768,Alaska
1,AL,5350,Alabama
2,AR,8722,Arkansas
3,AS,5312,American Samoa
4,AZ,7103,Arizona


# Plot choropleth

In [13]:
print(vh.plot_map.__doc__)

Creates a choropleth map based on a dataframe. Colors regions in a map based on provided data.

    Must provide geo_type *or* geo_data and key_on.

    `geo_type` options:
        * 'us_states': will join to a column containing state names. See
            `to_state_name()` to convert state abbreviations to full state names


    Args:
        df: Dataframe containing `color_column` and `geo_column`. If None,
            all regions will be colored a single color.
        color_column (`str`): Name of column in dataframe containing data to color map
            according to. If None, df must be None as well. All regions will be colored
            a single color.
        geo_column (`str`): Name of column in dataframe containing geographic key that
            can join to `key_on` in geoJson. If None, df must be None as well. All regions
            will be colored a single color.
        geo_type (`str`): If provided, maps to a geo_data and key_on for a given geography
        type. 

## Without data

In [14]:
fmap = vh.plot_map(None, geo_type='us_states')

In [15]:
fmap

In [16]:
fmap = vh.plot_map(None, geo_type='us_states', fill_color='pink')

In [17]:
fmap

## Using provided colors

Colors:
* `vh.formatting['color.darks']`: ['#0067a0', '#53565a', '#009681','#87189d', '#c964cf']
* `vh.formatting['color.mediums']`: ['#0085ca', '#888b8d', '#00c389', '#f4364c', '#e56db1']
* `vh.formatting['color.lights']`: ['#00aec7', '#b1b3b3', '#2cd5c4', '#ff671f', '#ff9e1b']
* `vh.formatting['color.greens']`: ['#43b02a', '#78be20', '#97d700']

In [20]:
fmap = vh.plot_map(None, geo_type='us_states', fill_color=vh.formatting['color.mediums'][2])

In [21]:
fmap

## With data

In [22]:
fmap = vh.plot_map(df,
                   color_column='num_loads',
                   geo_column='state_name',
                   geo_type='us_states',
                   legend_name='Number of loads')

The following geos were not in the provided geojson and will not be plotted: American Samoa,Guam,Northern Mariana Islands,National,Virgin Islands


In [15]:
fmap

In [23]:
fmap = vh.plot_map(df,
                   color_column='num_loads',
                   geo_column='state_name',
                   fill_color='RdPu',
                   geo_type='us_states',
                   legend_name='Number of loads')

The following geos were not in the provided geojson and will not be plotted: American Samoa,Guam,Northern Mariana Islands,National,Virgin Islands


In [24]:
fmap

`vishelper.save_map()` will by default save the map as an html file. If provided with the keyword argument, `png=True`, then a screenshot will be saved to a `png` file. This functionality is useful if you wish to produce images that are the same size/scale/orientatation (e.g. when producing slides in a presentation showing different data on same map). 

Saving as a png requires the package `selenium`. If you do not have or use Safari, you will want to provide a `selenium.webdriver` class, such as `selenium.webdriver.Firefox()` or `selenium.webdriver.Chrome()`. These will require downloading the required webdriver. See `selenium` [documentation](https://selenium-python.readthedocs.io/index.html) for more information. 


In [None]:
vh.save_map(fmap, htmlpath='map-example.html', png=True)

## Changing the color scale

In [26]:
fmap = vh.plot_map(df,
                   color_column='num_loads',
                   geo_column='state_name',
                   fill_color='RdPu',
                   geo_type='us_states',
                   legend_name='Number of loads',
                  threshold_scale=[0, 2000, 4000, 6000, 8000, 10000])

The following geos were not in the provided geojson and will not be plotted: American Samoa,Guam,Northern Mariana Islands,National,Virgin Islands


In [27]:
fmap

## Zipcode

GeoJSON files containing zip codes for each state can be found [here](https://github.com/OpenDataDE/State-zip-code-GeoJSON) and fed directly into `geo_data`. Please note: these files are not completely up to date. We are in the process of finding better geoJSON files. 

`filter_geos=True` will filter out all geoJSON entries for regions not included in the dataframe, `df`. 

In [28]:
dfzip = pd.DataFrame()
dfzip['destination_zip5'] = [
    '90010', '90012', '90038', '90039', '90042', '90048', '90049', '90058',
    '90063', '90069', '90220', '90232', '90250', '90266', '90292', '90403',
    '90405', '90501', '90505', '90630', '90631', '90640', '90650', '90670',
    '90712', '90713', '90745', '90813', '91105', '91203', '91301', '91311',
    '91361', '91362', '91367', '91401', '91411', '91436', '91506', '91702',
    '91708', '91710', '91732', '91748', '91752', '91761', '91773', '91803',
    '91950', '92009', '92029', '92037', '92064', '92066', '92093', '92101',
    '92103', '92117', '92121', '92123', '92124', '92127', '92130', '92158',
    '92324', '92335', '92374', '92376', '92377', '92401', '92407', '92408',
    '92410', '92507', '92508', '92518', '92553', '92606', '92618', '92630',
    '92647', '92656', '92675', '92677', '92706', '92708', '92782', '92802',
    '92806', '92807', '92821', '92831', '92880', '93021', '93030', '93060',
    '93065', '93245', '93446', '93722', '94005', '94110', '94534', '94536',
    '94551', '94558', '94560', '94577', '94588', '94607', '94806', '94931',
    '94949', '94952', '95014', '95020', '95076', '95206', '95215', '95304',
    '95330', '95341', '95354', '95358', '95376', '95377', '95560', '95605',
    '95618', '95620', '95678', '95691', '95695', '95765', '95811', '95825',
    '95828', '95834', '96003', '96162', '00000'
]  # Zips not in California are included to demonstrate function behavior

In [29]:
dfzip['num_loads'] = np.random.randint(0, 10000, len(dfzip))

In [30]:
dfzip.tail()

Unnamed: 0,destination_zip5,num_loads
136,95828,6251
137,95834,5989
138,96003,3847
139,96162,588
140,0,2860


In [31]:
fmap = vh.plot_map(df=dfzip, 
                   color_column='num_loads',
                   geo_column='destination_zip5',
                   key_on='feature.properties.ZCTA5CE10',
                   filter_geos=True,
                   location_start=(37.35, -119),
                   zoom_start=7,
                   geo_data='https://raw.githubusercontent.com/OpenDataDE/State-zip-code-GeoJSON/master/ca_california_zip_codes_geo.min.json')

The following geos were not in the provided geojson and will not be plotted: 92093,92158,96162,00000


In [32]:
fmap

Too much information, may need to open in HTML to see. 

In [33]:
vh.save_map(fmap, htmlpath='map-example-cali-zip.html')

## Edge cases

If the dataframe provided has no data to plot, it should raise an `AssertionError` saying `df` must have at least one row to be able to create map. 

In [34]:
dfzip.head(0)

Unnamed: 0,destination_zip5,num_loads


In [35]:
fmap = vh.plot_map(
    df=dfzip.head(0),
    color_column='num_loads',
    geo_column='destination_zip5',
    key_on='feature.properties.ZCTA5CE10',
    filter_geos=True,
    location_start=(37.35, -119),
    zoom_start=7,
    geo_data=
    'https://raw.githubusercontent.com/OpenDataDE/State-zip-code-GeoJSON/master/ca_california_zip_codes_geo.min.json'
)

AssertionError: Dataframe, df, must have at least one row of data to plot

No rows in the dataframe have geos that can be found in the geoJSON so a `ValueError` should be raised as there is nothing to plot. 

In [36]:
dfzip.tail(2)

Unnamed: 0,destination_zip5,num_loads
139,96162,588
140,0,2860


In [37]:
fmap = vh.plot_map(df=dfzip.tail(2), 
                   color_column='num_loads',
                   geo_column='destination_zip5',
                   key_on='feature.properties.ZCTA5CE10',
                   filter_geos=True,
                   location_start=(37.35, -119),
                   zoom_start=7,
                   geo_data='https://raw.githubusercontent.com/OpenDataDE/State-zip-code-GeoJSON/master/ca_california_zip_codes_geo.min.json')

The following geos were not in the provided geojson and will not be plotted: 96162,00000


ValueError: No rows in the dataframe were found in the geoJSON so there is nothing to plot.

# Appendix

## Watermark 
For full reproducibility of results, use exact data extraction as defined at top of notebook and ensure that the environment is exactly as follows: 

In [38]:
# ! pip install watermark
%load_ext watermark
%watermark -v -m --iversions -g

palettable 3.1.1
json       2.0.9
pandas     1.0.3
numpy      1.20.2
folium     0.9.1
CPython 3.7.3
IPython 7.5.0

compiler   : Clang 4.0.1 (tags/RELEASE_401/final)
system     : Darwin
release    : 20.3.0
machine    : x86_64
processor  : i386
CPU cores  : 12
interpreter: 64bit
Git hash   : b4de068893128c8b66a41da24daf7db95cf18913
