<img src="http://www.lineagelogistics.com/themes/custom/particle/dist/assets/lineage_logo.svg" alt="Lineage" width="400" align="left">

# Overview 

## Objective
To demonstrate `vishelper.plot_map()`

## Imports and setup

In [1]:
# must go first
%matplotlib inline
%config InlineBackend.figure_format='retina'

# Reloads functions each time so you can edit a script
# and not need to restart the kernel
%load_ext autoreload
%autoreload 2

import palettable
import folium

# basic wrangling
import numpy as np
import pandas as pd

import warnings
warnings.filterwarnings('ignore')

In [2]:
import vishelper as vh

# Data

Creating a fake dataset with random number of loads for each state

In [3]:
df = pd.DataFrame()

df['destination_state'] = list(vh.state_map.keys()) # state_map is a dict of state abbreviations to state names

df['num_loads'] = np.random.randint(0, 10000, len(df))

In [4]:
df.head(10)

Unnamed: 0,destination_state,num_loads
0,AK,1039
1,AL,7484
2,AR,6014
3,AS,9712
4,AZ,7450
5,CA,5002
6,CO,1415
7,CT,6551
8,DC,7294
9,DE,4861


# Analysis

`vishelper.plot_map()` requires state names, not abbreviations. We can convert to state names via `vishelper.state_map()`

In [5]:
df = vh.to_state_name(df, 'destination_state', 'state_name')

In [6]:
df.head()

Unnamed: 0,destination_state,num_loads,state_name
0,AK,1039,Alaska
1,AL,7484,Alabama
2,AR,6014,Arkansas
3,AS,9712,American Samoa
4,AZ,7450,Arizona


# Plot choropleth

In [7]:
print(vh.plot_map.__doc__)

Creates a choropleth map based on a dataframe. Colors regions in a map based on provided data.

    Must provide geo_type *or* geo_data and key_on.

    `geo_type` options:
        * 'us_states': will join to a column containing state names. See
            `to_state_name()` to convert state abbreviations to full state names
        * 'zip3': will join to a column containing zip3s
        * 'kma': will join to a column containing key market area ids


    Args:
        df: Dataframe containing `color_column` and `geo_column`. If None,
            all regions will be colored a single color.
        color_column (`str`): Name of column in dataframe containing data to color map
            according to. If None, df must be None as well. All regions will be colored
            a single color.
        geo_column (`str`): Name of column in dataframe containing geographic key that
            can join to `key_on` in geoJson. If None, df must be None as well. All regions
            will be co

## Without data

In [8]:
fmap = vh.plot_map(None, geo_type='us_states')

In [9]:
fmap

In [10]:
fmap = vh.plot_map(None, geo_type='us_states', fill_color='pink')

In [11]:
fmap

## Using provided colors

Colors:
* `vh.formatting['color.darks']`: ['#0067a0', '#53565a', '#009681','#87189d', '#c964cf']
* `vh.formatting['color.mediums']`: ['#0085ca', '#888b8d', '#00c389', '#f4364c', '#e56db1']
* `vh.formatting['color.lights']`: ['#00aec7', '#b1b3b3', '#2cd5c4', '#ff671f', '#ff9e1b']
* `vh.formatting['color.greens']`: ['#43b02a', '#78be20', '#97d700']

In [12]:
fmap = vh.plot_map(None, geo_type='us_states', fill_color=vh.formatting['color.mediums'][2])

In [13]:
fmap

## With data

In [14]:
fmap = vh.plot_map(df,
                   color_column='num_loads',
                   geo_column='state_name',
                   geo_type='us_states',
                   legend_name='Number of loads')

The following geos were not in the provided geojson and will not be plotted: American Samoa,Guam,Northern Mariana Islands,National,Virgin Islands


In [15]:
fmap

In [16]:
fmap = vh.plot_map(df,
                   color_column='num_loads',
                   geo_column='state_name',
                   fill_color='RdPu',
                   geo_type='us_states',
                   legend_name='Number of loads')

The following geos were not in the provided geojson and will not be plotted: American Samoa,Guam,Northern Mariana Islands,National,Virgin Islands


In [17]:
fmap

`vishelper.save_map()` will by default save the map as an html file. If provided with the keyword argument, `png=True`, then a screenshot will be saved to a `png` file. This functionality is useful if you wish to produce images that are the same size/scale/orientatation (e.g. when producing slides in a presentation showing different data on same map). 

Saving as a png requires the package `selenium`. If you do not have or use Safari, you will want to provide a `selenium.webdriver` class, such as `selenium.webdriver.Firefox()` or `selenium.webdriver.Chrome()`. These will require downloading the required webdriver. See `selenium` [documentation](https://selenium-python.readthedocs.io/index.html) for more information. 


In [18]:
# Must have `selenium` installed
# vh.save_map(fmap, htmlpath='figures/map-example.html', png=True)

## Changing the color scale

In [19]:
fmap = vh.plot_map(df,
                   color_column='num_loads',
                   geo_column='state_name',
                   fill_color='RdPu',
                   geo_type='us_states',
                   legend_name='Number of loads',
                  threshold_scale=[0, 2000, 4000, 6000, 8000, 10000])

The following geos were not in the provided geojson and will not be plotted: American Samoa,Guam,Northern Mariana Islands,National,Virgin Islands


In [20]:
fmap

## Zipcode

GeoJSON files containing zip codes for each state can be found [here](https://github.com/OpenDataDE/State-zip-code-GeoJSON) and fed directly into `geo_data`. Please note: these files are not completely up to date. We are in the process of finding better geoJSON files. 

`filter_geos=True` will filter out all geoJSON entries for regions not included in the dataframe, `df`. 

In [21]:
dfzip = pd.DataFrame()
dfzip['destination_zip5'] = [
    '90010', '90012', '90038', '90039', '90042', '90048', '90049', '90058',
    '90063', '90069', '90220', '90232', '90250', '90266', '90292', '90403',
    '90405', '90501', '90505', '90630', '90631', '90640', '90650', '90670',
    '90712', '90713', '90745', '90813', '91105', '91203', '91301', '91311',
    '91361', '91362', '91367', '91401', '91411', '91436', '91506', '91702',
    '91708', '91710', '91732', '91748', '91752', '91761', '91773', '91803',
    '91950', '92009', '92029', '92037', '92064', '92066', '92093', '92101',
    '92103', '92117', '92121', '92123', '92124', '92127', '92130', '92158',
    '92324', '92335', '92374', '92376', '92377', '92401', '92407', '92408',
    '92410', '92507', '92508', '92518', '92553', '92606', '92618', '92630',
    '92647', '92656', '92675', '92677', '92706', '92708', '92782', '92802',
    '92806', '92807', '92821', '92831', '92880', '93021', '93030', '93060',
    '93065', '93245', '93446', '93722', '94005', '94110', '94534', '94536',
    '94551', '94558', '94560', '94577', '94588', '94607', '94806', '94931',
    '94949', '94952', '95014', '95020', '95076', '95206', '95215', '95304',
    '95330', '95341', '95354', '95358', '95376', '95377', '95560', '95605',
    '95618', '95620', '95678', '95691', '95695', '95765', '95811', '95825',
    '95828', '95834', '96003', '96162', '00000'
]  # Zips not in California are included to demonstrate function behavior

In [22]:
dfzip['num_loads'] = np.random.randint(0, 10000, len(dfzip))

In [23]:
dfzip.tail()

Unnamed: 0,destination_zip5,num_loads
136,95828,9172
137,95834,6963
138,96003,5211
139,96162,7277
140,0,2225


In [24]:
fmap = vh.plot_map(df=dfzip, 
                   color_column='num_loads',
                   geo_column='destination_zip5',
                   key_on='feature.properties.ZCTA5CE10',
                   filter_geos=True,
                   location_start=(37.35, -119),
                   zoom_start=7,
                   geo_data='https://raw.githubusercontent.com/OpenDataDE/State-zip-code-GeoJSON/master/ca_california_zip_codes_geo.min.json')

The following geos were not in the provided geojson and will not be plotted: 92093,92158,96162,00000


In [25]:
fmap

Too much information, may need to open in HTML to see. 

In [26]:
vh.save_map(fmap, htmlpath='figures/map-example-cali-zip.html')

## Zip3s

In [27]:
dfzip['destination_zip3'] = dfzip['destination_zip5'].str[:3]

In [28]:
dfzip.head()

Unnamed: 0,destination_zip5,num_loads,destination_zip3
0,90010,8270,900
1,90012,933,900
2,90038,3328,900
3,90039,7970,900
4,90042,3296,900


In [29]:
dfzip3 = dfzip.groupby('destination_zip3').num_loads.sum().to_frame().reset_index()

In [30]:
dfzip3.head()

Unnamed: 0,destination_zip3,num_loads
0,0,2225
1,900,52777
2,902,16118
3,904,8461
4,905,2802


In [31]:
fmap = vh.plot_map(df=dfzip3, 
                   color_column='num_loads',
                   geo_column='destination_zip3',
                   geo_type='zip3',
                   filter_geos=True,
                   location_start=(37.35, -119),
                   zoom_start=7)

The following geos were not in the provided geojson and will not be plotted: 000


In [32]:
fmap

## Edge cases / expected errors

If the dataframe provided has no data to plot, it should raise an `AssertionError` saying `df` must have at least one row to be able to create map. 

In [33]:
dfzip.head(0)

Unnamed: 0,destination_zip5,num_loads,destination_zip3


_Commented out so all of notebook runs. Un-comment to see error message_

In [34]:
# fmap = vh.plot_map(
#     df=dfzip.head(0),
#     color_column='num_loads',
#     geo_column='destination_zip5',
#     key_on='feature.properties.ZCTA5CE10',
#     filter_geos=True,
#     location_start=(37.35, -119),
#     zoom_start=7,
#     geo_data=
#     'https://raw.githubusercontent.com/OpenDataDE/State-zip-code-GeoJSON/master/ca_california_zip_codes_geo.min.json'
# )

No rows in the dataframe have geos that can be found in the geoJSON so a `ValueError` should be raised as there is nothing to plot. 

In [35]:
dfzip.tail(2)

Unnamed: 0,destination_zip5,num_loads,destination_zip3
139,96162,7277,961
140,0,2225,0


In [36]:
# fmap = vh.plot_map(df=dfzip.tail(2), 
#                    color_column='num_loads',
#                    geo_column='destination_zip5',
#                    key_on='feature.properties.ZCTA5CE10',
#                    filter_geos=True,
#                    location_start=(37.35, -119),
#                    zoom_start=7,
#                    geo_data='https://raw.githubusercontent.com/OpenDataDE/State-zip-code-GeoJSON/master/ca_california_zip_codes_geo.min.json')

# Appendix

## Watermark 
For full reproducibility of results, use exact data extraction as defined at top of notebook and ensure that the environment is exactly as follows: 

In [37]:
# ! pip install watermark
%load_ext watermark
%watermark -v -m --iversions -g

Python implementation: CPython
Python version       : 3.10.4
IPython version      : 8.2.0

Compiler    : Clang 12.0.1 
OS          : Darwin
Release     : 21.1.0
Machine     : arm64
Processor   : arm
CPU cores   : 10
Architecture: 64bit

Git hash: 2d18bae00e50ac83267cb38ce1d89451ac6692e1

json      : 2.0.9
numpy     : 1.22.3
palettable: 3.3.0
folium    : 0.12.1.post1
vishelper : 0.1.2
pandas    : 1.4.2

