![TEW, Numpy, Pandas & Matplotlib](assets/tew.png)

## 38 Geographic Choropleth Maps in Python Using Plotly & Pandas 

In this tutorial on python for data science, you will learn about how to create geographic maps in python. We will be creating choropleth maps using plotly in python using jupyter notebook.

Plotly maps are...
- Interactive - Still not usre what this means
- Web friendly
- Easily shareable

Two main object types in Plotly mapping...
- Data object - A list object that cantains a dictionary specifying each of the parameters for the map's data
- Layout object - A nested dictionary object that specifies each of the parameters for the map's layout

Two main types of maps...
- **Choropleth map** - A geographical map where spatial areas are coloured in hues that represent the quantity of a given attribute in the specific geographic region
    - Requirements...
        - Use the "code" parameter to select geographical area by name (like "US States")
        - Data Parameter : type = 'choropleth'
- **Points map** - Are geographic maps with dots placed at specific spatial locations that represent data points. Sot size, colour or shape can be varied to add a layer of comparative detail when you look at the map
    - Requirements...
        - Precise x,y position data for each observation 
        - A "marker" parameter (instead of a z parameter)
        - Data parameter: type = 'Scattergeo'

In [1]:
#As always, we first set up our JN with our imports and settings...
import numpy as np
import pandas as pd

import plotly.plotly as py
import plotly.tools as tls
import plotly.graph_objs as go

To actually use ploty, you need to pass in your username and API key from the plotly website...

In [2]:
#First we need to call the ``set_credential`` function from the toolkit#
tls.set_credentials_file(username = 'bredlplotly', api_key = '4jBdfwVO9M6TzpBaTgSb')

In [3]:
#Importing our plotting data...
address = 'datasets/states.csv'
states = pd.read_csv(address, sep = '\t')
states.columns = ['code', 'region', 'pop', 'SATV', 'SATM', 'percent', 'dollars', 'pay']
states.head()

Unnamed: 0,code,region,pop,SATV,SATM,percent,dollars,pay
0,AL,ESC,4041,470,514,8,3.648,27
1,AK,PAC,550,438,476,42,7.887,43
2,AZ,MTN,3665,445,497,25,4.231,30
3,AR,WSC,2351,470,511,6,3.334,23
4,CA,PAC,29760,419,484,45,4.826,39


**NOTE:** Even tho we are creating a map here there is no requirement for lattitude and longitude values...but we will need them for the points map (see below)

## Generating choropleth maps
We're gonna attach a new column to the df called text. This will allow us to see details about the state, from the df, as we hover over them with the mouse...

In [4]:
states['text'] = 'SATv '+states['SATV'].astype(str) + ' ' + 'SATm ' + states['SATM'].astype(str) + '<br>' + \
'State ' + states['code']

data = [dict(type = 'choropleth', autocolorscale = False, locations = states['code'], z = states['dollars'], 
             locationmode = 'USA-states', text = states['text'], colorscale = 'custom-colorscale', 
             colorbar = dict(title = 'Thousand Dollars'))]
data # This is all the data that we are going to plot out

[{'type': 'choropleth', 'autocolorscale': False, 'locations': 0     AL
  1     AK
  2     AZ
  3     AR
  4     CA
  5     CO
  6     CN
  7     DE
  8     DC
  9     FL
  10    GA
  11    HI
  12    ID
  13    IL
  14    IN
  15    IA
  16    KS
  17    KY
  18    LA
  19    ME
  20    MD
  21    MA
  22    MI
  23    MN
  24    MS
  25    MO
  26    MT
  27    NE
  28    NV
  29    NH
  30    NJ
  31    NM
  32    NY
  33    NC
  34    ND
  35    OH
  36    OK
  37    OR
  38    PA
  39    RI
  40    SC
  41    SD
  42    TN
  43    TX
  44    UT
  45    VT
  46    VA
  47    WA
  48    WV
  49    WI
  50    WY
  Name: code, dtype: object, 'z': 0     3.648
  1     7.887
  2     4.231
  3     3.334
  4     4.826
  5     4.809
  6     7.914
  7     6.016
  8     8.210
  9     5.154
  10    4.860
  11    5.008
  12    3.200
  13    5.062
  14    5.051
  15    4.839
  16    5.009
  17    4.390
  18    4.012
  19    5.894
  20    6.184
  21    6.351
  22    5.257
  23    5.260
  24    3.3

``states['text']`` - Creating a new column that will hold the information about the state that will be visible when we hover over that state<br>
``SATv`` - Label for the SATV information that wqill be displayed<br>
``states['SATV']`` - The column that will be used to provide the above information<br>
``astype.(str)`` - We are going to display this information as a string rather than anything else<br>
***
``type = 'chorpleth'`` - This is how we specify that we want to generate a choropleth map<br>
``autocolorscale = False`` - Allows us to be able to define our own colour scale<br>
``locations = states['code']`` - We specify the states'(df) code column because this is where the states abbreviations are<br>
``z = states['dollars']`` - This is the value that is represented by the colour bar in the map<br>
``locationmode = 'USA-states'`` - This tells plotly to generate a USA map and not some other geographic location<br>
``text = states['text']`` - This is the text that is displayed when you hover over a state that we generated in the new column to the df<br>
``colorscale = 'custom-colorscale'`` - This is the colour scale that we are going to use for the map colour bar

In [5]:
# The layout object (nested dictionary)
layout = dict(title = 'State Spending on Public Education ($k/Student)', geo = dict(scope = 'usa', 
              projection = dict(type = 'albers usa'), showlakes = True, lakecolor = 'rgb(65,165,245)',),)
layout # To see what information our layout object has

{'title': 'State Spending on Public Education ($k/Student)',
 'geo': {'scope': 'usa',
  'projection': {'type': 'albers usa'},
  'showlakes': True,
  'lakecolor': 'rgb(65,165,245)'}}

In [6]:
#New dict that contains both the data and layout objects
fig = dict(data = data, layout = layout)
py.iplot(fig, filename = 'First-Choropleth-Map')

## Generating Points Maps
Generating points maps is very similar to generating choropleth maps but some of the parameters are different. So, in an effort to keep things brief, we are only going to go over the things that are different.
As we said above, for points maps you need precise x,y position data for each observation. A list of state abbreviations won't work, so we need a new dataset...

In [7]:
#Importing our plotting data...
address = 'datasets/snow_inventory.csv'
# This is a huge dataset from every weather station in the USA 
snow = pd.read_csv(address, sep = '\t')
snow.columns = ['stn_id', 'lat', 'long', 'elev', 'code']

#...because it is so big, we are going to create a random sample of 200 data points and use this for our point map
snow_sample = snow.sample(n = 200, random_state = 25, axis = 0)
snow_sample.head()

Unnamed: 0,stn_id,lat,long,elev,code
4479,USC00406292,36.4739,-81.8033,736.1,TN
2678,USC00237398,38.6856,-90.5231,137.2,MO
2902,USC00252820,40.0739,-97.1669,411.5,NE
2128,USC00170833,46.4283,-67.8442,128.0,ME
2138,USC00172878,47.2386,-68.6136,185.9,ME


In [8]:
# First we create our data dictionary for our plot...
data = [dict(type = 'scattergeo', locatiomode = 'USA-states', lon = snow_sample['long'], lat = snow_sample['lat'],
            marker = dict(size = 12, autocolorscale = False, colorscale = 'custom-color-scale', color = snow_sample['elev'],
                    colorbar = dict(title = 'Elevation (m)')))]

# Now we create the layout dictionary...
layout = dict(title = 'NOAA Weather Station Elevations', colorbar = True, geo = dict(scope = 'usa', 
                        projection = dict(type = 'albers usa'), showland = True, landcolor = "rgb(250,250,250)", 
                        subunitcolor = "rgb(217,217,217)", countrycolor = "rgb(217,217,217)", countrywidth = 0.5, 
                        subunitwidth = 0.5))

fig = dict(data = data, layout = layout)
py.iplot(fig, validate = False, filename = 'D3-Elevation')

``lon = snow_sample['long']`` - This is the longitude setting and is set to the long column in the snow_sample df<br>
``lat = snao_sample['lat']`` - Likewise for lattitude