<img src='../../img/anaconda-logo.png' align='left' style="padding:10px">
<br>
*Copyright Continuum 2012-2016 All Rights Reserved.*

# Bokeh Exercise Solution: Maps and Tiles: NYC Crime 

You've recently obtained NYC crime data from <a href="https://nycopendata.socrata.com/" target="_blank">NYC Open Data</a>.  Your job is to load the first 100K records, reproject records, and then visualize the small sample of 2,500 records while considering how browser rendering is affected by glpyh count. 

This exercise will challenge your skills with Python, Pandas, PyProj, and Bokeh, while highlighting issues with over-saturation when plotting many points.

## Table of Contents
* [Bokeh Exercise Solution: Maps and Tiles: NYC Crime](#Bokeh-Exercise-Solution:-Maps-and-Tiles:-NYC-Crime)
	* [Set-Up](#Set-Up)
* [Solutions](#Solutions)
	* [1. Load Data](#1.-Load-Data)
	* [2. Change to Categorical](#2.-Change-to-Categorical)
	* [3. Clean Up](#3.-Clean-Up)
	* [4. Use Projections](#4.-Use-Projections)
	* [5. Reproject coordinates](#5.-Reproject-coordinates)
	* [6. Map Categories to Colors](#6.-Map-Categories-to-Colors)
	* [7. Customize the Figure](#7.-Customize-the-Figure)
	* [8. Sample the Data](#8.-Sample-the-Data)
	* [9. Display Data with Glyphs](#9.-Display-Data-with-Glyphs)
	* [10. Plotting Categories](#10.-Plotting-Categories)
	* [11. Plot by Time Group](#11.-Plot-by-Time-Group)
	* [12. Display points using Bokeh WebGL](#12.-Display-points-using-Bokeh-WebGL)


## Set-Up

Python imports you need include:

In [None]:
import pandas as pd

from bokeh.models import Range1d, ColumnDataSource
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
from bokeh.tile_providers import STAMEN_TONER

from pyproj import transform, Proj

output_notebook()

# Solutions

## 1. Load Data

Load the first 100K rows from the file `../../data/Datashader/nyc_crime.csv` into a `Pandas.DataFrame` with the variable name `df`.

Tips: 
- pandas is great for loading CSV data
- `usecols` can help in only loading data which you need.  In this case, load the `Offense`, `XCoordinate`, `YCoordinate`, `Occurrence Hour`, and `Location 1` columns.
- `chunksize` is another helpful feature in pandas to limit the number of rows you load into memory.

In [None]:
chunks = pd.read_csv('../../data/Datashader/nyc_crime.csv',
                     usecols=['Offense','XCoordinate','YCoordinate', 'Location 1', 'Occurrence Hour'],
                     chunksize=1e5)
df = chunks.get_chunk()
print(df)

## 2. Change to Categorical

Change the `Offense` column to be a Categorical Column and check how changing this column type affects `memory_usage`
- use `.astype` to define `Offense` as a categorical field
- use `pd.cut` to classify `Occurrence Hour` into time_of_day categories
- check `memory_usage` to see how categorical field affect memory usage.

In [None]:
print('Without Categoricals')
print(df.memory_usage())

df['Offense'] = df['Offense'].astype('category')
df['time_of_day'] = pd.cut(df['Occurrence Hour'], 4, labels=["early_morning", "morning", "afternoon", "night"])

print('\nWith Categoricals')
print(df.memory_usage())

## 3. Clean Up

Clean up the `Location 1` field and Create two new columns named `lat` (latitude) and `lon` (longitude)

In [None]:
df['lat'], df['lon'] = zip(*df['Location 1'].str.replace('[()]','').str.split(','))
print(df)

## 4. Use Projections

Define input ("EPSG:4326") and output("EPSG:3857") projections using EPSG codes

In [None]:
input_proj  = Proj(init="EPSG:4326")
output_proj = Proj(init="EPSG:3857")

## 5. Reproject coordinates

Reproject coordinates
- use pyproj.transform
- loops are slow, vectorize x/y values
- get projected extent for NYC (min_lon=-74.15, max_lon=-73.75, min_lat=40.68, max_lat=40.84)

In [None]:
df['x'], df['y'] = transform(input_proj, output_proj, df.lon.values, df.lat.values)
extent_xs, extent_ys = transform(input_proj, output_proj, [-74.15, -73.75],[40.68, 40.84])

## 6. Map Categories to Colors

Create a dictionary called `cat_colors` which maps `Offense` Categories to colors

In [None]:
categs = ['FELONY ASSAULT', 'ROBBERY', 'RAPE']
colors = ['aqua', 'lime', 'purple']
cat_colors = dict(zip(categs, colors))
print(cat_colors)

## 7. Customize the Figure

Create a function named `create_figure()` which:
- accepts `use_webgl` parameter with default=False
- returns a Bokeh `Figure` with background_fill_color set to `black`
- x_range/y_range set to the extent of NYC (e.g. xmin=-74.15, xmax=-73.75, y, df.y.max()).
- set figure `background_fill_color` to `black`
- set figure `grid.grid_line_alpha` to 0
- set figure `axis.visible` to False

*Note: for certain glyph types, Bokeh supports WebGL*

In [None]:
def create_figure(webgl=False):
    fig = figure(plot_width=900,
                 plot_height=700,
                 x_range=extent_xs,
                 y_range=extent_ys,
                 background_fill_color='black',
                 webgl=webgl)
    fig.grid.grid_line_alpha = 0
    # fig.grid.visible = False
    fig.axis.visible = False
    return fig

## 8. Sample the Data

Create a new dataframe call `smaller_df` which is a `sample` of 2500 incidents

In [None]:
smaller_df = df.sample(n=2500)

## 9. Display Data with Glyphs

Using the `create_figure` function you above, generate a plot which displays the contents of `smaller_df` as `lime`-colored circle glyphs of size four.  
- use a `ColumnDataSource` to wrap the `smaller_df` and populate glyphs as cirle(source=your_column_datasource)
- add STAMEN_TONER tiles

**Note:** to do this will require creating a `ColumnDataSource`.

In [None]:
data_source = ColumnDataSource(smaller_df)
fig = create_figure()
fig.add_tile(STAMEN_TONER, alpha=.3)
fig.circle(x='x', y='y', color='lime', source=data_source, alpha=.5, size=4)
show(fig)

## 10. Plotting Categories

Create a plot which adds a circle glyph layer for each offense category in the `smaller_df` dataframe
 - use a different color for each category corresponding to those in the `cat_colors` variable
 - add a legend using the `legend` property of circle glyph

**Note:** to do this will require creating a `ColumnDataSource`.

In [None]:
fig = create_figure()
fig.add_tile(STAMEN_TONER, alpha=.3)
for cat, color in cat_colors.items():
    fig.circle(x='x',
               y='y',
               color=color,
               source=ColumnDataSource(smaller_df[smaller_df.Offense == cat]),
               alpha=.5,
               size=4,
               legend=cat)
show(fig)

## 11. Plot by Time Group

Create a plot which displays offense by time of day grouped by `early_morning`, `morning`, `afternoon`, `evening`
 - instead of adding a different
 - consider adding an `alpha` property to the fig.circle()

In [None]:
df['colors'] = df.time_of_day.copy()
df.colors.cat.categories = ['aqua', 'greenyellow', 'orange', 'hotpink']
smaller_df = df.sample(n=2500)

In [None]:
fig = create_figure()
fig.add_tile(STAMEN_TONER, alpha=.3)
fig.circle(x='x',
           y='y',
           color='colors',
           source=ColumnDataSource(smaller_df),
           alpha=1,
           line_color=None,
           size=4)
show(fig)

## 12. Display points using Bokeh WebGL

In [None]:
fig = create_figure(webgl=True)
fig.circle(x='x',
           y='y',
           color='colors',
           source=ColumnDataSource(df),
           alpha=.2,
           line_color=None,
           size=4)
show(fig)

---
*Copyright Continuum 2012-2016 All Rights Reserved.*