## Visualizing Spatial Data with Pandas and Bokeh

[bokeh](http://bokeh.pydata.org/en/latest/) is a relatively new JavaScript visualization language for Python that is modeled after D3 but is intended to be able to handle millions of data points.

>Bokeh is a Python interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of novel graphics in the style of D3.js, and to extend this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications. ([Bokeh Website](http://bokeh.pydata.org/en/latest/))

The advantage of Bokeh over matplotlib is that the visualizations can be interactive (via the JavaScript).

From the U.K. accident data, we can plot the location of accidents for which latitude and longitude values are provided.

In [1]:
import os
import sqlite3 as sqlite
DATADIR = os.path.join(os.path.expanduser("~"),"DATA",
                       "Misc")
print(os.path.exists(DATADIR))
import pandas as pd
import numpy as np

True


In [2]:
from bokeh.io import output_notebook

### This enables drawing directly in the notebook

In [3]:
output_notebook()

### Read in the data

In [4]:
data = pd.read_csv(os.path.join(DATADIR,
                         "Accidents7904.csv"),
                   usecols = ['Longitude',"Latitude","Date"]).dropna()
data

Unnamed: 0,Longitude,Latitude,Date
4883216,-0.271752,51.715661,25/12/1999
4883217,-0.239977,51.695136,17/12/1999
4883218,-0.270037,51.715096,15/12/1999
4883219,-0.263233,51.711309,02/12/1999
4883220,-0.227225,51.688200,04/12/1999
4883221,-0.375451,51.690074,29/12/1999
4883222,-0.279194,51.717928,17/12/1999
4883223,-0.384303,51.664928,17/12/1999
4883224,-0.372406,51.674116,17/11/1999
4883225,-0.241739,51.698220,23/11/1999


####  We can use the [``sample``](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sample.html) method to get a subset of DataFrame

In [5]:
subdata = data.sample(2000)
mean_long = np.mean(subdata['Longitude'])
mean_lat  = np.mean(subdata['Latitude'])


In [6]:
from bokeh.io import output_file, show
from bokeh.models import (
  GMapPlot, GMapOptions, ColumnDataSource, Circle, DataRange1d, 
    PanTool, WheelZoomTool, BoxSelectTool, HoverTool
)

hover = HoverTool()
map_options = GMapOptions(lat=mean_lat, 
                          lng=mean_long, 
                          map_type="roadmap", zoom=6)

plot = GMapPlot(
    x_range=DataRange1d(), 
    y_range=DataRange1d(), 
    map_options=map_options
)
plot.title.text = "U.K. Road Accidents"

source = ColumnDataSource(
    data=dict(
        lat=subdata['Latitude'],
        lon=subdata['Longitude'],
    )
)


hover.tooltips.append(('index','$index'))
circle = Circle(x="lon", y="lat", size=2, 
                fill_color="blue", fill_alpha=0.8, 
                line_color=None)
plot.add_glyph(source, circle)

plot.add_tools(PanTool(),WheelZoomTool(), BoxSelectTool(), hover)
show(plot)