## Visualizing Spatial Data with Pandas and Bokeh

[bokeh](http://bokeh.pydata.org/en/latest/) is a relatively new JavaScript visualization language for Python that is modeled after D3 but is intended to be able to handle millions of data points.

>Bokeh is a Python interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of novel graphics in the style of D3.js, and to extend this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications. ([Bokeh Website](http://bokeh.pydata.org/en/latest/))

The advantage of Bokeh over matplotlib is that the visualizations can be interactive (via the JavaScript).

From the U.K. accident data, we can plot the location of accidents for which latitude and longitude values are provided.

In [1]:
!pip install bokeh=='0.12.16'

Collecting bokeh==0.12.16
[?25l  Downloading https://files.pythonhosted.org/packages/cd/47/201408029628164342e65a4552ee00abc79ea7be1b64031281b81b0e2f4d/bokeh-0.12.16.tar.gz (14.7MB)
[K     |████████████████████████████████| 14.7MB 130kB/s  eta 0:00:01   |██▍                             | 1.1MB 971kB/s eta 0:00:14     |███████████████▋                | 7.1MB 971kB/s eta 0:00:08
Building wheels for collected packages: bokeh
  Building wheel for bokeh (setup.py) ... [?25ldone
[?25h  Stored in directory: /home/gastonq/.cache/pip/wheels/ff/28/51/22e8d08e9d5383ee1de981aaa8ff7bc53c7d65022e5101400f
Successfully built bokeh
Installing collected packages: bokeh
  Found existing installation: bokeh 1.0.4
    Uninstalling bokeh-1.0.4:
      Successfully uninstalled bokeh-1.0.4
Successfully installed bokeh-0.12.16


In [3]:
import os
import sqlite3 as sqlite
DATADIR = os.path.join(os.path.expanduser("~"),"DATA",
                       "Misc")
print(os.path.exists(DATADIR))
import pandas as pd
import numpy as np

True


In [4]:

from bokeh.io import output_notebook

### This enables drawing directly in the notebook

In [5]:
output_notebook()

### Read in the data

In [21]:
data = pd.read_csv(os.path.join(DATADIR,
                         "Accidents7904.csv"),nrows=5000,
                   usecols = ['Longitude',"Latitude","Date","Number_of_Casualties"])#.dropna()
data.head()

Unnamed: 0,Longitude,Latitude,Number_of_Casualties,Date
0,,,1,18/01/1979
1,,,1,01/01/1979
2,,,3,01/01/1979
3,,,2,01/01/1979
4,,,1,01/01/1979


In [14]:
data.describe()
#data.dtypes

Unnamed: 0,Location_Easting_OSGR,Location_Northing_OSGR,Longitude,Latitude,Police_Force,Accident_Severity,Number_of_Vehicles,Number_of_Casualties,Day_of_Week,Local_Authority_(District),...,Pedestrian_Crossing-Human_Control,Pedestrian_Crossing-Physical_Facilities,Light_Conditions,Weather_Conditions,Road_Surface_Conditions,Special_Conditions_at_Site,Carriageway_Hazards,Urban_or_Rural_Area,Did_Police_Officer_Attend_Scene_of_Accident,LSOA_of_Accident_Location
count,997.0,997.0,0.0,0.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,...,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,0.0
mean,294988.335005,501388.164493,,,1.0,2.859,1.73,1.223,3.937,30.379,...,-0.905,-0.644,2.514,6.874,1.939,-1.0,0.01,-1.0,-1.0,
std,108430.874548,291362.859755,,,0.0,0.375846,0.737322,0.565323,1.682219,80.495583,...,0.293362,1.241292,1.518588,2.309428,0.78479,0.0,0.161015,0.0,0.0,
min,44670.0,1000.0,,,1.0,1.0,1.0,1.0,1.0,1.0,...,-1.0,-1.0,1.0,2.0,1.0,-1.0,-1.0,-1.0,-1.0,
25%,226570.0,240000.0,,,1.0,3.0,1.0,1.0,3.0,8.0,...,-1.0,-1.0,1.0,8.0,1.0,-1.0,0.0,-1.0,-1.0,
50%,294080.0,514000.0,,,1.0,3.0,2.0,1.0,4.0,17.0,...,-1.0,-1.0,4.0,8.0,2.0,-1.0,0.0,-1.0,-1.0,
75%,360870.0,762000.0,,,1.0,3.0,2.0,1.0,5.0,26.0,...,-1.0,-1.0,4.0,8.0,3.0,-1.0,0.0,-1.0,-1.0,
max,572080.0,993000.0,,,1.0,3.0,5.0,5.0,7.0,513.0,...,0.0,5.0,6.0,8.0,3.0,-1.0,2.0,-1.0,-1.0,


####  We can use the [``sample``](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sample.html) method to get a subset of DataFrame

In [24]:
subdata = data.sample(2000)
mean_long = np.mean(subdata['Longitude'])
mean_lat  = np.mean(subdata['Latitude'])


In [30]:
from bokeh.io import output_file, show
from bokeh.models import (
  GMapPlot, GMapOptions, ColumnDataSource, Circle, DataRange1d, 
    PanTool, WheelZoomTool, BoxSelectTool, HoverTool
)

hover = HoverTool()
map_options = GMapOptions(lat=mean_lat, 
                          lng=mean_long, 
                          map_type="roadmap", zoom=6)

plot = GMapPlot(
    x_range=Range1d(), 
 y_range=Range1d(), 
    map_options=map_options
)
plot.title.text = "U.K. Road Accidents"

source = ColumnDataSource(
    data=dict(
        lat=subdata['Latitude'],
        lon=subdata['Longitude'],
    )
)


hover.tooltips.append(('index','$index'))
circle = Circle(x="lon", y="lat", size=2, 
                fill_color="blue", fill_alpha=0.8, 
                line_color=None)
plot.add_glyph(source, circle)

plot.add_tools(PanTool(),WheelZoomTool(), BoxSelectTool(), hover)
show(plot)

NameError: name 'Range1d' is not defined