# Interactive Plotting w/ Bokeh

Today we'll be going over how to do some more advanced things with interactive plots. 

In [None]:
import pandas as pd
import chardet
from config import gmaps_key
from pathlib import Path
from bokeh.io import output_notebook, show
from bokeh.plotting import figure, gmap
from bokeh import events
from bokeh.models import CustomJS, Div, Button, GMapOptions, Dropdown, ColumnDataSource, HoverTool
from bokeh.layouts import column, row
output_notebook()

data_path = Path.cwd() / 'data/boston_crime.csv'

## Character Encodings
Depending on the application that was used to create a CSV and the location it was created in, we could see different encodings on a CSV. For example, older Excel versions used custom encodings like what Pandas calls 'cp1252', and other countries have different encodings based on their local language. This has largely gone by the wayside in modern work since everything should be UTF-8, but you will see non-standard encodings every so often. 

[Here's the encodings read_csv supports.](https://docs.python.org/3/library/codecs.html#standard-encodings)

The best way of telling what type of encoding you're dealing with is to make sure that the person giving you the data puts it in UTF-8. That's not always possible, but for those times there's chardet. Chardet is a Python library that uses machine learning to predict what kind of encoding the file has. It takes a long time for large files and isn't always accurate, but it's usually a good thing to try. 

I don't recommend that you run the code chunk below. It takes forever, because this dataset is enormous. The output does state that the encoding is 'latin-1', which is correct. 

In [None]:
def detect_encoding(data_path):
    """
    detect_encoding()
    Takes in a Path object and prints the predicted encoding and confidence.
    
    Gets: data_path, a Path object
    Returns: nothing
    """
    with open(data_path, 'rb') as read_file:
        print(chardet.detect(read_file.read()))
        
        
detect_encoding(data_path)
# results in 'latin-1' but takes forever

## Comments
The best way to write comments in Python is, in my opinion and in the opinion of PEP8, to use docstrings. The comments I've placed within these functions are examples of docstrings. 

For more on docstrings, [see here](https://www.python.org/dev/peps/pep-0257/).

In [None]:
def import_data(data_path):
    """
    import_data(data_path)
    Receives a Path object and uses that to read in a csv
    and return it. Currently hardcoding encoding because this
    will only be used for one csv.
    
    Gets: data_path, a Path object
    Retuns: a Pandas Dataframe
    """
    return pd.read_csv(data_path, encoding='latin-1')
    
    
df = import_data(data_path)

In [None]:
print(df.shape)
print(df.head)

In [None]:
df = df[(df['Lat'].notnull()) & (df['Long'].notnull())]
df = df[(df['Lat'] > 41) & (df['Lat'] < 43)]
df = df[(df['Long'] > -73) & (df['Long'] < -69)]
df = df.sample(frac=.05, axis = 'index')
print(df.shape)

In [None]:
source = ColumnDataSource(df)

map_options = GMapOptions(lat=42.359955, lng=-71.059886, map_type="roadmap", zoom=11)
tooltips = [
    ("Date", "@OCCURRED_ON_DATE"),
    ("Offense Description", "@OFFENSE_DESCRIPTION"),
]


p = gmap(gmaps_key, title="Boston Crime", map_options=map_options, tools="box_select")
p.circle('Long', 'Lat', size=2, fill_alpha=0.6, line_color=None, source=source)
div = Div(width=400)
layout = column(button, row(p, div))
p.add_tools(HoverTool(tooltips=tooltips))

p.js_on_event(events.SelectionGeometry, CustomJS(args=dict(div=div), code="""
div.text = "Selection! <p> <p>" + JSON.stringify(cb_obj.geometry, undefined, 2);
"""))

show(layout)