<h1>Getting started in Bokeh!</h1>

I hope that the "Exploring Measles with interactive plots" ignited your curiosity regarding the BokehJS Python library. In this notebook I'm going to cover the basics of creating a Bokeh plot and adding inspectors that allow us to interact with the plot. By the end of this tutorial we will be plotting a line plot that shows the incidence of measles per capita in New York for the year of 1935. You will be able to hover over each point and see how many cases occured in each week. 

<h2>The basics: Bokeh layers</h2>

Each Bokeh plot can be thought of as consisting of layers:
* The very first layer is the 'figure', think of this as the base of your plot, the area in which it will sit. We refer to this layer for all other additional features on the plot.
* On top of the figure we can add Glymphs. Glymphs are the visual components of a plot that we use to visualise data. They are linked to a data source and can be points, lines, or various shapes.
* In addition we can then add sytles to our plot and customise its look.
* Finally we can add a layer of inspectors, these are interactive features that will allow us to hover over the plot and see more data for each of our plotted points.

In [1]:
#Import Bokeh
from bokeh.io import output_notebook, show
from bokeh.plotting import figure

In [2]:
#Make notebook fullwidth
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:80% !important; }</style>"))

We import the `figure` class from Bokeh above, but also import `output_notebook` and `show`, these two will us to view our Bokeh outputs in Jupyter Notebooks

In [4]:
output_notebook()

We can use the above command to ensure Bokeh has loaded successfully.

Below I'm going to get us started with a simple plot of 5 data points. Notice how we start with a figure object, where we define figure specific stuff like the height, width, and title, and then we add a glymph layer using the `circle` method. It is within this method code that we define the data we want to plot and all visual aspects related to that glymph.

In [5]:
plot = figure(plot_height=400, plot_width=600, title="My first line plot", toolbar_location = None)
plot.circle(x = [1,2,3,4,5], y = [2,3,5,7,8], size = 8, fill_color = "blue", fill_alpha = 0.5, line_alpha = 0.5)

In [6]:
show(plot)

Lets play with the visual elements of the plot above. Make the following changes to the code below and run the two cells:
* Make the plot wider
* Add an additional data point with the coordinates (8, 5)
* Change the fill colour to "red"
* Add an additional argument of `x_range` to the figure definition and make it equal to a tuple e.g. (0,10) for a range of 0 to 10; this will set the x-axis to have a specified range

In [19]:
plot = figure(plot_height=400, plot_width=___, title="My first line plot", x_range= ___, toolbar_location = None)
plot.circle(x = [1,2,3,4,5,_], y = [2,3,5,7,8,_], size = 8, fill_color = "___", fill_alpha = 0.5, line_alpha = 0.5)

In [20]:
show(plot)

<h2>Add axis title</h2>

We can add axis titles by accessing the `xaxis.axis_label` attribute of our figure object. Use the code below to add axis titles to our plot:

In [21]:
plot.xaxis.axis_label = ""
plot.yaxis.axis_label = ""

In [22]:
show(plot)

<h2>Tools</h2>

Lets start adding some interactive features to the plot. So far we have been setting the `toolbar_location` attribute to `None` which means no tools have been displayed. By default Bokeh will provide tools to interact with your plot, you can also specify where you want the toolbar to be located. Try out some of the tools in the plot below.

In [23]:
plot = figure(plot_height=400, plot_width=600, title="My first line plot", toolbar_location="above")
plot.circle(x = [1,2,3,4,5], y = [2,3,5,7,8], size = 8, fill_color = "blue", 
            fill_alpha = 0.5, line_alpha = 0.5)
show(plot)

<h2>Adding multiple Glymphs</h2>

We can add multiple glymphs to the same plot by using an additional method call on our figure object. Complete the code below to add a line glymph to the plot. This will have the same data as the circle glymph. Give the line a colour of "red"

In [25]:
plot = figure(plot_height=400, plot_width=600, title="My first line plot", toolbar_location="above")
plot.circle(x = [1,2,3,4,5], y = [2,3,5,7,8], size = 8, fill_color = "blue", 
            line_color = "green", fill_alpha = 0.5, line_alpha = 0.5)
plot.line(x = [1,2,3,4,5], y = [2,3,5,7,8], line_color = "___")
show(plot)

<h2>ColumnDataSource</h2>

It might seem a little inefficient in the plot above that we have specified the same data independently for each glymph. BokehJS actually have a data object that it uses for all the underlying data for our plots. In fact, under the hood Bokeh has been taking out inputs for x and y and creating a `ColumnDataSource` object. So what is this `ColumnDataSource` thing?

The ColumnDataSource class is used for all data in Bokeh plots and is a powerful concept, because when the same ColumnDataSource object is used across multiple plots, it allows for linkages between plots. The basic ColumnDataSource object is a dictionary where the keys relate to column names and the values are the contents of these columns. ColumnDataSource objects can be created from Python dictionary objects but can also be defined using Pandas DataFrames.

Below I am creating a ColumnDataSource object using a Python dictionary. We will use this object to create the same plot we created above. What I want you to do, is set the attirubtes for our glymphs as follows:
* Set the `source` attribute to equal our ColumnDataSource object
* Set `x` to equal the name of our 'x' column (pass the value to x as a string)
* Do the same for `y` but refer to the 'y' column

In [26]:
from bokeh.models import ColumnDataSource

data = {'x': [1,2,3,4,5],
       'y': [2,3,5,7,8],
       'text': ["This", "Bokeh", "stuff", "is", "pretty cool!"]}

src = ColumnDataSource(data)

In [33]:
plot = figure(plot_height=400, plot_width=600, title="My first line plot", toolbar_location="above")
plot.circle(x = "_", y = "_", size = 8, fill_color = "blue", 
            line_color = "blue", fill_alpha = 0.5, line_alpha = 0.5, source = ___)
plot.line(x = "_", y = "_", line_color = "red", source = src)
show(plot)

<h2>Inspectors</h2>

You're probably thinking, what is with that text column I added to the ColumnDataSource object? Well, now its time to talk about inspectors! Inspectors allow us to add even more interaction to our plot, as we can hover over each data point and 'inspect' it, giving the observer access to even more of the underlying data!

To add an inspector we specify the attribute `tool_tips` in our figure definition. `tool_tips` takes a list of tuples that specifies what we want to see when we hover over a data point.

The tuples we include in this list are in the format `("Tool name", "data")`. The second element of this tuple, the data, can take two types of values:
* A special field, which is prefixed with a dollar sign e.g. "\$x" which would show the x-coordinate under the cursor
* A data field fron the ColumnDataSource object, which is prefixed with an '@' symbol, e.g. "@text" would show the text value for the corresponding x,y value in ColumnDataSource

Complete the tool tips list below so that "x" and "y" correspond to the x and y coordinate of the mouse, and "text" corresponds to the "text" column in the ColumnDataSource object `src` we defined above

In [36]:
TOOL_TIPS = [("x", "__"), ("y", "__"), ("text", "_____")]

In [37]:
plot = figure(plot_height=400, plot_width=600, title="My first line plot", toolbar_location="above",
             tooltips=TOOL_TIPS)
plot.circle(x = "x", y = "y", size = 8, fill_color = "blue", 
            line_color = "blue", fill_alpha = 0.5, line_alpha = 0.5, source = src)
plot.line(x = "x", y = "y", line_color = "red", source = src)

show(plot)

<h2>Plotting Measles data for Measles cases in New York in 1935</h2>

In [53]:
import pandas as pd
measles_data = pd.read_csv("measles.csv")
measles_data["year"] = measles_data["week"].apply(lambda x: int(str(x)[0:4]))
measles_data["week_num"] = measles_data["week"].apply(lambda x: int(str(x)[4:7]))
new_york = measles_data[(measles_data["year"] == 1935) & (measles_data["state_name"] == "NEW YORK")]

Above I have imported some data for Measles cases in the United States and created a subset of this dataset for New York in the year 1935. We are going to plot the weekly incidence of Measles cases and make the plot interactive so you can hover over each data point and see the total number of cases.

To show you what columns exist in the DataFrame, lets just call the `head` method.

In [54]:
new_york.head()

Unnamed: 0,week,state,state_name,disease,cases,incidence_per_capita,year,week_num
15485,193501,NY,NEW YORK,MEASLES,671,5.02,1935,1
15529,193502,NY,NEW YORK,MEASLES,1110,8.3,1935,2
15575,193503,NY,NEW YORK,MEASLES,826,6.18,1935,3
15620,193504,NY,NEW YORK,MEASLES,823,6.15,1935,4
15666,193505,NY,NEW YORK,MEASLES,1091,8.16,1935,5


**NB. We don't actually have to define a ColumnDataSource object when using a Pandas DataFrame, we can pass it in directly to the attribute `source`**

Start by defining our tool tips list. We want to show the number of cases for each row plotted on our graph, so choose the relevant column to refer too:

In [55]:
TOOL_TIPS = [("Total Cases:", "_____"), ("Week Number:", "_____")]

Define a figure object called `plot`. We know there are 52 weeks in a year, so lets set the x_range to have values between 1 and 52. Also give the plot a sensible title.

In [56]:
plot = figure(x_range=(_,__), plot_width=800, plot_height=500, title="______", tooltips=TOOL_TIPS)

Add a circle and line glymph in the cell below. For the attribute `source` pass the DataFrame `new_york`. Set the value for x so that it corresponds to the "week_num" column and y so that it corresponds to the "incidence_per_capita" column. Set other visual attributes however you see fit.

In [63]:
plot.circle(x = "_____", y = "_____", size = 10, source = _____, fill_color="red")
plot.line(x = "_____", y = "_____", line_width = 1, source = _____, line_color="blue")

Give our plot some sensible axis titles.

In [64]:
plot.xaxis.axis_label = "_____"
plot.yaxis.axis_label = "_____"

In [65]:
show(plot)

Congratulations! You have created a fully interactive Bokeh plot for cases of Measles in New York in 1935! If this has sparked your curiosity I highly recommend checking out the following resources:

* The bokeh documentation that can be found on their <a href="https://bokeh.pydata.org/en/latest/">website.</a>
* William Koehrsen's articles on BokehJS for creating a BokehJS dashboard are fantastic, and is split into three parts: <a href="https://towardsdatascience.com/data-visualization-with-bokeh-in-python-part-one-getting-started-a11655a467d4">part 1</a>, <a href="https://towardsdatascience.com/data-visualization-with-bokeh-in-python-part-ii-interactions-a4cf994e2512">part 2</a>, and <a href="https://towardsdatascience.com/data-visualization-with-bokeh-in-python-part-iii-a-complete-dashboard-dc6a86aa6e23">part 3.</a>
* Eugine Kang gives a nice basic overview <a href="https://medium.com/@kangeugine/bokeh-rshiny-replacement-ac74694bbe3f">here.</a>
* Mandi Cai from FreecodeCamp gives a great comparison of Bokeh and D3 <a href="https://medium.freecodecamp.org/charting-the-waters-between-bokeh-and-d3-73b3ee517478"> here.</a>