# Plotting cumulative cases from the 2014-2015 Ebola Outbreak

In this example we explore an open Ebola dataset by Caitlin Rivers: https://github.com/cmrivers/ebola

We want to show how the outbreak was getting worse in Winter 2014 by plotting the cumulative cases of all the countries.

This tutorial covers the `line` and `multi_line` functions in the `plotting` module, and the `TimeSeries` function from the `charts` module

In [None]:
import pandas as pd

Since we are rendering plots in the notebook, we need to first set the output to be notebook

In [None]:
from bokeh.plotting import output_notebook
output_notebook()

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/cmrivers/ebola/master/country_timeseries.csv')

In [None]:
df.head()

In [None]:
# We need a datetime object and not a string representation of the date
df['datetime'] = pd.to_datetime(df['Date'])

Bokeh has 3 levels of plotting.  The lowest layer of `glyphs`, a `plotting` layer, and a high level `charts`.
We are using the figure object, and plotting a line in it.

The line funtion takes an x and y series or array of values.

In [None]:
from bokeh.plotting import figure, show

p = figure()
p.line(x=df['datetime'], y=df['Cases_Guinea'])
show(p)

In order to tell Bokeh to read our dates as dates and not a number, we pass the `x_axis_type` value into `figure`.
We can also pass in values into `line` for `color` and `line_width`

In [None]:
p = figure(x_axis_type = 'datetime')
p.line(x=df['datetime'], y=df['Cases_Guinea'], color="red", line_width=2)
show(p)

Finally, we can add a legend by passing in a string value of the legend we want.

In [None]:
p = figure(x_axis_type = 'datetime')
p.line(df['datetime'], df['Cases_Guinea'], color="red",line_width=2, legend='Cases_Guinea')
show(p)

# Plot the rest of the data

We showed how to plot a single line, but what about the other countries?
One way would be to add a new `line` for each line we want to plot.

In [None]:
p = figure(x_axis_type = 'datetime')
p.line(df['datetime'], df['Cases_Guinea'], color="#1f77b4",line_width=2, legend='Cases_Guinea')
p.line(df['datetime'], df['Cases_Liberia'], color="#ff7f0e",line_width=2, legend='Cases_Liberia')
p.line(df['datetime'], df['Cases_Mali'], color="#2ca02c",line_width=2, legend='Cases_Mali')
p.line(df['datetime'], df['Cases_Nigeria'], color="#d62728",line_width=2, legend='Cases_Nigeria')
p.line(df['datetime'], df['Cases_Senegal'], color="#9467bd",line_width=2, legend='Cases_Senegal')
p.line(df['datetime'], df['Cases_SierraLeone'], color="#8c564b",line_width=2, legend='Cases_SierraLeone')
p.line(df['datetime'], df['Cases_Spain'], color="#e377c2",line_width=2, legend='Cases_Spain')
p.line(df['datetime'], df['Cases_UnitedStates'], color="#7f7f7f",line_width=2, legend='Cases_UnitedStates')
show(p)

You can see that doing it this way would be repetitive.
If we want to plot many lines at once, we can use the `multi_line` function insead of regular `line`.

The way `multi_line` works is that it takes a list of values for the various axes.

In [None]:
# since we want a list of dates, and all the dates are the same, we can use some
# python to autogenerate a list of dates for us
time = [df['datetime']] * 8

# We create a list of column names we will use to subset the dataframe from
type_country_name = ['Cases_Guinea', 'Cases_Liberia', 'Cases_Mali', 'Cases_Nigeria',
                     'Cases_Senegal', 'Cases_SierraLeone', 'Cases_Spain', 'Cases_UnitedStates']

# Here we are generating the list of data we are subsetting from our dataframe
case_country_values = [df[x] for x in type_country_name]

# and a list of colors for each line
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728',
          '#9467bd', '#8c564b', '#e377c2', '#7f7f7f']

The syntax for `multi_line` is similar to `line`, but now the axes take a list of values.
As of Bokeh 0.9.1 you can't pass in a list of values into `legend`

In [None]:
p = figure(x_axis_type = 'datetime')
p.multi_line(xs=time, ys=case_country_values, color=colors)
show(p)

In [None]:
# Using the higher level charts module

In [None]:
# directly read in the Date column as datetime
df = pd.read_csv('https://raw.githubusercontent.com/cmrivers/ebola/master/country_timeseries.csv',
                 parse_dates=['Date'])

In [None]:
# drop the unnecessary column
df.drop('Day', axis=1, inplace=True)

In [None]:
from bokeh.charts import Line, TimeSeries
from bokeh.io import show

In [None]:
ts = TimeSeries(df, index='Date', legend=True)
show(ts)

The above plot isn't that useful since it is plotting the cases and deaths on the same plot and this causes a lot of clutter.

In [None]:
# Subset the data to only have the "Date" column and all other columns that begin with "Cases"
df2 = df.filter(regex="Date|^Cases")

In [None]:
# replot using only the cases data
ts = TimeSeries(df2, index='Date', legend=True)
show(ts)

Finally, we can clean up the plot by adding labels to the axes

In [None]:
ts = TimeSeries(df2, index='Date', legend=True, title='Ebola Cases Chart', ylabel='Count')
show(ts)