# bokeh_wrap demo

Examples using **`bokeh_wrap.py`**, a convenience wrapper around two bokeh plot functions for making two kinds of plots:

1. Histogram with **`bokeh_wrap.bokeh_wrap.hist()`**
2. Time Series with **`bokeh_wrap.bokeh_wrap.timeplot()`**

## 0. Import the module (in this same directory)

In [None]:
import bokeh_wrap as bw

## 1. Histogramming

#### 1.1.  Read some data from a text file into a Pandas DataFrame

Using a Pandas DataFrame as the ColumnDataSource for a bokeh plot is *not* required, though they [are particularly convenient](https://bokeh.pydata.org/en/0.12.7/docs/user_guide/data.html#columndatasource).

In [None]:
import pandas as pd

data = pd.read_table("data.txt")
print(data.shape)
data.head()

Pretend these data are the runtime (in some units) of some process.  We want to explore:
1. Are there outliers?  If yes, how far out and how many/much of an issue?
2. Is there a single population of values or maybe multiple ones?
3. Where's the median?  Where would a cut-off such as the 90th percentile lie?

Pandas DataFrames do have methods `describe()` and `hist()`... but they are not as flexible out of the box as I would like...

In [None]:
data.describe()

In [None]:
from matplotlib import pyplot as plt
%matplotlib inline
data.hist()

#### 1.2.  Pass the DataFrame to function `bw.hist()`

In [None]:
bw.hist(df=data, colname='runtime')

## 2.  Time series plot

#### 2.1.  Read in some time-stamped values
Note: the data do not need to be sorted in any particular time order, ascending or descending.  However, it is important to make sure the time column is of (or is converted to) type **`numpy.datetime64`**

In [None]:
tsdata = pd.read_csv('time_data.csv')
tsdata.head()

Check the data type of the **`started_at`** column:

In [None]:
type(tsdata.started_at.values[0])

Convert that column from **`str`** to **`numpy.datetime64`**:

In [None]:
tsdata.started_at = tsdata.started_at.apply(pd.to_datetime)
type(tsdata.started_at.values[0])

#### 2.2. Pass the DataFrame to function `bw.timeplot()`

In [None]:
bw.timeplot(df=tsdata, timecol='started_at', datacol='runtime', plottype='scatter')