# Data Visualisation
## Creating plots with Bokeh

### Bokeh - why it is useful?

- Facilitates creation of **interactive charts** for web browsers
    - Add **data filters** which automatically update the plot
    - Add **tooltips** which show additional data when a point is hovered on
    - Tools to **zoom in** on a plot and **save an image** of it
    
- **HTML** and **Javascript** output, produced with **Python** code
    - Facilitates simple **sharing** of html files with people who do not use Python or Jupyter notebooks

### Bokeh terminology

**Figures**: the '**canvas**' onto which we plot
- axes, grids and tools (such as zoom, scroll and save)


**Glyphs**: typically our **datapoints**
- the lines, circles, bars, etc which are plotted onto a figure


**Widgets**: additional **user interface components** outside of a plot
- sliders, buttons and drop-down menus


**Models**: the **lower-level objects** with which Bokeh visualisations are built
- can be modified when more specific configuration is required 

#### Importing Bokeh modules

In [None]:
from bokeh.plotting import Figure, output_notebook, show, save
from bokeh.models import ColumnDataSource, HoverTool, GroupFilter, CDSView

output_notebook()

import pandas as pd
df = pd.read_csv('data/mpg.csv')

- The `Figure` **class** in `bokeh.plotting` facilitates the creation of plots by adding a **default set of tools and styles**
- Running `output_notebook()` results in our plots being shown in the notebook when `show()` is run
- The `bokeh.plotting.save` method allows us to **save a HTML file** of our plots

We have also imported some **classes** (from the `bokeh.models` module, which is a **lower-level interface** (i.e. more complex but allowing more customisation) for use in the following examples.

- Classes (which by convention have Capitalised names) allow us to create (or **instantiate**) objects with pre-defined **methods** and **attributes**; an object created from a given class is said to be an **instance** of that class

Creating an instance of `Figure` and then calling `show` outputs an empty plot (with the default tools next to it).

 - Bokeh also gives us a `WARNING` that the `Plot has no renderers`, i.e. nothing to place onto it.

In [None]:
weight_mpg = Figure(width=300, height=300)
show(weight_mpg)

In [None]:
weight_mpg.scatter(x=df['weight'], y=df['mpg'])
show(weight_mpg)

`weight_mpg` (our instance of a `Figure`) has **inherited** the various methods in the `Figure` **class definition**, including the `.scatter()` method.

- Similarly to the `pandas` and `seaborn` plotting methods seen previously, we provide arguments for the `x` and `y` parameters

- `x` and `y` can be of various data types (lists, Series, DataFrame columns) but **must be the same length**

In [None]:
origin_colors = {'usa':'#CE1141', 'japan': 'orange', 'europe': 'blue'}
df['color_column'] = [origin_colors[x] for x in df['origin']]

origin_eff = Figure(width=400, height=300)
origin_eff.scatter(x=df['horsepower'], y=df['weight'], \
                   fill_color=df['color_column'], size=8)

show(origin_eff)

- We added `color_column` to `df`, using a dictionary to enter values based on `origin`
- We created a new `Figure` and used the `.scatter()` method as before
- We passed `df['color_column']` to the `fill_color` parameter
- We modified the `size` of our datapoints

In [None]:
data = ColumnDataSource(df)
origin_tt = Figure(tools='', tooltips=[('Model', '@name'), ('Country', '@origin')], \
                   width=300, height=300)

origin_tt.scatter(source=data, x='horsepower', y='acceleration', \
                  fill_color='color_column', size=8, \
                  legend_group='origin')

show(origin_tt);

- When creating the `Figure` we passed an empty string to `tools`, this resulted in the removal of the tools from the toolbar
- The `tooltips` parameter allows us to show information when the user hovers on a datapoint
    - Each tuple in the list is in the format `(displayed_feature_name, @name_of_column_in_df)`

- We created a `ColumnDataSource` instance called `data` from `df`
    - In previous examples, bokeh built its own `ColumnDataSource` instance using the values passed to `x` and `y`
    - Bokeh's `ColumnDataSource` is what it uses to pass data to various glyphs and plots. More information can be found [here](https://docs.bokeh.org/en/latest/docs/user_guide/data.html#userguide-data-cds)
- We used `data` as the `source` argument when calling the `.scatter()` method. This allowed us to use the column names from `df` for other parameters, without using `df[...]` each time
- The `legend_group` parameter allowed us to create a legend based on the `origin` column
    

In [None]:
save(origin_tt, filename='data/hp-acc.html', resources='inline', title='Horsepower vs Acceleration');

- The `save` function from `bokeh.plotting` allows us to create a **HTML file** of the plot
    - The file is called `hp-acc.html` and has been saved to the `data` folder
    - Passing `inline` to `resources` packages the required Javascript within the file
    - The assigned `title` can be seen in the HTML code of the file
- The HTML file can be viewed in any **web browser** 
    - Hovering on a datapoint reveals the features specified in `tooltips`

In [None]:
from bokeh.models.widgets import Tabs, Panel

wght_panel = Panel(child=origin_eff, title='Weight')
acc_panel = Panel(child=origin_tt, title='Acceleration')

both = Tabs(tabs=[wght_panel, acc_panel])
show(both)

Bokeh has various layout options like tabs, columns, rows, gridplot and layout objects. Find more information on layouts [here](https://docs.bokeh.org/en/latest/docs/user_guide/layout.html)

### Pandas-Bokeh

Pandas-Bokeh is a package which brings together bokeh and the pandas `.plot()` method, simplifying the creation of interactive bokeh plots from pandas DataFrames.

In [None]:
pd.set_option('plotting.backend', 'pandas_bokeh')

import warnings
warnings.filterwarnings('ignore')

The pandas `.set_option()` method allows us to set `pandas_bokeh` as the pandas `plotting.backend`
- `pandas_bokeh` will be used when the DataFrame `.plot()` method is called

- `filterwarnings` is set to `ignore` because (at the time of writing) a warning is sometimes given about the future deprecation of a bokeh feature used by pandas_bokeh

In [None]:
fig = df.plot(x='weight', y='mpg', kind='scatter', category='origin', alpha=0.6);

- We can call the `.plot()` method on the DataFrame as we have seen previously
    - `x`, `y`, `kind`, and `alpha` parameters work as previously seen
    - `category` is a **pandas-bokeh** parameter, which works similarly to `hue` in seaborn
        - **clickable legend** provides datapoint filtering

In [None]:
fig.title.text = 'Car weight vs fuel efficiency'
fig.title.text_font_size = '14pt'
fig.xaxis.axis_label, fig.yaxis.axis_label  = ('Weight (kg)', 'Fuel efficiency (mpg)')
fig.xaxis.minor_tick_line_width, fig.yaxis.minor_tick_line_width = (0, 0)
fig.xaxis.axis_label_text_font_style, fig.yaxis.axis_label_text_font_style = ('bold', 'bold')
fig.toolbar_location=None
show(fig)

We can assign the underlying bokeh Figure to a variable (here `fig`) and then customize it using **bokeh Figure methods**.

- We changed the `axis_label` and its `font_style` for both `xaxis` and `yaxis`
- We changed the `title.text` and its `text_font_size`
- We removed the toolbar by setting `toolbar_location` to `None`

## Jupyter Exercise 

Plotting with bokeh
 
Open file `bokeh-workbook.ipynb`