# Bokeh: quick (and partial) summary - part 1
Marco Chierici & Giuseppe Jurman

May, 24th 2022

(partially abridged from [Data Visualisation with Bokeh](https://github.com/ernestoarbitrio/bokeh-data-visualisation) and the [Bokeh User Guide](https://docs.bokeh.org/en/latest/docs/user_guide.html))

**Bokeh** \ˈbō-kā\ is a Python interactive visualization library that targets modern web browsers for presentation. Bokeh provides elegant, concise construction of novel graphics with high-performance interactivity over very large or streaming datasets in a quick and easy way.

To offer both simplicity and the powerful and flexible features needed for advanced customizations, Bokeh exposes two interface levels to users:

- a low-level bokeh.models interface that provides the most flexibility to application developers;
- an higher-level bokeh.plotting interface centered around composing visual glyphs.

We start with an introduction to the bokeh.plotting interface.

## Defining key concepts

### Glossary

Here is a small glossary of some of the most important concepts in Bokeh.

**Application**

A Bokeh application is a rendered Bokeh document, running in a browser.

**BokehJS**

The JavaScript client library that actually renders the visuals and handles the UI interactions for Bokeh plots and widgets in the browser. Typically, users will not have to think about this aspect of Bokeh much (“We write the JavaScript, so you don’t have to!”) but it is good to have basic knowledge of this dichotomy. 

**Documents**

An organizing data structure for Bokeh applications. Documents contain all the Bokeh Models and data needed to render an interactive visualization or application in the browser.

**Embedding**

Various methods of including Bokeh plots and widgets into web apps and pages, or the IPython notebook.

**Glyphs**

The basic visual building blocks of Bokeh plots, e.g. lines, rectangles, squares, wedges, patches, etc. The bokeh.plotting interface provides a convenient way to create plots centered around glyphs.

**Models**

The lowest-level objects that comprise Bokeh “scenegraphs”. These live in the bokeh.models interface. Most users will not use this level of interface to assemble plots directly. However, ultimately all Bokeh plots consist of collections of models, so it is important to understand them enough to configure their attributes and properties.

**Server**

The Bokeh server is an optional component that can be used for sharing and publishing Bokeh plots and apps, for handling streaming of large data sets, or for enabling sophisticated user interactions based off of widgets and selections.

**Widgets**

User interface elements outside of a Bokeh plot such as sliders, drop down menus, buttons, etc. Events and updates from widgets can inform additional computations, or cause Bokeh plots to update. Widgets can be used in both standalone applications or with the Bokeh server.

### Getting Started

Let's begin with some examples.

Plotting data in basic Python lists as a line plot including zoom, pan, save, and other tools is simple and straightforward:

In [1]:
from bokeh.plotting import figure, output_notebook, show
output_notebook()

In [2]:
p = figure(plot_width=400, plot_height=400)

# add a circle renderer with a size, color, and alpha
p.circle([1, 2, 3, 4, 5], [6, 7, 2, 4, 5], size=20, 
         color="navy", alpha=0.5)

# show the results
show(p)

### Multiple lines

In [3]:
p = figure(plot_width=400, plot_height=400)
p.multi_line([[1, 3, 2], [3, 4, 6, 6]], [[2, 1, 4], [4, 7, 8, 5]],
             color=["firebrick", "navy"], alpha=[0.8, 0.3], line_width=4)
show(p)

### Plotting bars

In [4]:
p = figure(plot_width=400, plot_height=400)
p.vbar(x=[1, 2, 3], width=0.5, bottom=0,
       top=[1.2, 2.5, 3.7], color="red")
show(p)

In [5]:
p = figure(plot_width=400, plot_height=400)
p.hbar(y=[1, 2, 3], height=0.5, left=0,
       right=[1.2, 2.5, 3.7], color="navy")
show(p)

### Plotting line

In [6]:
# prepare some demo data
x = [1, 2, 3, 4, 5, 6, 7]
y = [6, 7, 2, 4, 5, 10, 4]

# create a new plot with a title and axis labels
p = figure(title="line example", x_axis_label='x', 
           y_axis_label='y', width=500, height=400)

# add a line renderer with legend and line thickness
p.line(x, y, legend_label="Temp.", line_width=2)

# show the results
show(p)

### Twin Axes

In [7]:
from numpy import pi, arange, sin, linspace
from bokeh.models import LinearAxis, Range1d

x = arange(-2*pi, 2*pi, 0.1)
y = sin(x)
y2 = linspace(0, 100, len(y))

p = figure(x_range=(-6.5, 6.5), y_range=(-1.1, 1.1))

p.circle(x, y, color="red")

p.extra_y_ranges = {"foo": Range1d(start=0, end=100)}
p.circle(x, y2, color="blue", y_range_name="foo")
p.add_layout(LinearAxis(y_range_name="foo"), 'left')

show(p)

The basic steps to creating plots with the [bokeh.plotting](http://bokeh.pydata.org/en/latest/docs/reference/plotting.html#bokeh-plotting) interface are:

 - Prepare some data (in this case plain python lists).
 - Tell Bokeh where to generate output (in this case using `output_notebook()`).
 - Call `figure()` to create a plot with some overall options like title, tools and axes labels.
 - Add renderers (in this case, `Figure.line`) for our data, with visual customizations like colors, legends and widths  to the plot.
 - Ask Bokeh to `show()` or `save()` the results.

Steps three and four can be repeated to create more than one plot, as shown in some of the examples below.

The [bokeh.plotting](http://bokeh.pydata.org/en/latest/docs/reference/plotting.html#bokeh-plotting) interface is also quite handy if we need to customize the output a bit more by adding more data series, glyphs, logarithmic axis, and so on. It’s also possible to easily combine multiple glyphs together on one plot as shown below:

In [8]:
# prepare some data
x = [0.1, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0]
y0 = [i**2 for i in x]
y1 = [10**i for i in x]
y2 = [10**(i**2) for i in x]

# create a new plot
p = figure(
    tools="pan,box_zoom,reset,save",
    y_axis_type="log", title="log axis example",
    x_axis_label='sections', y_axis_label='particles',
    width=700, height=350)


# add some renderers
p.line(x, x, legend_label="y=x")
p.circle(x, x, legend_label="y=x", fill_color="white", size=8)

p.line(x, y0, legend_label="y=x^2", line_width=3)

p.line(x, y1, legend_label="y=10^x", line_color="red")
p.circle(x, y1, legend_label="y=10^x", fill_color="red", line_color="red", 
         size=6)

p.line(x, y2, legend_label="y=10^x^2", line_color="orange", line_dash="4 4")

# show the results
show(p)

From the 0.12.9 it's possible to define the click policy on the legend. So for example if you want to hide the series you click on the legend:

In [9]:
p.legend.location = "top_left"
p.legend.click_policy="hide" # >mute< is the other option

In [10]:
show(p)

## Sample Data
Some of the examples included in the Bokeh source make use of sample data files that are distributed separately. To download this data, execute the following commands at a Bash or Windows command prompt:

```bash
bokeh sampledata
```

## Concepts

Let’s consider the plots above, and use them to help define some core concepts.

### Plot

Plots are a central concept in Bokeh. They are containers that hold all the various objects (renderers, guides, data, and tools) that comprise the final visualization that is presented to users. The bokeh.plotting interface provides a Figure class to help with assembling all the necessary objects, and a convenience function figure() for creating Figure objects.

### Glyphs

Glyphs are the basic visual marks that Bokeh can display. At the lowest level, there are glyph objects, such as Line. If you are using the low-level bokeh.models interface, it is your responsibility to create and coordinate all the various Bokeh objects, including glyph objects and their data sources. To make life easier, the bokeh.plotting interface exposes higher level glyph methods such as the Figure.line method used in the first example. The second example also adds in calls to Figure.circle to display circle and line glyphs together on the same plot. Besides lines and circles, Bokeh makes many additional glyphs and markers available.

The visual appearance of a glyph is tied directly to the data values that are associated with the glyph’s various attributes. In the example above we see that positional attributes like x and y can be set to vectors of data. But glyphs also have some combination of Line Properties, Fill Properties, and Text Properties to control their appearance. All of these attributes can be set with “vectorized” values as well. We will show examples of this below.

### Guides and Annotations

Bokeh plots can also have other visual components that aid presentation or help the user make comparisons. These fall into two categories. Guides are visual aids that help users judge distances, angles, etc. These include grid lines or bands, axes (such as linear, log, or datetime) that may have ticks and tick labels as well. Annotations are visual aids that label or name parts of the plot. These include titles, legends, etc.

### Ranges

Ranges describe the data-space bounds of a plot. By default, plots generated with the bokeh.plotting interface come configured with DataRange1d objects that try to automatically set the plot bounds to encompass all the available data. But it is possible to supply explicit Range1d objects for fixed bounds. As a convenience these can also typically be spelled as 2-tuples or lists:

`p = figure(x_range=[0,10], y_range=(10, 20))`


### Resources

To generate plots, the client library BokehJS JavaScript and CSS code must be loaded into the browser. By default, the output_file() function will load BokehJS from http://cdn.pydata.org . However, you can also configure Bokeh to generate static HTML files with BokehJS resources embedded directly inside, by passing the argument mode="inline" to the output_file() function.

### More examples¶

Here are a few more examples to demonstrate other common tasks and use-cases with the bokeh.plotting interface.
Vectorized colors and sizes¶

This example shows how it is possible to provide sequences of data values for glyph attributes like fill_color and radius. Other things to look out for in this example:
- supplying an explicit list of tool names to figure()
- fetching BokehJS resources from CDN using the mode argument
- setting the x_range and y_range explicitly
- turning a line off (by setting its value to None)
- using NumPy arrays for supplying data

In [11]:
import numpy as np

# prepare some data
N = 4000
x = np.random.random(size=N) * 100
y = np.random.random(size=N) * 100
radii = np.random.random(size=N) * 1.5
colors = [
    "#%02x%02x%02x" % (int(r), int(g), 150) 
    for r, g in zip(50+2*x, 30+2*y)
]

TOOLS="crosshair,pan,wheel_zoom,box_zoom,reset,box_select,lasso_select"

# create a new plot with the tools above, and explicit ranges
p = figure(tools=TOOLS, x_range=(0,100), y_range=(0,100), width=500, 
           height=500)

# add a circle renderer with vectorized colors and sizes
p.circle(x,y, radius=radii, fill_color=colors, fill_alpha=0.6, 
         line_color=None)

# show the results
show(p)

In [12]:
colors[:30]


['#982196',
 '#415a96',
 '#ef4096',
 '#aeb596',
 '#aa3396',
 '#959996',
 '#7c4796',
 '#ce6796',
 '#605e96',
 '#cba696',
 '#edc996',
 '#4cb396',
 '#6d6396',
 '#b9bd96',
 '#55d696',
 '#c13d96',
 '#339796',
 '#f2c496',
 '#cbb996',
 '#ed8d96',
 '#338296',
 '#d29e96',
 '#dc3996',
 '#c9c496',
 '#e37796',
 '#abc896',
 '#55ae96',
 '#f2b696',
 '#d77196',
 '#c58896']

# Data Sources and Transformations

We've seen how Bokeh can work well with Python lists, NumPy arrays, Pandas series, etc. At lower levels, these inputs are converted to a Bokeh `ColumnDataSource`. This data type is the central data source object used throughout Bokeh. Although Bokeh often creates them for us transparently, there are times when it is useful to create them explicitly.

In later sections we will see features like hover tooltips, computed transforms, and CustomJS interactions that make use of the `ColumnDataSource`, so let's take a quick look now.

## Creating with Python Dicts
The `ColumnDataSource` can be imported from `bokeh.models`:

In [13]:
from bokeh.models import ColumnDataSource

The `ColumnDataSource` is a mapping of column names (strings) to sequences of values. Here is a simple example. The mapping is provided by passing a Python `dict` with string keys and simple Python lists as values. The values could also be NumPy arrays, or Pandas sequences.

***NOTE: ALL the columns in a `ColumnDataSource` must always be the SAME length.***


In [14]:
source = ColumnDataSource(data={
    'x' : [1, 2, 3, 4, 5],
    'y' : [3, 7, 8, 5, 1],
})

Up until now we have called functions like `p.circle` by passing in literal lists or arrays of data directly, when we do this, Bokeh creates a `ColumnDataSource` for us, automatically. But it is possible to specify a `ColumnDataSource` explicitly by passing it as the `source` argument to a glyph method. Whenever we do this, if we want a property (like `"x"` or `"y"` or `"fill_color"`) to have a sequence of values, we pass the ***name of the column*** that we would like to use for a property:

In [15]:
p = figure(plot_width=400, plot_height=400)
p.circle('x', 'y', size=20, source=source)
show(p)

## Creating with Pandas DataFrames

It's also simple to create `ColumnDataSource` objects directly from Pandas data frames. To do this, just pass the data frame to  `ColumnDataSource` when you create it:

In [16]:
from bokeh.sampledata.iris import flowers as df

source = ColumnDataSource(df)

p = figure(plot_width=400, plot_height=400)
p.circle('petal_length', 'petal_width', source=source)
show(p)

## Automatic Conversion

If you do not need to share data sources, it may be convenient to pass dicts, Pandas `DataFrame` or `GroupBy` objects directly to glyph methods, without explicitly creating a `ColumnDataSource`. In this case, a `ColumnDataSource` will be created automatically.

In [17]:
from bokeh.sampledata.iris import flowers as df

p = figure(plot_width=400, plot_height=400)
p.circle('petal_length', 'petal_width', source=df, color='green')
show(p)

## Transformations

In addition to being configured with names of columns from data sources, glyph properties may also be configured with transform objects that represent transformations of columns. These live in the `bokeh.transform` module. It is important to note that when using these objects, the tranformations occur *in the browser, not in Python*. 

The first transform we look at is the `cumsum` transform, which can generate a new sequence of values from a data source column by cumulatively summing the values in the column. This can be useful for pie or donut type charts as seen below.

In [18]:
from math import pi
import pandas as pd
from bokeh.palettes import Category20c
from bokeh.transform import cumsum

x = { 'United States': 157, 'United Kingdom': 93, 'Japan': 89, 'China': 63,
      'Germany': 44, 'India': 42, 'Italy': 40, 'Australia': 35, 'Brazil': 32,
      'France': 31, 'Taiwan': 31, 'Spain': 29 }

data = pd.Series(x).reset_index(name='value').rename(columns={'index':'country'})
data['color'] = Category20c[len(x)]

# represent each value as an angle = value / total * 2pi
data['angle'] = data['value']/data['value'].sum() * 2*pi

data.head()

Unnamed: 0,country,value,color,angle
0,United States,157,#3182bd,1.437988
1,United Kingdom,93,#6baed6,0.851802
2,Japan,89,#9ecae1,0.815165
3,China,63,#c6dbef,0.577027
4,Germany,44,#e6550d,0.403003


In [19]:
p = figure(plot_height=350, title="Pie Chart", toolbar_location=None,
           tools="hover", tooltips="@country: @value")

p.wedge(x=0, y=1, radius=0.4, 
        # use cumsum to cumulatively sum the values for start and end angles
        start_angle=cumsum('angle', include_zero=True), 
        end_angle=cumsum('angle'),
        line_color="white", fill_color='color', legend_field='country', 
        source=data)

p.axis.axis_label=None
p.axis.visible=False
p.grid.grid_line_color = None

show(p)

The next transform we look at is the `linear_cmap` transform, which can generate a new sequence of colors by applying a linear colormapping to a data source column.

In [20]:
from bokeh.transform import linear_cmap
# Palette referece here https://bokeh.pydata.org/en/latest/docs/reference/palettes.html

N = 4000
data = dict(x=np.random.random(size=N) * 100,
            y=np.random.random(size=N) * 100,
            r=np.random.random(size=N) * 1.5)

p = figure()

p.circle('x', 'y', radius='r', source=data, fill_alpha=0.6,
         # color map based on the x-coordinate
         color=linear_cmap('x', 'Viridis256', 0, 100))

show(p) 

Bokeh's [palette reference](https://docs.bokeh.org/en/latest/docs/reference/palettes.html)

Change the code above to use `log_cmap` and observe the results. Try changing `low` and `high` and specifying `low_color` and `high_color`.

## Deal with datetime and categorical axes

### Datetime axes
Dealing with date and time series is another common task. Bokeh has a sophisticated `DatetimeAxis` that can change the displayed ticks based on the current scale of the plot. There are some inputs for which Bokeh will automatically default to DatetimeAxis, but you can always explicitly ask for one by passing the value "datetime" to the `x_axis_type` or `y_axis_type` parameters to `figure()`. A few things of interest to look out for in this example:

- setting the width and height arguments to `figure()`
- customizing plots and other objects by assigning values to their attributes
- accessing guides and annotations with convenience Figure attributes: `legend, grid, xgrid, ygrid, axis, xaxis, yaxis`

In [31]:
# %%script echo (skipped)
# Run this only once:
from bokeh.sampledata import download
download()

Creating /Users/cassolseba/.bokeh directory
Creating /Users/cassolseba/.bokeh/data directory
Using data directory: /Users/cassolseba/.bokeh/data
Downloading: CGM.csv (1589982 bytes)
   1589982 [100.00%]
Downloading: US_Counties.zip (3171836 bytes)
   3171836 [100.00%]
Unpacking: US_Counties.csv
Downloading: us_cities.json (713565 bytes)
    713565 [100.00%]
Downloading: unemployment09.csv (253301 bytes)
    253301 [100.00%]
Downloading: AAPL.csv (166698 bytes)
    166698 [100.00%]
Downloading: FB.csv (9706 bytes)
      9706 [100.00%]
Downloading: GOOG.csv (113894 bytes)
    113894 [100.00%]
Downloading: IBM.csv (165625 bytes)
    165625 [100.00%]
Downloading: MSFT.csv (161614 bytes)
    161614 [100.00%]
Downloading: WPP2012_SA_DB03_POPULATION_QUINQUENNIAL.zip (4816256 bytes)
   4816256 [100.00%]
Unpacking: WPP2012_SA_DB03_POPULATION_QUINQUENNIAL.csv
Downloading: gapminder_fertility.csv (64346 bytes)
     64346 [100.00%]
Downloading: gapminder_population.csv (94509 bytes)
     94509 [10

In [32]:
from bokeh.sampledata.stocks import AAPL

In [33]:
AAPL.keys()

dict_keys(['date', 'open', 'high', 'low', 'close', 'volume', 'adj_close'])

In [34]:
for key in AAPL:
    print(key, AAPL[key][:5])

date ['2000-03-01', '2000-03-02', '2000-03-03', '2000-03-06', '2000-03-07']
open [118.56, 127.0, 124.87, 126.0, 126.44]
high [132.06, 127.94, 128.23, 129.13, 127.44]
low [118.5, 120.69, 120.0, 125.0, 121.12]
close [130.31, 122.0, 128.0, 125.69, 122.87]
volume [38478000, 11136800, 11565200, 7520000, 9767600]
adj_close [31.68, 29.66, 31.12, 30.56, 29.87]


In [35]:
# prepare data
aapl = np.array(AAPL['adj_close'])
aapl_dates = np.array(AAPL['date'], dtype=np.datetime64)

window_size = 30
window = np.ones(window_size)/float(window_size)
aapl_avg = np.convolve(aapl, window, 'same')

# create a new plot with a a datetime axis type
p = figure(width=800, height=350, x_axis_type="datetime")

# add renderers
p.circle(aapl_dates, aapl, size=4, color='darkgrey', 
         alpha=0.2, legend_label='close')
p.line(aapl_dates, aapl_avg, color='red', legend_label='avg')

# NEW: customize by setting attributes
p.title.text = "AAPL One-Month Average"
p.legend.location = "top_left"
p.grid.grid_line_alpha=0
p.xaxis.axis_label = 'Date'
p.yaxis.axis_label = 'Price'

p.ygrid.band_fill_color="gray"
p.ygrid.band_fill_alpha = 0.1

p.legend.click_policy="hide" # enable the click policy on the legend

# show the results
show(p)

## Categorical axes
To inform Bokeh that the `x-axis` is `categorical`, we pass this list of factors as the `x_range` argument to bokeh plotting figure:

```python
p = figure(x_range=fruits, ... )
```

Note that passing the list of factors is a convenient shorthand notation for creating a FactorRange. The equivalent explicit notation is:
```python
p = figure(x_range=FactorRange(field=fruits), ... )
```
This more explicit for is useful when you want to customize the FactorRange, e.g. by changing the range or category padding.

Next we can call `vbar` with the list of fruit name factors as the x coordinate, the bar height as the top coordinate, and optionally any width or other properties that we would like to set:
```python
p.vbar(x=fruits, top=[5, 3, 4, 2, 4, 6], width=0.9)
```

Now putting this stuff together we will see the output:

In [36]:
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 
          'Strawberries']

p = figure(x_range=fruits, plot_height=250, title="Fruit Counts",
           toolbar_location=None, tools="")

p.vbar(x=fruits, top=[5, 3, 4, 2, 4, 6], width=0.6)

p.xgrid.grid_line_color = None
p.y_range.start = 0

show(p)

Often times we may want to have bars that are shaded on some color. One way is to supply all the colors up front. This can be done by putting all the data, including the colors for each bar, in a `ColumnDataSource`. Then the name of the column containing the colors is passed to figure as the `color` (or `line_color`/`fill_color`) arguments. This is shown below:

In [37]:
from bokeh.models import ColumnDataSource
from bokeh.palettes import Spectral6

fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 
          'Grapes', 'Strawberries']
counts = [5, 3, 4, 2, 4, 6]

source = ColumnDataSource(data=dict(fruits=fruits, counts=counts, 
                                    color=Spectral6))

p = figure(x_range=fruits, y_range=(0,9), 
           plot_height=250, title="Fruit Counts",
           toolbar_location=None, tools="")

p.vbar(x='fruits', top='counts', width=0.6, 
       color='color', legend_field="fruits", source=source)

p.xgrid.grid_line_color = None
p.legend.orientation = "horizontal"
p.legend.location = "top_center"

show(p)

### Grouped

When creating bar charts, it is often desirable to visually display the data according to sub-groups. There are two basic methods that can be used, depending on your use case: using nested categorical coordinates, or applying vidual dodges.

### Nested Categories

If the coordinates of a plot range and data have two or three levels, then Bokeh will automatically group the factors on the axis, including a hierarchical tick labeling with separators between the groups. In the case of bar charts, this results in bars grouped together by the top-level factors. This is probably the most common way to achieve grouped bars, especially if you are starting from “tidy” data.

The example below shows this approach by creating a single column of coordinates that are each 2-tuples of the form (fruit, year). Accordingly, the plot groups the axes by fruit type, with a single call to `vbar`:

In [28]:
from bokeh.models import ColumnDataSource, FactorRange

fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 
          'Grapes', 'Strawberries']
years = ['2015', '2016', '2017']

data = {'fruits' : fruits,
        '2015'   : [2, 1, 4, 3, 2, 4],
        '2016'   : [5, 3, 3, 2, 4, 6],
        '2017'   : [3, 2, 4, 4, 5, 3]}

# this creates [ ("Apples", "2015"), ("Apples", "2016"), ("Apples", "2017"), ("Pears", "2015), ... ]
x = [ (fruit, year) for fruit in fruits for year in years ]
counts = sum(zip(data['2015'], data['2016'], data['2017']), ()) 
# like an hstack in numpy

source = ColumnDataSource(data=dict(x=x, counts=counts))

p = figure(x_range=FactorRange(*x), plot_height=250, 
           title="Fruit Counts by Year",
           toolbar_location=None, tools="")

p.vbar(x='x', top='counts', width=0.9, source=source)

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None

show(p)

In [29]:
counts

(2, 5, 3, 1, 3, 2, 4, 3, 4, 3, 2, 4, 2, 4, 5, 4, 6, 3)

We can also apply a color mapping, similar to the earlier example. To obtain same grouped bar plot of fruits data as above, except with the bars shaded by the year, change the `vbar` function call to use `factor_cmap` for the `fill_color`:

In [30]:
from bokeh.transform import factor_cmap
from bokeh.palettes import Spectral10

# use the palette to colormap based on the the x[1:2] values
p.vbar(x='x', top='counts', width=0.9, source=source, 
       line_color="white",
       fill_color=factor_cmap('x', palette=Spectral10, 
                              factors=years, start=1, end=2))
show(p)

Another method for achieving grouped bars is to explicitly specify a visual displacement for the bars. Such a visual offset is also referred to as a **dodge**.

In this scenario, our data is not “tidy”. Instead a single table with rows indexed by factors (fruit, year), we have separate series for each year. We can plot all the year series using separate calls to vbar but since every bar in each group has the same fruit factor, the bars would overlap visually. We can prevent this overlap and distinguish the bars visually by using the `dodge()` function to provide an offset for each different call to vbar:

In [31]:
from bokeh.core.properties import value
from bokeh.models import ColumnDataSource
from bokeh.transform import dodge

fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ['2015', '2016', '2017']

data = {'fruits' : fruits,
        '2015'   : [2, 1, 4, 3, 2, 4],
        '2016'   : [5, 3, 3, 2, 4, 6],
        '2017'   : [3, 2, 4, 4, 5, 3]}

source = ColumnDataSource(data=data)

p = figure(x_range=fruits, y_range=(0, 10), plot_height=250, 
           title="Fruit Counts by Year",
           toolbar_location=None, tools="")

p.vbar(x=dodge('fruits', -0.25, range=p.x_range), 
       top='2015', width=0.2, source=source,
       color="#c9d9d3", legend_label="2015")

p.vbar(x=dodge('fruits',  0.0,  range=p.x_range), 
       top='2016', width=0.2, source=source,
       color="#718dbf", legend_label="2016")

p.vbar(x=dodge('fruits',  0.25, range=p.x_range), 
       top='2017', width=0.2, source=source,
       color="#e84d60", legend_label="2017")

p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"

show(p)

Another common operation on bar charts is to stack bars on top of one another. Bokeh makes this easy to do with the specialized `hbar_stack()` and `vbar_stack()` functions. The example below shows the fruits data from above, but with the bars for each fruit type stacked instead of grouped:

In [32]:
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ["2015", "2016", "2017"]
colors = ["#34A212", "#D3F543", "#98F5FF"]

data = {'fruits' : fruits,
        '2015'   : [2, 1, 4, 3, 2, 4],
        '2016'   : [5, 3, 4, 2, 4, 6],
        '2017'   : [3, 2, 4, 4, 5, 3]}

source = ColumnDataSource(data=data)

p = figure(x_range=fruits, plot_height=250, title="Fruit Counts by Year",
           toolbar_location=None, tools="")

p.vbar_stack(years, x='fruits', width=0.9, color=colors, source=source,
             legend_label=years)

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.axis.minor_tick_line_color = None
p.outline_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"

show(p)

Sometimes we may want to stack bars that have both positive and negative extents. The example below shows how it is possible to create such a stacked bar chart that is split by positive and negative values:

In [33]:
from bokeh.palettes import GnBu3, OrRd3

fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 
          'Grapes', 'Strawberries']
years = ["2015", "2016", "2017"]

exports = {'fruits' : fruits,
           '2015'   : [2, 1, 4, 3, 2, 4],
           '2016'   : [5, 3, 4, 2, 4, 6],
           '2017'   : [3, 2, 4, 4, 5, 3]}
imports = {'fruits' : fruits,
           '2015'   : [-1, 0, -1, -3, -2, -1],
           '2016'   : [-2, -1, -3, -1, -2, -2],
           '2017'   : [-1, -2, -1, 0, -2, -2]}

p = figure(y_range=fruits, plot_height=250, 
           x_range=(-16, 16), 
           title="Fruit import/export, by year",
           toolbar_location=None)

p.hbar_stack(years, y='fruits', height=0.9, color=GnBu3, 
             source=ColumnDataSource(exports),
             legend_label=["%s exports" % x for x in years])

p.hbar_stack(years, y='fruits', height=0.9, color=OrRd3, 
             source=ColumnDataSource(imports),
             legend_label=["%s imports" % x for x in years])

p.y_range.range_padding = 0.1
p.ygrid.grid_line_color = None
p.legend.location = "top_left"
p.axis.minor_tick_line_color = None
p.outline_line_color = None

show(p)

## Mixing Categorical Levels

If you have created a range with nested categories as above, it is possible to plot glyphs using only the "outer" categories, if desired. The plot below shows monthly values grouped by quarter as bars. The data for these are in the famliar format:

    factors = [("Q1", "jan"), ("Q1", "feb"), ("Q1", "mar"), ....]

The plot also overlays a line representing average quarterly values, and this is accomplished by using only the "quarter" part of each nexted category:

    p.line(x=["Q1", "Q2", "Q3", "Q4"], y=....)

In [34]:
factors = [("Q1", "jan"), ("Q1", "feb"), ("Q1", "mar"),
           ("Q2", "apr"), ("Q2", "may"), ("Q2", "jun"),
           ("Q3", "jul"), ("Q3", "aug"), ("Q3", "sep"),
           ("Q4", "oct"), ("Q4", "nov"), ("Q4", "dec")]

p = figure(x_range=FactorRange(*factors), plot_height=250)

x = [ 10, 12, 16, 9, 10, 8, 12, 13, 14, 14, 12, 16 ]
p.vbar(x=factors, top=x, width=0.9, alpha=0.5)

qs, aves = ["Q1", "Q2", "Q3", "Q4"], [12, 9, 13, 14]
p.line(x=qs, y=aves, color="red", line_width=3)
p.circle(x=qs, y=aves, line_color="red", fill_color="white", size=10)

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None

show(p)

## Using Pandas `GroupBy`

We may want to make charts based on the results of "group by" operations. Bokeh can utilize Pandas `GroupBy` objects directly to make this simpler. Let's take a look at how Bokeh deals with `GroupBy` objects by examining the "mpg" data set.

In [35]:
from bokeh.sampledata.autompg import autompg_clean as df

df.cyl = df.cyl.astype(str)
df.head()

Unnamed: 0,mpg,cyl,displ,hp,weight,accel,yr,origin,name,mfr
0,18.0,8,307.0,130,3504,12.0,70,North America,chevrolet chevelle malibu,chevrolet
1,15.0,8,350.0,165,3693,11.5,70,North America,buick skylark 320,buick
2,18.0,8,318.0,150,3436,11.0,70,North America,plymouth satellite,plymouth
3,16.0,8,304.0,150,3433,12.0,70,North America,amc rebel sst,amc
4,17.0,8,302.0,140,3449,10.5,70,North America,ford torino,ford


Suppose we would like to display some values grouped according to `"cyl"`. If we create `df.groupby(('cyl'))` then call `group.describe()` we can see that Pandas automatically computes various statistics for each group. 

In [36]:
group = df.groupby(('cyl'))
group.describe()

Unnamed: 0_level_0,mpg,mpg,mpg,mpg,mpg,mpg,mpg,mpg,displ,displ,...,accel,accel,yr,yr,yr,yr,yr,yr,yr,yr
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,...,75%,max,count,mean,std,min,25%,50%,75%,max
cyl,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
3,4.0,20.55,2.564501,18.0,18.75,20.25,22.05,23.7,4.0,72.5,...,13.5,13.5,4.0,75.5,3.696846,72.0,72.75,75.0,77.75,80.0
4,199.0,29.28392,5.670546,18.0,25.0,28.4,32.95,46.6,199.0,109.670854,...,18.0,24.8,199.0,77.030151,3.737484,70.0,74.0,77.0,80.0,82.0
5,3.0,27.366667,8.228204,20.3,22.85,25.4,30.9,36.4,3.0,145.0,...,20.0,20.1,3.0,79.0,1.0,78.0,78.5,79.0,79.5,80.0
6,83.0,19.973494,3.828809,15.0,18.0,19.0,21.0,38.0,83.0,218.361446,...,17.6,21.0,83.0,75.951807,3.264381,70.0,74.0,76.0,78.0,82.0
8,103.0,14.963107,2.836284,9.0,13.0,14.0,16.0,26.6,103.0,345.009709,...,14.0,22.2,103.0,73.902913,3.021214,70.0,72.0,73.0,76.0,81.0


Bokeh allows us to create a `ColumnDataSource` directly from Pandas `GroupBy` objects, and when this happens, the data source is automatically filled with the summary values from `group.describe()`. Observe the column names below, which correspond to the output above.

In [37]:
source = ColumnDataSource(group)

",".join(source.column_names)

'cyl,mpg_count,mpg_mean,mpg_std,mpg_min,mpg_25%,mpg_50%,mpg_75%,mpg_max,displ_count,displ_mean,displ_std,displ_min,displ_25%,displ_50%,displ_75%,displ_max,hp_count,hp_mean,hp_std,hp_min,hp_25%,hp_50%,hp_75%,hp_max,weight_count,weight_mean,weight_std,weight_min,weight_25%,weight_50%,weight_75%,weight_max,accel_count,accel_mean,accel_std,accel_min,accel_25%,accel_50%,accel_75%,accel_max,yr_count,yr_mean,yr_std,yr_min,yr_25%,yr_50%,yr_75%,yr_max'

Knowing these column names, we can immediately create bar charts based on Pandas `GroupBy` objects. The example below plots the aveage MPG per cylinder, i.e. columns `"mpg_mean"` vs `"cyl"`

In [38]:
from bokeh.palettes import Spectral5

cyl_cmap = factor_cmap('cyl', palette=Spectral5, factors=sorted(df.cyl.unique()))

p = figure(plot_height=350, x_range=group)
p.vbar(x='cyl', top='mpg_mean', width=1, line_color="white", 
       fill_color=cyl_cmap, source=source)

p.xgrid.grid_line_color = None
p.xaxis.axis_label = "number of cylinders"
p.yaxis.axis_label = "Mean MPG"
p.y_range.start = 0

show(p)

## Up next

In the next lecture we'll cover annotation tools, interaction with widgets and JavaScript callbacks, and mapping of geographical data.