
# Plotting with Bokeh 

This tutorial should help you getting started with the visualization framework Bokeh in python. 
There are several good reasons to use bokeh, first it provides a rich API, meaning that there are many visualizations 
types supported and it exposes a lot attributes to the user to customize the visualization without becoming "low level". Furthermore it can be called directly from python but provides visualizations for the web. This means on one hand that you don't have to switch to another web framework to bring your interactive visualizations to the public, on the otherhand this allows you to program the complete data visualization process simply in python. 

We will start off with the very basics, how to create a bokeh plot, how to link multiple plots, how to handle categorical data with the framework. Later, we will look at the bokeh server, which allows you to handle interactions of the user and compute stuff on some event raised by the user.

Every section contains an example, followed by an exercise to solve.

## 1. Basics

Look at the code in the following code block. After the import statements typical to python, we first communicate to Bokeh, that we work in a jupyter notebook environment, using ``` output_notebook()```, then a simple lineplot is created.  The most important thing when it comes to Bokeh is the following ever occuring pattern:
*Every* plot in Bokeh should consists of three basic elements:

- A **ColumnDataSource**, which holds the data to be plotted
- A **Figure**, which resembles the canvas everything is rendered on
- Multiple **Glyphs** which represent our data, or a part of it

For more information about the basics, you may want to look at the
Bokeh Getting Started page: https://docs.bokeh.org/en/latest/docs/user_guide/quickstart.html

In [1]:
import numpy as np

from bokeh.plotting import figure, output_file, show, output_notebook
from bokeh.models import ColumnDataSource

# Since this is a jupyter notebook, we specify a jupyter output mode
output_notebook()

# Uncomment this line to output the result into a local file
output_file("01_basic_plotting.html")

# Now, we first create a ColumnDataSource to contain our data.
# Make sure that the number of rows in each entry are always of the same length.
source = ColumnDataSource(data = dict(
    xs = np.arange(0,10),
    ys = np.random.randint(0,100, 10)
))

# Second we create a figure, where we want to display our data.
p = figure(title="01 - Basic Plotting with Bokeh", x_axis_label='x', y_axis_label='y', height=300)

# Finally we add a line glyph to represent our data.
# The data can given, by referencing the column in a ColumnDataSource.
p.line("xs", "ys", legend_label="Random Value.", line_width=2, source=source)

# Make sure the plot fills the complete width
p.sizing_mode = "stretch_width"

show(p)

As you can see, few lines of code are sufficient to produce this simple plot. 

- First we define a ColumnDataSource to hold our data. In an essence it simply wraps a Python dictionary, where the key refers to the *column name* and the associated list contains the data to be visualized. (We will later see that passing a pandas.DataFrame is just as simple as this example.)

- Next, we define a canvas to be drawn on using the `figure(...)` function. We now have data in a suitable format for Bokeh and a place to put our visualization. Thus we only have to draw our data. 

- This is done using `Figure.line(...)` which draws our data by a given glyph (in this case a line glyph). Each glyph in Bokeh has certain properties. For example, a line has multiple *x* and *y* coordinates, a circle may have an additional *radius*, or a bar of a barchart may have a *height*, *width*, *right* and *left* property. The glyphs can now be connected to a *ColumnDataSource* and column names can be assigned to these properties. In the above example, we connect `xs` to the x coordinates of the line plot and `ys` to the y coordinates of the line plot.


## *Exercise 1: Creating a basic plot*

Try to implement such a visualization yourself as a warm-up exercise. 

Do the following:

1. Implement a visualization with a ColumnDataSource, Figure and a *circle* glyph using Bokeh
2. In the final visualization 10 circles should be arrange in a diagonal line from (x,y) =(0, 0) to (x, y) = (1, 1).
3. Implement a third column "size" into the ColumnDataSource with a random value between 0 and 20 and bind it to the circle.size attribute of the Bokeh glyph.

A full list of all glyphs and their attributes can be found on the Bokeh docs page here:
https://docs.bokeh.org/en/latest/docs/user_guide/plotting.html#userguide-plotting

The result should look similar to this: 
![Solution E1](images/sol_e1.png)

In [12]:
# (10 min)

import numpy as np

from bokeh.plotting import figure, output_file, show, output_notebook
from bokeh.models import ColumnDataSource

# Since this is a jupyter notebook, we specify a jupyter output mode
output_notebook()
output_file("02_basic_plotting.html")
#######################
source = ColumnDataSource(data = dict(
    xs = np.arange(0,10),
    ys = np.arange(0,1, 0.1),
    size = np.random.random(10)*20
))

# Second we create a figure, where we want to display our data.
p = figure(title="Exercise 1", x_axis_label='x', y_axis_label='y', height=300)

# Finally we add a line glyph to represent our data.
# The data can given, by referencing the column in a ColumnDataSource.
p.circle("xs", "ys", size = "size", source = "source")

#######################

# Make sure the plot fills the complete width
p.sizing_mode = "stretch_width"
show(p)

AttributeError: 'ValueError' object has no attribute 'message'

## 2. Multiple Glyphs
Of course, Bokeh does not limit you to only use one glyph or ColumnDataSource at the time. 
Adding multiple glyphs is simply a matter of calling the different glyph functions subsequently, and binding the corresponding columns to the attributes of the respective glyph.

Bokeh provides a large collection of such glyphs, ranging from circles and lines up to images. Every glyph has different attributes, so make sure to check the Bokeh docs. A full guide on all glyphs can be found here: https://docs.bokeh.org/en/latest/docs/user_guide/plotting.html#userguide-plotting

In [None]:
import numpy as np

from bokeh.plotting import figure, output_file, show, output_notebook
from bokeh.models import ColumnDataSource
from bokeh.colors import RGB

# Since this is a jupyter notebook, we specify a jupyter output mode
output_notebook()

# Uncomment this line to output the result into a local file
# output_file("01_basic_plotting.html")

# Now, we first create a ColumnDataSource to contain our data.
# Make sure that the number of rows in each entry are always of the same length.
source = ColumnDataSource(data = dict(
    xs = np.arange(0,30),
    ys = np.random.randint(0,100, 30),
))

# Second we create a figure, where we want to display our data.
p = figure(title="02 - Multiple Glyphs", x_axis_label='x', y_axis_label='y', height=300)

# We then create two glyphs, a line and a circle glyph with the same x and y coordinates.
p.line("xs", "ys", legend_label="Random Value 2.", line_width=2, color=RGB(113,125,165), source=source)
p.circle("xs", "ys", size=5, line_width=2, color="white", line_color="black", fill_alpha=0.5, source=source)

# Make sure the plot fills the complete width
p.sizing_mode = "stretch_width"

show(p)

## *Exercise 2: Multiple Glyphs*

Create a plot with two different glyphs. 

1. Create a ColumnDataSource containing histogram of 10'000 samples drawn from a gaussian distribution. The histogram should have 20 bins.
    - Tipp: Look at the numpy.random.normal and numpy.histogram functions. 
2. Draw the histogram with a vbar glyph and a line glyph.

The result should look similar to this: 
![Solution E1](images/sol_e2.png)

In [None]:
# (15 min)

import numpy as np

from bokeh.plotting import figure, output_file, show, output_notebook
from bokeh.models import ColumnDataSource
from bokeh.colors import RGB

# Since this is a jupyter notebook, we specify a jupyter output mode
output_notebook()

#######################
# HERE COMES YOUR CODE#
#######################

# Make sure the plot fills the complete width
p.sizing_mode = "stretch_width"
show(p)

## 3. Multiple Figures in one View and linked panning

In the previous examples we have seen how Bokeh allows you to create a single figure, however, Bokeh allows you to create several figures next to each other using the *layout* function. Look at the following example. 

Three figures are created and passed into the layout function. The `layout` function accepts a nested array of figures (and widgets as we will see later), which Bokeh will transform to a layout. For simpler layout, Bokeh also provides `column`, `row` and the `gridplot` function. However, you can essentially stick to the `layout` function since it is the most versatile.

Note that these plots further provide the first type of interaction, namely, linked panning. When you move the plot on the top-left, the figure top-right will move both axis accordingly, the lower plot on the other hand will only move in the x-axis.

This is achieved by passing the x_range and y_range of an already created figure into the new figure. The then share the specific range. 


In [None]:
from bokeh.plotting import figure, output_file, show, output_notebook
from bokeh.models import ColumnDataSource
from bokeh.layouts import layout
from bokeh.plotting import figure

# Since this is a jupyter notebook, we specify a jupyter output mode
output_notebook()

source = ColumnDataSource(dict(
    x = list(range(11)),
    y0 = list(range(11)),
    y1 = [10-xx for xx in list(range(11))],
    y2 = [abs(xx-5) for xx in list(range(11))]
))


# create a new plot
s1 = figure(plot_width=250, plot_height=250, title=None)
s1.circle("x", "y0", size=10, color="navy", alpha=0.5, source=source)

# create a new plot and share both ranges
s2 = figure(plot_width=250, plot_height=250, x_range=s1.x_range, y_range=s1.y_range, title=None)
s2.triangle("x", "y1", size=10, color="firebrick", alpha=0.5, source=source)

# create a new plot and share only one range
s3 = figure(plot_width=250, plot_height=250, x_range=s1.x_range, title=None)
s3.square("x", "y2", size=10, color="olive", alpha=0.5, source=source)

p = layout(
    [[s1, s2], 
     [s3]]
)

# show the results
show(p)

## *Exercise 3: Multiple Figures in one View and linked panning*

Try it yourself. Extend the above example by a 4th figure and put it into the bottom-right. 

1. Add a new column to the ColumnDataSource and fill it with random y-coordinates between 0 and 5
2. Put it into the currently empty bottom-right of the view
3. Link its x-axis to the top-right and it's y-axis to the bottom-left figure
4. Use the sizing_mode parameter of the layout function, to make the view filling the whole width of the area. 

Tipp: Read here about sizing_modes in bokeh https://docs.bokeh.org/en/latest/docs/user_guide/layout.html?highlight=sizing%20mode#sizing-mode

In [None]:
# (10 min)
import numpy as np

from bokeh.plotting import figure, output_file, show, output_notebook
from bokeh.models import ColumnDataSource
from bokeh.layouts import layout
from bokeh.plotting import figure

# Since this is a jupyter notebook, we specify a jupyter output mode
output_notebook()

#TODO : Add another y column with random values between 0 and 5
source = ColumnDataSource(dict(
    x = list(range(11)),
    y0 = list(range(11)),
    y1 = [10-xx for xx in list(range(11))],
    y2 = [abs(xx-5) for xx in list(range(11))]
))


# create a new plot
s1 = figure(plot_width=250, plot_height=250, title=None)
s1.circle("x", "y0", size=10, color="navy", alpha=0.5, source=source)

# create a new plot and share both ranges
s2 = figure(plot_width=250, plot_height=250, x_range=s1.x_range, y_range=s1.y_range, title=None)
s2.triangle("x", "y1", size=10, color="firebrick", alpha=0.5, source=source)

# create a new plot and share only one range
s3 = figure(plot_width=250, plot_height=250, x_range=s1.x_range, title=None)
s3.square("x", "y2", size=10, color="olive", alpha=0.5, source=source)

#TODO: Add another figure with a glyph of your choice (which has not been used yet) 
# and link the axes, as described in the exercise description. 


#TODO: put it into the layout and adapt the sizing mode as described in the exercise.
p = layout(
    [[s1, s2], 
     [s3]]
)

# show the results
show(p)

## 4. Handling Categorical Data

In data science, we often have to deal with categorical data (also refered to as nominal data). In such cases, one axis of the figure indicates a category. The following example is based on the swiss governments evaluation of vehicles currently licensed in Switzerland. 

We first import the necessary Bokeh modules, as previously. Since it is stored in a CSV file, we first read the dataset using pandas. 

As you can see, the dataframe consists of two columns *VehicleGroup* and *Count*, the first one represents a category, we now want to again perform a basic barplot which compares the amount of vehicles per category. The nice thing about Bokeh is, that it integrates well with pandas. To cast a pandas.DataFrame into a Bokeh.ColumnDataSource, simply pass it into the ColumnDataSource constructor.

In [3]:
import numpy as np
import pandas as pd 

from bokeh.plotting import figure, output_file, show, output_notebook
from bokeh.models import ColumnDataSource

# Since this is a jupyter notebook, we specify a jupyter output mode
output_notebook()

# Read the CSV file and take a look at it 
df = pd.read_csv("data/swiss_vehicles_simple.csv")
print(df)

# Create relative amounts
df['RelativeAmount'] = df['Count'].divide(df['Count'].sum()) * 100

# Cast a pandas.DataFrame to a Bokeh.ColumnDataSource
source = ColumnDataSource(df)

# We create the figure and set the x_range to be categorical by passing the respective x values to it. 
# You can access a column in the ColumnDataSource using the .data dict. The order you pass the x values 
# will define the order of the glyphs. 
p = figure(title="01 - Basic Plotting with Bokeh", 
           x_axis_label='Vehicle group', 
           y_axis_label='percentage',
           x_range=source.data['VehicleGroup'], y_range=[0, 100.0])

# Finally we add a line glyph to represent our data.
# The data can given, by referencing the column in a ColumnDataSource.
p.vbar(x="VehicleGroup", top="RelativeAmount", width=0.8, source=source)

# Make sure the plot fills the complete width
p.sizing_mode = "stretch_width"

show(p)


FileNotFoundError: [Errno 2] No such file or directory: 'data/swiss_vehicles_simple.csv'

## *Exercise 4: Handling Categorical Data*

The above plot works somewhat, however there are several things not really well done. First, it would be nice to have the bars sorted in a descending manner from left to right, such that the reader can directly see the order within the dataset. Second, many of the bars are actually very small because of the category "passenger cars" being very large, and are therefore hard to compare. Best would be to put a text with the corresponding percentage above each bar directly. 

1. Sort the categories by value in a descending manner from left to right. 
2. Add text glyphs on each bar with the corresponding relative amount and a % sign. 

The result should look like this:
![Solution E3](images/sol_e3.png)

In [None]:
# (10 min)

import numpy as np
import pandas as pd 

from bokeh.plotting import figure, output_file, show, output_notebook
from bokeh.models import ColumnDataSource

# Since this is a jupyter notebook, we specify a jupyter output mode
output_notebook()

# Read the CSV file and take a look at it 
df = pd.read_csv("data/swiss_vehicles_simple.csv")
print(df)

# Create relative amounts
df['RelativeAmount'] = df['Count'].divide(df['Count'].sum()) * 100

# Cast a pandas.DataFrame to a Bokeh.ColumnDataSource
source = ColumnDataSource(df)

# We create the figure and set the x_range to be categorical by passing the respective x values to it. 
# You can access a column in the ColumnDataSource using the .data dict. The order you pass the x values 
# will define the order of the glyphs. 
p = figure(title="01 - Basic Plotting with Bokeh", 
           x_axis_label='Vehicle group', 
           y_axis_label='percentage',
           x_range=source.data['VehicleGroup'], y_range=[0, 100.0])

# Finally we add a line glyph to represent our data.
# The data can given, by referencing the column in a ColumnDataSource.
p.vbar(x="VehicleGroup", top="RelativeAmount", width=0.8, source=source)

# Make sure the plot fills the complete width
p.sizing_mode = "stretch_width"

show(p)


# Interactive Visualization with Bokeh
## 5. Introduction

Until now, the visualizations have been fairly static. However, since this is an *interactive* visualization course, of course we want to look at Bokehs features in that regard. Take a look at the following example. As you can see, additionally to the figure conveying the temperature, an additional slider is provided to filter the signal. Such functionality is very typical in visualization tasks, because data signals tend to be noisy and hard to read without the approiate tools given to the user. 

Bokeh does this by employing a server-client architecture. You visualization thus contains two parts, the client is the visualization in browser, whenever the user interacts with the visualization by *selecting*, *clicking* or *changing* something, the server is informed and a callback allows you to define what to do with the callback. In the example below, whenever the value of the slider changes, `callback` gets called on the server, and the data is updated with a filtered one according to the slider value.

Note that such server applications need to live in a function in jupyter. We call it conformly `bkapp`. 

Bokeh provides a lot of interactions "out-of-the-box", this means, you simply have to configure them without programming them manually. A full collection with well described samples can be found in the Bokeh Docs here: https://docs.bokeh.org/en/latest/docs/user_guide/interaction.html. 

We will only cover custom functions here, where Bokeh does not provide a solution internally. 

In [None]:
from bokeh.layouts import layout
from bokeh.models import ColumnDataSource, Slider
from bokeh.plotting import figure
from bokeh.themes import Theme
from bokeh.io import show, output_notebook

from bokeh.sampledata.sea_surface_temperature import sea_surface_temperature

output_notebook()

def bkapp(doc):
    df = sea_surface_temperature.copy()
    source = ColumnDataSource(data=df)

    plot = figure(x_axis_type='datetime', y_range=(0, 25),
                  y_axis_label='Temperature (Celsius)',
                  title="Sea Surface Temperature at 43.18, -70.43")
    plot.line('time', 'temperature', source=source)

    def callback(attr, old, new):
        if new == 0:
            data = df
        else:
            data = df.rolling('{0}D'.format(new)).mean()
        source.data = ColumnDataSource.from_df(data)

    slider = Slider(start=0, end=30, value=0, step=1, title="Smoothing by N Days")
    slider.on_change('value', callback)
    
    lt = layout([slider, plot], sizing_mode="stretch_width")
    
    doc.add_root(lt)

show(bkapp)

## 6. Changing the data to visualize

One classic application of interactive visualization is that the user can filter the complete dataset by certain values.
In the following example, the number of vehicles per 1000 citizens is display in a per-canton fashion, the user can use a dropdown to select the canton. 

### **Tremendously Important** 

Note that whenever you change the data of the visualization, don't simply change one value, but change the complete data. If you don't follow this advice, Bokeh will definitely, but unexpectedly, turn to the Dark Side and cast spells full of dark magic and undefined behaviour against you. 

**So don't ever do this**: 

```python
source.data['y'][10] = 100
```

**or this**

```python
new_data = [0,1,2,3,4,5] # the new data
source.data['y'] = new_data
```

**Instead do this:**

```python
# Update your data 
xs = source.data['xs']
ys = update_my_data_function()

# Replace the complete data field, and you are safe!
source.data = dict(
    xs = xs, 
    ys = ys
)```


In [None]:
import numpy as np
import pandas as pd 

from bokeh.layouts import layout
from bokeh.models import ColumnDataSource, Select
from bokeh.plotting import figure
from bokeh.themes import Theme
from bokeh.io import show, output_notebook

from bokeh.sampledata.sea_surface_temperature import sea_surface_temperature


output_notebook()

def bkapp(doc):
    # Load the data
    df = pd.read_csv("data/swiss_vehicles_vehicles_per_capita.csv")
    
    # Extract all Cantons
    headers = df.columns.tolist()
    headers.remove("Year")
    headers.sort()
    
    # Compute the mean for each year, again we need to exclude the year. 
    df['mean'] = df.loc[:, df.columns != 'Year'].mean(axis=1)
    
    # Create a ColumnDataSource, values will always contain the current selection. 
    source = ColumnDataSource(dict(
        year = df['Year'], 
        values = df['Thurgau'],
        mean = df['mean']
    ))
    
    # Let's create the Figure
    plot = figure(y_axis_label='Vehicels per 1000 citizen',
                  title="Vehicels per 1000 citizen of Switzerland", y_range=[200,700])
    
    # The line with the selectable values
    plot.line('year', 'values', source=source, legend_label="Current Selection")
    # The line with the mean values
    plot.line('year', 'mean', source=source, color="gray", legend_label="Mean", line_alpha=0.5)
    
    # This is the callback which is called as soon as the user selects a new canton. 
    # the new parameter contains the new selected value
    def callback(attr, old, new):
        # We simply get the data for that canton, and replace the complete dict
        new_data = dict(
            year = df['Year'], 
            values = df[new],
            mean = df['mean']
        )
        source.data = new_data
    
    # We create the dropdown menu with the different cantons
    dropdown = Select(value="Thurgau", options=headers, title="Select Canton")
    
    # And connect the on_change event to our callback function
    dropdown.on_change('value', callback)
    
    lt = layout([dropdown, plot], sizing_mode="stretch_width")
    
    doc.add_root(lt)

show(bkapp)

## *Exercise 6: Changing the data to visualize*

Let's add a range slider to set the year range to the existing visualization. 
To do so, you want to look at Bokeh's RangeSlider. 

1. Add a range slider to the view and set the range to be between 1980 and 2020. 
2. Update the callback function such that whenever the range is changed the visualization gets updated. 

**Hint 1**: In Situations where there are multiple settings affecting the plot, the simplest solution is typically have a one `update` method which is called for each widget callback. You can retrieve the current value of any Widget in Bokeh, by reading my_widget.value. E.g.
`slider.value` retrieves a number, or `dropdown.value` contains the current canton in the dropdown.

**Hint 2**: Boolean search helps you to filter pandas.DataFrames for a certain value. 
E.g `df[df['Year'] >= 1990]` will return all rows with a year larger than 1990. When chaining multiple conditions, 
make sure to use the actual boolean operators `&  |` and not the python truth value comparators `and or` and put each condition in brackets: 
```python 
    df[(df['Year'] <= 1990) | df['Year'] >= 2015)]
```


In [None]:
# (15 min)
import numpy as np
import pandas as pd 

from bokeh.layouts import layout
from bokeh.models import ColumnDataSource, Select, RangeSlider
from bokeh.plotting import figure
from bokeh.themes import Theme
from bokeh.io import show, output_notebook

from bokeh.sampledata.sea_surface_temperature import sea_surface_temperature

output_notebook()

def bkapp(doc):
    # Load the data
    df = pd.read_csv("data/swiss_vehicles_vehicles_per_capita.csv")
    
    # Extract all Cantons
    headers = df.columns.tolist()
    headers.remove("Year")
    headers.sort()

    # Compute the mean for each year, again we need to exclude the year. 
    df['mean'] = df.loc[:, df.columns != 'Year'].mean(axis=1)
    
    # Create a ColumnDataSource, values will always contain the current selection. 
    source = ColumnDataSource(dict(
        year = df['Year'], 
        values = df['Thurgau'],
        mean = df['mean']
    ))
    
    # Let's create the Figure
    plot = figure(y_axis_label='Vehicels per 1000 citizen',
                  title="Vehicels per 1000 citizen of Switzerland", y_range=[200,700])
    
    # The line with the selectable values
    plot.line('year', 'values', source=source, legend_label="Current Selection")
    # The line with the mean values
    plot.line('year', 'mean', source=source, color="gray", legend_label="Mean", line_alpha=0.5)
    
    def update(attr, old, new):
        # TODO setup the new update callback
        # Fetch the values directly from the Widgets
        # Remember to not let Bokeh turn to the dark side! 
        pass
        
    # We create the dropdown menu with the different cantons
    dropdown = Select(value="Thurgau", options=headers, title="Select Canton")
    dropdown.on_change('value', update)
    
    # TODO: Add a RangeSlider and connect it to the update function 
    range_slider = ...
    
    lt = layout([dropdown, 
                 plot], sizing_mode="stretch_width")
    
    doc.add_root(lt)

show(bkapp)


## 7. Working with selections

Another important aspect of interactive data visualization are selections. Bokeh provides several tools to support selections, which are documented here: https://docs.bokeh.org/en/latest/docs/user_guide/tools.html#lassoselecttool

Most importantly, plots and glyphs which share the same ColumnDataSource automatically share their selection, as it is a property of the ColumnDataSource. Look at the example in below, a simple model which computes the probability to unleash Bokeh's dark arts based on expertise, attentivity and number of neighbours two figures are displayed, which share the ColumnDataSource. Using the LassoTool, one can select a number of points in one figure, and they are also selected on the other figure. 

Further note, that when the selected values change, the indices (rows) of the selected datapoints in the ColumnDataSource are printed. 

In [None]:
from random import choices
import pandas as pd 

from bokeh.layouts import layout
from bokeh.models import ColumnDataSource, Select, RangeSlider
from bokeh.plotting import figure
from bokeh.themes import Theme
from bokeh.io import show, output_notebook

from bokeh.sampledata.sea_surface_temperature import sea_surface_temperature

classes = ["did it correctly", "messed with the dark arts"]
def dark_arts_affine(expertise, attentivity, neighbours):
    prob = expertise * attentivity + (neighbours / 4 * 0.5)
    return prob, choices(classes, [prob, 1 - prob])

output_notebook()

def bkapp(doc):
    
    # Create a ColumnDataSource, values will always contain the current selection. 
    expertise = np.random.normal(0.5, 0.2, 100)
    attentivity = np.random.normal(0.5, 0.1, 100) 
    neighbours = np.random.randint(0, 4, 100) 
    probability = []
    label = []
    
    for i in range(100):
        c, l = dark_arts_affine(expertise[i],attentivity[i], neighbours[i])
        probability.append(c)
        label.append(l)
               
    source = ColumnDataSource(dict(
        expertise = expertise ,
        attentivity = attentivity, 
        neighbours = neighbours,
        probability=probability,
        label = label
    ))
    
    
    # Let's create the Figure and add a Lasso Tool
    plot1 = figure(y_axis_label='Attentivity', 
                  x_axis_label='Expertise',
                  title="Expertise and Attentivity", 
                   tools=['save', 'lasso_select'])
    
    # The line with the selectable values
    plot1.circle('expertise', 'attentivity', source=source, legend_label="Current Selection")
    
    
        # Let's create the Figure
    plot2 = figure(y_axis_label='Probability', 
                  x_axis_label='Neighbours',
                  title="Probability and Neighbours",
                  tools=['save', 'lasso_select'])
    
    # The line with the selectable values
    plot2.circle('neighbours', 'probability', source=source, legend_label="Current Selection")
    
    
    def update(attr, old, new):
        print(new)
    source.selected.on_change("indices", update)

    
    lt = layout([[plot1, plot2]], 
                sizing_mode="stretch_width")
    
    doc.add_root(lt)

show(bkapp)

## *Exercise 7: Changing the data to visualize*
In this final exercise, the goal is to implemented the callback from the above example and update a third plot and connect it the selection update of the upper two plots. The goal is, that when the user selects a subset of the total dataset in the upper figures, the lower one shows how many of the students have solved the Bokeh "black magic* problem correctly. 

1. Implement a second ColumnDataSource which holds the classes and the frequencies of the two classes
2. Implement a second Figure displays this data in a horizontal barchart (as shown in the image below) 
3. Implement the update function which computes the frequencies once the user performs a selection.

The result should look like this:
![Solution E6](images/sol_e6.png)


In [None]:
# (20 min)

from random import choices
import pandas as pd 

from bokeh.layouts import layout
from bokeh.models import ColumnDataSource, Select, RangeSlider
from bokeh.plotting import figure
from bokeh.themes import Theme
from bokeh.io import show, output_notebook

classes = ["did it correctly", "messed with the dark arts"]
def dark_arts_affine(expertise, attentivity, neighbours):
    prob = expertise * attentivity + (neighbours / 4 * 0.5)
    return prob, choices(classes, [prob, 1 - prob])[0]

output_notebook()

def bkapp(doc):
    
    # Create a ColumnDataSource, values will always contain the current selection. 
    expertise = np.random.normal(0.5, 0.2, 100)
    attentivity = np.random.normal(0.5, 0.1, 100) 
    neighbours = np.random.randint(0, 4, 100) 
    probability = []
    label = []
    
    for i in range(100):
        c, l = dark_arts_affine(expertise[i],attentivity[i], neighbours[i])
        probability.append(c)
        label.append(l)
               
    source = ColumnDataSource(dict(
        expertise = expertise ,
        attentivity = attentivity, 
        neighbours = neighbours,
        probability=probability,
        label = label
    ))
    
    
    # Let's create the Figure and add a Lasso Tool
    plot1 = figure(y_axis_label='Attentivity', 
                  x_axis_label='Expertise',
                  title="Expertise and Attentivity", 
                   tools=['save', 'lasso_select'])
    
    # The line with the selectable values
    plot1.circle('expertise', 'attentivity', source=source, legend_label="Current Selection")
    
    
        # Let's create the Figure
    plot2 = figure(y_axis_label='Probability', 
                  x_axis_label='Neighbours',
                  title="Probability and Neighbours",
                  tools=['save', 'lasso_select'])
    
    # The line with the selectable values
    plot2.circle('neighbours', 'probability', source=source, legend_label="Current Selection")
    
    
    source2 = # TODO: Implement a second ColumnDataSource which holds the data for the figure 2#
    
    plot3 = # TODO: Here comes figure 2#
    
    def update(attr, old, new):
        selected_rows = new
        if len(selected_rows) == 0:
            return
        # TODO # 
        # Write an update function which computes the frequencies of the two classes in the 
        # currently selected rows 
    
    #Connect the selection change of the first source to the update
    source.selected.on_change("indices", update)

    lt = layout([
        [plot1, plot2],
        [plot3]], 
        sizing_mode="stretch_width")
    
    doc.add_root(lt)

show(bkapp)