# Manual labelling for creating training set in [Bokeh](https://docs.bokeh.org/en/latest/docs/first_steps.html#first-steps)
#### Author: Raphael Attie, NASA/GSFC & George Mason University: email at rattie@gmu.edu (NASA email currently unavailable, do not use it)

## Manual labelling with Bokeh 
self-made example - no reference available

In [4]:
from bokeh.plotting import figure
from bokeh.layouts import column
from bokeh.models import CDSView, ColumnDataSource, IndexFilter, RadioGroup
from bokeh.io import show, output_notebook
output_notebook()

See documentation for on_change behavior of the `ColumnDataSource` object, inherited from `ColumnarDataSource`:
https://docs.bokeh.org/en/latest/docs/reference/models/sources.html#bokeh.models.sources.ColumnarDataSource.on_change

In [12]:
import random
from functools import partial
import numpy as np
import pandas as pd

# Get some data into a dataframe
x1 = list(range(0, 26))
y1 = random.sample(range(0, 100), 26)

# Prepare an output dataframe that will contain the labels (initially, there is none)
df = pd.DataFrame({'x':x1, 'y':y1, 'label': 'no label'})
df.set_index('x', inplace=True)

# Create your labels
labels = ['class 1', 'class 2', 'class 3']
# Create associated visual aid: e.g. 1 color per class. 
colors = ['red', 'black', 'blue']
# Initialize a basic container of the labelled datasets (list of 3 lists)
clusters = [[], [], []]

# Define the app. The "doc" argument is the container of your app. We add to it the graphs and GUI elements
def bkapp1(doc):
        
    # Import your data in the Bokeh source data object. Similar to Pandas dataframe, which can also be imported. 
    source = ColumnDataSource(dict(x1=x1, y1=y1))
    # Define the tools that will appear in the figure toolbar. 
    # The selection tool is necessary, e.g. `box_select`.
    tools = ['box_select', 'hover', 'reset']
    
    # Create the figure axis
    plot = figure(x_axis_label='x',
                  y_axis_label='Temperature (Celsius)',
                  title="Test saving selection for labelling",
                  tools=tools)
    # Populate it with the data visualization (aka glyphs). Here we choose a scatter plot. 
    # Specify some more visual aid when items are selected.
    scatter = plot.scatter(x='x1', y='y1', source=source,
                          selection_color="firebrick",
                          nonselection_fill_alpha=0.4)

    # In our manual labelling app, we could use some buttons to choose which class we are labelling. 
    # Buttons are part of the many widgets offered in Bokeh. 
    # Let's choose a group of radio buttons, for mutually exclusive classes (one button checked at a time) 
    radio_group = RadioGroup(labels=["Cluster 1", "Cluster 2", "Cluster 3"], active=0)

    
    # Define a function that will run when the button is clicked
    def my_radio_handler(new):
        global selected, selected1, selected2, selected3
        print(f'Radio button {new} selected.')
    # Assign it to the "on_click" action of the radio group instance
    radio_group.on_click(my_radio_handler)

    # Below we get to the most important interaction function: the "callback" functions. 
    # They define what happens when you interact with your data through a widget. This interaction should depend on the state 
    # of the widget (our radio buttons). So we must query those states => The button objects must be known by this function
    # This kind of interaction is not explicitly documented in the Bokeh intro tutorials, 
    # it implies some knowledge of some Python language features. 
    # Callbacks are normally are defined just as callback(attr, old, new), where `attr` refers to the changed attribute’s name, 
    # and `old` and `new` refer to the previous and updated values of the attribute.
    # In fact, nothing prevents you from passing more objects. 
    # Here, we will pass our radio buttons This is possible by compiling a `partial` call back function, using functools.partial(). 
    def callback1(attr, old, new, widget):
        # Let's use some shared variables. More elegant ways exist. 
        global clusters, colors, labels
        # Let's get which radio button (the widget instance) is clicked, or `active`
        a = widget.active
        # `new` is the new values taken by the attribute of the callback. 
        # Here that will be the list of indices of our newly selected data
        # So we append the list of newly selected data into our global output variable.
        clusters[a] = clusters[a] + new
        # Assign the label in the dataframe. Direct index selection avoids the problem of redundant selection. 
        df.loc[clusters[a], 'label'] = labels[a]

        # Plot a so-called "view" to overlay the selected data as we go. This avoids hard copies by using Bokeh `IndexFilter`
        view = CDSView(source=source, filters=[IndexFilter(clusters[a])])
        plot.scatter(x='x1', y='y1', source=source, view=view, 
                     legend_label=f'{labels[a]}', # Get a dynamic legend, to show a new legend when we add a new label
                     size=20, line_width=2,
                     fill_color=None, line_color=colors[a],
                     nonselection_fill_alpha=0.4,
                     nonselection_line_alpha=1.0)
        
        # Display how many we selected in each class, just as a visual check
        nclasses = [len(df[df['label'] == s]) for s in labels]
        plot.title.text = f'Class 1: {nclasses[0]} -- Class 2: {nclasses[1]} -- Class 3: {nclasses[2]}'
        
    # Our callback function is complete. Let's add it to what shall trigger it: a change in the manual data selection
    # More info at: https://docs.bokeh.org/en/latest/docs/reference/models/sources.html#bokeh.models.sources.ColumnarDataSource.on_change
    scatter.data_source.selected.on_change('indices', partial(callback1, widget=radio_group))

    # Add our GUI and graph in a column layout
    doc.add_root(column(radio_group, plot))

In [14]:
# Start and show the app
show(bkapp1)

Radio button 1 selected.
Radio button 2 selected.
Radio button 1 selected.
Radio button 0 selected.


In [15]:
df

Unnamed: 0_level_0,y,label
x,Unnamed: 1_level_1,Unnamed: 2_level_1
0,95,class 1
1,33,no label
2,99,class 1
3,84,class 1
4,83,class 1
5,3,class 3
6,21,class 3
7,80,class 1
8,91,class 1
9,30,class 3


To echo with Addison Howard's talk, your source data may be public, but this labelled dataset, in form of a Pandas dataframe that you can export as a standalone file (csv, hdf5, pickle, etc...) is the byproduct you can make private for the sake of the competition requirements. 

#### Creating training set and labelled data is one of the main bottlenecks of Machine Learning. These interactive tools are examples of how to create them and make them "A.I-ready". (See Barbara's talk after this one)

#### This tutorial will be made available on HelioNauts (https://helionauts.org), NASA's new permanent forum for Heliophysics. 

## Other example: Real-time interactive data processing 
From https://github.com/bokeh/bokeh/blob/2.3.3/examples/howto/server_embed/notebook_embed.ipynb

In [1]:
import yaml

from bokeh.layouts import column
from bokeh.models import ColumnDataSource, Slider
from bokeh.plotting import figure
from bokeh.themes import Theme
from bokeh.io import show, output_notebook

from bokeh.sampledata.sea_surface_temperature import sea_surface_temperature

output_notebook()

In [2]:
def bkapp(doc):
    df = sea_surface_temperature.copy()
    source = ColumnDataSource(data=df)

    plot = figure(x_axis_type='datetime', y_range=(0, 25),
                  y_axis_label='Temperature (Celsius)',
                  title="Sea Surface Temperature at 43.18, -70.43")
    plot.line('time', 'temperature', source=source)

    def callback(attr, old, new):
        if new == 0:
            data = df
        else:
            data = df.rolling('{0}D'.format(new)).mean()
        source.data = ColumnDataSource.from_df(data)

    slider = Slider(start=0, end=30, value=0, step=1, title="Smoothing by N Days")
    slider.on_change('value', callback)

    doc.add_root(column(slider, plot))
    

In [3]:
show(bkapp)