# Deploying interactive Bokeh visualizations

[Bokeh](https://bokeh.pydata.org/en/latest/) is powerful library for creating plots and charts in python. You will probably already have played with [matplotlib](https://matplotlib.org/) or [seaborn](https://seaborn.pydata.org/), but Bokeh makes it ridiculously easy to have the user interact with your charts.

In this notebook, we provide a step-by-step guide on how to create a line plot with bokeh that can be deployed as web app with ease. This is a special extended version of a [tool](https://github.com/laurafroelich/swung_viz_log/blob/master/code/holoMagic.py) that Duncan, Jo, Laura, Sean and I developed during the [2018 Subsurface Hackathon](https://agilescientific.com/events/subsurface2018) in Copenhagen.

The code is modified from the wonderful [stocks](https://github.com/bokeh/bokeh/tree/master/examples/app/stocks) and [weather](https://github.com/bokeh/bokeh/tree/master/examples/app/weather) examples in the bokeh gallery.

## Drilling down the rabbit hole

The context for our example comes from visualising log data in oil well explorations. Our final goal is to set up the following visualization. Don't worry, we'll go through the single parts in a second.

<img src='./bokeh.png'></img>

What is shown on the left is a plot of a quantity measured by a drilling head as it moves down into different formations. The quantity being measured is typically being referred to as *Curve*. Here, we show a curve for a measure of radioactivity, but in the drop-down menu the user can choose a different one.

The *group* refers to a subsample of the depth-axis with specific geological properties. Hence, selecting a group corresponds roughly to subsetting the data.

Finally, the *Well* specifies which of possibly several exploration sites should be investigated.

Next to the plot we show basic associated summary statistics, such as maximum, minimum, mean and standard deviation.

The complete code is in the python scripts [holoMagic.py](./holoMagic.py) and [holoHelper.py](./holoHelper.py).

## Bokeh visualizations

First, we import pandas, numpy and several bokeh objects. We also need helper functions that will be explained later if you want to know all the details.

In [None]:
import pandas as pd
import numpy as np

from bokeh.io import curdoc
from bokeh.layouts import row, column
from bokeh.models import ColumnDataSource, Select, PreText

from holoHelper import get_dataset, make_plot, select_top_base, update_text, update_plot

Now, we specify where to find the data and which statistics to look at. These statistics are displayed in ``PreText`` bokeh objects.

In [None]:
#file names and considered stats
DATA_PATH = "../data/EAGE2018/"
wld_fname = "{}well_log_data.txt".format(DATA_PATH)

###Set up the statistics text fields
STAT_NAMES = ['Max ', 'Min ', 'Mean ', 'Std ']
stats = [PreText(text=prop, width=500, height=1) for prop in STAT_NAMES]

Besides the text fields, we also need the drop down menus. For the purpose of this tutorial, we hardwire them into the code.

In [None]:
#Titles and initial valuesfor drop-downs
TITLES = ['Curve', 'Group', 'Well']
INITS = ['Gamma', 'AA', 'A']

#Options to choose from
CURVE_OPTIONS = ['Gamma', 'Res']
GROUP_OPTIONS = ['HH', 'GG', 'FF', 'EE', 'DD', 'CC', 'BB', 'AA', 'All']
WELL_OPTIONS = ['X-27', 'I_A', 'D', 'B', 'B_AT2', 'B_A', 'AA', 'A']
OPTIONS = [CURVE_OPTIONS, GROUP_OPTIONS, WELL_OPTIONS]

#Define drop-down menus
selects = [Select(value=init, title=title, options= option) for init, title, option in zip(INITS, TITLES, OPTIONS)]

When the user makes a selection in the drop-down menu, this should correspond to a change in the plot and the statistics. Therefore, we attach the function `update_plot_text` to the `on_change` event of the drop-down menus.

In [None]:
def update_plot_text(attrname, old, new):
    curve, group, well = [select.value for select in selects]

    top,base = select_top_base(group, well)

    update_text(top, base, curve, well, stats)
    update_plot(top, base, curve, well, source)
    
for select in selects:
    select.on_change('value', update_plot_text)

Now, all that is left to be done is to construct the plot and everything to the root of the current document. There you go!

In [None]:
source = get_dataset(INITS[0], INITS[2])
plot = make_plot(source, INITS[0])
controls = column(*selects, *stats)

curdoc().add_root(row(plot, controls))
curdoc().title = "Log quality visualisation"

If you put the code into ``log_quality.py``, you can deploy the bokeh server via ``bokeh serve --show log_quality.py``. The [bokeh docs](https://bokeh.pydata.org/en/latest/docs/user_guide/server.html) give you all the further details you might be interested.

## Helper functions

So you are still passionate to learn more about bokeh? Then, let's look at the helper functions in detail. We import the usual suspects.

In [None]:
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.palettes import Blues4

import pandas as pd
import numpy as np
import json

We also specify the file path and the statistics to be considered.

In [None]:
#File path
DATA_PATH = "../data/EAGE2018/"
wld_fname = "{}well_log_data.txt".format(DATA_PATH)
group_fname = "{}EAGE_Hackathon_2018_Well_".format(DATA_PATH)

#considered statistics
STAT_NAMES = ['Max ', 'Min ', 'Mean ', 'Std ']

#Fix the maximum possible depth
MAX_DEPTH=5000

We convert the json files into a pandas dataframe.

In [None]:
#convert json data into pandas
with open(wld_fname, 'r') as f:
    j_data = json.load(f)

for i, item in enumerate(j_data):
    if i == 0:
        p_data = pd.DataFrame(item)
    else:
        p_data = p_data.append(pd.DataFrame(item))

The first helper function selects the relevant pieces of the large pandas dataframe and puts them into a bokeh `Source` object.

In [None]:
def get_dataset(curve, well, top=0, base=MAX_DEPTH):
    """Prepare dataset for plotting

    Select curve data from a well in a group bounded by a specified top and base
    # Arguments
        curve: curve to be considered
        well: well to be considered
        top: top coordinates of the group to be considered
        base: base coordinates of the group to be considered
    # Returns
        A bokeh Source object containing the pandas data
    """
    src = p_data[(p_data['Depth']>top) & (p_data['Depth']<base)]
    cur_df = (src[src['Well'] == well])[['Depth', curve]]
    cur_df=cur_df.set_index('Depth').copy()
    cur_df.columns = ['val']

    #reverse depth direction for drawing
    cur_df=cur_df.sort_index()
    cur_df.index = -cur_df.index

    return ColumnDataSource(data=cur_df)

Once we have a bokeh `Source` object, we push it through the plotting pipeline.

In [None]:
def make_plot(current, curve, plot_width=800, plot_height=1000, alpha=.3, font_style="bold"):
    """Show plot of current data 

    Compute plot of current data 
    # Arguments
        current: bokeh Source object containing the current data
        curve: curve to be plotted
        plot_width: width of plot
        plot_heigth: height of plot
        alpha: alpha-value of plot
        font_style: font style for plot
    # Returns
        A plot object for the current data
    """

    #plot the figure
    plot = figure( plot_width=plot_width, plot_height=plot_height, tools="", toolbar_location=None)
    plot.line(y='Depth', x='val', source=current, color=Blues4[1])

    #set plot meta_data
    plot.yaxis.axis_label = "Depth"
    plot.axis.axis_label_text_font_style = font_style
    plot.grid.grid_line_alpha = alpha

    return plot

The groups are given as names, so that we need to read out their starting and and depth from a database.

In [None]:
def select_top_base(group, well):
    """Extract top and base coordinates
    
    Extract top and base coordinates for a specified group and well
    # Arguments
        group: group for which top and base coords are to be queried
        well: well to be considered
    # Returns
        A pair consisting of the top and base coordinate of the specified group
    """

    group_file = "{}EAGE_Hackathon_2018_{}{}{}".format(DATA_PATH,"Well_", well,".csv")
    group_df = pd.read_csv(group_file)

    base = MAX_DEPTH
    top = 0
    if group!='All':
         base_top = group_df[(group_df['name'] == group) & (group_df['Surface'] == 'group')]
         top_sel = base_top[base_top['Obs#'] == 'Top']['MD']
         base_sel = base_top[base_top['Obs#'] == 'Base']['MD']
         if len(top_sel)>0:
             top = top_sel.values[0]
         if len(base_sel)>0:
            base = base_sel.values[0]
    return top, base

Updating the plot is refreshingly simple as we only need to update the attached source.

In [None]:
def update_plot(curve, well, top, base, source):
    """Updates the plot

    Updates the plotted curve after user interaction
    # Arguments
        curve: curve to be considered
        well: well to be considered
        top: top coordinates of the group to be considered
        base: base coordinates of the group to be considered
        source: source object to be modified
    """
    new_source = get_dataset(curve, well, top, base)
    source.data.update(new_source.data)

To update the text, we construct a new pandas dataframe and then recompute the statistics.

In [None]:
def update_text(curve, well, top, base, stats):
    """Updates the plot

    Updates the displayed statistics after user interaction
    # Arguments
        curve: curve to be considered
        well: well to be considered
        top: top coordinates of the group to be considered
        base: base coordinates of the group to be considered
        stats: text fields to be modified
    """
    src = p_data[(p_data['Depth']>top) & (p_data['Depth']<base)]
    df = src[src['Well'] == well]
    df = df[curve]

    stat_vals = [df.max(), df.min(), df.mean(), df.std()]
    for stat, name, stat_val in zip(stats, STAT_NAMES, stat_vals):
        stat.text = "{0}{1:.2f}".format(name, stat_val)                                                          