# Data Sources and Transformations

We've seen how Bokeh can work well with Python lists, NumPy arrays, Pandas series, etc. At lower levels, these inputs are converted to a Bokeh `ColumnDataSource`. This data type is the central data source object used throughout Bokeh. Although Bokeh often creates them for us transparently, there are times when it is useful to create them explicitly.

In later sections we will see features like hover tooltips, computed transforms, and CustomJS interactions that make use of the `ColumnDataSource`, so let's take a quick look now. 


In [1]:
from bokeh.io import output_notebook, show
from bokeh.plotting import figure

In [2]:
output_notebook()

## Creating with Python Dicts

The `ColumnDataSource` can be imported from `bokeh.models`:

In [3]:
from bokeh.models import ColumnDataSource

The `ColumnDataSource` is a mapping of column names (strings) to sequences of values. Here is a simple example. The mapping is provided by passing a Python `dict` with string keys and simple Python lists as values. The values could also be NumPy arrays, or Pandas sequences.

***NOTE: ALL the columns in a `ColumnDataSource` must always be the SAME length.***


In [4]:
source = ColumnDataSource(data={
    'x' : [1, 2, 3, 4, 5],
    'y' : [3, 7, 8, 5, 1],
})

Up until now we have called functions like `p.circle` by passing in literal lists or arrays of data directly, when we do this, Bokeh creates a `ColumnDataSource` for us, automatically. But it is possible to specify a `ColumnDataSource` explicitly by passing it as the `source` argument to a glyph method. Whenever we do this, if we want a property (like `"x"` or `"y"` or `"fill_color"`) to have a sequence of values, we pass the ***name of the column*** that we would like to use for a property:

In [5]:
p = figure(plot_width=400, plot_height=400)
p.circle('x', 'y', size=20, source=source)
show(p)

In [7]:
# Exercise: create a column data source with NumPy arrays as column values and plot it



## Creating with Pandas DataFrames

It's also simple to create `ColumnDataSource` objects directly from Pandas data frames. To do this, just pass the data frame to  `ColumnDataSource` when you create it:

In [9]:
import pandas as pd
data = np.random.uniform(0,1,[30,4])
df = pd.DataFrame(data)
df.columns = ['a','b','c','d']
source = ColumnDataSource(df)

Now we can use it as we did above by passing the column names to glhph methods:

In [10]:
p = figure(plot_width=400, plot_height=400)
p.circle(x='a',
         y='b',
         source=source)
show(p)

In [None]:
# Exercise: create a column data source from a data frame and plot it



# Linked Interactions

It is possible to link various interactions between different Bokeh plots. For instance, the ranges of two (or more) plots can be linked, so that when one of the plots is panned (or zoomed, or otherwise has its range changed) the other plots will update in unison. It is also possible to link selections between two plots, so that when items are selected on one plot, the corresponding items on the second plot also become selected.

## Linked panning
Linked panning (when multiple plots have ranges that stay in sync) is simple to spell with Bokeh. You simply share the appropriate range objects between two (or more) plots. The example below shows how to accomplish this by linking the ranges of three plots in various ways:

In [11]:
from bokeh.layouts import gridplot

x = list(range(11))
y0, y1, y2 = x, [10-i for i in x], [abs(i-5) for i in x]

plot_options = dict(width=250, plot_height=250, tools='pan,wheel_zoom')

# create a new plot
s1 = figure(**plot_options)
s1.circle(x, y0, size=10, color="navy")

# create a new plot and share both ranges
s2 = figure(x_range=s1.x_range, y_range=s1.y_range, **plot_options)
s2.triangle(x, y1, size=10, color="firebrick")

# create a new plot and share only one range
s3 = figure(x_range=s1.x_range, **plot_options)
s3.square(x, y2, size=10, color="olive")

p = gridplot([[s1, s2, s3]])

# show the results
show(p)

In [None]:
# EXERCISE: create two plots in a gridplot, and link their ranges


## Linked brushing

Linking selections is accomplished in a similar way, by sharing data sources between plots. Note that normally with ``bokeh.plotting`` and ``bokeh.charts`` creating a default data source for simple plots is handled automatically. However to share a data source, we must create them by hand and pass them explicitly. This is illustrated in the example below:

In [12]:
from bokeh.models import ColumnDataSource

x = list(range(-20, 21))
y0, y1 = [abs(xx) for xx in x], [xx**2 for xx in x]

# create a column data source for the plots to share
source = ColumnDataSource(data=dict(x=x, y0=y0, y1=y1))

TOOLS = "box_select,lasso_select,help"

# create a new plot and add a renderer
left = figure(tools=TOOLS, width=300, height=300)
left.circle('x', 'y0', source=source)

# create another new plot and add a renderer
right = figure(tools=TOOLS, width=300, height=300)
right.circle('x', 'y1', source=source)

p = gridplot([[left, right]])

show(p)

In [None]:
# Make an exercise


# Hover Tools

Bokeh has a Hover Tool that allows additional information to be displayed in a popup whenever the user hovers over a specific glyph. Basic hover tool configuration amounts to providing a list of ``(name, format)`` tuples. The full details can be found in the User's Guide [here](http://bokeh.pydata.org/en/latest/docs/user_guide/tools.html#hovertool).

The example below shows some basic usage of the Hover tool with a circle glyph, using hover information defined in utils.py:

In [14]:
from bokeh.models import HoverTool

source = ColumnDataSource(
        data=dict(
            x=[1, 2, 3, 4, 5],
            y=[2, 5, 8, 2, 7],
            other_var=['A', 'b', 'C', 'd', 'E'],
        )
    )

hover = HoverTool(
        tooltips=[
            ("index", "$index"),
            ("(x,y)", "($x, $y)"),
            ("my other var", "@other_var"),
        ]
    )

p = figure(plot_width=300, plot_height=300, tools=[hover], title="Mouse over the dots")

p.circle('x', 'y', size=20, source=source)

show(p)

# Widgets

Bokeh supports direct integration with a small basic widget set. Thse can be used in conjunction with a Bokeh Server, or with ``CustomJS`` models to add more interactive capability to your documents. You can see a complete list, with example code in the [Adding Widgets](http://bokeh.pydata.org/en/latest/docs/user_guide/interaction.html#adding-widgets) section of the User's Guide. 

To use the widgets, include them in a layout like you would a plot object:

Keep in mind that to make this example run, everything has to be done in the server. To run the server, put the file in an independent `.py` file and execute using

    bokeh serve myfile.py --show
    
   

In [None]:
from bokeh.layouts import widgetbox
from bokeh.models.widgets import Slider


slider = Slider(start=0, end=10, value=1, step=.1, title="foo")
show(widgetbox(slider))


In [None]:
# Updating a plot using a widget. This wont run in the notebook.

from bokeh.layouts import widgetbox, gridplot, row
from bokeh.models.widgets import Slider
import numpy as np
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.io import curdoc, show
# from bokeh.layouts import row, widgetbox
# from bokeh.models.grids import

# Setup model (function)
def fun_pow(exp):
    return x**(exp)

# Setup data
interval = 0.001
x = np.arange(0, 1, interval)
y = fun_pow(2)
ds = ColumnDataSource(data=dict(x=x, y=y))

# Setup plot
p = figure(plot_width=400, plot_height=400)
p.line('x', 'y', source=ds, line_width=3, line_alpha=0.6)

# Make the widgets
slider = Slider(start=0, end=10, value=1, step=.1, title="exponent")

# define the update
def update_data(attribute, old, new):
    y = fun_pow(new)
    ds.data = dict(x=x, y=y)
    
slider.on_change('value', update_data)
curdoc().add_root(row(slider, p, width=800))
show(curdoc)

## Callbacks for widgets

Widgets that have values associated can have small JavaScript actions attached to them. These actions (also referred to as "callbacks") are executed whenever the widget's value is changed. In order to make it easier to refer to specific Bokeh models (e.g., a data source, or a glyhph) from JavaScript, the ``CustomJS`` obejct also accepts a dictionary of "args" that map names to Python Bokeh models. The corresponding JavaScript models are made available automaticaly to the ``CustomJS`` code. 

And example below shows an action attached to a slider that updates a data source whenever the slider is moved:

In [None]:
from bokeh.layouts import column
from bokeh.models import CustomJS, ColumnDataSource, Slider

x = [x*0.005 for x in range(0, 201)]

source = ColumnDataSource(data=dict(x=x, y=x))

plot = figure(plot_width=400, plot_height=400)
plot.line('x', 'y', source=source, line_width=3, line_alpha=0.6)

slider = Slider(start=0.1, end=6, value=1, step=.1, title="power")

update_curve = CustomJS(args=dict(source=source, slider=slider), 
                        code="""
                        var data = source.get('data');
                        var f = slider.value;
                        x = data['x']
                        y = data['y']
                        for (i = 0; i < x.length; i++) {
                            y[i] = Math.pow(x[i], f)
                        }
                        source.change.emit();
                        """)

slider.js_on_change('value', update_curve)


show(column(slider, plot))

# Calbacks for selections
It's also possible to make JavaScript actions that execute whenever a user selection (e.g., box, point, lasso) changes. This is done by attaching the same kind of CustomJS object to whatever data source the selection is made on.

The example below is a bit more sophisticated, and demonstrates updating one glyph's data source in response to another glyph's selection:


In [None]:
from random import random

x = [random() for x in range(500)]
y = [random() for y in range(500)]
color = ["navy"] * len(x)

s = ColumnDataSource(data=dict(x=x, y=y, color=color))
p = figure(plot_width=400, plot_height=400, tools="lasso_select", title="Select Here")
p.circle('x', 'y', color='color', size=8, source=s, alpha=0.4)

s2 = ColumnDataSource(data=dict(xm=[0,1],ym=[0.5, 0.5]))
p.line(x='xm', y='ym', color="orange", line_width=5, alpha=0.6, source=s2)

s.callback = CustomJS(args=dict(s2=s2), code="""
    var inds = cb_obj.get('selected')['1d'].indices;
    var d = cb_obj.get('data');
    var ym = 0
    
    if (inds.length == 0) { return; }
    
    for (i = 0; i < d['color'].length; i++) {
        d['color'][i] = "navy"
    }
    for (i = 0; i < inds.length; i++) {
        d['color'][inds[i]] = "firebrick"
        ym += d['y'][inds[i]]
    }
    
    ym /= inds.length
    s2.get('data')['ym'] = [ym, ym]
    
    cb_obj.trigger('change');
    s2.trigger('change');
""")

show(p)

# Integration Example

It is also possible to specify the visual appearance of glyphs when they are "inspected", e.g. by a hover tool. This is accomplished by setting an optional `hover_glyph` on the glyph renderer:

    r.hover_glyph = Circle(fill_alpha=1, fill_color="firebrick", line_color=None) 

Or if using `bokeh.plotting` glyph methods, by passing `hover_fill_alpha`, etc. to the glyph method. Lets look at an example that works together with a `HoverTool` configured for "hline" hit-testing.

In [15]:
from bokeh.models.tools import HoverTool, WheelZoomTool
import pandas as pd
from bokeh.models import ColumnDataSource

data = pd.read_excel(r'sample_data\brue.xlsx', index_col='Date', parse_dates=True)

event = data['1993-09-19':'1993-09-22']
x, y = event.index.to_series(), event['P']

ds = ColumnDataSource(event)

# Figure setup
p = figure(width=600, 
           height=300, 
           x_axis_type="datetime", 
           title='Hover over points')

# Adding a line plot to the figure
p.line(x, y, line_width=1, color='gray')

# Creating a circle plot in top of the line (as markers)
cr = p.circle(x, y, size=10.0,
              fill_color="grey", hover_fill_color="firebrick",
              fill_alpha=0.05, hover_alpha=0.3,
              line_color=None, hover_line_color="white")

# Configuring the Hover tool to display data
hover_tool = HoverTool(tooltips=None,  # set tooltips
                       line_policy='interp', # prev, next, nearest, interp, none
                       renderers=[cr], # which plot will appear
                       mode='hline')  # mouse, hline, vline

hover_tool.tooltips = [
                        ("index", "$index"),
                        ("(x,y)", "($x, $y)"),
                        ("y", "$y"),
                      ]

# Here we add the tools. In this case the hover amd mouse wheel zoom
mouse_zoom = WheelZoomTool()
p.add_tools(hover_tool)

# Display the plot
show(p)

In [16]:
p = figure(width=600, height=300, x_axis_type="datetime", title='Hover over points')
p.line(x='Date', y='P', line_width=1, color='gray', source=ds)


cr = p.circle(x='Date', y='P', size=10.0,
              fill_color="grey", hover_fill_color="firebrick",
              fill_alpha=0.05, hover_alpha=0.3,
              line_color=None, hover_line_color="white",
              source = ds)

hover_tool = HoverTool(tooltips=None,  # set tooltips
                      line_policy = 'interp', # prev, next, nearest, interp, none
                      renderers=[cr], # plot appear
                      mode='hline')  # mouse, hline, vline

hover_tool.tooltips = [
                        ("index", "$index"),
                        ("(x,y)", "($x, $y)"),
                        ("y", "$y"),
                        ('Q', '@Q')
#                         ("fill color", "$color[hex, swatch]:fill_color"),
                    ]

mouse_zoom = WheelZoomTool()
p.add_tools(hover_tool)
show(p)

# Integration example using widgets

Remember that widgets wont work on notebooks. They are meant to work in the server, and for this reasons, they should be deployed in an independent application. To make this run, copy the contents of the following cell into a `.py` file and then execute from the command line

    bokeh serve my_file_name.py --show
    
With this instruction, you will be executing `my_file_name.py` from the server


In [None]:
# Integration example using server. It wont work here!

import numpy as np
from bokeh.io import curdoc
from bokeh.layouts import row, widgetbox
from bokeh.models import ColumnDataSource
from bokeh.models.widgets import Slider, TextInput
from bokeh.plotting import figure, show

# Set up data
N = 200
x = np.linspace(0, 4*np.pi, N)
y = np.sin(x)
source = ColumnDataSource(data=dict(x=x, y=y))

# Set up plot
plot = figure(plot_height=400, plot_width=400, title="my sine wave",
              tools="crosshair,pan,reset,save,wheel_zoom",
              x_range=[0, 4*np.pi], y_range=[-2.5, 2.5])

plot.line('x', 'y', source=source, line_width=3, line_alpha=0.6)


# Set up widgets
text = TextInput(title="title", value='my sine wave')
offset = Slider(title="offset", value=0.0, start=-5.0, end=5.0, step=0.1)
amplitude = Slider(title="amplitude", value=1.0, start=-5.0, end=5.0, step=0.1)
phase = Slider(title="phase", value=0.0, start=0.0, end=2*np.pi)
freq = Slider(title="frequency", value=1.0, start=0.1, end=5.1, step=0.1)


# Set up callbacks
def update_title(attrname, old, new):
    plot.title.text = text.value

text.on_change('value', update_title)

def update_data(attrname, old, new):

    # Get the current slider values
    a = amplitude.value
    b = offset.value
    w = phase.value
    k = freq.value

    # Generate the new curve
    x = np.linspace(0, 4*np.pi, N)
    y = a*np.sin(k*x + w) + b

    source.data = dict(x=x, y=y)

for w in [offset, amplitude, phase, freq]:
    w.on_change('value', update_data)


# Set up layouts and add to document
inputs = widgetbox(text, offset, amplitude, phase, freq)

curdoc().add_root(row(inputs, plot, width=800))
curdoc().title = "Sliders"
show(curdoc)