# Bokeh based example with Spark

This notebook shows how to easily visualize histogrammar plots produced with Bokeh in the notebook.
The example closely follows this tutorial: http://histogrammar.org/docs/tutorials/python-bokeh/

We are going to launch the notebook in a distributed mode with Spark enabled, e.g.:

```bash
IPYTHON_OPTS="notebook --no-browser --port=8889 --ip=127.0.0.1" pyspark --master yarn-client --num-executors 10 --executor-cores 2 --executor-memory 5g
```
   
   
As usually, when working with Spark in the notebook, we check the the SparkContext variable is available:   

In [3]:
sc

<pyspark.context.SparkContext at 0x7f2b0d267160>

Next, we follow the example by booking and filling a couple of histograms:

In [4]:
from histogrammar import *
from histogrammar.plot.bokeh import plot,save,view

In [5]:
simple = [3.4, 2.2, -1.8, 0.0, 7.3, -4.7, 1.6, 0.0, -3.0, -1.7]

In [6]:
one = Histogram(5, -5.0, 8.0, lambda x: x)
two = Histogram(5, -3.0, 7.0, lambda x: x)

In [7]:
labeling = Label(one=one, two=two)
for _ in simple: labeling.fill(_)

In [8]:
glyph_one = one.bokeh()
plot_one = plot(glyph_one)

Usual approach would be to save the histogram plot as HTML to the disk. This is still possible:

In [9]:
save(plot_one,"python_plot_one.html")

But with IPython notebook, a better approach is possible, which is to view the plots inline:

In [10]:
from bokeh.plotting import show
from bokeh.io import output_notebook

In [11]:
output_notebook()

In [12]:
show(plot_one)

# Aggregation

Next is an example with aggregation.

In [13]:
simple_rdd = sc.parallelize(simple)

histogram = Histogram(5, -5.0, 8.0, lambda x: x)

two = simple_rdd.aggregate(histogram, increment, combine)

In [14]:
glyph_two = two.bokeh(glyphType="histogram",fillColor="red")
plot_two = plot(glyph_two)

show(plot_two)