# Part III: Advanced data visualisation

![](img/visualization.jpeg)

In this section we'll learn:
- Using [Bokeh](https://bokeh.pydata.org/en/latest/) as a low-level visualisation tool to build arbitrary visualisations
- Creating interactive plots

In [1]:
import pandas

movies = pandas.read_json('data/movies_clean.jsonl.bz2', lines=True).set_index('imdb_id')

Continuing the previous examples about movie ratings, we will visualize again the data, but in this case with an interactive plot. One of the best tools for interactive data visualizations in the Python ecosystem is `bokeh`.

Different to matplotlib (the backend used by `pandas`) `bokeh` does not implement high level plots like `histograms`, `bar plots`... But instead, it let us directly draw shapes, so we can create arbitrary plots.

We will iteratively create a plot, one step at a time, so we can better understand the concepts.

To let `bokeh` know that our front end is a Jupyter notebook, we will start by executing the next two lines:

In [2]:
import bokeh.io

bokeh.io.output_notebook()

Next step is to define a figure. Figures are created with the function `bokeh.plotting.figure`. And are rendered with `bokeh.io.show(our_figure)`.

**Exercise:** Start by creating an empty figure with the default parameters, and display it.

A first thing we can do is to define the properties of the figure in the constructor.

The parameters we may want to update are:
- `width`
- `height`
- `title`
- `x_axis_label`
- `y_axis_label`

**Exercise:** Set the values of the previous parameters in the creation of the figure, in preparation to plot the rating of the movies as a function of the number of votes.

Now that we have got the canvas of our plot, we will see how to add shapes to it.

To draw circles, our figure has a method `.circle()`. The position is specified by the parameters `x` and `y`. Other useful parameters are:
- `size`
- `fill_color`
- `fill_alpha`
- `line_color`

**Exercise:** Draw a single circle in the position `0, 0` of the plot. Set other parameters to make the circle look beautiful to you.

**Exercise:** If you want to draw more than one circle, the best way is to provide a lists instead of scalars in the parameters `x` and `y`. Draw 3 circles in different positions. 

With 3 points, now it can make sense to start interacting with the plot, for example zooming on it. Next, we will add our real data.

**Exercise:** Plot the data from our `DataFrame` by specifying in the parameters `x` and `y` of `.circle()` the columns to plot.

While this is already useful, to get the most out of interactivity, we want to provide to each point, all the relevant data of the `DataFrame`, which we will visualize later.

This is done in `bokeh` by creating a data source. A data source is a representation that will be provided to the front end of `bokeh`, so this data is available.

A data source can be created with `bokeh.plotting.ColumnDataSource(data=data)`. Where `data` is a dictionary where each key is a column name, and each value is the list of values for that column. Conveniently, a `pandas` pandas `DataFrame` can be converted to this format with the method `.to_dict(orient='list')`.

**Exercise:** Create a data source with the next columns from our `DataFrame`: `['title', 'photo', 'rate', 'num_ratings', 'summary']`. Add a parameter `.circle(source=source)` to tell `bokeh` to use our data. And replace the values of `x` and `y`, by simply the name of the column.

A useful thing to do, is to show tooltips for every point in the plot, so we can easily identify which movie is it.

For doing that we will first create a small HTML template, which we will later add as a tooltip.

**Exercise:** With the data from the next row (or any other you prefer), create a short html template that let us quickly identify the movie.

In [16]:
movies[['title', 'photo', 'rate', 'num_ratings', 'summary']].loc['tt2338151'].to_dict()

{'title': 'PK',
 'photo': 'https://m.media-amazon.com/images/M/MV5BMTYzOTE2NjkxN15BMl5BanBnXkFtZTgwMDgzMTg0MzE@._V1_UY268_CR1,0,182,268_AL_.jpg',
 'rate': 8.2,
 'num_ratings': 127507.0,
 'summary': 'An alien on Earth loses the only device he can use to communicate with his spaceship. His innocent nature and child-like questions force the country to evaluate the impact of religion on its people.'}

In [14]:
%%html
<p>Replace this by you HTML code. Keep the Jupyter magic %%html, so you can "execute" this cell and see the result<p>

Once we have the HTML document, we need to create a Python variable with its content, replacing the values of the sample movie, with the name of the column prefix with `@`. For example `tooltips = '<h1>@title</h1>'`.

**Exercise:** Create the Python template for `bokeh`, and assign it to the constructor of `figure`, in the parameter `tooltips`.