# Interactive Data Visualisation with Bokeh

In [None]:
import pandas as pd
import pandas_bokeh
from bokeh.plotting import Figure, show, output_notebook
import warnings
warnings.filterwarnings('ignore')

### Bokeh.plotting

Let's start with a simple plot using bokeh.plotting. First we will import some methods, and run `output notebook()`. This will result in our plots being shown in the notebook rather than in a separate page.

In [None]:
output_notebook()

x = [0, 10, 20, 30, 40, 50]
y0 = [0, 1, 4, 9, 16, 25]
y1 = [25, 16, 9, 4, 1, 0]

**Q1)** Create a figure assigned to the variable `s` and then plot the data in `x` and `y0` from above. Plot both `line` and `circle` glyphs with the data, and experiment with the `size`, `alpha`, and `color` parameters for each. `show` the plot in the notebook.

In [None]:
#add your code below


### Bokeh.models.widgets

Run the following code cell

In [None]:
from bokeh.models.widgets import Tabs, Panel

**Q2)** We're now going to use the `Tabs` and `Panel` widgets to have multiple plots available to view via tabbed navigation.

First, create another figure like the one created above, except using `y1` instead of `y0`, and assigning it to the variable `d`. Make the glyph styling a little different, for example by using a different colour.

*It's worth noting here that the manner in which Bokeh creates the output means that we need to create a new figure from scratch rather than attempting to copy and then modify the previous figure.*

*If we subsequently want to update specific attributes of each figure with the same values, we can do so using iteration.*

In [None]:
#add your code below



**Q3)** We need to create a `Panel` object for each `figure` we want to include, using the `child` argument to specify the figure and add a `title` for each one:

- 'Supply' for the `s` panel
- 'Demand' for the `d` panel

Next, create a `Tabs` object, giving a list of the panels created as the `tabs` argument, and then `show` the result.

Look at the [documentation]('https://docs.bokeh.org/en/latest/docs/reference/models/layouts.html?highlight=panel#panel') for help and examples.

In [None]:
#add your code below



Note that the figures in each panel remain independent of one another. 

If we want to update properties with a common value across all graphs, we can use iteration; set the following properties with the same values of your choice for figures `s` and `d`: 

`xaxis.axis_label`, `yaxis.axis_label`, `plot_width`, `plot_height`, `toolbar_location`  

See the [documentation](https://bokeh.pydata.org/en/latest/docs/user_guide/tools.html) for more details on toolbar configuration options.

In [None]:
for chart in [s, d]:

    chart.xaxis.axis_label = "Quantity"
    chart.yaxis.axis_label = "Price"
    chart.plot_width = 600
    chart.plot_height = 400
    chart.toolbar_location = None

show(both)


### Pandas-Bokeh

[Pandas-Bokeh](https://github.com/PatrikHlobil/Pandas-Bokeh) is a library which simplifies the creation of Bokeh plots when using Pandas DataFrames as the data source. It was integrated more directly with Pandas in Pandas version 0.25 (which is why we checked our version at the top of the notebeook).  

This integration allows us to set bokeh as our default plotting backend for Pandas, thus replacing matplotlib when `df.plot` is used and adding Bokeh methods to `pd.plotting`. Note that not all plot types are supported, but what is available should be more than sufficient for most purposes.

In [None]:
pd.set_option('plotting.backend', 'pandas_bokeh')

In [None]:
import warnings
warnings.filterwarnings("ignore")

First we'll load some data. The dataset gives details of 350,000+ domestic commercial flights in the USA from 1990 - 2009.  

It's important to understand that **much of the work required for visualisation comes in the processing of the data** (even when the data is in a clean and tidy state such as the file we will be using below is), so we will walk through this together. 

Take a moment to look at the cell below and understand what's going on; we are loading a comma-separated values file, adding column headers, and then using Pandas `datetime` methods to extract the year and month for each row.  

More information about the dataset can be found in the `.yaml` file in the `data` folder.

In [None]:
df = pd.read_csv('data/flights.csv',
                 names=['Origin', 'Destination','Origin_City', 'Destination_City', 
                        'Passengers', 'Seats', 'Flights', 'Distance','Date', 
                        'Origin_City_Popn','Destination_City_Popn'])

df.head()

In [None]:
df.shape

In [None]:
df['Date'] = pd.to_datetime(df['Date'], format='%Y%m')
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df.head()

We will now calculate some further columns derived from columns of interest, and check that our extended DataFrame looks as expected:

In [None]:
df['Empty_Seats'] = df['Seats'] - df['Passengers']
df['Spare_Capacity_%'] = (1 - df['Passengers'] / df['Seats']) * 100
df['Passenger_Miles'] = df['Passengers'] * df['Distance']
df.head()

We would like to look at the progression over time of the volume of flights and passengers from a specific airport. There's a number of approaches that could be taken to achieve this but use of `.groupby` is shown. Note how this creates a `MultiIndex` DataFrame.

In [None]:
org = df.groupby(['Origin', 'Year']).sum()
org.head()

Notice how as a result of using `.sum` in the `.groupby`, some columns (such as `Date`) have been dropped because they could not be summed, while others (such as `Month` and `Spare_Capacity_%`) no longer contain values which are particularly useful.  

We'll now extract only the columns of use and the rows for a specific airport. Note the assignment of the airport code to a variable `airport`; this should make things easier if we subsequently want to re-use our code for analysis of a different airport, perhaps within a function.

In [None]:
airport = 'JFK'
ap = org.loc[airport][['Passengers', 'Passenger_Miles','Flights', 'Seats', 'Empty_Seats']]
ap.head()

We can see that the values in each column are of quite different magnitudes; so that we can more easily see their progression over time relative to one another, let's use the 1990 values for each as a base year for comparison:

In [None]:
base = ap.iloc[0]
ap_90 = ap / base
ap_90.head()

**Q4)** We now have a tidy DataFrame with `Year` as the index and the progression of various metrics for a given airport over the time period.   

Try using the`.plot` method on the `ap_90` DataFrame and see what you get:

In [None]:
#add your code below



Pandas-Bokeh has done a lot of work for us here - try clicking on the different labels in the legend, and hovering over the lines. The package has created a Bokeh `Figure` object, and from the `bokeh.models` module used `ColumnDataSource` to interpret the DataFrame and `HoverTool` to add further interactivity to the chart.

**Q5)** The above plot was produced without providing any [keyword arguments](https://treyhunner.com/2018/04/keyword-arguments-in-python/) to the `.plot` method, but we can provide further [optional parameter values](https://github.com/PatrikHlobil/Pandas-Bokeh#lineplot) when using it for customisation.


We can assign the resulting Bokeh figure to a variable `fig`, and then make further modifications to it. When doing so we may wish to use the `show_figure=False` argument with the `.plot` method so we don't display it unnecessarily.  

Use the `.plot` method on `ap_90` to assign a figure to a variable `fig`, which does not have a `hovertool` and uses a `colormap` of your choice from the [palettes](https://bokeh.pydata.org/en/latest/docs/reference/palettes.html), while preventing the figure being displayed.  

Have a go at updating the figure using Bokeh methods, such as `toolbar_location`, `legend.location`, and `plot_width`. Remember we can then use `show` to see it.

In [None]:
#add your code below

#fig = ap_90.plot()
#fig.yaxis.axis_label = 'Base Year = 1990'
#show(fig)



**Q6)** What do you think has happened to the **average spare capacity** at which the planes have run, and the **average distance** of a flight from this airport? 

- Use `.groupby` on `df` to find the `mean` of all the values for each airport by each year - `['Origin', 'Year']`
- and then create a new dataframe with only the two columns of interest `['Distance', 'Spare_Capacity_%']` and the rows for the given airport.

In [None]:
#add your code below



We could do as we did previously with the `.sum` data and use the 1990 values as a base. But sometimes it may be useful or necessary to see the absolute values on the chart.

### Further customisation with Bokeh

In [None]:
from bokeh.models import LinearAxis, Range1d, HoverTool

Using the bokeh `Figure` class, we could create a plot which displays the progression of distance and spare capacity over time, with the ability to view the values via the plot. 

plot will include:
- a secondary y-axis 
- an appropriate range and scale for each y-axis
- a tooltip showing the values in an appropriate format
- labels to each y-axis
- a legend and title  

You may find it helpful to investigate the classes imported above from `bokeh.models`.

In [None]:
hover = HoverTool(
        tooltips=[
            ("Distance", "@{Distance}{int} miles"),
            ("Capacity", "@{Spare_Capacity_%}{1.1}%")])

mn = Figure(plot_width=800,plot_height=400,
            title='Distance and spare capacity averages 1990 - 2009',            
            tools=[hover],
            toolbar_location=None,
            y_axis_label='Distance (miles)',
            x_axis_label='Year')

mn.y_range = Range1d(0, 2000)
mn.extra_y_ranges = {"y2_range": Range1d(start=25, end=50)}
mn.add_layout(LinearAxis(y_range_name="y2_range", axis_label="Spare Capacity (%)"), 'right')

mn.line("Year", "Distance", color="#3288bd", source=ap_mean, legend_label="Distance (miles)")
mn.line("Year", "Spare_Capacity_%", color="#99d594", source=ap_mean, legend_label="Spare Capacity (%)", y_range_name="y2_range")

mn.legend.location = "bottom_left"
mn.legend.click_policy="hide"

show(mn)

Finally, we'll take a look at the use of `layouts` to display multiple plots together. 

The `pd.plotting.plot_grid` method takes a list of lists containing plots and will display each list in turn as a row in a grid.

Using this method, let's create a layout which displays the plot of the summed data over time for the given airport from earlier in the notebook, with plot of the averaged data for distance and capacity we just produced below it.

Assign this layout to a variable `grid` and then use the `save` method (imported from `bokeh.plotting`) to create a file called `flights.html`, using `inline` for the `resources` parameter and `Flight Data` for the `title` parameter.

In [None]:
from bokeh.plotting import save


grid = pd.plotting.plot_grid([[fig],[mn]]);
save(grid, filename='flights.html', resources='inline');
