# Aim

Now that we've pre-processed our data and created some standalone Bokeh graphs, we can now look at taking these interactive graphs to the next level. I.e., we can add drop-downs and look at how to hook up the notebook to a local server.

But for now, let's try and get these drop-downs working so that we can switch between colouring our data points by region, urban-rural classification, or shannon index. We also want to get some drop-downs working for our religion dataset, so that we can switch between different religions to explore the relationship between % of religious group in an LA and their contribution to the SO NR rate.

## Import libraries

In [1]:
# used to manipulate dataframes
import pandas as pd

# used to create visualisations
import seaborn as sns
import matplotlib.pylab as plt

# used to create interactive visualisations
from bokeh.io import show, curdoc, output_notebook
from bokeh.layouts import column, row
from bokeh.models import (
    ColumnDataSource,
    ColorBar,
    BasicTicker,
    PrintfTickFormatter,
    LinearColorMapper,
    Select,
    HTMLTemplateFormatter,
)

from bokeh.palettes import Category10
from bokeh.plotting import figure
from bokeh.models.widgets import DataTable, TableColumn, Div
from bokeh.plotting import figure


## Read-in data

We have some pre-processed data from our previous notebooks (started in [Main_Lang_NR_SO.ipynb](./Main_Lang_NR_SO.ipynb), and finished in [Religion_1_SO.ipynb](./Religion_1_SO.ipynb)) that we will read in now.

In [2]:

df = pd.read_csv('../Data/final_lang_so.csv')

# Let's take a quick glance

df.head()

Unnamed: 0.1,Unnamed: 0,LA_name,Observation,Non_Eng_Percentages,NR_rate,region,Urb_Rur,Shannon_idx
0,0,Adur,1971,3.14,6.47,South East,Predominantly Urban,0.176281
1,1,Allerdale,1073,1.15,6.18,North West,Predominantly Rural,0.050053
2,2,Amber Valley,1850,1.51,6.77,East Midlands,Predominantly Urban,0.104063
3,3,Arun,9469,5.89,7.09,South East,Predominantly Urban,0.146576
4,4,Ashfield,3944,3.22,6.77,East Midlands,Predominantly Urban,0.176282


# Interactive scatterplots

## Non-English + Non-response

Shows the relationship between the % of Non-English speakers and % of SO non-response for our 331 local authorities in England and Wales. 

In terms of interactive elements, we have included:

* tooltips - allow us to scroll over dps to reveal information
* legends - the 'click_policy' property is set to "hide" so that when a user clicks on a legend (for instance 'East Midlands' in terms of region) the data points relating to this category will disappear from the plot.
* dropdowns - we create a 'dropdown' widget with different options (region, urban-rural classification, and shannon index) and implement python callbacks so that when a user selects an option, the graph updates.

In [3]:
# Prepare data sources
source = ColumnDataSource(df)


# Bokeh has a hover tool, allowing you to scroll over dps to reveal info
# To configure the tool, we must set our tooltips arguments...

# We simply define a list of tuples which refer to column values in our merged_df 

tool = [
    ("index", "$index"),
    ("(x,y)", "(@Non_Eng_Percentages, @NR_rate)"),
    ("name", "@LA_name"),
]

# Create first default graph figure, set title and x and y labels

p0 = figure(title = "Relationship between Non-response Rate and Non-English Speakers", x_axis_label = "Percentage of Non-English Speakers",
           y_axis_label = "Non-response rate", tooltips = tool)

# Create scatterplot and x and y values from columns
p0.scatter("Non_Eng_Percentages", "NR_rate", source=source, fill_alpha=0.5, size=10)


# Plot 1 (By Region)
p1 = figure(title="Relationship between Non-response Rate and Non-English Speakers",
            x_axis_label="Percentage of Non-English Speakers",
            y_axis_label="Non-response Rate",
            tooltips=tool)


# To colour each data point by region we first loop over each unique region and its colour
for region, color in zip(df.region.unique(), Category10[10]):
    # Subset dataframe by region for each unique region
    b = df[df.region == region]
    #     Each dp within that region is then plotted with its data and specific colour
    p1.circle(x='Non_Eng_Percentages', y='NR_rate', size=10, alpha=0.5, color=color,
              legend_label=region, muted_color=color, muted_alpha=0.1, source=ColumnDataSource(b))

    
# Set location of legend
p1.legend.location = "bottom_right"
# Set click policy to hide 
# When a specific legend is clicked, its dps are removed from the graph
p1.legend.click_policy = "hide"
# Set legend title
p1.legend.title = "Regions"

# Plot 2 (Urban vs Rural)
p2 = figure(title="Relationship between Non-response Rate and Non-English Speakers",
            x_axis_label="Percentage of Non-English Speakers",
            y_axis_label="Non-response Rate",
            tooltips=tool)

for urb_rur, color in zip(df.Urb_Rur.unique(), Category10[10]):
    c = df[df.Urb_Rur == urb_rur]
    p2.circle(x='Non_Eng_Percentages', y='NR_rate', size=10, alpha=0.5, color=color,
              legend_label=urb_rur, muted_color=color, muted_alpha=0.1, source=ColumnDataSource(c))

p2.legend.location = "bottom_right"
p2.legend.click_policy = "hide"
p2.legend.title = "Urban-Rural"



# Plot 3 (Shannon Index)

# Created color map object in Bokeh
# Viridis256 chosen because it's good at representing continuous variables
color_map = LinearColorMapper(palette="Viridis256", low=df.Shannon_idx.min(), high=df.Shannon_idx.max())

p3 = figure(title="Relationship between Non-response Rate and Non-English Speakers",
            x_axis_label="Non-response Rate",
            y_axis_label="Percentage of Non-English Speakers",
            tooltips=tool)

p3.scatter("Non_Eng_Percentages", "NR_rate", source=source, fill_alpha=0.5, size=10,
           color={'field': 'Shannon_idx', 'transform': color_map})

# Create colour bar and set the color_mapper parameter 
color_bar = ColorBar(color_mapper=color_map,
                     title='Shannon Index',
                     ticker=BasicTicker(desired_num_ticks=5),
                     formatter=PrintfTickFormatter(format='%.2f'))

# Add the colour bar to the right of the p3 graph
p3.add_layout(color_bar, 'right')

# Create dropdown selection menu
dropdown = Select(title="Color By:", value="None", options=["Default", "Region", "Urban", "Shannon Index"])

# Define the update function
def update_scatterplots(attr, old, new):
    if dropdown.value == "Default":
        p0.visible = True
        p1.visible = False
        p2.visible = False
        p3.visible = False
    elif dropdown.value == "Region":
        p0.visible = False
        p1.visible = True
        p2.visible = False
        p3.visible = False
    elif dropdown.value == "Urban":
        p0.visible = False
        p1.visible = False
        p2.visible = True
        p3.visible = False
    elif dropdown.value == "Shannon Index":
        p0.visible = False
        p1.visible = False
        p2.visible = False
        p3.visible = True

# Set initial visibility
p0.visible = True
p1.visible = False
p2.visible = False
p3.visible = False

# Add the callback to the dropdown menu
dropdown.on_change('value', update_scatterplots)

# Create a layout with the dropdown menu and the scatterplots
layout = column(dropdown, p0, p1, p2, p3)

# Add the layout to the document
curdoc().add_root(layout)


## Religion + Non-response

This second scatterplot shows the relationship between the % of religious groups  and their contribution to the non-response rate for our 331 local authorities in England and Wales. 

In terms of interactive elements, we have included:

* tooltips - allow us to scroll over dps to reveal information
* dropdowns - we create a 'dropdown' widget for the different religious categories (christian, buddhist, jewish, etc.) and implement python callbacks so that when a user selects an option, the graph updates.
* custom formatter - this is implemented in the 'create_formatter' function, where we customise our HTML template and set our selected religion as the input. Then, the 'update_plot' callback updates our DataTable with our 'update_highlighted_rows' function, making whichever religion the user selects both red and bold.

In [6]:
# Read-in pre-processed data for religion

rel = pd.read_csv('../Data/religion_so_cleaned.csv')

# We'll also read-in the dataframes containing our religion totals and non-response info

totals = pd.read_csv('../Data/rel_totals_so.csv')
nr_totals = pd.read_csv('../Data/rel_nr_totals_so.csv')

In [8]:

# Prepare data

rel['selected_religion'] = rel['Christian_Percentage']
rel['selected_percentages'] = rel['Christian_NR']

source = ColumnDataSource(rel)

# Define tooltips

tool = [
    ("index", "$index"),
    ("(x,y)", "(@selected_religion{0.2f}, @selected_percentages{0.2f})"),
    ("name", "@LA_name"),
]

# Custom cell formatter created to highlight chosen religion in data table

template = """
<% if (Religion_categories == selected_religion) { %>
    <span style="color: red; font-weight: bold"><%= value %></span>
<% } else { %>
    <span style="color: black;"><%= value %></span>
<% } %>
"""

def create_formatter(selected_religion):
    formatter = HTMLTemplateFormatter(template=template.replace("selected_religion", f"'{selected_religion}'"))
    return formatter

    
# Create select widget

options = ['Christian', 'Muslim', 'Jewish', 'Buddhist', 'Hindu', 'Sikh', 'Other religion', 'No religion']  # Update with all available religious groups
select_religion = Select(title="Religious Group:", value='Christian', options=options)

# Define callback for updating data source

def update_plot(attr, old, new):
    selected_religion = select_religion.value
    rel['selected_religion'] = rel[f'{selected_religion}_Percentage']
    rel['selected_percentages'] = rel[f'{selected_religion}_NR']
    source.data = source.from_df(rel)
    update_highlighted_rows(selected_religion)

    
# Attach callback to the select widget
# Update the plot when the value in the dropdown changes
select_religion.on_change('value', update_plot)


# Create DataTable for layout1

source1 = ColumnDataSource(totals)

columns1 = [
    TableColumn(field="Religion_categories", title="Religion", formatter=create_formatter('Christian')),
    TableColumn(field="Observation", title="Observation", formatter=create_formatter('Christian')),
    TableColumn(field="Percentages", title="Percentages", formatter=create_formatter('Christian')),
]

# Create heading for first DataTable
heading1 = Div(text="<h1>Totals</h1>", width=300)

data_table1 = DataTable(source=source1, columns=columns1, editable=False, width=500, index_position=None)


# Define first layout

layout1 = column(heading1, data_table1)

# Create DataTable for layout2

source2 = ColumnDataSource(nr_totals)

columns2 = [
    TableColumn(field="Religion_categories", title="Religion", formatter=create_formatter('Christian')),
    TableColumn(field="Observation", title="Observation", formatter=create_formatter('Christian')),
    TableColumn(field="NR_rate", title="Non response rate", formatter=create_formatter('Christian')),
    TableColumn(field="Per_Total", title="% of total NR", formatter=create_formatter('Christian')),
]

heading2 = Div(text="<h1>Non-response rates</h1>", width=300)

data_table2 = DataTable(source=source2, columns=columns2, editable=False, width=700, index_position=None)

layout2 = column(heading2, data_table2)


# Create figure
p4 = figure(title="Relationship between % of religious group in given LA, and their non-response rate",
            y_axis_label="Non-response Rate", x_axis_label="Percentage of religious group in given LA", tooltips=tool)

# Scatter plot
p4.scatter("selected_religion", "selected_percentages", source=source, fill_alpha=0.5, size=10)



# Define callback for updating rows with custom cell formatter

def update_highlighted_rows(selected_religion):
    formatter = create_formatter(selected_religion)
    for col in columns1:
        col.formatter = formatter
    for col in columns2:
        col.formatter = formatter
    data_table1.columns = columns1
    data_table2.columns = columns2



# Define callback for updating data source

def update_plot(attr, old, new):
    selected_religion = select_religion.value
    rel['selected_religion'] = rel[f'{selected_religion}_Percentage']
    rel['selected_percentages'] = rel[f'{selected_religion}_NR']
    source.data = source.from_df(rel)
    update_highlighted_rows(selected_religion)


# Initial update of the highlighted rows
update_highlighted_rows(select_religion.value)

# Define layout
layout = column(select_religion, p4)
l = row(layout1, layout2)

# Add the layout to the document
curdoc().add_root(column(layout, l))

# Display output

## Bokeh server

You'll notice that unlike in previous notebooks (e.g. [Main_Lang_NR_SO.ipynb](./Main_Lang_NR_SO.ipynb), and [Religion_1_SO.ipynb](./Religion_1_SO.ipynb)) we have no "show()" command in the above cell, to render the graph output to the screen. This is because we are now working with more complex interactive functions like dropdown widgets and buttons. This requires a Bokeh server which can manage Python callbacks that respond to user interactions. 

## Python callbacks

A python callback is a function that is passed as an argument to another function to be executed at a later time. For instance, in the above code we have "update_scatterplots(attr, old, new)" which servers as the callback, as it is designed to be executed whenever the user selects a value in the dropdown menu. The following line "dropdown.on_change('value', update_scatterplots)" then registers "update_scatterplots" as a callback function that will be triggered whenever the dropdown value changes. 

## Activating the Bokeh server

Now, to the good stuff. In order to see these callbacks in action and interact with these new visualisations, you're going to need to do the following:

1. Make sure you have cloned the GitHub repo and have installed the various libraries
2. Open your terminal (if using Mac OS), or your command prompt (if using Windows)
3. Navigate to the specific subfolder which holds this file: SO_outputs.ipynb
4. Then type 'bokeh serve --show SO_outputs.ipynb'

A new tab should then automatically open with the visualisations displayed. Go ahead and try clicking on the different dropdowns for each plot!

## BUT! 

If that sounds intimidating/you'd rather just see the finished product (which is outlined in [main.py](../main.py)), please click this link [census-visualisations](https://census-visualisations.herokuapp.com/main).

# Conclusion

Now that we've hooked up this jupyter notebook with our new and exciting drop-downs to a local server, we can move onto the final stage. All that's left to do now is place the sexual orientation outputs and gender identity outputs into one big notebook and get it running on a remote server!

## Confused?

Fair enough. So, we have these visualisations from this notebook up and running in a local server (i.e., you can access it on your local PC), but what if you want to share these visualisations with your friend who lives down the road, or on the other side of the world? That's where remote servers come in. What we can do is hook up our local Bokeh server to a cloud platform like Heroku. Heroku essentially offers us a virtual plot of land so we can host our notebook and allow anyone else in the world to access it. When you click the [census-visualisations](https://census-visualisations.herokuapp.com/main) link, that takes you to our remote application!


So, if that interests you, please proceed to [main.py](../main.py), our final notebook which includes the final and complete application build. 