# Aim

Now that we've pre-processed our data and created some standalone Bokeh graphs, we can now look at taking these interactive graphs to the next level. I.e., we can add drop-downs and look at how to hook up the notebook to a local Bokeh server.

But for now, let's try and get these drop-downs working so that we can switch between colouring our data points by region, urban-rural classification, or shannon index. We also want to get some drop-downs working for our religion dataset, so that we can switch between different religions to explore the relationship between % of religious group in an LA and their contribution to the GI NR rate.

## Import libraries

In [1]:
# used to manipulate dataframes
import pandas as pd

# used to create visualisations
import seaborn as sns
import matplotlib.pylab as plt

# used to create interactive visualisations
from bokeh.io import show, curdoc, output_notebook
from bokeh.layouts import column
from bokeh.models import (
    ColumnDataSource,
    ColorBar,
    BasicTicker,
    PrintfTickFormatter,
    LinearColorMapper,
    Select,
)
from bokeh.models.annotations import LabelSet
from bokeh.palettes import Category10
from bokeh.plotting import figure


## Read-in data

We have some pre-processed data from our previous notebooks (started in [Main_Lang_NR_GI.ipynb](./Main_Lang_NR_GI.ipynb), and finished in [Religion_1_GI.ipynb](./Religion_1_GI.ipynb)) that we will read in now.

In [2]:

df = pd.read_csv('../Data/final_lang_gi.csv')

# Let's take a quick glance

df.head()

Unnamed: 0,LA_name,Observation,Non_Eng_Percentages,NR_rate,region,Urb_Rur,Shannon_idx
0,Adur,1971,3.14,4.68,South East,Predominantly Urban,0.205076
1,Allerdale,1073,1.15,4.61,North West,Predominantly Rural,0.099143
2,Amber Valley,1850,1.51,5.44,East Midlands,Predominantly Urban,0.142831
3,Arun,9469,5.89,5.44,South East,Predominantly Urban,0.201664
4,Ashfield,3944,3.22,5.64,East Midlands,Predominantly Urban,0.225514


# Interactive scatterplots

## Non-English + Non-response

This first scatterplot shows the relationship between the % of Non-English speakers and % of GI non-response for our 331 local authorities in England and Wales. 

In terms of interactive elements, we have included:

* tooltips - allow us to scroll over dps to reveal information
* legends - the 'click_policy' property is set to "hide" so that when a user clicks on a legend (for instance 'East Midlands' in terms of region) the data points relating to this category will disappear from the plot.
* dropdowns - we create a 'dropdown' widget with different options (region, urban-rural classification, and shannon index) and implement python callbacks so that when a user selects an option, the graph updates.



In [3]:
# Prepare data sources
source = ColumnDataSource(df)


# Bokeh has a hover tool, allowing you to scroll over dps to reveal info
# To configure the tool, we must set our tooltips arguments...

# We simply define a list of tuples which refer to column values in our merged_df 

tool = [
    ("index", "$index"),
    ("(x,y)", "(@Non_Eng_Percentages, @NR_rate)"),
    ("name", "@LA_name"),
]

# Create first graph figure, set title and x and y labels

p0 = figure(title = "Relationship between Non-response Rate and Non-English Speakers", x_axis_label = "Percentage of Non-English Speakers",
           y_axis_label = "Non-response rate", tooltips = tool)

# Create scatterplot and x and y values from columns
p0.scatter("Non_Eng_Percentages", "NR_rate", source=source, fill_alpha=0.5, size=10)


# Plot 1 (By Region)
p1 = figure(title="Relationship between Non-response Rate and Non-English Speakers",
            x_axis_label="Percentage of Non-English Speakers",
            y_axis_label="Non-response Rate",
            tooltips=tool)


# To colour each data point by region we first loop over each unique region and its colour
for region, color in zip(df.region.unique(), Category10[10]):
    # Subset dataframe by region for each unique region
    b = df[df.region == region]
    #     Each dp within that region is then plotted with its data and specific colour
    p1.circle(x='Non_Eng_Percentages', y='NR_rate', size=10, alpha=0.5, color=color,
              legend_label=region, muted_color=color, muted_alpha=0.1, source=ColumnDataSource(b))

    
# Set location of legend
p1.legend.location = "bottom_right"
# Set click policy to hide 
# When a specific legend is clicked, its dps are removed from the graph
p1.legend.click_policy = "hide"
# Set legend title
p1.legend.title = "Regions"

# Plot 2 (Urban vs Rural)
p2 = figure(title="Relationship between Non-response Rate and Non-English Speakers",
            x_axis_label="Percentage of Non-English Speakers",
            y_axis_label="Non-response Rate",
            tooltips=tool)

for urb_rur, color in zip(df.Urb_Rur.unique(), Category10[10]):
    c = df[df.Urb_Rur == urb_rur]
    p2.circle(x='Non_Eng_Percentages', y='NR_rate', size=10, alpha=0.5, color=color,
              legend_label=urb_rur, muted_color=color, muted_alpha=0.1, source=ColumnDataSource(c))

p2.legend.location = "bottom_right"
p2.legend.click_policy = "hide"
p2.legend.title = "Urban-Rural"



# Plot 3 (Shannon Index)

# Created color map object in Bokeh
# Viridis256 chosen because it's good at representing continuous variables
color_map = LinearColorMapper(palette="Viridis256", low=df.Shannon_idx.min(), high=df.Shannon_idx.max())

p3 = figure(title="Relationship between Non-response Rate and Non-English Speakers",
            x_axis_label="Non-response Rate",
            y_axis_label="Percentage of Non-English Speakers",
            tooltips=tool)

p3.scatter("Non_Eng_Percentages", "NR_rate", source=source, fill_alpha=0.5, size=10,
           color={'field': 'Shannon_idx', 'transform': color_map})

# Create colour bar and set the color_mapper parameter 
color_bar = ColorBar(color_mapper=color_map,
                     title='Shannon Index',
                     ticker=BasicTicker(desired_num_ticks=5),
                     formatter=PrintfTickFormatter(format='%.2f'))

# Add the colour bar to the right of the p3 graph
p3.add_layout(color_bar, 'right')

# Create dropdown selection menu
dropdown = Select(title="Color By:", value="None", options=["Default", "Region", "Urban", "Shannon Index"])

# Define the update function
def update_scatterplots(attr, old, new):
    if dropdown.value == "Default":
        p0.visible = True
        p1.visible = False
        p2.visible = False
        p3.visible = False
    elif dropdown.value == "Region":
        p0.visible = False
        p1.visible = True
        p2.visible = False
        p3.visible = False
    elif dropdown.value == "Urban":
        p0.visible = False
        p1.visible = False
        p2.visible = True
        p3.visible = False
    elif dropdown.value == "Shannon Index":
        p0.visible = False
        p1.visible = False
        p2.visible = False
        p3.visible = True

# Set initial visibility
p0.visible = True
p1.visible = False
p2.visible = False
p3.visible = False

# Add the callback to the dropdown menu
dropdown.on_change('value', update_scatterplots)

# Create a layout with the dropdown menu and the scatterplots
layout = column(dropdown, p0, p1, p2, p3)

# Add the layout to the document
curdoc().add_root(layout)


## Religion + Non-response 

This second scatterplot shows the relationship between the % of religious groups  and their contribution to the non-response rate for our 331 local authorities in England and Wales. 

In terms of interactive elements, we have included:

* tooltips - allow us to scroll over dps to reveal information
* dropdowns - we create a 'dropdown' widget for the different religious categories (christian, buddhist, jewish, etc.) and implement python callbacks so that when a user selects an option, the graph updates.

In [4]:
# Read-in pre-processed data for religion

rel = pd.read_csv('../Data/religion_gi_cleaned.csv')

In [5]:
rel

Unnamed: 0,LA_name,Total_Observation,No religion_Percentage,No religion_Observation,Christian_Percentage,Christian_Observation,Buddhist_Percentage,Buddhist_Observation,Hindu_Percentage,Hindu_Observation,...,Other religion_Percentage,Other religion_Observation,Buddhist_NR,Christian_NR,Hindu_NR,Jewish_NR,Muslim_NR,No religion_NR,Other religion_NR,Sikh_NR
0,Adur,49937,47.51,23725,49.18,24557,0.51,256,0.34,170,...,0.73,366,0.02,1.46,0.00,0.00,0.06,1.34,0.03,0.00
1,Allerdale,75913,33.10,25128,65.88,50010,0.27,205,0.08,57,...,0.38,285,0.01,1.72,0.00,0.00,0.02,0.97,0.03,0.01
2,Amber Valley,99178,46.62,46233,51.74,51317,0.26,261,0.18,178,...,0.69,685,0.02,1.92,0.00,0.00,0.01,1.39,0.05,0.01
3,Arun,131269,39.56,51925,58.33,76574,0.36,469,0.23,303,...,0.63,822,0.03,2.06,0.02,0.01,0.04,1.22,0.05,0.00
4,Ashfield,96860,49.49,47936,48.56,47039,0.24,236,0.32,312,...,0.58,557,0.03,1.87,0.02,0.00,0.05,1.67,0.04,0.02
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
326,Wrexham,103236,41.62,42968,56.16,57980,0.35,361,0.26,264,...,0.46,470,0.02,2.64,0.01,0.01,0.10,1.88,0.04,0.01
327,Wychavon,104729,34.63,36263,63.55,66551,0.30,309,0.15,157,...,0.49,517,0.02,2.01,0.00,0.00,0.04,0.98,0.04,0.00
328,Wyre,89649,32.76,29371,65.81,58994,0.31,280,0.14,128,...,0.50,446,0.02,1.82,0.00,0.00,0.03,0.98,0.03,0.00
329,Wyre Forest,79812,37.87,30225,60.03,47913,0.27,214,0.14,112,...,0.55,441,0.02,2.29,0.00,0.00,0.06,1.33,0.03,0.02


In [6]:
from bokeh.layouts import column
from bokeh.models import ColumnDataSource, Select, HoverTool
from bokeh.plotting import figure, show
from bokeh.io import output_notebook

# Prepare data
rel['selected_religion'] = rel['Christian_Percentage']  # Default religion
rel['selected_percentages'] = rel['Christian_NR']

source = ColumnDataSource(rel)

# Define tooltips
tool = [
    ("index", "$index"),
    ("(x,y)", "(@selected_religion{0.2f}, @selected_percentages{0.2f})"),
    ("name", "@LA_name"),
]

# Create figure
p_2 = figure(title="Relationship between % of religious group in given LA, and their non-response rate",
            y_axis_label="Non-response Rate", x_axis_label="Percentage of religious group in given LA", tooltips=tool)


# Scatter plot
p_2.scatter("selected_religion", "selected_percentages", source=source, fill_alpha=0.5, size=10)

hover_tool = HoverTool(tooltips=tool, mode='mouse')  # Change mode to 'mouse'
p_2.add_tools(hover_tool)

# Define callback for updating data source
def update_plot(attr, old, new):
    selected_religion = select_religion.value
    rel['selected_religion'] = rel[f'{selected_religion}_Percentage']
    rel['selected_percentages'] = rel[f'{selected_religion}_NR']
    source.data = source.from_df(rel)

# Create select widget
options = ['Christian', 'Muslim', 'Jewish', 'Buddhist', 'Hindu', 'Sikh', 'Other']  # Update with all available religious groups
select_religion = Select(title="Religious Group:", value='Christian', options=options)
select_religion.on_change('value', update_plot)

# Layout
layout = column(select_religion, p_2)

# Display output

curdoc().add_root(layout)