# Activity 6.02: Extending Plots with Widgets

In this activity, you will combine what you have already learned about Bokeh. You will also need the skills you have acquired while working with pandas for additional DataFrame handling. We will create an interactive visualization that lets us explore the results of the 2016 Rio Olympics.

Our dataset contains the following columns:

- id: Unique ID of the athlete
- name: Name of the athlete
- nationality: Nationality of the athlete
- sex: Male or female
- dob: Date of birth of the athlete
- height: Height of the athlete
- weight: Weight of the athlete
- sport: Category the athlete is competing in
- gold: Number of gold medals the athlete won
- silver: Number of silver medals the athlete won
- bronze: Number of bronze medals the athlete won

We want to use the nationality, gold, silver, and bronze columns to create a custom visualization that lets us dig through the Olympians.

Our visualization will display each country that participated in a coordinate system where the x-axis represents the number of medals won and the y-axis represents the number of athletes. Using interactive widgets, we will be able to filter the displayed countries by both the maximum number of medals won and the maximum amount of athletes axes.

There are many options when it comes to choosing which interactivity to use. We will focus on only two widgets to make it easier for you to understand the concepts. In the end, we will have a visualization that allows us to filter countries for the number of medals and athletes they placed in the Olympics and upon hovering over the single data points, receive more information about each country:

In [1]:
import pandas as pd 
import bokeh.plotting as plt 
from bokeh.models.sources import ColumnDataSource
import bokeh.io as io 
io.output_notebook()
import ipywidgets
import random

In [2]:
df = pd.read_csv('../../Datasets/olympia2016_athletes.csv')
df

Unnamed: 0,id,name,nationality,sex,dob,height,weight,sport,gold,silver,bronze
0,736041664,A Jesus Garcia,ESP,male,10/17/69,1.72,64.0,athletics,0,0,0
1,532037425,A Lam Shin,KOR,female,9/23/86,1.68,56.0,fencing,0,0,0
2,435962603,Aaron Brown,CAN,male,5/27/92,1.98,79.0,athletics,0,0,1
3,521041435,Aaron Cook,MDA,male,1/2/91,1.83,80.0,taekwondo,0,0,0
4,33922579,Aaron Gate,NZL,male,11/26/90,1.81,71.0,cycling,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...
11533,265605954,Zurian Hechavarria,CUB,female,8/10/95,1.64,58.0,athletics,0,0,0
11534,214461847,Zuzana Hejnova,CZE,female,12/19/86,1.73,63.0,athletics,0,0,0
11535,88361042,di Xiao,CHN,male,5/14/91,1.85,100.0,wrestling,0,0,0
11536,900065925,le Quoc Toan Tran,VIE,male,4/5/89,1.60,56.0,weightlifting,0,0,0


In [3]:
# list of countries
countries = df['nationality'].unique()
countries

array(['ESP', 'KOR', 'CAN', 'MDA', 'NZL', 'AUS', 'USA', 'ETH', 'BRN',
       'IOA', 'GBR', 'UZB', 'RSA', 'EGY', 'MAR', 'QAT', 'SUD', 'ALG',
       'DEN', 'NED', 'DJI', 'SEN', 'CMR', 'ITA', 'NIG', 'SWE', 'GHA',
       'AFG', 'AZE', 'KSA', 'BAN', 'NGR', 'RUS', 'IND', 'HUN', 'KAZ',
       'BDI', 'ERI', 'POL', 'BRA', 'GEO', 'CZE', 'SEY', 'GAM', 'LTU',
       'IRI', 'ROU', 'CUB', 'SLO', 'BAH', 'ARG', 'PUR', 'FRA', 'RWA',
       'TOG', 'MDV', 'TUN', 'ISR', 'LAT', 'JOR', 'MAS', 'LIB', 'LBA',
       'PLE', 'IRQ', 'TUR', 'VEN', 'JPN', 'TPE', 'KGZ', 'CHN', 'MEX',
       'GRE', 'IRL', 'JAM', 'SUI', 'BAR', 'HON', 'ANG', 'GER', 'COL',
       'URU', 'MNE', 'SRB', 'BUL', 'FIN', 'UKR', 'BLR', 'SMR', 'COK',
       'SAM', 'AUT', 'BEL', 'KEN', 'SVK', 'POR', 'ECU', 'UAE', 'NAM',
       'GUY', 'EST', 'SKN', 'ARU', 'PAN', 'PER', 'TAN', 'FIJ', 'GUI',
       'NOR', 'ARM', 'THA', 'SIN', 'TKM', 'CRO', 'BIH', 'TGA', 'MAW',
       'DOM', 'GUA', 'MKD', 'TJK', 'CYP', 'CHI', 'MLT', 'ZIM', 'TTO',
       'CRC', 'BOL',

In [4]:
athletes_per_country = df.groupby(by=['nationality']).size()
athletes_per_country

nationality
AFG     3
ALB     6
ALG    68
AND     5
ANG    26
       ..
VIE    23
VIN     4
YEM     3
ZAM     7
ZIM    35
Length: 207, dtype: int64

In [5]:
medals_per_country = df.groupby(by=['nationality'])[('gold', 'silver', 'bronze')].sum()
medals_per_country

  """Entry point for launching an IPython kernel.


Unnamed: 0_level_0,gold,silver,bronze
nationality,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AFG,0,0,0
ALB,0,0,0
ALG,0,2,0
AND,0,0,0
ANG,0,0,0
...,...,...,...
VIE,1,1,0
VIN,0,0,0
YEM,0,0,0
ZAM,0,0,0


In [6]:
medals_per_country.loc['ALG', 'gold']

0

In [7]:
# get a 6 digit random hex color to differentiate the countries better
def get_random_color():
    return f'{random.randint(0, 0xFFFFFF):06x}'

In [8]:
def get_datasource(filtered_countries):
    return ColumnDataSource(data=dict(
        color=[get_random_color() for _ in filtered_countries],
        countries=filtered_countries,
        gold=[medals_per_country.loc[country]['gold'] for country in filtered_countries],
        silver=[medals_per_country.loc[country]['silver'] for country in filtered_countries],
        bronze=[medals_per_country.loc[country]['bronze'] for country in filtered_countries],
        x=[medals_per_country.loc[country].sum() for country in filtered_countries],
        y=[athletes_per_country.loc[country].sum() for country in filtered_countries]
    )) 

In [9]:
def get_plot(max_athletes, max_medals):
    # filter countries dataset: less than or equal medals and athletes than values passed as arguments
    athlete_mask = list(athletes_per_country[athletes_per_country<=max_athletes].index)
    medal_mask = list(medals_per_country[medals_per_country.sum(axis=1)<=max_medals].index)
    filtered_countries = df[df['nationality'].isin(athlete_mask) & df['nationality'].isin(medal_mask)]['nationality'].unique()

    # create DataSource
    data_source = get_datasource(filtered_countries)

    TOOLTIPS = [
        ('Country', '@countries'),
        ('Num of Athletes', '@y'),
        ('Gold', '@gold'),
        ('Silver', '@silver'),
        ('Bronze', '@bronze')
    ]

    plot = plt.figure(
        title='Rio Olympics 2016 - Medal comparison', 
        x_axis_label='Number of Medals',  
        y_axis_label='Num of Athletes',
        plot_width=800, 
        plot_height=500,
        tooltips=TOOLTIPS
    )
    
    plot.circle('x', 'y', source=data_source, size=20, color='color', alpha=0.5)

    return plot

In [10]:
imax_medals = medals_per_country.sum(axis=1).max()
imax_athletes = athletes_per_country.max()

In [11]:
# configure widget elements
slider_athletes=ipywidgets.IntSlider(
    value=imax_athletes,
    min=0, 
    max=imax_athletes, 
    step=1,
    description='Max. Athletes:',
    continuous_update=False,
    orientation='vertical', 
    layout={'width': '100px'}
)
slider_medals=ipywidgets.IntSlider(
    value=imax_medals,
    min=0, 
    max=imax_medals, 
    step=1,
    description='Max. Medals:',
    continuous_update=False,
    orientation='horizontal',
)


@ipywidgets.interact(max_athletes=slider_athletes, max_medals=slider_medals)
def slider(max_athletes, max_medals):
    plt.show(get_plot(max_athletes, max_medals))