# Data visualization on the web

In this tutorial we learn how to create web-ready interactive data visualizations from Jupyter by using the library **Bokeh**.

## Case study: display the distribution of artistic periods in the '90s

We want to display how frequently an artistic period is addressed in collections over time. We want to compare the proportions of periods occurrences for each year. We want to visualize the distribution and be able to compare the variance across years in a **stacked bar chart**, wherein on the x axis we place the years (in the '90s) and on the y axis we place the count for each artistic period. We use a standard distribution, meaning we show the actual number of occurrences and not their percentage.

To do so, we reuse the csv table we created in prior tutorials called `periods_dates.csv`, where in every row we have a unique combination of `"collection","period_label","year"`. 

To visualize this information in a stacked bar chart we need to create a new matrix where columns are the (unique) artistic periods and every row represents the counting of periods (respectively) in a certain year. By looking at the [documentation of Bokeh](https://docs.bokeh.org/en/latest/docs/gallery/bar_stacked.html) we understand we need to create a structure like the following one:

In [1]:
my_fake_final_data = { "my_periods" : ["period_1", "period_2", "period_n"],
                 "1950": [6,15,3],
                 "1951": [4,9,1],
                 "2020": [3,0,1]}

We need to create a dictionary while iterating over the rows of the csv. Every key of the dictionary has as a value a list. We can use a `defaultdict` to initialise the dictionary and feed the values while iterating in a for loop. 

In [4]:
import csv
from collections import defaultdict
import numpy as np
data = {} # final data


with open('periods_dates.csv') as csvfile:
    data_periods = [] 
    years = []
    rows = csv.reader(csvfile, delimiter=',')
    next(rows) # skip the header
    for row in rows:
        if row[1] not in data_periods:
            data_periods.append(row[1]) # create the list of unique periods
        if row[2] not in years and (int(row[2]) >= 1980 and int(row[2]) <= 1999) :
            years.append(row[2]) # create the list of unique periods
    
    years.sort()
    periods_years_zeros_matrix = np.array(np.zeros(( len(years), len(data_periods) ))) # rows, columns
    
    for i, period in enumerate(data_periods): # loop over periods and their index position  
        for j, year in enumerate(years):
            with open('periods_dates.csv') as csvfile:
                rows = csv.reader(csvfile, delimiter=',')
                for row in rows:
                    if row[2] == year and row[1] == period:
                        periods_years_zeros_matrix[j][i] = int(periods_years_zeros_matrix[j][i])+1
    
    periods_years_zeros_matrix
    
# put everything together
data["periods"] = data_periods
for i, year in enumerate(years):
    data[year] = list(periods_years_zeros_matrix[i])
print(data)

{'periods': ['15th century', 'Middle Ages', 'Byzantine art', '19th century', 'neoclassicism', '14th century', 'Renaissance', 'Baroque', 'early Christian art and architecture', '18th century', 'modern art', '16th century', 'Romanticism', 'Cold War', 'Mannerism', '17th century'], '1990': [6.0, 3.0, 2.0, 1.0, 2.0, 1.0, 9.0, 5.0, 2.0, 3.0, 1.0, 6.0, 1.0, 0.0, 2.0, 6.0], '1991': [6.0, 3.0, 2.0, 1.0, 2.0, 1.0, 9.0, 5.0, 2.0, 3.0, 1.0, 6.0, 1.0, 0.0, 2.0, 6.0], '1992': [6.0, 3.0, 2.0, 1.0, 2.0, 1.0, 9.0, 5.0, 2.0, 3.0, 1.0, 6.0, 1.0, 0.0, 2.0, 6.0], '1993': [6.0, 3.0, 2.0, 1.0, 2.0, 1.0, 8.0, 5.0, 2.0, 3.0, 1.0, 6.0, 1.0, 0.0, 2.0, 6.0], '1994': [6.0, 3.0, 2.0, 1.0, 2.0, 1.0, 8.0, 5.0, 2.0, 3.0, 1.0, 6.0, 1.0, 0.0, 2.0, 6.0], '1995': [6.0, 3.0, 2.0, 1.0, 2.0, 1.0, 8.0, 5.0, 2.0, 3.0, 1.0, 6.0, 1.0, 0.0, 2.0, 6.0], '1996': [5.0, 2.0, 1.0, 1.0, 2.0, 1.0, 7.0, 5.0, 1.0, 3.0, 1.0, 6.0, 1.0, 0.0, 2.0, 6.0], '1997': [5.0, 2.0, 1.0, 1.0, 2.0, 1.0, 6.0, 4.0, 1.0, 2.0, 0.0, 6.0, 1.0, 0.0, 2.0, 5.0], '

In [3]:
from bokeh.io import output_file, show
from bokeh.plotting import figure
import random

output_file("periods_by_year.html")

number_of_colors = 10 # pick random colors for the years
color = ["#"+''.join([random.choice('0123456789ABCDEF') for j in range(6)])
             for i in range(number_of_colors)]

p = figure(x_range=data_periods, plot_height=650, title="Periods counting by year",
           toolbar_location=None, tools="hover", tooltips="$name: @$name")

p.vbar_stack(years, x='periods', width=0.8, color=color, source=data,
             legend_label=years)

p.y_range.start = 0
p.x_range.range_padding = 0.2
p.xgrid.grid_line_color = None
p.axis.minor_tick_line_color = None
p.outline_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "vertical"
p.xaxis.major_label_orientation = "vertical"
show(p)

As a result you get an html page including the snippet HTML + JS. This allows you to leverage charts that are available only in python libraries (or that are complex to handle with Javascript libraries) and reuse them in the final visualization.