Note: This notebook contains two Bokeh visualizations and they'll be displayed inline.

Data Visialization 1

The following is a visualization of total expenditure on health (in USD) made per capita in 184 countries in a span of 15 years starting from 1995. This dataset was downloaded from gapminder.org.

In [21]:
from bokeh.io import output_notebook, show
from bokeh.plotting import figure, output_file, show
import pandas as pd
import numpy as np
from bokeh.models import ColumnDataSource,HoverTool

output_notebook()

#Importing dataset and cleaning it
health_exp = pd.read_csv("Health Expenditure.csv", encoding = "ISO-8859-1")
health_exp = health_exp.dropna()
health_exp.head(10)

Unnamed: 0,Per capita total expenditure on health,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010
3,Albania,105.870766,145.874213,156.120295,183.802111,225.253214,266.395628,279.082481,303.114667,326.034134,379.826416,406.901526,450.944023,493.882557,557.387321,591.483421,577.275816
4,Algeria,192.243807,180.373326,196.124726,206.13679,202.622356,188.073681,212.235307,225.208426,234.901186,227.205538,221.502879,232.780189,273.055581,299.398716,364.54394,330.006155
6,Andorra,1468.685064,1596.140032,1785.686282,2319.864615,1825.072993,1966.806392,1954.305347,2152.617156,2225.100628,2327.568749,2476.411767,2850.048649,3022.875316,3218.212284,3403.376051,3254.524918
7,Angola,68.573562,43.045968,48.128401,41.652303,44.315755,56.246126,82.23639,63.357726,72.091317,62.970625,71.333321,104.313599,130.931971,190.840727,297.288639,168.130557
9,Antigua and Barbuda,501.813555,502.271727,527.249278,543.902633,565.256756,584.6704,579.822814,595.292678,641.4622,689.30638,734.3333,815.175285,895.974838,886.946267,872.534416,990.552253
10,Argentina,648.112322,663.869736,753.113154,800.222491,849.996066,839.338708,829.943512,657.983669,724.223721,806.31034,915.848622,1016.675343,1124.978869,1217.506224,1385.692886,1286.678797
11,Armenia,89.368248,88.193249,93.680663,102.497062,106.600987,127.525792,136.143116,142.495032,170.739065,191.321885,199.103285,199.733059,217.078453,229.774733,244.933956,238.522122
13,Australia,1596.537533,1700.61543,1804.000654,1941.546408,2085.299782,2252.727448,2373.557882,2543.450068,2658.094774,2860.819645,2958.261806,3137.686345,3313.792536,3408.736167,3483.764413,3441.042109
14,Austria,2237.858435,2305.099607,2446.137722,2603.340383,2733.147526,2861.531273,2905.428323,3057.574228,3198.567633,3393.159124,3475.558191,3722.113902,3880.597143,4128.47947,4288.341805,4387.915343
15,Azerbaijan,87.393149,94.391572,89.283375,98.837604,105.075028,102.399157,109.149367,121.051224,199.430617,269.191888,342.615176,366.528931,382.515013,386.061118,553.557168,579.072675


In [22]:
countries = health_exp['Per capita total expenditure on health'].tolist()
countries = countries[::-1]
years = health_exp.columns.values.tolist()[1:]

data = {}
data['countries'] = countries
for year in years:
    data[year] = health_exp[year].tolist()

#Defining the ColumnDataSource object
source = ColumnDataSource(data)

#Defining figure object
p = figure(x_range=(0,10000), y_range=countries, plot_height=1000, plot_width = 900, title="Healthcare Expenditure")

shades = ['red','blue','green','yellow','black','brown','olive','beige','orange','pink','khaki','grey','navy','teal','magenta','turquoise']
#Defining horizontal stacked bar
p.hbar_stack(years, y='countries', height=0.8, color=shades, source=source, legend=["%s " % year for year in years])


p.ygrid.grid_line_color = None
p.xaxis.axis_label = 'Per capita total expenditure on health (In USD)'
p.yaxis.axis_label = 'Countries'
p.axis.major_label_text_font_size = "5pt"
p.legend.location = "center_left"
p.legend.click_policy="hide"

show(p)

The visualization above is a cluster of stacked horizontal bar glyphs and as an interactive feature, glyphs for individual years can be hidden or be made to appear at the click of button by interaction with the elements in the legend.

From this visualization, it's very intuitive to infer cumulative expenditure on health per capita starting from 1995, of a country.
For example, Denmark's cumulative expenditure per capita in the year 2001 is a little over 15500$.

In a similar fashion, to infer a specific year's datum associated with a country, just gauge the length of the appropriate colored horizontal bar.
For example, Canada's expenditure on health per capita in 2001 is approximately 2700$.

To drill down on a specific year or to hide a certain year's data, please hide/unhide them by interacting with the legend.
Please remember to zoom in using the 'wheel zoom' tool on the right of the plot for convenience.

In [23]:
output_file('Health Expenditure.html')

Data Visualization 2

The following is a visualization of sugar consumption per person per day (in grams) in 151 countries between 1961 and 2004.

In [24]:
from bokeh.models import (ColumnDataSource,HoverTool,LinearColorMapper,BasicTicker,PrintfTickFormatter,ColorBar)

#Importing dataset
data = pd.read_csv('indicator sugar_consumption.csv',encoding = 'ISO-8859-1')
data = data.dropna()
#data = data.fillna(-1) 
#Dropping records that have not one data point
data.head(10)

Unnamed: 0,Countries,1961,1962,1963,1964,1965,1966,1967,1968,1969,...,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004
3,Albania,30.14,30.14,32.88,35.62,35.62,35.62,38.36,38.36,38.36,...,54.8,68.49,60.27,60.27,57.53,65.75,68.49,71.23,65.75,65.75
4,Algeria,46.58,49.32,46.58,49.32,46.58,46.58,49.32,49.32,46.58,...,73.97,73.97,79.45,54.8,60.27,82.19,79.45,82.19,84.93,84.93
7,Angola,24.66,30.14,32.88,30.14,30.14,30.14,30.14,35.62,30.14,...,24.66,27.4,32.88,30.14,38.36,27.4,32.88,35.62,35.62,35.62
9,Antigua and Barbuda,95.89,126.03,104.11,106.85,123.29,131.51,131.51,134.25,134.25,...,90.41,90.41,87.67,93.15,90.41,93.15,93.15,93.15,95.89,95.89
10,Argentina,106.85,98.63,73.97,109.59,101.37,106.85,109.59,106.85,106.85,...,128.77,123.29,120.55,117.81,128.77,128.77,128.77,126.03,112.33,112.33
13,Australia,153.43,147.95,147.95,150.69,153.43,150.69,147.95,150.69,153.43,...,131.51,123.29,134.25,131.51,117.81,123.29,123.29,126.03,128.77,128.77
14,Austria,109.59,115.07,109.59,106.85,112.33,109.59,106.85,115.07,98.63,...,128.77,128.77,126.03,123.29,123.29,126.03,123.29,120.55,123.29,123.29
16,Bahamas,93.15,93.15,87.67,95.89,98.63,98.63,98.63,109.59,131.51,...,126.03,131.51,117.81,120.55,117.81,117.81,112.33,104.11,126.03,126.03
18,Bangladesh,24.66,27.4,27.4,30.14,32.88,41.1,41.1,38.36,41.1,...,16.44,16.44,19.18,19.18,16.44,16.44,19.18,13.7,16.44,16.44
19,Barbados,142.47,142.47,145.21,142.47,142.47,145.21,142.47,142.47,136.99,...,136.99,156.16,158.9,145.21,147.95,147.95,150.69,156.16,156.16,156.16


In [25]:
#pivoting the dataframe to prepare it for conversion into ColumnDataSource object
data = data.set_index('Countries')
data.columns.name = 'Year'
pivoted_df = pd.DataFrame(data.stack(), columns=['Sugar']).reset_index()

#Defining the ColumnDataSource object
source = ColumnDataSource(pivoted_df)

In [26]:
#Defining linear mapper function using the palette above to associate them with sugar consumption levels
colormap = LinearColorMapper(palette=['#fee5d9','#fcae91','#fb6a4a','#de2d26','#a50f15'],
                             low=pivoted_df['Sugar'].min(), high=pivoted_df['Sugar'].max())

In [27]:
#Making lists of countries and years to define the figure
countries = list(data.index)
years = list(data.columns)

#Defining figure object and styling it
p = figure(title="Sugar Consumption",
           x_range=years, y_range=list(countries[::-1]),
           plot_height=1200, plot_width=900,
           x_axis_location="above", toolbar_location='below', tools="hover,pan,wheel_zoom,box_zoom,reset,save")

p.xaxis.axis_label = 'Years'
p.yaxis.axis_label = 'Countries'
p.axis.axis_line_color = None
p.axis.major_tick_line_color = None
p.axis.major_label_text_font_size = "5pt"


In [28]:
#Defining plot point
p.rect(x="Year", y="Countries", height=1, width=1, source=source, fill_color={'field': 'Sugar', 'transform': colormap},
       line_color=None)

#Defining legend and adding it to figure
legend = ColorBar(color_mapper=colormap, ticker=BasicTicker(desired_num_ticks=5),
                  formatter=PrintfTickFormatter(format="%d%%"), major_label_text_font_size="5pt", 
                  location=(0, 0), 
                  )
p.add_layout(legend, 'right')

In [29]:
#Defining hover tool
p.select_one(HoverTool).tooltips = [('Country & Year', '@Countries @Year'),
                                    ('Sugar Consumption per person (g per day)', '@Sugar')]

show(p)

This linear colormap visualization presents the sugar consumption per person per day (in grams) across various demographies in a timeframe between 1961 and 2004. The colors represent which category of percent value a data point has over the average.

The interactive feature that powers this visualization is the hover tool which gives one the ability to view a pop-up tip when hovering the mouse pointer on a data point. The single hue color palette helps in emphasizing countries that have a history of high sugar consumption, which is pretty sweet (Pun unintended).

An interesting point observed is that countries where sugar consumption is high tend to have higher health expenditure. Of course, world economy, in addition to other factors come into play, but a rather fair inference can be made from this correlation.

In [30]:
output_file('Sugar Consumption.html')