In [1]:
import pandas as pd
import numpy as np 
import ipywidgets 
import bqplot
import matplotlib.pyplot as plt
%matplotlib inline
import datetime as dt

## Data preprocessing 

Read the dataset and clean the data. 

In [2]:
building = pd.read_csv('building_inventory.csv',
                  na_values={'Congress Dist':0,
                            'Year Acquired':0,
                            'Square Footage':0})

First, check the value range of the columns and get the unique values for a basic exploration. 

In [3]:
total_square=building.groupby('Year Acquired')['Square Footage'].sum()
total_square.index

Float64Index([1753.0, 1802.0, 1810.0, 1832.0, 1837.0, 1838.0, 1839.0, 1840.0,
              1841.0, 1843.0,
              ...
              2010.0, 2011.0, 2012.0, 2013.0, 2014.0, 2015.0, 2016.0, 2017.0,
              2018.0, 2019.0],
             dtype='float64', name='Year Acquired', length=171)

In [4]:
agency = building['Agency Name'].unique()
agency

array(['Department of Natural Resources', 'Department of Corrections',
       'Department of Human Services', 'Department of Transportation',
       'Department of State Police', 'Department of Military Affairs',
       'Department of Agriculture', 'Governors State University',
       'Department of Central Management Services',
       'Illinois State University', 'Historic Preservation Agency',
       'Department of Juvenile Justice', 'Southern Illinois University',
       'Illinois Medical District Commission', 'University of Illinois',
       "Department of Veterans' Affairs", 'Chicago State University',
       'Northern Illinois University', 'Office of the Secretary of State',
       'Illinois Emergency Management Agency',
       'Western Illinois University', 'Eastern Illinois University',
       'Northeastern Illinois University',
       'Illinois Community College Board',
       'Illinois Board of Higher Education',
       'IL State Board of Education', 'Department of Revenue',


In [5]:
con_dis = building['Congressional Full Name'].unique()
con_dis

array(['Cheri Bustos', 'John Shimkus', 'Adam Kinzinger',
       'Darin M. LaHood', 'Bill Foster', 'Mike Bost',
       'Daniel William Lipinski', 'Rodney L. Davis', 'Peter J. Roskam',
       nan, 'Randy Hultgren', 'Danny K. Davis', 'Tammy Duckworth',
       'Janice Schakowsky', 'Robin Kelly', 'Bobby L. Rush', 'Robert Dold',
       'Mike Quigley', 'Luis Gutierrez'], dtype=object)

In [6]:
building_group = building.groupby(['Agency Name', 'Congressional Full Name'])['Square Footage'].sum()
building_group

Agency Name                        Congressional Full Name
Appellate Court / Fifth District   Mike Bost                     15124.0
Appellate Court / Fourth District  Rodney L. Davis               16400.0
Appellate Court / Second District  Tammy Duckworth               43330.0
Appellate Court / Third District   Adam Kinzinger                18700.0
Chicago State University           Bobby L. Rush               1219492.0
                                                                 ...    
University of Illinois             Danny K. Davis              6363904.0
                                   Robin Kelly                 3643049.0
                                   Rodney L. Davis            14695427.0
Western Illinois University        Cheri Bustos                 385896.0
                                   Darin M. LaHood             1962213.0
Name: Square Footage, Length: 146, dtype: float64

## Build the data matrix of ordinal data points


I write this part by refering this website creating the pivot table. https://blog.algorexhealth.com/2017/09/10-heatmaps-10-python-libraries/

To plot the ordinal data, first a matrix need to be constructed. This helps make the color scale for heatmap.

In [7]:
building_matrix = building.pivot_table(index = 'Agency Name', columns='Congressional Full Name', values = 'Square Footage',aggfunc = sum)
building_matrix

Congressional Full Name,Adam Kinzinger,Bill Foster,Bobby L. Rush,Cheri Bustos,Daniel William Lipinski,Danny K. Davis,Darin M. LaHood,Janice Schakowsky,John Shimkus,Luis Gutierrez,Mike Bost,Mike Quigley,Peter J. Roskam,Randy Hultgren,Robert Dold,Robin Kelly,Rodney L. Davis,Tammy Duckworth
Agency Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
Appellate Court / Fifth District,,,,,,,,,,,15124.0,,,,,,,
Appellate Court / Fourth District,,,,,,,,,,,,,,,,,16400.0,
Appellate Court / Second District,,,,,,,,,,,,,,,,,,43330.0
Appellate Court / Third District,18700.0,,,,,,,,,,,,,,,,,
Chicago State University,,,1219492.0,,,,,,,,,,,,,,,
Department of Agriculture,,,,29350.0,,,,,41984.0,,536232.0,,,,,,2000832.0,
Department of Central Management Services,44130.0,,,151963.0,,2088840.0,54014.0,,70160.0,,98355.0,9932.0,,,443865.0,,1003106.0,65268.0
Department of Corrections,2862863.0,2598339.0,,1518546.0,,,1656696.0,,2908649.0,,2594507.0,,,,,49572.0,931578.0,
Department of Human Services,206088.0,66673.0,449547.0,247839.0,,304039.0,1887569.0,,394598.0,,1579965.0,362890.0,,,234642.0,1253943.0,192934.0,913263.0
Department of Juvenile Justice,,,,227480.0,,,,,209238.0,,63508.0,,72411.0,538807.0,,,36538.0,


## Grid heatmap

This is to test whether the heatmap works well and plot the result needed. And I tried to solve the problem of no enough space to show the lables.

One problem for the plot is that, I didn't find out the way to show the whole labels of x and y. And the y axis labels and legend labels  could be seen by setting the figure margins.

In [29]:
# x and y data
x = building_matrix.index
y = building_matrix.columns

# scales 
x_sc = bqplot.OrdinalScale()
y_sc = bqplot.OrdinalScale()
col_sc = bqplot.ColorScale(schema='RdPu')

# axis 
x_ax = bqplot.Axis(scale=x_sc, label='Agency Name', tick_rotate = 90,  tick_style = {'font-size': 8})
y_ax = bqplot.Axis(scale=y_sc, label='Congress Dist', orientation='vertical',  tick_style = {'font-size': 8})
c_ax = bqplot.ColorAxis(scale = col_sc, orientation='vertical', side='right')

# marks
heat_map = bqplot.GridHeatMap(color= building_matrix, 
                              row= x.tolist(), 
                              column= y.tolist(),
                             scales = {'color':col_sc, 'row':y_sc, 'column':x_sc},
                             interactions={'click':'select'},
                             anchor_style={'fill':'blue'})
# interactivity
mySelectedLable = ipywidgets.Label()# print out info about our selection 

def get_data_value(change):
    # to make sure we only support single selections 
    if len(change['owner'].selected) == 1: # only one selection
        
        i,j = change['owner'].selected[0]
        v = building_matrix.iloc[i, j] 
        mySelectedLable.value = 'Total Square Footage(sft)=' + str(v)
        print(building_matrix.index[i])
        print(building_matrix.columns[j])
        
heat_map.observe(get_data_value, 'selected')
# put all together
fig = bqplot.Figure(marks=[heat_map], axes = [c_ax, y_ax, x_ax], fig_margin={'top':60, 'bottom':100, 'left':150, 'right':100})
fig.layout.max_width = '600px'
fig.layout.min_height = '800px'

myDashboard = ipywidgets.VBox([mySelectedLable, fig])
myDashboard

VBox(children=(Label(value=''), Figure(axes=[ColorAxis(orientation='vertical', scale=ColorScale(), side='right…

# line plot

In [9]:
heat_map.keys

['_model_module',
 '_model_module_version',
 '_model_name',
 '_view_count',
 '_view_module',
 '_view_module_version',
 '_view_name',
 'anchor_style',
 'apply_clip',
 'color',
 'column',
 'column_align',
 'display_format',
 'display_legend',
 'enable_hover',
 'font_style',
 'interactions',
 'labels',
 'null_color',
 'opacity',
 'preserve_domain',
 'row',
 'row_align',
 'scales',
 'scales_metadata',
 'selected',
 'selected_style',
 'stroke',
 'tooltip',
 'tooltip_location',
 'tooltip_style',
 'unselected_style',
 'visible']

In [10]:
# create plot elemetns 
x_scl = bqplot.LinearScale()
y_scl = bqplot.LogScale() # since we know duration is best shown in log

ax_xcl = bqplot.Axis(label='Year Acquired', scale=x_scl) # dates
ax_ycl = bqplot.Axis(label='Total Square Footage', scale=y_scl,
                     orientation='vertical', side='left')

line = bqplot.Lines(x= total_square.index,
                   y = total_square,
                   scales = {'x':x_scl, 'y':y_scl})

# figure 
fig = bqplot.Figure(axes=[ax_xcl, ax_ycl], marks=[line])
fig

Figure(axes=[Axis(label='Year Acquired', scale=LinearScale()), Axis(label='Total Square Footage', orientation=…

# Add interactivity

In [11]:
# CRATE LABEL - # 1
mySelectedLable = ipywidgets.Label()# print out info about our selection 



In [17]:
# create heatmap elements
# x and y data
x = building_matrix.index
y = building_matrix.columns

# scales 
x_sc = bqplot.OrdinalScale()
y_sc = bqplot.OrdinalScale()
col_sc = bqplot.ColorScale(schema='BuPu')

# axis, remember to rotate the x value 
x_ax = bqplot.Axis(scale=x_sc, label='Agency Name', tick_rotate = 90,  tick_style = {'font-size': 8})
y_ax = bqplot.Axis(scale=y_sc, label='Congress Dist', orientation='vertical',  tick_style = {'font-size': 8})
c_ax = bqplot.ColorAxis(scale = col_sc, orientation='vertical', side='right',)

# marks
heat_map = bqplot.GridHeatMap(color= building_matrix, 
                              row= x.tolist(), 
                              column= y.tolist(),
                             scales = {'color':col_sc, 'row':y_sc, 'column':x_sc},
                             interactions={'click':'select'},
                             anchor_style={'fill':'blue'},
                             fig_margin={'top':20, 'bottom':100, 'left':1000, 'right':20})

In [18]:
# create plot elemetns 
x_scl = bqplot.LinearScale()
y_scl = bqplot.LogScale() # for many agency the total square footage are in huge difference in different time

# axis 
ax_xcl = bqplot.Axis(label='Year Acquired', scale=x_scl, tick_rotate = 90,  tick_style = {'font-size': 8}) # dates
ax_ycl = bqplot.Axis(label='Total Square Footage', scale=y_scl,
                     orientation='vertical', side='left',  tick_style = {'font-size': 8})
# mark
line = bqplot.Lines(x= total_square.index,
                   y = total_square,
                   scales = {'x':x_scl, 'y':y_scl})

# figure 
fig = bqplot.Figure(axes=[ax_xcl, ax_ycl], marks=[line])


In [19]:
# link lineplot with heatmap 
def get_data_value(change):
    # to make sure we only support single selections 
    if len(change['owner'].selected) == 1: # only one selection
        
        i,j = change['owner'].selected[0]
        v = building_matrix.iloc[i, j] 
        
        mySelectedLable.value = 'Total Square Footage(sft)=' + str(v)
#         print(building_matrix.index[i])
#         print(building_matrix.columns[j])
        
        # get the value of the corresponding agency name and congresstional district name 
        agency = building_matrix.index[i]
        congress = building_matrix.columns[j]
        
        df_square = building[(building['Agency Name']== agency) & (building['Congressional Full Name']== congress)]
        total_square = df_square.groupby('Year Acquired')['Square Footage'].sum()
        
        line.x = total_square.index
        line.y = total_square
        
        
# make sure to observe the change         
heat_map.observe(get_data_value, 'selected')
# put all together
fig = bqplot.Figure(marks=[heat_map], axes = [c_ax, y_ax, x_ax], )

In [20]:
# create figure objects
fig_heatmap = bqplot.Figure(marks=[heat_map], axes = [c_ax, y_ax, x_ax],fig_margin={'top':20, 'bottom':100, 'left':150, 'right':100})
fig_line = bqplot.Figure(marks=[line], axes = [ax_xcl, ax_ycl],fig_margin={'top':20, 'bottom':100, 'left':60, 'right':60})

In [21]:
# put it all together finally ! as a dashboard
fig_heatmap.layout.min_width = '500px'
fig_line.layout.min_width = '500px'
plots = ipywidgets.HBox([fig_heatmap, fig_line])
myDashboard = ipywidgets.VBox([mySelectedLable, plots])

myDashboard

VBox(children=(Label(value='Total Square Footage(sft)=516516.0'), HBox(children=(Figure(axes=[ColorAxis(orient…

A strange thing happens when I change the cells, maybe after 7 to 8 times, the line plot does not refresh. Don't know why. No warnings or any error raised.

And also, the labels at bottom is still incomplete,

The two problems are  still not solved when submitted.  

## Things to think about 

* Can you keep the x and y ranges static on the line plot? 

    I believe it can be set a fixed range on the x and y axis but haven't figure it out.
    
    
* Can you change the style?
    I find out that it's hard to change the margins or other parameters to show the labels of each axis. But there is a parameter for line plot called line_style, so the style can be changed.  