# Heat Maps and Trees

Today we will be going over more ways to visualize your data. As usual, if you have any questions, feel free to ask!

Documentation continues to be your best friend:
* http://bokeh.pydata.org
* https://pygraphviz.github.io

### Installation

In [None]:
#!brew install graphviz
#!pip install pygraphviz --install-option="--include-path=/usr/local/include/graphviz/" \
#--install-option="--library-path=/usr/local/lib/graphviz"

### Import Statements

In [None]:
from datascience import *
from IPython.display import Image
from bokeh.charts import HeatMap, output_file, show
import pygraphviz as PG
import numpy as np
%matplotlib inline

### Reading in the data

In [None]:
tu38 = Table().read_table('TU 38 master - Sheet1.csv')
tu38

In [None]:
public = Table.read_table('pubschls.csv')
public

In [None]:
graffiti = Table.read_table('Graffiti.csv')
graffiti

### Heat Maps

Per Wikipedia: "A heat map (or heatmap) is a graphical representation of data where the individual values contained in a matrix are represented as colors." They are great tools for visualizing the relationship between two variables, and can be applied to large sets of data easily. In order to create one, we need to find two variables that we want to compare, and come up with some metric to quantify their relationship. In the case of this first dataset, we are going to look at the relationship between deities / groups of deities and the substances that they're offered. We will use the frequency of their appearance together as our way to color the chart.

In [None]:
# grouping the data by SUBSTANCE and DEITY
groupeded = tu38.group(['substance', 'deity'])

# converting to a pandas dataframe
groupeded = groupeded.to_df()
groupeded

In [None]:
# making a bokeh heatmap from the df GROUPEDED
hm = HeatMap(groupeded, x='substance', y='deity', values='count',
             title='Substance x Deity', stat=None)

output_file('tu38_heatmap.html')
show(hm)

What we just did is very similar to a pivot table, but using colors to display intensity instead of numbers.

In [None]:
# this pivot table is similar in representation to what the above heat map represents
pivoted = tu38.pivot('deity', 'substance')
data = pivoted.to_df()

w_index = data.set_index('substance')
w_index

Next we are going to use public school data. We are going to visualize the relationship between counties and funding types, and quantify the relationship with their average DOC (meaning?).

In [None]:
# grouping PUBLIC by COUNTY and FUNDINGTYPE, and using np.mean to calculate 
# the average of all columns for that group
public_grouped = public.group(['County', 'FundingType'], np.mean).to_df()
public_grouped

In [None]:
# heat map that uses the average DOC for its color saturation
hm = HeatMap(public_grouped, x='County', y='FundingType', values='DOC mean',
             title='County x Funding Type', stat=None)

output_file('school_heatmap.html')
show(hm)

### Trees

Sometimes it is helpful to put your data in a tree structure to represent relationships. To quote Wikipedia, "A tree structure or tree diagram is a way of representing the hierarchical nature of a structure in a graphical form." We will start with a simple example, then move onto a multi-layered tree.

In [None]:
# selecting columns that will represent the relationship
# that we want to display
graffiti_sub = graffiti.select('Temple', 'Code')

#picking out a single temple to show the tree of
temple_d = graffiti_sub.where('Temple', 'D')
temple_d

In [None]:
# initializing a pygraphviz tree object
B = PG.AGraph()

# we are going to iterate through each row to add the pairs to the tree
row = 0
while row < temple_d.num_rows:
    count = 0
    while count < temple_d.num_columns-1:
        temple_d.apply((lambda x,y: B.add_edge(x, y)), [count,count+1])
        count += 1
    row+=1

    
# save the graph in dot format
B.write('ademo.dot')

# pygraphviz renders graphs in neato by default, 
# so you need to specify dot as the layout engine
B.layout(prog='dot')

# creating a png
B.draw('file.png')

# displaying that png
Image('file.png')

The above graph is a simple example of the kind of structure that we can implement. The relationship on display is that there is a temple (D), and that temple is the common factor for each of the branches. As we get more complicated and have overlapping edges and nodes, we will have to get more creative about how we construct the trees.

In [None]:
# selecting out features that we care about for our tree
slct = tu38.select('deity', 'position', 'utensil', 'substance')
slct

In [None]:
# we are going to make a tree for just Anu, so we subset the table
anu = slct.where('deity', 'Anu')
anu

In [None]:
# we are going to create id's for each of the values, like we did with networks,
# but instead of numbers, we are going to use permutations of the tree sequences
permutations = anu.group(['deity', 'position', 'utensil', 'substance']).drop('count')
original = permutations.copy()

count = 0
while count < permutations.num_columns-1:
    changed = permutations.apply((lambda x,y: x + ' ' + y), [count,count+1])
    count += 1
    permutations[permutations.labels[count]] = changed
permutations

In [None]:
# creating a dictionary where we will map the permutations to their original values
label_dictionary = {}

for row in range(original.num_rows):
    graph_row = original.take(row)
    key_row = permutations.take(row)
    for i in range(original.num_columns):
        label_dictionary[key_row.get(i)[0]] = graph_row.get(i)[0]
label_dictionary

In [None]:
# initializing the tree
B = PG.AGraph()

# creating all of the nodes
for x in label_dictionary.keys():
    B.add_node(x, label=label_dictionary[x])

# connecting the nodes with the proper edges
count = 0
while count < permutations.num_columns-1:
    permutations.apply((lambda x,y: B.add_edge(x, y)), [count,count+1])
    count += 1
    
B.write('ademo.dot')
B.layout(prog='dot')
B.draw('file.png')
Image('file.png')

In [None]:
# the difference between this tree and the last one is 'strict=False'
# this allows a line for each edge, adding weight to the connection
B = PG.AGraph(strict=False)

for x in label_dictionary.keys():
    B.add_node(x, label=label_dictionary[x])

count = 0
while count < permutations.num_columns-1:
    permutations.apply((lambda x,y: B.add_edge(x, y)), [count,count+1])
    count += 1
    
B.write('ademo.dot')
B.layout(prog='dot')
B.draw('file.png')
Image('file.png')

In [None]:
# this tree shows all of the possbile connections that each level has
# only a level to level display, does not show relationships beyond the adjacent node
B = PG.AGraph()

row = 0
while row < permutations.num_rows:
    count = 0
    while count < permutations.num_columns-1:
        original.apply((lambda x,y: B.add_edge(x, y)), [count,count+1])
        count += 1
    row+=1
    
B.write('ademo.dot')
B.layout(prog='dot')
B.draw('file.png')
Image('file.png')