# Formatting a DataFrame into a Heatmap

Date: 2023-03-14  
Author: Jason Beach  
Categories: Introduction_Tutorial, Data_Science 
Tags: pandas, heatmap 

<!--eofm-->

Reports are the primary way for analysts to have their work reach executives and other stakeholders.  Creating reports in a visually succinct, but still usable, style is important for maximizing audience reach.  An important visual tool for many analysts is the heatmap within an excel file.  This post will describe the process of formatting a dataframe into a heatmap for automated generation. 

Lets start by populating our dataframe with the Titanic survival dataset.  We will only use a subset of the data so that it is more manageable.

<ABSTRACT>

In [1]:
import pandas as pd

In [2]:
table_url = 'https://raw.githubusercontent.com/alexisperrier/packt-aml/master/ch4/extended_titanic.csv'
df = pd.read_csv(table_url)
df.shape

(1309, 19)

In [3]:
df = df.loc[:3, df.columns[:5]]
df.shape

(4, 5)

In [4]:
df.style.format(formatter={'age':"{:.1f}"})

Unnamed: 0,pclass,survived,name,sex,age
0,3,1,de Mulder Mr. Theodore,male,30.0
1,3,0,Edvardsson Mr. Gustaf Hjalmar,male,18.0
2,2,0,Veal Mr. James,male,40.0
3,2,1,Wilhelms Mr. Charles,male,31.0


We can start by simply highlighting the cell as we would with a bright yellow pen.  The most direct approach is simply applying a function to the dataframe.

In [5]:
def highlight_age_cells(val):
    color = 'yellow' if val>32 else ''
    return 'background-color: {}'.format(color)

df.style.applymap(highlight_age_cells, subset='age')

Unnamed: 0,pclass,survived,name,sex,age
0,3,1,de Mulder Mr. Theodore,male,30.0
1,3,0,Edvardsson Mr. Gustaf Hjalmar,male,18.0
2,2,0,Veal Mr. James,male,40.0
3,2,1,Wilhelms Mr. Charles,male,31.0



This can easily be expanded to the usual traffic-light colors ubiquitous in the corporate world.

In [6]:
def highlight_age_cells(val):
    color = ''
    font_color=''
    match val:
        case val if 0  <= val < 20: bg_color = '#58f931'; font_color='black'
        case val if 21 <= val < 31: bg_color = '#FFEB51'; font_color='black'
        case val if 31 <= val < 50: bg_color = '#ff2e1b'; font_color='white'
    return f'background-color: {bg_color}; color: {font_color}'

df.style.applymap(highlight_age_cells, subset='age')

Unnamed: 0,pclass,survived,name,sex,age
0,3,1,de Mulder Mr. Theodore,male,30.0
1,3,0,Edvardsson Mr. Gustaf Hjalmar,male,18.0
2,2,0,Veal Mr. James,male,40.0
3,2,1,Wilhelms Mr. Charles,male,31.0


This can also be applied to the entire row.  You can also imagine formatting simply for aesthetic reasons, similar to an accountant's ledger.  

Commented-out is the code for exporting the styled data to excel.

In [7]:
def highlight_cells(row):
    val=row['age']
    color = 'yellow' if val>32 else ''
    return ['background-color: {}'.format(color) for r in row]

df.style.apply(highlight_cells, axis=1)#.to_excel('styled_df.xlsx', engine='openpyxl')

Unnamed: 0,pclass,survived,name,sex,age
0,3,1,de Mulder Mr. Theodore,male,30.0
1,3,0,Edvardsson Mr. Gustaf Hjalmar,male,18.0
2,2,0,Veal Mr. James,male,40.0
3,2,1,Wilhelms Mr. Charles,male,31.0


Use multiple routines with the same format mapping to create a more complex heatmap, as would be expected from an analyst.

In [8]:
def task_age(v):
    bg_color = ''
    if v>30: bg_color = 'yellow'
    return f'background-color: {bg_color}'

In [9]:
def task_pclass(v):
    bg_color = ''
    if v>2: bg_color = 'yellow'
    return f'background-color: {bg_color}'

In [10]:
def highlight_cells(row):
    formats = []
    for key,val in zip(row.index,row):
        match key:
            case 'pclass': tmp = task_pclass(val)
            case 'age': tmp = task_age(val)
            case _: tmp = "background-color:''"
        formats.append( tmp )
    return formats

In [11]:
df.style.apply(highlight_cells, axis=1)#.to_excel('styled_df.xlsx', engine='openpyxl')

Unnamed: 0,pclass,survived,name,sex,age
0,3,1,de Mulder Mr. Theodore,male,30.0
1,3,0,Edvardsson Mr. Gustaf Hjalmar,male,18.0
2,2,0,Veal Mr. James,male,40.0
3,2,1,Wilhelms Mr. Charles,male,31.0


This is a really powerful technique.  However, if the data is much larger, or contains categorical / textual data, such as questionnaires, this kind of heatmap can be unwieldly.  

Instead, take a look at the [nlp-heatmap](https://github.com/IMTorgOpenDataTools/nlp-heatmap) which specifically accomodates for more complex situations.