<img src="https://snipboard.io/Kx6OAi.jpg">

# 4. Advanced Pandas: Customizing DataFrames
<div style="margin-top: -20px;">Author:  David Yerrington</div>

## Learning Objectives

- Understanding visual aspects of DataFrame output
- Applying basic CSS properties to DataFrames
- Using custom functions to style DataFrame

### Prerequisite Knowledge
- Basic Pandas 
  - Difference between Series vs Dataframe
  - Bitmasks, query function, selecting data
  - Aggregations

## Environment Setup

Don't forget to setup your Python environment from [the setup guide](../environment.md) if you haven't done so yet.


### Imports

In [8]:
import pandas as pd, numpy as np

### Dataset:  Yet, more Pokemon!

In [10]:
df = pd.read_csv("../data/pokemon.csv", encoding = "utf8")

# Styling DataFrames

There are many reasons why you may want to change the styling of how a DataFrame is presented.  Just as plotting can help tell a story that numbers can't, tailoring your DataFrames to point out outliers, distinguish specific categories, or simply making your data easy to scan for publications is very practical.

In this module, we will mostly be coding examples of the core constructs that support changing the default aesthetics of Pandas DataFrames.

### Apply vs Applymap
#### Apply

As a refresher, `.apply()` can be used on `axis=0` (columns), or `axis=1` (rows).  The axis changes what is sent to the function supplied in `.apply()`.  If your DataFrame is 10(columns)x20(rows), `.apply([function])` is called once for each of the 10 columns when `axis = 0` (default) or for each of the 20 rows `axis = 1`.

![](https://snipboard.io/BRMnEI.jpg)

#### Applymap

Applymap is useful for working on and referencing all of your data in the context of each individual cell.  With `.apply()`, the input to your apply function per each call is either the length of your DataFrame (iteration by column series) or the length of your row (iteration by row).  With `.applymap()` the input is each single value of your DataFrame.  So if your DataFrame is 10x10, `applymap([function])` is called for each 100 items contained within it.

![](https://snipboard.io/e9QED3.jpg)

### Question:  Without running the code example below, what do you think the result would be?

```python
data = [
   (1, 10, 100),
   (2, 20, 200),
   (3, 30, 300),
]
test = pd.DataFrame(data)
def my_function(val):
    print(f"Hey I'm being called on {val}!")
    return val + 1
test.applymap(my_function)
```

> If you're using an older version of Pandas the apply/applymap functions will be called twice on the first iteration because of a legacy optimization.  More recent version of Pandas don't behave this way.

In [17]:
test

Unnamed: 0,0,1,2
0,1,10,100
1,2,20,200
2,3,30,300


In [16]:
# Now lets try it out here!
data = [
   (1, 10, 100),
   (2, 20, 200),
   (3, 30, 300),
]
test = pd.DataFrame(data)
def my_function(val):
    print(f"Hey I'm being called on {val}!")
    return val + 1
test.applymap(my_function)



Hey I'm being called on 1!
Hey I'm being called on 2!
Hey I'm being called on 3!
Hey I'm being called on 10!
Hey I'm being called on 20!
Hey I'm being called on 30!
Hey I'm being called on 100!
Hey I'm being called on 200!
Hey I'm being called on 300!


Unnamed: 0,0,1,2
0,2,11,101
1,3,21,201
2,4,31,301


### Working with Style

Now that we know what which aspects of a DataFrame we can access with `.apply` and `.applymap`, we can customize many aspects of how DataFrames are displayed.

Using the `df.style` object it's possible to access the "style" attribute that's attached to every single element displayed in the HTML of the DataFrame rendering in Jupyter.  When we use `.apply` and `.applymap` with `df.style`, the return of our functions used with these methods modifies the CSS "style" attribute of the post-rendered DataFrame HTML rather than the actual data.

> What's been updated is the post-rendered HTML.  All DataFrames are specific HTML tags wrapped around each data element.

### CSS in a Nutshell

If we're going to change the CSS style properties of our post-rendered HTML, it may be useful to understand a few CSS properties.  First, CSS is a vast standard that allows web developers to control the look and feel of HTML documents (ie: a Document Object Model).  

It's impossible to become an expert in CSS without practice but the most useful CSS style properties to be know are:

 - `color`:  Controls the text color [[ref](https://www.w3schools.com/cssref/pr_text_color.asp)]
 - `font-family`: Change the font [[ref](https://www.w3schools.com/cssref/pr_font_font-family.asp)]
 - `background-color`: Change an elements background color [[ref](https://www.w3schools.com/cssref/pr_font_font-family.asp)]
 
 > Learn more about CSS @ [W3Schools](https://www.w3schools.com/cssref/)

### Practicing CSS

Jupyter allows us to render basic HTML tags right in a "markdown" type cell.  Lets explore these properties in more detail.

#### Div with style = `color: red`
HTML
>```HTML
><div style = "color: red">This is text!</div>
>```

Output:
><div style = "color: red">This is text!</div>


#### Div with style = `color: red; background-color: black;`

HTML
>```HTML
><div style = "color: red; background-color: black;">This is text!</div>
>```

Output:
><div style = "color: red; background-color: black;">This is text!</div>


## Question:  Can you search the web for the CSS style property make the font bold?

Much of what we learn on the job at the lowest level is researched.  When thinking about how to change the aethetics of your DataFrame output, as long as we know the "style" HTML attribute is what is responsible for applying CSS properties, we only need to serach for them.

```HTML
<div style="???">Am I bold text?</div>
```

> Solution (double click this frame and edit below)
> <div style="font-weight: bold;">Am I bold text?</div>

### Applying styles properties to DataFrames
Using `.applymap`, we will write a method that will change the text color of `Type 1` based on the Pokemon type using a dictionary whith keys that match the `Type 1` value and cooresponding colors for the given text.

Also with `apply`, we can also use the `subset = [list of columns]` to tell apply which columns to apply to but at the same time showing other columns in context.

> For a list of all CSS colors by name, check out: https://www.w3schools.com/cssref/css_colors.asp

In [28]:
type_map = dict(
    Fire    = "red",
    Grass   = "green",
    Water   = "blue",
    Bug     = "brown",
    Normal  = "emerald",
    Poison  = "darkgreen",
    Ghost   = "grey",
)

def style_type(val):
    # We will update this function
    color = type_map.get(val, "")
    return f"color: {color};"

df.head(10).style.applymap(style_type)

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,80,82,83,100,100,80,1,False
3,4,Mega Venusaur,Grass,Poison,80,100,123,122,120,80,1,False
4,5,Charmander,Fire,,39,52,43,60,50,65,1,False
5,6,Charmeleon,Fire,,58,64,58,80,65,80,1,False
6,7,Charizard,Fire,Flying,78,84,78,109,85,100,1,False
7,8,Mega Charizard X,Fire,Dragon,78,130,111,130,85,100,1,False
8,9,Mega Charizard Y,Fire,Flying,78,104,78,159,115,100,1,False
9,10,Squirtle,Water,,44,48,65,50,64,43,1,False


### Applying to the row `axis=1`

Applying custom methods to the row axis is a little different than what we would expect from `.applymap()` because we're dealing with a series of multiple items rather than single values.  The return is expected to be a list of style properties matching the length of the input.

For example, if we applied on row `axis = 1` with the subset `['Name', 'Type 1', 'Type 2']`, each row will have 3 elements.  The expected return is `["style for 'Name'", "Style for 'Type 1'", "Style for 'Type 3'"]`.
> ![](https://snipboard.io/ruEj8l.jpg)

In [31]:
type_map = dict(
    Fire    = "red",
    Grass   = "green",
    Water   = "blue",
    Bug     = "brown",
    Normal  = "emerald",
    Poison  = "darkgreen",
    Ghost   = "grey",
)

def row_color(row):
    default = ""
    color = type_map.get(row['Type 1'], default)
    return [f"background-color: {color}; color: white;" for item in row]

df.head().style.apply(row_color, subset = ['Name', 'Type 1', 'Type 2'], axis = 1)

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,80,82,83,100,100,80,1,False
3,4,Mega Venusaur,Grass,Poison,80,100,123,122,120,80,1,False
4,5,Charmander,Fire,,39,52,43,60,50,65,1,False


### Question:  How would we also change the background of "Legendary" to "yellow" if the value == `False`?

In [37]:
df.tail().style.apply(row_color, subset = ['Name', 'Type 1', 'Type 2'], axis = 1).applymap(lambda val: "color: yellow" if val == True else "color: red", subset="Legendary")

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
795,796,Diancie,Rock,Fairy,50,100,150,100,150,50,6,True
796,797,Mega Diancie,Rock,Fairy,50,160,110,160,110,110,6,True
797,798,Hoopa Confined,Psychic,Ghost,80,110,60,150,130,70,6,True
798,799,Hoopa Unbound,Psychic,Dark,80,160,60,170,130,80,6,True
799,800,Volcanion,Fire,Water,80,110,120,130,90,70,6,True


### Built-in style Methods

#### `.background_gradient`

> The `cmap` parameter accepts matplotlib color map presets as values.  Read more about them @ https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html

In [40]:
df.head().style.background_gradient()

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,80,82,83,100,100,80,1,False
3,4,Mega Venusaur,Grass,Poison,80,100,123,122,120,80,1,False
4,5,Charmander,Fire,,39,52,43,60,50,65,1,False


#### Inline `.bar` Charts

> Pandas bar docs: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.io.formats.style.Styler.bar.html

In [45]:
df.head(15).style \
    .bar(subset = ['HP']) \
    .bar(subset = ['Attack'], color = "lightblue") \
    .bar(subset = ['Defense'], color = "pink")

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,80,82,83,100,100,80,1,False
3,4,Mega Venusaur,Grass,Poison,80,100,123,122,120,80,1,False
4,5,Charmander,Fire,,39,52,43,60,50,65,1,False
5,6,Charmeleon,Fire,,58,64,58,80,65,80,1,False
6,7,Charizard,Fire,Flying,78,84,78,109,85,100,1,False
7,8,Mega Charizard X,Fire,Dragon,78,130,111,130,85,100,1,False
8,9,Mega Charizard Y,Fire,Flying,78,104,78,159,115,100,1,False
9,10,Squirtle,Water,,44,48,65,50,64,43,1,False


### Adding Captions with `.set_caption("your caption!")`

Sometimes you will take a screenshot of a sample of data in a DataFrame.  It's helpful to add a caption to it so it so you don't have to add one later in post production.

In [49]:
columns = ['HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed']
# Code below
df[columns].corr().style.background_gradient().set_caption('This is a corr chart')

Unnamed: 0,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed
HP,1.0,0.422386,0.239622,0.36238,0.378718,0.175952
Attack,0.422386,1.0,0.438687,0.396362,0.26399,0.38124
Defense,0.239622,0.438687,1.0,0.223549,0.510747,0.015227
Sp. Atk,0.36238,0.396362,0.223549,1.0,0.506121,0.473018
Sp. Def,0.378718,0.26399,0.510747,0.506121,1.0,0.259133
Speed,0.175952,0.38124,0.015227,0.473018,0.259133,1.0


### Highlighting the min or max

In [53]:
columns = ['HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed']
# Code below
df[columns].head(10).style.highlight_max(color = "red").highlight_min(color = "lightblue")

Unnamed: 0,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed
0,45,49,49,65,65,45
1,60,62,63,80,80,60
2,80,82,83,100,100,80
3,80,100,123,122,120,80
4,39,52,43,60,50,65
5,58,64,58,80,65,80
6,78,84,78,109,85,100
7,78,130,111,130,85,100
8,78,104,78,159,115,100
9,44,48,65,50,64,43


### Question:  What appears to be happening here?  Why do you think we see multiple elements highlighted?

### Lastly, using `.format()`

Format behaves like `.applymap()` but you can configure which columns map to specific methods.  Useful for applying money format to values, changing case of text, changing format of data.

In [54]:
df.head(10).style.format({
    "Name": lambda val: val.upper(),
    "Type 1": lambda val: val.lower(),
    "HP": "${:20,.2f}"
})

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,BULBASAUR,grass,Poison,$ 45.00,49,49,65,65,45,1,False
1,2,IVYSAUR,grass,Poison,$ 60.00,62,63,80,80,60,1,False
2,3,VENUSAUR,grass,Poison,$ 80.00,82,83,100,100,80,1,False
3,4,MEGA VENUSAUR,grass,Poison,$ 80.00,100,123,122,120,80,1,False
4,5,CHARMANDER,fire,,$ 39.00,52,43,60,50,65,1,False
5,6,CHARMELEON,fire,,$ 58.00,64,58,80,65,80,1,False
6,7,CHARIZARD,fire,Flying,$ 78.00,84,78,109,85,100,1,False
7,8,MEGA CHARIZARD X,fire,Dragon,$ 78.00,130,111,130,85,100,1,False
8,9,MEGA CHARIZARD Y,fire,Flying,$ 78.00,104,78,159,115,100,1,False
9,10,SQUIRTLE,water,,$ 44.00,48,65,50,64,43,1,False


## Summary

- `df.style` is a way we access DataFrame "style" attributes as they are rendered as HTML.
- CSS is a standard that defines how HTML element can be styled visually (and other important HTML aspects).
- `df.style.applymap(func)` styles individual cells.
>Expects a `style` string as the return to your function.
- `df.style.apply(func, axis=0)` styles row cells.
> Expects a series of `style` strings as the return to your function.

- Built-in style methods
  - `.background_gradient` - Changes background based on matplotlibs color map (`cmap`) like a heatmap.
  - `.bar` - Draws bars as the backround representing the displayed values in a column series.
  - `.set_caption` - Adds a caption to the top of your DataFrame useful for taking screenshots.
  - `.highlight_min` - Highlights the minimum value of each column in a subset.
  - `.highlight_max` - Highlights the maximum value of each columns in a subset.
  - `.format` - Flexible mapping method that matches column keys to specific functions that operate like `.applymap()`