# Week 8 Fun Pandas Tricks and Final Thoughts

Week 8 reading: **Pandas for Everyone** chapters 17 - 18 (pages 305 - 311), and Appendices

Outline:

* Giving color to Pandas
    1. Color a cell based on value
    2. Highlight max in column
    3. Applying to entire Dataframe
    4. Built-in styles
        * Highlight nulls
        * Heatmap style
    
## Overview

The Pandas' Dataframe has the ability to change the way cells, rows, columns, or the entire table is displayed. Functions can be written that change style based on criteria. The overview below re-iterates selected parts of the Pandas Documentation on the subject (https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html).

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns

%matplotlib inline
sns.set()

In [48]:
# Create a random dataframe with a couple of NaNs.
np.random.seed(24)
df = pd.DataFrame({'A': np.linspace(1, 10, 10)})
df = pd.concat([df, pd.DataFrame(np.random.randn(10, 4), columns=list('BCDE'))],
               axis=1)
df.iloc[0, 2] = np.nan
df

Unnamed: 0,A,B,C,D,E
0,1.0,1.329212,,-0.31628,-0.99081
1,2.0,-1.070816,-1.438713,0.564417,0.295722
2,3.0,-1.626404,0.219565,0.678805,1.889273
3,4.0,0.961538,0.104011,-0.481165,0.850229
4,5.0,1.453425,1.057737,0.165562,0.515018
5,6.0,-1.336936,0.562861,1.392855,-0.063328
6,7.0,0.121668,1.207603,-0.00204,1.627796
7,8.0,0.354493,1.037528,-0.385684,0.519818
8,9.0,1.686583,-1.325963,1.428984,-2.089354
9,10.0,-0.12982,0.631523,-0.586538,0.29072


### Color a cell based on a value

Dataframes have a style property that will take either an apply() or applymap() function to conditionally format cells. 

* `applymap(func)`: applies to individual elements
* `apply(func)`: applies to column/row/table

In both cases, `apply()` and `applymap()`, we are returning CSS styles.


Let's color cells with negative numbers red.

In [63]:
def color_negative_red(val):
    """
    Takes a scalar and returns a string with
    the css property `'color: red'` for negative
    strings, black otherwise.
    """
    color = 'red' if val < 0 else 'black'
    return 'color: %s' % color


In [53]:
s = df.style.applymap(color_negative_red)
s

Unnamed: 0,A,B,C,D,E
0,1,1.32921,,-0.31628,-0.99081
1,2,-1.07082,-1.43871,0.564417,0.295722
2,3,-1.6264,0.219565,0.678805,1.88927
3,4,0.961538,0.104011,-0.481165,0.850229
4,5,1.45342,1.05774,0.165562,0.515018
5,6,-1.33694,0.562861,1.39285,-0.063328
6,7,0.121668,1.2076,-0.00204021,1.6278
7,8,0.354493,1.03753,-0.385684,0.519818
8,9,1.68658,-1.32596,1.42898,-2.08935
9,10,-0.12982,0.631523,-0.586538,0.29072


### Highlight max in column

Since we are working with columns, we have to use the `apply()` function. 

In [60]:
def highlight_max(s):
    '''
    highlight the maximum in a Series yellow.
    '''
    is_max = s == s.max()  # Build list is_max -- 1 if max, 0 otherwise
    return ['background-color: yellow' if v else '' for v in is_max] # iterate list, return yellow for max

In [61]:
df.style.apply(highlight_max)

Unnamed: 0,A,B,C,D,E
0,1,1.32921,,-0.31628,-0.99081
1,2,-1.07082,-1.43871,0.564417,0.295722
2,3,-1.6264,0.219565,0.678805,1.88927
3,4,0.961538,0.104011,-0.481165,0.850229
4,5,1.45342,1.05774,0.165562,0.515018
5,6,-1.33694,0.562861,1.39285,-0.063328
6,7,0.121668,1.2076,-0.00204021,1.6278
7,8,0.354493,1.03753,-0.385684,0.519818
8,9,1.68658,-1.32596,1.42898,-2.08935
9,10,-0.12982,0.631523,-0.586538,0.29072


And, we can chain styles:

In [62]:
df.style.\
    applymap(color_negative_red).\
    apply(highlight_max)

Unnamed: 0,A,B,C,D,E
0,1,1.32921,,-0.31628,-0.99081
1,2,-1.07082,-1.43871,0.564417,0.295722
2,3,-1.6264,0.219565,0.678805,1.88927
3,4,0.961538,0.104011,-0.481165,0.850229
4,5,1.45342,1.05774,0.165562,0.515018
5,6,-1.33694,0.562861,1.39285,-0.063328
6,7,0.121668,1.2076,-0.00204021,1.6278
7,8,0.354493,1.03753,-0.385684,0.519818
8,9,1.68658,-1.32596,1.42898,-2.08935
9,10,-0.12982,0.631523,-0.586538,0.29072


### Highlight row based on column value

In this case, we have to return a list with as many elements as we have columns. 

The function below will color rows yellow where the value in the B column is greater than 1.  

NOTE: The syntax `['background-color: yellow']*5` means to return a list with the shown element repeated 5 times.

In [69]:
def highlight_greaterthan_1(s):
    if s.B > 1.0:
        return ['background-color: yellow']*5 # same as ['background-color: yellow','background-color: yellow','background-color: yellow','background-color: yellow','background-color: yellow']
    else:
        return ['background-color: white']*5

In [70]:
df.style.apply(highlight_greaterthan_1, axis=1)

Unnamed: 0,A,B,C,D,E
0,1,1.32921,,-0.31628,-0.99081
1,2,-1.07082,-1.43871,0.564417,0.295722
2,3,-1.6264,0.219565,0.678805,1.88927
3,4,0.961538,0.104011,-0.481165,0.850229
4,5,1.45342,1.05774,0.165562,0.515018
5,6,-1.33694,0.562861,1.39285,-0.063328
6,7,0.121668,1.2076,-0.00204021,1.6278
7,8,0.354493,1.03753,-0.385684,0.519818
8,9,1.68658,-1.32596,1.42898,-2.08935
9,10,-0.12982,0.631523,-0.586538,0.29072


### Applying to entire Dataframe

In the case of using `apply()` for the entire table, we pass `axis=None`. Let's highlight the max value in the entire dataframe.

In [65]:
def highlight_max(data, color='yellow'):
    '''
    highlight the maximum in a Series or DataFrame
    '''
    attr = 'background-color: {}'.format(color)
    if data.ndim == 1:  # Series from .apply(axis=0) or axis=1
        is_max = data == data.max()
        return [attr if v else '' for v in is_max]
    else:  # from .apply(axis=None)
        is_max = data == data.max().max()
        return pd.DataFrame(np.where(is_max, attr, ''),
                            index=data.index, columns=data.columns)


In [66]:
df.style.apply(highlight_max, color='darkorange', axis=None)

Unnamed: 0,A,B,C,D,E
0,1,1.32921,,-0.31628,-0.99081
1,2,-1.07082,-1.43871,0.564417,0.295722
2,3,-1.6264,0.219565,0.678805,1.88927
3,4,0.961538,0.104011,-0.481165,0.850229
4,5,1.45342,1.05774,0.165562,0.515018
5,6,-1.33694,0.562861,1.39285,-0.063328
6,7,0.121668,1.2076,-0.00204021,1.6278
7,8,0.354493,1.03753,-0.385684,0.519818
8,9,1.68658,-1.32596,1.42898,-2.08935
9,10,-0.12982,0.631523,-0.586538,0.29072


### Built-in Styles

Several situations come up often enough that Pandas has given us pre-made styles.


**Highlight null**

In [71]:
df.style.highlight_null(null_color='red')

Unnamed: 0,A,B,C,D,E
0,1,1.32921,,-0.31628,-0.99081
1,2,-1.07082,-1.43871,0.564417,0.295722
2,3,-1.6264,0.219565,0.678805,1.88927
3,4,0.961538,0.104011,-0.481165,0.850229
4,5,1.45342,1.05774,0.165562,0.515018
5,6,-1.33694,0.562861,1.39285,-0.063328
6,7,0.121668,1.2076,-0.00204021,1.6278
7,8,0.354493,1.03753,-0.385684,0.519818
8,9,1.68658,-1.32596,1.42898,-2.08935
9,10,-0.12982,0.631523,-0.586538,0.29072


**Heatmap styling**

This requires matplotlib but Seaborn looks nicer.

In [72]:
cm = sns.light_palette("green", as_cmap=True)

s = df.style.background_gradient(cmap=cm)
s

Unnamed: 0,A,B,C,D,E
0,1,1.32921,,-0.31628,-0.99081
1,2,-1.07082,-1.43871,0.564417,0.295722
2,3,-1.6264,0.219565,0.678805,1.88927
3,4,0.961538,0.104011,-0.481165,0.850229
4,5,1.45342,1.05774,0.165562,0.515018
5,6,-1.33694,0.562861,1.39285,-0.063328
6,7,0.121668,1.2076,-0.00204021,1.6278
7,8,0.354493,1.03753,-0.385684,0.519818
8,9,1.68658,-1.32596,1.42898,-2.08935
9,10,-0.12982,0.631523,-0.586538,0.29072


Many other possibilities are available. Check out Pandas' User Documentation for more (https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html)