In [23]:
import numpy as np, pandas as pd, seaborn as sns

# Pivot Tables

In this section we will be using the All-time Classic Titanic Dataset, available on the Seaborn Libary.

This contains a wealth of information on each passenger of that ill-fated voyage, including gender, age, class, fare paid, and much more.

In [24]:
Titanic=sns.load_dataset('titanic')
Titanic

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.2500,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.0500,S,Third,man,True,,Southampton,no,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,Second,man,True,,Southampton,no,True
887,1,1,female,19.0,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True
888,0,3,female,,1,2,23.4500,S,Third,woman,False,,Southampton,no,False
889,1,1,male,26.0,0,0,30.0000,C,First,man,True,C,Cherbourg,yes,True


## Pivot Tables by Hand

Let's start by grouping the data according to Gender. From the section covered previously you might be tempted to apply the `GroupBy` function 

In [25]:
# for i in Titanic.groupby('sex'):
#     print(i[1])
Titanic.groupby('sex').describe()

Unnamed: 0_level_0,survived,survived,survived,survived,survived,survived,survived,survived,pclass,pclass,...,parch,parch,fare,fare,fare,fare,fare,fare,fare,fare
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,...,75%,max,count,mean,std,min,25%,50%,75%,max
sex,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
female,314.0,0.742038,0.438211,0.0,0.0,1.0,1.0,1.0,314.0,2.159236,...,1.0,6.0,314.0,44.479818,57.997698,6.75,12.071875,23.0,55.0,512.3292
male,577.0,0.188908,0.391775,0.0,0.0,0.0,0.0,1.0,577.0,2.389948,...,0.0,5.0,577.0,25.523893,43.138263,0.0,7.8958,10.5,26.55,512.3292


Say we want to go deeper into our analysis and study the survival by sex and class. We can do so as follows

In [28]:
Titanic.groupby(['sex','class'])['survived'].aggregate('mean').unstack()

  Titanic.groupby(['sex','class'])['survived'].aggregate('mean').unstack()


class,First,Second,Third
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,0.968085,0.921053,0.5
male,0.368852,0.157407,0.135447


Although this gives us a complete idea of both the tasks, the code can be declutered. Pivot Tables can solve this problem for us

## Pivot Table Syntax

The same previous code could be rewritten as follows to make things more readable

In [29]:
Titanic.pivot_table('survived',index='sex',columns='class')

class,First,Second,Third
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,0.968085,0.921053,0.5
male,0.368852,0.157407,0.135447


As expected, the women and higher classes had more chances of survival. i.e. First class woman survived without any uncertainity whereas third class men had the least chances of making it out

### Multi-level Pivot Tables

Just like groupby, the grouping in pivot tables can be specified with multiple levels. For example, we can look at age and a third dimension

**Note**
the use of the pandas `cut` method

In [61]:
age=pd.cut(Titanic['age'],[0,18,80],labels=['Teen','Adult'])
Titanic.pivot_table('survived',index=['sex',age],columns='class')

Unnamed: 0_level_0,class,First,Second,Third
sex,age,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
female,Teen,0.909091,1.0,0.511628
female,Adult,0.972973,0.9,0.423729
male,Teen,0.8,0.6,0.215686
male,Adult,0.375,0.071429,0.133663


We can apply the same strategy when working with the columns as well; let's add info on the fare paid using `pd.qcut` to automatically compute quantiles:



In [67]:
fareBreakdown=pd.qcut(Titanic['fare'],q=2,labels=['Lower','Upper'])
Titanic.pivot_table('survived',index=['sex',age],columns=[fareBreakdown,'class'])

Unnamed: 0_level_0,fare,Lower,Lower,Lower,Upper,Upper,Upper
Unnamed: 0_level_1,class,First,Second,Third,First,Second,Third
sex,age,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
female,Teen,,1.0,0.714286,0.909091,1.0,0.318182
female,Adult,,0.88,0.444444,0.972973,0.914286,0.391304
male,Teen,,0.0,0.26087,0.8,0.818182,0.178571
male,Adult,0.0,0.098039,0.125,0.391304,0.030303,0.192308


### Additional Pivot Table Options

The full implementation of this function goes as follows
```python
# call signature as of Pandas 0.18
DataFrame.pivot_table(
    data, 
    values=None, 
    index=None, 
    columns=None,
    aggfunc='mean', 
    fill_value=None, 
    margins=False,
    dropna=True, 
    margins_name='All'
)
```

We've already seen the examples of active usage of the first 3(4) arguments. Let's see the rest here

The `aggfunc` keyword controls what type of aggregation is applied, which is mean by default. The values can vary from common aggregation function strings (`'sum'`,`'mean'`,`'count'`,`'min'`,`'max'`) to proper functions (`np.sum()`,`min()`,`max()`,`sum()`)

Multiple aggregate functions can also be specified in the form of a dictionary that maps a **column to function**

In [71]:
Titanic.pivot_table(index='sex',columns='class',aggfunc={'survived':'sum','fare':'mean'})

Unnamed: 0_level_0,fare,fare,fare,survived,survived,survived
class,First,Second,Third,First,Second,Third
sex,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
female,106.125798,21.970121,16.11881,91,70,72
male,67.226127,19.741782,12.661633,45,17,47


Notice how we've completely ignored the `values` argument. We're specifying the coloumns that the aggregate functions need to be applied on alongside the functions itself

In [72]:
Titanic.pivot_table('survived',index='sex',columns='class',margins=True)

class,First,Second,Third,All
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
female,0.968085,0.921053,0.5,0.742038
male,0.368852,0.157407,0.135447,0.188908
All,0.62963,0.472826,0.242363,0.383838
