# Pivot Tables and Cross-Tabulation

A *pivot table* is a data analysis software. It aggregates a table of data by one or more keys, arranging the data in a rectangle with some of the group keys along the rows and some *groupby* facility described in this chapter combined with reshape operations utilizing hierarchical indexing. DataFrame has a *pivot_table* method, and additionally there is a top-level *pandas.pivot_table* function. In addition to providing a converience interface to *groupby, pivot_table* also can add partial totals, also known as *margins*

Returning to the ripping data set, suppose I wanted to compute a table of group means (the default *pivot_table* aggregation type) attanged by *sex and smoker* on the rows:

In [1]:
import pandas as pd
import numpy as np
from pandas import DataFrame, Series

In [2]:
tips = pd.read_csv('../../CSV Files/O_Reilly/ch08/tips.csv')

In [3]:
tips.pivot_table(index=['sex', 'smoker'])

Unnamed: 0_level_0,Unnamed: 1_level_0,size,tip,total_bill
sex,smoker,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,No,2.592593,2.773519,18.105185
Female,Yes,2.242424,2.931515,17.977879
Male,No,2.71134,3.113402,19.791237
Male,Yes,2.5,3.051167,22.2845


In [4]:
tips['tip_pct'] = tips['tip'] / tips['total_bill']

This could have been easily produced using *groupby*. Now, suppose we want to aggregate only *tip_pct* and *size*, and additionally group by *day*. I'll put *smoker* in the table columns and *day* in the rows:

In [5]:
tips.pivot_table(['tip_pct', 'size'], index=['sex', 'day'],
                columns=['smoker'])

Unnamed: 0_level_0,Unnamed: 1_level_0,size,size,tip_pct,tip_pct
Unnamed: 0_level_1,smoker,No,Yes,No,Yes
sex,day,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
Female,Fri,2.5,2.0,0.165296,0.209129
Female,Sat,2.307692,2.2,0.147993,0.163817
Female,Sun,3.071429,2.5,0.16571,0.237075
Female,Thur,2.48,2.428571,0.155971,0.163073
Male,Fri,2.0,2.125,0.138005,0.14473
Male,Sat,2.65625,2.62963,0.162132,0.139067
Male,Sun,2.883721,2.6,0.158291,0.173964
Male,Thur,2.5,2.3,0.165706,0.164417


This table could by augmented to include partial totals by passing *margins = Ture*. This has the effect of adding *All* row and column lables, with corresponging values being the group statistics for all the data within a single tier. In this belwo example, the *All* values are means with out taking into account smoker vs. non-smoker (the *All* columns) or any of the two levels of grouping on the rows (the *All* row):

In [6]:
tips.pivot_table(['tip_pct', 'size'], index = ['sex', 'day'],
                    columns='smoker', margins = True)

Unnamed: 0_level_0,Unnamed: 1_level_0,size,size,size,tip_pct,tip_pct,tip_pct
Unnamed: 0_level_1,smoker,No,Yes,All,No,Yes,All
sex,day,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
Female,Fri,2.5,2.0,2.111111,0.165296,0.209129,0.199388
Female,Sat,2.307692,2.2,2.25,0.147993,0.163817,0.15647
Female,Sun,3.071429,2.5,2.944444,0.16571,0.237075,0.181569
Female,Thur,2.48,2.428571,2.46875,0.155971,0.163073,0.157525
Male,Fri,2.0,2.125,2.1,0.138005,0.14473,0.143385
Male,Sat,2.65625,2.62963,2.644068,0.162132,0.139067,0.151577
Male,Sun,2.883721,2.6,2.810345,0.158291,0.173964,0.162344
Male,Thur,2.5,2.3,2.433333,0.165706,0.164417,0.165276
All,,2.668874,2.408602,2.569672,0.159328,0.163196,0.160803


To use a different aggregation function, pass it to aggfunc. For example, '*count*' or *len* will give you a cross-tabulation (count or frequency) of group size:

In [7]:
from importlib.machinery import all_suffixes
from tkinter.simpledialog import askfloat
from turtle import shapesize


tips.pivot_table('tip_pct', index=['sex', 'smoker'], columns= 'day',
                    aggfunc = len, margins=True)

Unnamed: 0_level_0,day,Fri,Sat,Sun,Thur,All
sex,smoker,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Female,No,2,13,14,25,54
Female,Yes,7,15,4,7,33
Male,No,2,32,43,20,97
Male,Yes,8,27,15,10,60
All,,19,87,76,62,244


If some combinations are empty (or otherwise NA), you may wish to pass a *fill_value*:

In [8]:
tips.pivot_table('size', index=['time', 'sex', 'smoker'],
                columns='day', aggfunc= sum, fill_value= 0)

Unnamed: 0_level_0,Unnamed: 1_level_0,day,Fri,Sat,Sun,Thur
time,sex,smoker,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dinner,Female,No,2,30,43,2
Dinner,Female,Yes,8,33,10,0
Dinner,Male,No,4,85,124,0
Dinner,Male,Yes,12,71,39,0
Lunch,Female,No,3,0,0,60
Lunch,Female,Yes,6,0,0,17
Lunch,Male,No,0,0,0,50
Lunch,Male,Yes,5,0,0,23


![pivot_table options](../../Pictures/pivot_table%20options.png)