# Data Aggregation and group operations

## Index

1. Groupby mechanics
2. Iterating over groups
   - 2.1. Selecting a column or subset or columns
3. Data aggregation   
   - 3.1. Column-wise and multiple function application
   - 3,2. Apply: General split-apply-combine
   - 3.3. Quantile and bucket analysis
   - 3.4. Example: filling missing values with group-specific values
   - 3.5. Pivot tables and cross-tabulations

## 1. Groupby mechanics

In [1]:
import pandas as pd
import numpy as np


In [6]:
df = pd.DataFrame({
    'data1': np.random.randn(5) * 50 + 20,
    'data2': np.random.randn(5) * 20 + 5,
    'key1': ['one', 'two', 'one', 'two', 'one'],
    'key2': list('aabba')
})
df

Unnamed: 0,data1,data2,key1,key2
0,5.895791,25.110625,one,a
1,-1.629359,-2.198525,two,a
2,83.233262,29.210019,one,b
3,-22.817568,-5.629852,two,b
4,22.142696,32.904751,one,a


Un groupby en pandas es lazy

In [10]:
df.groupby('key1')

<pandas.core.groupby.DataFrameGroupBy object at 0x7f98a4b95b38>

Me devuelve un objeto de tipo DataFrameGroupBy, que tiene bastantes métodos (suma, media...)

In [12]:
gb = df.groupby('key1')

gb.sum()

Unnamed: 0_level_0,data1,data2
key1,Unnamed: 1_level_1,Unnamed: 2_level_1
one,111.271749,87.225395
two,-24.446927,-7.828377


In [13]:
gb.mean()

Unnamed: 0_level_0,data1,data2
key1,Unnamed: 1_level_1,Unnamed: 2_level_1
one,37.090583,29.075132
two,-12.223464,-3.914188


In [15]:
gb.std()

Unnamed: 0_level_0,data1,data2
key1,Unnamed: 1_level_1,Unnamed: 2_level_1
one,40.778065,3.898814
two,14.982327,2.426315


In [17]:
# Con la clave que yo quiera

df.groupby('key2').mean()

Unnamed: 0_level_0,data1,data2
key2,Unnamed: 1_level_1,Unnamed: 2_level_1
a,8.803043,18.605617
b,30.207847,11.790083


In [20]:
# también puedo agrupar en base a varias columnas

df.groupby(['key1', 'key2']).sum()

Unnamed: 0_level_0,Unnamed: 1_level_0,data1,data2
key1,key2,Unnamed: 2_level_1,Unnamed: 3_level_1
one,a,28.038487,58.015375
one,b,83.233262,29.210019
two,a,-1.629359,-2.198525
two,b,-22.817568,-5.629852


In [22]:
# Me da el tamaño de cada grupo

df.groupby(['key1', 'key2']).size()

key1  key2
one   a       2
      b       1
two   a       1
      b       1
dtype: int64

## 2. Iterating over groups



Si quiero hacer una cosa más detallada que no sea un sum, mean..

In [27]:
for name, group in df.groupby('key1'):
    print(name, group)

one        data1      data2 key1 key2
0   5.895791  25.110625  one    a
2  83.233262  29.210019  one    b
4  22.142696  32.904751  one    a
two        data1     data2 key1 key2
1  -1.629359 -2.198525  two    a
3 -22.817568 -5.629852  two    b


In [29]:
gb

<pandas.core.groupby.DataFrameGroupBy object at 0x7f98a4b95fd0>

In [31]:
list(gb)

[('one',        data1      data2 key1 key2
  0   5.895791  25.110625  one    a
  2  83.233262  29.210019  one    b
  4  22.142696  32.904751  one    a), ('two',        data1     data2 key1 key2
  1  -1.629359 -2.198525  two    a
  3 -22.817568 -5.629852  two    b)]

Si hago esto me ejecuta el groupby y me devuelve una tupla de dos valores: los correspondientes a 'one' y los correspondientes a 'two'

### 2.1. Selecting a column or subset or columns

In [33]:
df.groupby('key1').mean()

Unnamed: 0_level_0,data1,data2
key1,Unnamed: 1_level_1,Unnamed: 2_level_1
one,37.090583,29.075132
two,-12.223464,-3.914188


In [35]:
# Si yo sólo quiero sacarla de una columna

df.groupby('key1')['data1'].mean()

key1
one    37.090583
two   -12.223464
Name: data1, dtype: float64

In [38]:
df.groupby(['key1', 'key2'])['data1'].mean()

key1  key2
one   a       14.019244
      b       83.233262
two   a       -1.629359
      b      -22.817568
Name: data1, dtype: float64

In [40]:
df.groupby(['key1', 'key2'])['data1'].mean().index

# Multi índice

MultiIndex(levels=[['one', 'two'], ['a', 'b']],
           labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
           names=['key1', 'key2'])

## 3. Data aggregation

In [41]:
df.groupby('key1')['data1'].quantile(0.9)

key1
one    71.015149
two    -3.748180
Name: data1, dtype: float64

In [44]:
# Podemos utilizar funciones en vez de las predeterminadas (quantile).

# Una agregación que no está predefinida:

def peak_to_peak(series):
    return series.max() - series.min()


In [46]:
df.groupby('key1')['data1'].agg(peak_to_peak)

# El único requisito es que esa función pueda comer series

key1
one    77.337471
two    21.188209
Name: data1, dtype: float64

In [47]:
# Cargamos un archivo

!wget https://github.com/wesm/pydata-book/raw/1st-edition/ch08/tips.csv 

--2018-05-25 18:40:32--  https://github.com/wesm/pydata-book/raw/1st-edition/ch08/tips.csv
Resolving github.com (github.com)... 192.30.253.113, 192.30.253.112
Connecting to github.com (github.com)|192.30.253.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/wesm/pydata-book/1st-edition/ch08/tips.csv [following]
--2018-05-25 18:40:33--  https://raw.githubusercontent.com/wesm/pydata-book/1st-edition/ch08/tips.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.132.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.132.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7943 (7,8K) [text/plain]
Saving to: ‘tips.csv’


2018-05-25 18:40:33 (109 MB/s) - ‘tips.csv’ saved [7943/7943]



In [49]:
tips = pd.read_csv('tips.csv')
tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [51]:
tips.shape

(244, 7)

In [55]:
# Media de propinas en función del género

tips.groupby('sex')['tip'].mean()

sex
Female    2.833448
Male      3.089618
Name: tip, dtype: float64

In [56]:
# También funciona así

tips.groupby('sex').mean()['tip']

sex
Female    2.833448
Male      3.089618
Name: tip, dtype: float64

In [58]:
# Porque 
tips.groupby('sex').mean()
# es un df

Unnamed: 0_level_0,total_bill,tip,size
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,18.056897,2.833448,2.45977
Male,20.744076,3.089618,2.630573


### 3.1. Column-wise and multiple function application

In [61]:
tips['tip_pct'] = tips['tip'] / tips['total_bill']

tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,tip_pct
0,16.99,1.01,Female,No,Sun,Dinner,2,0.059447
1,10.34,1.66,Male,No,Sun,Dinner,3,0.160542
2,21.01,3.5,Male,No,Sun,Dinner,3,0.166587
3,23.68,3.31,Male,No,Sun,Dinner,2,0.13978
4,24.59,3.61,Female,No,Sun,Dinner,4,0.146808


In [65]:
# Agg come nombres de funciones o listas de funciones

tips.groupby(['sex']).agg('mean')

Unnamed: 0_level_0,total_bill,tip,size,tip_pct
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,18.056897,2.833448,2.45977,0.166491
Male,20.744076,3.089618,2.630573,0.157651


In [69]:
tips.groupby(['sex'])['tip_pct'].agg(['mean','std', peak_to_peak])

# Me hace, sobre tip_pct agrupando por sexo, esas tres cosas

Unnamed: 0_level_0,mean,std,peak_to_peak
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,0.166491,0.053632,0.360233
Male,0.157651,0.064778,0.674707


Ejercicio: calcular el Z-score de cada pct propina con respecto a su sex


In [76]:
# Primero calculamos la media y desv típica por sexo

med_std_sex = tips.groupby(['sex'])['tip_pct'].agg(['mean','std'])
med_std_sex

Unnamed: 0_level_0,mean,std
sex,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,0.166491,0.053632
Male,0.157651,0.064778


In [78]:
med_std_sex.columns

Index(['mean', 'std'], dtype='object')

In [82]:
# No tengo sex en las columnas, es el índice

med_std_sex_2 = med_std_sex.reset_index()

In [83]:
med_std_sex_2.columns

Index(['sex', 'mean', 'std'], dtype='object')

In [86]:
# Cruzamos con la tabla

join_tips = tips.merge(med_std_sex_2)

join_tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,tip_pct,mean,std
0,16.99,1.01,Female,No,Sun,Dinner,2,0.059447,0.166491,0.053632
1,24.59,3.61,Female,No,Sun,Dinner,4,0.146808,0.166491,0.053632
2,35.26,5.0,Female,No,Sun,Dinner,4,0.141804,0.166491,0.053632
3,14.83,3.02,Female,No,Sun,Dinner,2,0.203641,0.166491,0.053632
4,10.33,1.67,Female,No,Sun,Dinner,3,0.161665,0.166491,0.053632


In [88]:
join_tips['Z-score'] = (join_tips['tip_pct'] - join_tips['mean'])/join_tips['std']
join_tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,tip_pct,mean,std,Z-score
0,16.99,1.01,Female,No,Sun,Dinner,2,0.059447,0.166491,0.053632,-1.995908
1,24.59,3.61,Female,No,Sun,Dinner,4,0.146808,0.166491,0.053632,-0.367005
2,35.26,5.0,Female,No,Sun,Dinner,4,0.141804,0.166491,0.053632,-0.460306
3,14.83,3.02,Female,No,Sun,Dinner,2,0.203641,0.166491,0.053632,0.692697
4,10.33,1.67,Female,No,Sun,Dinner,3,0.161665,0.166491,0.053632,-0.089978


In [None]:
# Otra manera: en vez de utilizar el reset_index:

tips.merge(..., left_on = 'sex', right_index = True)

In [90]:
# Cuando aplicamos muchas funciones, podemos darles alias

# Antes hemos hecho esto

tips.groupby(['sex'])['tip_pct'].agg(['mean','std'])

Unnamed: 0_level_0,mean,std
sex,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,0.166491,0.053632
Male,0.157651,0.064778


In [91]:
tips.groupby(['sex'])['tip_pct'].agg([('media','mean'),('desv', 'std')])

Unnamed: 0_level_0,media,desv
sex,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,0.166491,0.053632
Male,0.157651,0.064778


Podemos aplicar una función de agregación distinta para cada columna. Con un diccionario

In [92]:
functions_to_use_to_aggregate = {
    'tip': ['mean', 'std', np.sum],
    'total_bill': 'sum'}

functions_to_use_to_aggregate

# Las llamo así porque luego se las meteré a agg

{'tip': ['mean', 'std', <function numpy.core.fromnumeric.sum>],
 'total_bill': 'sum'}

In [96]:
multiple_aggregations = tips.groupby('smoker').agg(functions_to_use_to_aggregate)
multiple_aggregations

Unnamed: 0_level_0,tip,tip,tip,total_bill
Unnamed: 0_level_1,mean,std,sum,sum
smoker,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
No,2.991854,1.37719,451.77,2897.43
Yes,3.00871,1.401468,279.81,1930.34


In [94]:
# A total bill se le aplica lo que le he dicho y a tip también

In [98]:
multiple_aggregations['tip']['mean']

smoker
No     2.991854
Yes    3.008710
Name: mean, dtype: float64

In [100]:
multiple_aggregations[('tip','mean')]  # También con una tupla

smoker
No     2.991854
Yes    3.008710
Name: (tip, mean), dtype: float64

### 3.2. Apply: General split-apply-combine

In [106]:
gb = tips.groupby('smoker')
gb

<pandas.core.groupby.DataFrameGroupBy object at 0x7f98a3eafcc0>

In [107]:
tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,tip_pct
0,16.99,1.01,Female,No,Sun,Dinner,2,0.059447
1,10.34,1.66,Male,No,Sun,Dinner,3,0.160542
2,21.01,3.5,Male,No,Sun,Dinner,3,0.166587
3,23.68,3.31,Male,No,Sun,Dinner,2,0.13978
4,24.59,3.61,Female,No,Sun,Dinner,4,0.146808


In [108]:
# Definimos una función

def top(df, n=2, column = 'tip'):
    return df.sort_values(by = column)[-n:]

top(tips)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,tip_pct
212,48.33,9.0,Male,No,Sat,Dinner,4,0.18622
170,50.81,10.0,Male,Yes,Sat,Dinner,3,0.196812


In [109]:
# Al gb le meto una función

gb = tips.groupby('smoker')
gb.apply(top)

Unnamed: 0_level_0,Unnamed: 1_level_0,total_bill,tip,sex,smoker,day,time,size,tip_pct
smoker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
No,23,39.42,7.58,Male,No,Sat,Dinner,4,0.192288
No,212,48.33,9.0,Male,No,Sat,Dinner,4,0.18622
Yes,183,23.17,6.5,Male,Yes,Sun,Dinner,4,0.280535
Yes,170,50.81,10.0,Male,Yes,Sat,Dinner,3,0.196812


In [115]:
# Cómo cambio los argumentos?

gb = tips.groupby('smoker')
gb.apply(lambda df: top(df, n=5))  # Con una lambda

# Esto funciona porque al apply sólo le puedo meter una función con un sólo argumento.

Unnamed: 0_level_0,Unnamed: 1_level_0,total_bill,tip,sex,smoker,day,time,size,tip_pct
smoker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
No,47,32.4,6.0,Male,No,Sun,Dinner,4,0.185185
No,141,34.3,6.7,Male,No,Thur,Lunch,6,0.195335
No,59,48.27,6.73,Male,No,Sat,Dinner,4,0.139424
No,23,39.42,7.58,Male,No,Sat,Dinner,4,0.192288
No,212,48.33,9.0,Male,No,Sat,Dinner,4,0.18622
Yes,211,25.89,5.16,Male,Yes,Sat,Dinner,4,0.199305
Yes,181,23.33,5.65,Male,Yes,Sun,Dinner,2,0.242177
Yes,214,28.17,6.5,Female,Yes,Sat,Dinner,3,0.230742
Yes,183,23.17,6.5,Male,Yes,Sun,Dinner,4,0.280535
Yes,170,50.81,10.0,Male,Yes,Sat,Dinner,3,0.196812


In [113]:
# Aunque se puede hacer de una manera más sencilla

gb.apply(top, n = 3)

Unnamed: 0_level_0,Unnamed: 1_level_0,total_bill,tip,sex,smoker,day,time,size,tip_pct
smoker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
No,59,48.27,6.73,Male,No,Sat,Dinner,4,0.139424
No,23,39.42,7.58,Male,No,Sat,Dinner,4,0.192288
No,212,48.33,9.0,Male,No,Sat,Dinner,4,0.18622
Yes,214,28.17,6.5,Female,Yes,Sat,Dinner,3,0.230742
Yes,183,23.17,6.5,Male,Yes,Sun,Dinner,4,0.280535
Yes,170,50.81,10.0,Male,Yes,Sat,Dinner,3,0.196812


#### Supressing the group keys

In [118]:
tips.groupby('smoker', group_keys = True).apply(top)


Unnamed: 0_level_0,Unnamed: 1_level_0,total_bill,tip,sex,smoker,day,time,size,tip_pct
smoker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
No,23,39.42,7.58,Male,No,Sat,Dinner,4,0.192288
No,212,48.33,9.0,Male,No,Sat,Dinner,4,0.18622
Yes,183,23.17,6.5,Male,Yes,Sun,Dinner,4,0.280535
Yes,170,50.81,10.0,Male,Yes,Sat,Dinner,3,0.196812


In [119]:
tips.groupby('smoker', group_keys = False).apply(top)


Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,tip_pct
23,39.42,7.58,Male,No,Sat,Dinner,4,0.192288
212,48.33,9.0,Male,No,Sat,Dinner,4,0.18622
183,23.17,6.5,Male,Yes,Sun,Dinner,4,0.280535
170,50.81,10.0,Male,Yes,Sat,Dinner,3,0.196812


Útil para muestras estratificadas

### 3.3. Quantile and bucket analysis

In [122]:
pd.cut?

# Nos sirve para agrupar

In [127]:
pd.cut(tips['total_bill'], 5).head()

# Agrúpame los valores de total_bill en 5 bins

# Esto

0    (12.618, 22.166]
1     (3.022, 12.618]
2    (12.618, 22.166]
3    (22.166, 31.714]
4    (22.166, 31.714]
Name: total_bill, dtype: category
Categories (5, interval[float64]): [(3.022, 12.618] < (12.618, 22.166] < (22.166, 31.714] < (31.714, 41.262] < (41.262, 50.81]]

In [126]:
pd.cut(tips['total_bill'], 5).unique()

[(12.618, 22.166], (3.022, 12.618], (22.166, 31.714], (31.714, 41.262], (41.262, 50.81]]
Categories (5, interval[float64]): [(3.022, 12.618] < (12.618, 22.166] < (22.166, 31.714] < (31.714, 41.262] < (41.262, 50.81]]

In [None]:
# Me dice para cada valor en qué bucket está

In [129]:
pd.cut(tips['total_bill'], range(0,30,5)).head()

0    (15, 20]
1    (10, 15]
2    (20, 25]
3    (20, 25]
4    (20, 25]
Name: total_bill, dtype: category
Categories (5, interval[int64]): [(0, 5] < (5, 10] < (10, 15] < (15, 20] < (20, 25]]

In [137]:
# Otro método: por cuantiles

quantile_series = pd.qcut(tips['total_bill'], 10)

# Me elige los límites de manera que los intervalos tengan el mismo n de obs.

In [138]:
# Media por deciles

tips.groupby(quantile_series).size()

total_bill
(3.069, 10.34]      26
(10.34, 12.636]     23
(12.636, 14.249]    24
(14.249, 16.222]    25
(16.222, 17.795]    24
(17.795, 19.818]    24
(19.818, 22.508]    25
(22.508, 26.098]    24
(26.098, 32.235]    24
(32.235, 50.81]     25
dtype: int64

In [139]:
tips.groupby(quantile_series).mean()

Unnamed: 0_level_0,total_bill,tip,size,tip_pct
total_bill,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
"(3.069, 10.34]",8.828462,1.844615,1.923077,0.215923
"(10.34, 12.636]",11.61087,1.886522,2.0,0.162514
"(12.636, 14.249]",13.330417,2.238333,2.0,0.167743
"(14.249, 16.222]",15.3324,2.4156,2.16,0.158184
"(16.222, 17.795]",16.88,2.95375,2.416667,0.175093
"(17.795, 19.818]",18.572917,2.95375,2.583333,0.159171
"(19.818, 22.508]",20.9712,3.3956,2.6,0.161685
"(22.508, 26.098]",24.206667,3.715833,2.958333,0.153092
"(26.098, 32.235]",28.842917,3.7075,3.458333,0.128327
"(32.235, 50.81]",39.184,4.8516,3.6,0.123403


### 3.4. Example: filling missing values with group-specific values

In [141]:
provinces = ['M','Va', 'So', 'O', 'Ac', 'S']

groups = ['C', 'C', 'C', 'N', 'N', 'N']

df = pd.DataFrame(np.random.randn(6) * 1000000, index = provinces)
df

Unnamed: 0,0
M,51695.68
Va,110474.5
So,827366.0
O,-113016.9
Ac,1102069.0
S,225235.1


In [142]:
# Agrupamos en base a la lista de grupos

df.groupby(groups).sum()

Unnamed: 0,0
C,989536.1
N,1214287.0


In [145]:
df[0][2:5] = np.nan
df

Unnamed: 0,0
M,51695.681206
Va,110474.4644
So,
O,
Ac,
S,225235.11936


In [147]:
# Calculamos la media de cada grupo:

means = df.groupby(groups).mean()
means

Unnamed: 0,0
C,81085.072803
N,225235.11936


In [151]:
# Imputamos los missing de acuerdo al grupo

df.groupby(groups).apply(lambda df: df.fillna(df.mean()))

# Hemos rellenado cada uno con la media de su grupo y luego hemos agrupado

Unnamed: 0,Unnamed: 1,0
C,M,51695.681206
C,Va,110474.4644
C,So,81085.072803
N,O,225235.11936
N,Ac,225235.11936
N,S,225235.11936


### 3.5. Pivot tables and cross-tabulations

In [153]:
tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,tip_pct
0,16.99,1.01,Female,No,Sun,Dinner,2,0.059447
1,10.34,1.66,Male,No,Sun,Dinner,3,0.160542
2,21.01,3.5,Male,No,Sun,Dinner,3,0.166587
3,23.68,3.31,Male,No,Sun,Dinner,2,0.13978
4,24.59,3.61,Female,No,Sun,Dinner,4,0.146808


In [156]:
# Tabla dinámica

pivoted = tips.pivot_table(index = ['sex', 'smoker'])
pivoted

Unnamed: 0_level_0,Unnamed: 1_level_0,size,tip,tip_pct,total_bill
sex,smoker,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Female,No,2.592593,2.773519,0.156921,18.105185
Female,Yes,2.242424,2.931515,0.18215,17.977879
Male,No,2.71134,3.113402,0.160669,19.791237
Male,Yes,2.5,3.051167,0.152771,22.2845


In [158]:
# Sólo ciertos campos:

pivoted = tips.pivot_table(['tip','size'], index = ['sex', 'smoker'])
pivoted


Unnamed: 0_level_0,Unnamed: 1_level_0,size,tip
sex,smoker,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,No,2.592593,2.773519
Female,Yes,2.242424,2.931515
Male,No,2.71134,3.113402
Male,Yes,2.5,3.051167


In [159]:
df.pivot_table?