In [1]:
import pandas as pd

# Criando uma pivot table usando o método `pivot` e `pivot_table`
- Uma pivot table é interessante para agregar informações de uma tabela que repete valores multiplas vezes
- É uma maneira de comprimir os dados e as informações
- vamos carregar um dataset que tem esse comportamento, ele vai repetir as vendas de vendedores ao longo das datas:

In [2]:
salesmen = pd.read_csv("../data/salesmen.csv", parse_dates=["Date"])
salesmen.head()

  salesmen = pd.read_csv("../data/salesmen.csv", parse_dates=["Date"])


Unnamed: 0,Date,Salesman,Revenue
0,2016-01-01,Bob,7172
1,2016-01-02,Bob,6362
2,2016-01-03,Bob,5982
3,2016-01-04,Bob,7917
4,2016-01-05,Bob,7837


- Primeiro, vamos criar um novo dataframe com o método `pivot`

In [3]:
pivot_df = salesmen.pivot(index="Date", columns="Salesman", values="Revenue")
pivot_df.head()

Salesman,Bob,Dave,Jeb,Oscar,Ronald
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2016-01-01,7172,1864,4430,5250,2639
2016-01-02,6362,8278,8026,8661,4951
2016-01-03,5982,4226,5188,7075,2703
2016-01-04,7917,3868,3144,2524,4258
2016-01-05,7837,2287,938,2793,7771


- Observe que agora nós temos uma nova tabela em que os vendedores se tornaram colunas. Nao tem mais centenas de repetição

## Para exemplificar o Pivot Table, vamos carregar outro dataset
- A ideia é como é feito no excel: pegar os dados de entradas do DF e fazer uma agregação

In [4]:
foods = pd.read_csv("../data/foods.csv")
foods.head()

FileNotFoundError: [Errno 2] No such file or directory: '../data/foods.csv'

- Vamos verificar qual o gasto médio por gênero:

In [7]:
foods.pivot_table(values="Spend", index="Gender", aggfunc="mean")

Unnamed: 0_level_0,Spend
Gender,Unnamed: 1_level_1
Female,50.709629
Male,49.397623


- Agora, qual a soma total do gasto por genero:

In [8]:
foods.pivot_table(values="Spend", index="Gender", aggfunc="sum")

Unnamed: 0_level_0,Spend
Gender,Unnamed: 1_level_1
Female,25963.33
Male,24106.04


- Podemos trocar o indice tranquilamente. Por exemplo, vamos usar o `Item`:

In [11]:
foods.pivot_table(values="Spend", index="Item", aggfunc="sum")

Unnamed: 0_level_0,Spend
Item,Unnamed: 1_level_1
Burger,7765.73
Burrito,8270.44
Chalupa,7644.52
Donut,8758.76
Ice Cream,8886.99
Sushi,8742.93


- Ou, podemos usar multiplos indices:

In [12]:
foods.pivot_table(values="Spend", index=["Gender", "Item"], aggfunc="sum")

Unnamed: 0_level_0,Unnamed: 1_level_0,Spend
Gender,Item,Unnamed: 2_level_1
Female,Burger,4094.3
Female,Burrito,4257.82
Female,Chalupa,4152.26
Female,Donut,4743.0
Female,Ice Cream,4032.87
Female,Sushi,4683.08
Male,Burger,3671.43
Male,Burrito,4012.62
Male,Chalupa,3492.26
Male,Donut,4015.76


- Ainda mais complexo, podemos calcular o gasto médio, considerando tanto genero quanto item, porém estratificando por cidade:
    - Perceba que essa é uma query complexa que é solucionada de maneira rapida e facil com pandas
    - `aggfunc` pode assumir, por exemplo: `mean`, `count`, `max`, `min`, `sum`...

In [14]:
foods.pivot_table(values="Spend", index=["Gender", "Item"], columns="City", aggfunc="mean")

Unnamed: 0_level_0,City,New York,Philadelphia,Stamford
Gender,Item,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,Burger,51.626667,52.87871,45.037778
Female,Burrito,42.563043,52.098571,53.532647
Female,Chalupa,46.135789,52.291563,64.094
Female,Donut,46.670323,54.642,48.734118
Female,Ice Cream,56.356296,46.225625,46.910455
Female,Sushi,47.75129,58.096,45.622187
Male,Burger,58.822273,44.675238,46.424516
Male,Burrito,55.976,43.764333,46.438929
Male,Chalupa,49.1108,48.444783,50.011304
Male,Donut,44.842333,37.859394,49.004483


- Ainda mais complexo, podemos ter multiplas colunas:

In [15]:
foods.pivot_table(values="Spend", index=["Gender", "Item"], columns=["Frequency", "City"], aggfunc="mean")

Unnamed: 0_level_0,Frequency,Daily,Daily,Daily,Monthly,Monthly,Monthly,Never,Never,Never,Often,...,Once,Seldom,Seldom,Seldom,Weekly,Weekly,Weekly,Yearly,Yearly,Yearly
Unnamed: 0_level_1,City,New York,Philadelphia,Stamford,New York,Philadelphia,Stamford,New York,Philadelphia,Stamford,New York,...,Stamford,New York,Philadelphia,Stamford,New York,Philadelphia,Stamford,New York,Philadelphia,Stamford
Gender,Item,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
Female,Burger,43.778333,77.226667,48.22,57.286667,53.7625,59.6225,97.89,54.7425,45.485,23.74,...,31.683333,31.58,58.435714,48.765,92.175,16.0,31.004,64.825,61.585,51.171667
Female,Burrito,44.89,53.595,39.126,40.913333,17.14,67.94,47.4325,63.716667,52.334286,34.533333,...,56.003333,83.77,49.5275,78.163333,13.23,31.41,46.182,35.63,38.916667,43.245
Female,Chalupa,43.19,23.49,95.7,79.185,72.49,80.99,35.15,30.4925,52.12,39.73,...,40.59,40.0,54.902,58.416667,42.88,28.136667,68.23,52.606667,56.048889,69.632
Female,Donut,39.841667,61.85,41.45,71.1325,50.25,45.86,56.07,72.263333,52.443333,32.6575,...,79.12,30.27,45.8125,34.886667,71.39,69.6,55.0075,62.95,58.41,56.12
Female,Ice Cream,65.5475,59.23,46.44,46.265,37.255,41.95,68.716667,39.0075,77.66,58.065,...,55.866,80.783333,50.775,58.865,56.905,47.546667,25.006,37.9175,39.965,15.24
Female,Sushi,40.535,58.088333,56.181429,46.58,78.71,54.195,69.33,47.645,19.56,49.134286,...,38.95,87.7,62.848,27.82,51.36125,55.666667,52.56,46.482,48.616667,45.3425
Male,Burger,63.892,37.566667,49.43,62.43,71.046667,13.58,90.32,8.655,,27.735,...,36.293333,75.226667,47.015,53.25,69.69,33.296667,77.5525,24.805,49.34,45.014
Male,Burrito,78.736667,41.44,69.0575,49.18,29.86,39.866667,28.926667,47.29,70.368,47.48,...,15.075,67.466667,27.71,9.84,64.185,48.208333,40.4625,55.175,59.255,32.83
Male,Chalupa,27.045,68.7025,48.16,66.752,45.35,57.293333,39.818,48.596,46.233333,62.88,...,,11.69,65.375,34.804,54.4,33.92,44.37,55.913333,34.405,58.095
Male,Donut,46.0,47.6775,64.71,45.9325,51.858,29.8825,43.926,26.825,54.91,46.6,...,27.978,16.25,33.003333,40.8275,37.22,38.6,62.254,35.775,22.305,16.52


# Usando o método `melt()`
- Basicamente é a operação inversa do `pivot_table()`
- Pega um dado agregado e transforma e uma tabela

In [5]:
sales = pd.read_csv("../data/quarters.csv")
sales.head()

Unnamed: 0,Salesman,Q1,Q2,Q3,Q4
0,Boris,602908,233879,354479,32704
1,Bob,43790,514863,297151,544493
2,Tommy,392668,113579,430882,247231
3,Travis,834663,266785,749238,570524
4,Donald,580935,411379,110390,651572


- basicamente, vamos mover as colunas Q1-4 para virar linhas da tabela:

In [6]:
melt_sales = pd.melt(sales, id_vars="Salesman")
melt_sales

Unnamed: 0,Salesman,variable,value
0,Boris,Q1,602908
1,Bob,Q1,43790
2,Tommy,Q1,392668
3,Travis,Q1,834663
4,Donald,Q1,580935
5,Ted,Q1,656644
6,Jeb,Q1,486141
7,Stacy,Q1,479662
8,Morgan,Q1,992673
9,Boris,Q2,233879


- Perceba que `variable` são as colunas que queremos expandir e `values` são os valores que cada uma delas assume
- Podemos, obviamente, controlar os nomes delas:

In [20]:
melt_sales = pd.melt(sales, id_vars="Salesman", var_name="Quarter", value_name="Revenue")
melt_sales.head()

Unnamed: 0,Salesman,Quarter,Revenue
0,Boris,Q1,602908
1,Bob,Q1,43790
2,Tommy,Q1,392668
3,Travis,Q1,834663
4,Donald,Q1,580935
