# Pivot & Melting DataFrames

## Pivoting DataFrames

Pivot dataframes to re-shape data in order to more easily see relationships using pandas `pivot` method. `pivot` allows you to define which columns to use as an `index` (must NOT have any duplicates) and which to use as columns (using the `columns` attribute, must be of the same length) in the new dataframe. 

You can also define which remaining columns to use as values (using the `values` attribute) otherwise all remaining columns are used.

In [27]:
import pandas as pd
import numpy as np

In [30]:
users = pd.read_csv('./data/users.csv', index_col=0)
users

Unnamed: 0,weekday,city,visitors,signups
0,Sun,Austin,139,7
1,Sun,Dallas,237,12
2,Mon,Austin,326,3
3,Mon,Dallas,456,5


In [34]:
users.pivot(
    index='city', # variable used to index the rows
    columns='weekday' # cariable used to index columns
)

Unnamed: 0_level_0,visitors,visitors,signups,signups
weekday,Mon,Sun,Mon,Sun
city,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Austin,326,139,3,7
Dallas,456,237,5,12


In [35]:
users.pivot(
    index='weekday',
    columns='city'
)

Unnamed: 0_level_0,visitors,visitors,signups,signups
city,Austin,Dallas,Austin,Dallas
weekday,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Mon,326,456,3,5
Sun,139,237,7,12


In [54]:
users.pivot(
    index='weekday',
    columns='city',
    values='visitors'
)

city,Austin,Dallas
weekday,Unnamed: 1_level_1,Unnamed: 2_level_1
Mon,326,456
Sun,139,237


In [58]:
people = users.set_index(['weekday', 'city'])
people = people.sort_index()
people

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,signups
weekday,city,Unnamed: 2_level_1,Unnamed: 3_level_1
Mon,Austin,326,3
Mon,Dallas,456,5
Sun,Austin,139,7
Sun,Dallas,237,12


`pivot` will not work directly with dataframes employing a multilevel (hierarchical) index.

In [62]:
try:
    people.pivot(index='city', columns='weekday', values='visitors')
except Exception as error:
    print('Error raised', error)

Error raised 'city'


To work with such dataframes we need to **flatten** them first with pandas `unstack` method, which produces a result similar to `pivot`.

In [84]:
flattened = people.unstack(level='city')
print(type(flattened))
print(flattened.index.name)
print(flattened.columns.names)
flattened

<class 'pandas.core.frame.DataFrame'>
weekday
[None, 'city']


Unnamed: 0_level_0,visitors,visitors,signups,signups
city,Austin,Dallas,Austin,Dallas
weekday,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Mon,326,456,3,5
Sun,139,237,7,12


In [71]:
users.pivot(
    index='weekday',
    columns='city'
)

Unnamed: 0_level_0,visitors,visitors,signups,signups
city,Austin,Dallas,Austin,Dallas
weekday,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Mon,326,456,3,5
Sun,139,237,7,12


In [72]:
flattened = people.unstack(level='weekday')
print(type(flattened))
flattened

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0_level_0,visitors,visitors,signups,signups
weekday,Mon,Sun,Mon,Sun
city,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Austin,326,139,3,7
Dallas,456,237,5,12


In [75]:
users.pivot(
    index='city',
    columns='weekday'
)

Unnamed: 0_level_0,visitors,visitors,signups,signups
weekday,Mon,Sun,Mon,Sun
city,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Austin,326,139,3,7
Dallas,456,237,5,12


In [76]:
people

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,signups
weekday,city,Unnamed: 2_level_1,Unnamed: 3_level_1
Mon,Austin,326,3
Mon,Dallas,456,5
Sun,Austin,139,7
Sun,Dallas,237,12


We can use `swaplevel` (together with `sort_index`) method to swap the outer and inner indices, returns a new dataframe.

In [85]:
swapped = people.swaplevel().sort_index()
swapped

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,signups
city,weekday,Unnamed: 2_level_1,Unnamed: 3_level_1
Austin,Mon,326,3
Austin,Sun,139,7
Dallas,Mon,456,5
Dallas,Sun,237,12


## Pivot Tables

Pivot requires unique column value pairs to work, otherwise you recieve th duplicate indexes error. In these cases use pandas `pivot_table` method. Pivot tables carryout a reduction on the column values using some aggregate function , by default it's an average. Other aggregate functions are possible, using the `aggfunc` attribute.

In [111]:
titanic = pd.read_csv('./data/train.csv')
titanic.head(2)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C


In [132]:
titanic.pivot_table(
    index='Sex',
    columns='Survived',
    values=['PassengerId'],
    aggfunc='count'
)

Unnamed: 0_level_0,PassengerId,PassengerId
Survived,0,1
Sex,Unnamed: 1_level_2,Unnamed: 2_level_2
female,81,233
male,468,109


In [134]:
titanic.pivot_table(
    index='Pclass',
    columns='Sex',
    values=['PassengerId'],
    aggfunc='count',
    margins=True # give a total value
)

Unnamed: 0_level_0,PassengerId,PassengerId,PassengerId
Sex,female,male,All
Pclass,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
1,94,122,216
2,76,108,184
3,144,347,491
All,314,577,891


## Melting DataFrames

Pandas `melt` function allows you to **unpivot** a dataframe. 

* use `id_vars` to specifiy the columns that should remain in the re-shaped dataframe.
* use `value_vars` to specify which columns should remain as values.

In [98]:
users

Unnamed: 0,weekday,city,visitors,signups
0,Sun,Austin,139,7
1,Sun,Dallas,237,12
2,Mon,Austin,326,3
3,Mon,Dallas,456,5


In [99]:
pivoted = users.pivot(index='weekday', columns='city')
pivoted

Unnamed: 0_level_0,visitors,visitors,signups,signups
city,Austin,Dallas,Austin,Dallas
weekday,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Mon,326,456,3,5
Sun,139,237,7,12


In [109]:
try:
    pd.melt(pivoted, id_vars=['weekday', 'city'])
except Exception as error:
    print(error)

'weekday'
