# Pandas Reshaping

Using 2 Pandas tools that are don't exist in the usual relational database management systems: `stacking` and `pivot tables`.

In [1]:
import pandas as pd
import numpy as np

## Stacking

The `stack()` method compresses a level in the DataFrame's columns.

In [6]:
tuples = list(zip(*[
    ['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
    ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]))

index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(8, 2), index=index, columns=['A', 'B'])
df

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B
first,second,Unnamed: 2_level_1,Unnamed: 3_level_1
bar,one,1.704238,0.541993
bar,two,-1.143423,0.583134
baz,one,-0.918292,-0.525323
baz,two,0.412571,0.107841
foo,one,-1.207709,-0.085743
foo,two,0.820842,0.384716
qux,one,0.465889,-0.380897
qux,two,0.012261,0.741264


In [7]:
df2 = df[:4]
df2

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B
first,second,Unnamed: 2_level_1,Unnamed: 3_level_1
bar,one,1.704238,0.541993
bar,two,-1.143423,0.583134
baz,one,-0.918292,-0.525323
baz,two,0.412571,0.107841


In [8]:
stacked = df2.stack()
stacked

first  second   
bar    one     A    1.704238
               B    0.541993
       two     A   -1.143423
               B    0.583134
baz    one     A   -0.918292
               B   -0.525323
       two     A    0.412571
               B    0.107841
dtype: float64

In [9]:
stacked.dtype

dtype('float64')

Stacking works the other way also.

In [18]:
stacked.unstack(0)

Unnamed: 0_level_0,first,bar,baz
second,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
one,A,1.704238,-0.918292
one,B,0.541993,-0.525323
two,A,-1.143423,0.412571
two,B,0.583134,0.107841


In [19]:
stacked.unstack(1)

Unnamed: 0_level_0,second,one,two
first,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
bar,A,1.704238,-1.143423
bar,B,0.541993,0.583134
baz,A,-0.918292,0.412571
baz,B,-0.525323,0.107841


## Pivot Tables

In [20]:
df = pd.DataFrame({'A': ['one', 'one', 'two', 'three'] * 3,
                   'B': ['A', 'B', 'C'] * 4,
                   'C': ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 2,
                   'D': np.random.randn(12),
                   'E': np.random.randn(12)})

In [21]:
pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'])

Unnamed: 0_level_0,C,bar,foo
A,B,Unnamed: 2_level_1,Unnamed: 3_level_1
one,A,-0.686573,1.074451
one,B,0.511225,-0.800466
one,C,0.18846,-0.980315
three,A,-1.334193,
three,B,,0.957545
three,C,-0.330885,
two,A,,-2.260016
two,B,0.994628,
two,C,,0.629763
