# Chapter 8: Join, Combine and Reshape

## Hierarchical indexing

It allows you to work with higher dimensional data in a lower dimensional form. For example: 

In [3]:
import pandas as pd
import numpy as np
from pandas import DataFrame, Series

In [3]:
data = pd.Series(np.random.randn(9), index=[['a','a','a','b','b','c','c','d','d'], [1,2,3,1,3,1,2,2,3]])
data

a  1   -0.431083
   2   -0.983507
   3   -2.188526
b  1    0.662896
   3   -0.432764
c  1   -0.665135
   2    0.992855
d  2    0.362436
   3    1.394354
dtype: float64

In [4]:
data['b']

1    0.662896
3   -0.432764
dtype: float64

In [5]:
data.unstack()

Unnamed: 0,1,2,3
a,-0.431083,-0.983507,-2.188526
b,0.662896,,-0.432764
c,-0.665135,0.992855,
d,,0.362436,1.394354


## Reordering and sorting Levels
data.sort_index(level=1)

In [8]:
data.sort_index(level=1)

a  1   -0.431083
b  1    0.662896
c  1   -0.665135
a  2   -0.983507
c  2    0.992855
d  2    0.362436
a  3   -2.188526
b  3   -0.432764
d  3    1.394354
dtype: float64

In [11]:
data.swaplevel(0,1)

1  a   -0.431083
2  a   -0.983507
3  a   -2.188526
1  b    0.662896
3  b   -0.432764
1  c   -0.665135
2  c    0.992855
   d    0.362436
3  d    1.394354
dtype: float64

To move to and from columns / index you can use the `set_index` and `reset_index` methods. Where `set_index` removes columns and places them as indices, and `reset_index` does the inverse operation.

You can also opt to replicate the columns in an index rather than removing them by using `drop=False`.

## Combining and merging data sets

* `pandas.join` (like merge, but on index)
* `pandas.merge` -> Database like merge
    * inner
    * left
    * right
    * outer
* `pandas.concat` -> Concatenate stacks together
* `combine_first` -> Splice overlapping data together. (Fill in sentinel values with values from another set)



# Pivotting Long <-> Wide formats


In [4]:
df = DataFrame({'key' : ['foo','bar','baz'],
              'A' : [1,2,3],
              'B' : [4,5,6],
              'C' : [7,8,9]})
df

Unnamed: 0,key,A,B,C
0,foo,1,4,7
1,bar,2,5,8
2,baz,3,6,9


In [8]:
melted = pd.melt(df, 'key')
melted

Unnamed: 0,key,variable,value
0,foo,A,1
1,bar,A,2
2,baz,A,3
3,foo,B,4
4,bar,B,5
5,baz,B,6
6,foo,C,7
7,bar,C,8
8,baz,C,9


We have now turned our key into an index, and put it in long-form. (Because it's longer, it is 'expanded in rows'). The reverse operations 'stacks' these rows into columns.

In [9]:
reshaped = melted.pivot('key', 'variable', 'value')
reshaped


variable,A,B,C
key,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
bar,2,5,8
baz,3,6,9
foo,1,4,7
