Hierarchical indexing is an important feature of pandas that enables you to have multiple (two or more) index levels on an axis.

In [174]:
import pandas as pd
import numpy as np
data=pd.Series(np.random.randn(9),index=[['a','a','a','b','b','b','c','c','c'],[1,2,3,4,5,6,7,8,9]])
print(data)

a  1   -0.885389
   2   -0.975715
   3    2.167961
b  4   -0.122855
   5    1.488219
   6    1.492552
c  7    1.540954
   8    1.007483
   9   -0.414360
dtype: float64


In [175]:
print(data.index)

MultiIndex([('a', 1),
            ('a', 2),
            ('a', 3),
            ('b', 4),
            ('b', 5),
            ('b', 6),
            ('c', 7),
            ('c', 8),
            ('c', 9)],
           )


In [176]:
data['b':'c']

b  4   -0.122855
   5    1.488219
   6    1.492552
c  7    1.540954
   8    1.007483
   9   -0.414360
dtype: float64

In [177]:
data.loc[['b','c']]

b  4   -0.122855
   5    1.488219
   6    1.492552
c  7    1.540954
   8    1.007483
   9   -0.414360
dtype: float64

In [178]:
data.loc[:,[2,5]]

a  2   -0.975715
b  5    1.488219
dtype: float64

In [179]:
data.loc[:,:]

a  1   -0.885389
   2   -0.975715
   3    2.167961
b  4   -0.122855
   5    1.488219
   6    1.492552
c  7    1.540954
   8    1.007483
   9   -0.414360
dtype: float64

In [180]:
data.loc[['a','b'],:]

a  1   -0.885389
   2   -0.975715
   3    2.167961
b  4   -0.122855
   5    1.488219
   6    1.492552
dtype: float64

Hierarchical indexing plays an important role in reshaping data and group-based operations like forming a pivot table.       
       
For example, you could rearrange the data into a DataFrame using its unstack method

In [181]:
data

a  1   -0.885389
   2   -0.975715
   3    2.167961
b  4   -0.122855
   5    1.488219
   6    1.492552
c  7    1.540954
   8    1.007483
   9   -0.414360
dtype: float64

In [182]:
data.unstack()

Unnamed: 0,1,2,3,4,5,6,7,8,9
a,-0.885389,-0.975715,2.167961,,,,,,
b,,,,-0.122855,1.488219,1.492552,,,
c,,,,,,,1.540954,1.007483,-0.41436


In [183]:
data.unstack().stack()

a  1   -0.885389
   2   -0.975715
   3    2.167961
b  4   -0.122855
   5    1.488219
   6    1.492552
c  7    1.540954
   8    1.007483
   9   -0.414360
dtype: float64

With a DataFrame, either axis can have a hierarchical index

In [184]:
df=pd.DataFrame(np.arange(12).reshape(4,3),index=[['a','b','c','d'],[1,1,2,2]],columns=[['ankit','kiio','summi'],['ankit','kiio','kiio']])
print(df)

    ankit kiio summi
    ankit kiio  kiio
a 1     0    1     2
b 1     3    4     5
c 2     6    7     8
d 2     9   10    11


In [185]:
df.index.names=['key1','key2']
df.columns.names=['column1','column2']
print(df)

column1   ankit kiio summi
column2   ankit kiio  kiio
key1 key2                 
a    1        0    1     2
b    1        3    4     5
c    2        6    7     8
d    2        9   10    11


A MultiIndex can be created by itself and then reused; the columns in the preceding DataFrame with level names could be created like this.

In [186]:
print(pd.MultiIndex.from_arrays([['ankit','kiio','summi'],['kiio','kiio','summi']],names=['column1','column2']))

MultiIndex([('ankit',  'kiio'),
            ( 'kiio',  'kiio'),
            ('summi', 'summi')],
           names=['column1', 'column2'])


In [187]:
print(df.swaplevel('key2','key1'))

column1   ankit kiio summi
column2   ankit kiio  kiio
key2 key1                 
1    a        0    1     2
     b        3    4     5
2    c        6    7     8
     d        9   10    11


sort_index, on the other hand, sorts the data using only the values in a single level. 
When swapping levels, it’s not uncommon to also use sort_index so that the result is lexicographically sorted by the indicated level

In [188]:
df.sort_index(level=1)

Unnamed: 0_level_0,column1,ankit,kiio,summi
Unnamed: 0_level_1,column2,ankit,kiio,kiio
key1,key2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
a,1,0,1,2
b,1,3,4,5
c,2,6,7,8
d,2,9,10,11


Data selection performance is much better on hierarchically indexed objects if the index is lexicographically sorted starting with the outermost level—that is,the result of calling sort_index(level=0) or sort_index().

In [189]:
print(df.swaplevel(0, 1).sort_index(level=0))

column1   ankit kiio summi
column2   ankit kiio  kiio
key2 key1                 
1    a        0    1     2
     b        3    4     5
2    c        6    7     8
     d        9   10    11


In [190]:
print(df.sum(level='key2'))

column1 ankit kiio summi
column2 ankit kiio  kiio
key2                    
1           3    5     7
2          15   17    19


  print(df.sum(level='key2'))


In [191]:
frame = pd.DataFrame({'a': range(7), 'b': range(7, 0, -1),'c': ['one', 'one', 'one', 'two', 'two','two', 'two'],'d': [0, 1, 2, 0, 1, 2, 3]})
frame2 = frame.set_index(['c', 'd'],drop=False)
print(frame2)

       a  b    c  d
c   d              
one 0  0  7  one  0
    1  1  6  one  1
    2  2  5  one  2
two 0  3  4  two  0
    1  4  3  two  1
    2  5  2  two  2
    3  6  1  two  3


reset_index, on the other hand, does the opposite of set_index; the hierarchical index levels are moved into the columns

In [192]:
frame2 = frame.set_index(['c', 'd'])
print(frame2.reset_index())

     c  d  a  b
0  one  0  0  7
1  one  1  1  6
2  one  2  2  5
3  two  0  3  4
4  two  1  4  3
5  two  2  5  2
6  two  3  6  1


pandas.merge connects rows in DataFrames based on one or more keys. This will be familiar to users of SQL or other relational databases, as it implements database join operations.

pandas.concat concatenates or “stacks” together objects along an axis.

The combine_first instance method enables splicing together overlapping data to fill in missing values in one object with values from another.

By default merge does an 'inner' join; the keys in the result are the intersection, or the common set found in both tables. Other possible options are 'left', 'right', and 'outer'.      
     
           
The outer join takes the union of the keys, combining the effect of applying both left and right joins

In [193]:
import pandas as pd
df1= pd.DataFrame({'key':['a','b','c','a','b'],'name':['ankit','kiioo','soumi','summi','samiksha']})
df1

Unnamed: 0,key,name
0,a,ankit
1,b,kiioo
2,c,soumi
3,a,summi
4,b,samiksha


In [194]:
df2= pd.DataFrame({'key':['a','b'],'Marks':[80,100]})
df2

Unnamed: 0,key,Marks
0,a,80
1,b,100


In [195]:
pd.merge(df1,df2)

Unnamed: 0,key,name,Marks
0,a,ankit,80
1,a,summi,80
2,b,kiioo,100
3,b,samiksha,100


In [196]:
pd.merge(df1,df2,how='inner')

Unnamed: 0,key,name,Marks
0,a,ankit,80
1,a,summi,80
2,b,kiioo,100
3,b,samiksha,100


In [197]:
pd.merge(df1,df2,how='left')

Unnamed: 0,key,name,Marks
0,a,ankit,80.0
1,b,kiioo,100.0
2,c,soumi,
3,a,summi,80.0
4,b,samiksha,100.0


In [198]:
pd.merge(df1,df2,how='right')

Unnamed: 0,key,name,Marks
0,a,ankit,80
1,a,summi,80
2,b,kiioo,100
3,b,samiksha,100


In [199]:
print(pd.merge(df1, df2, how='outer'))

  key      name  Marks
0   a     ankit   80.0
1   a     summi   80.0
2   b     kiioo  100.0
3   b  samiksha  100.0
4   c     soumi    NaN


To merge with multiple keys, pass a list of column names

In [200]:
left = pd.DataFrame({'key1': ['foo', 'foo', 'bar'],'key2': ['one', 'two', 'one'],'lval': [1, 2, 3]})
left

Unnamed: 0,key1,key2,lval
0,foo,one,1
1,foo,two,2
2,bar,one,3


In [201]:
right = pd.DataFrame({'key1': ['foo', 'foo', 'bar', 'bar'],'key2': ['one', 'one', 'one', 'two'],'rval': [4, 5, 6, 7]})
right

Unnamed: 0,key1,key2,rval
0,foo,one,4
1,foo,one,5
2,bar,one,6
3,bar,two,7


In [202]:
pd.merge(left, right, on=['key1', 'key2'], how='outer')

Unnamed: 0,key1,key2,lval,rval
0,foo,one,1.0,4.0
1,foo,one,1.0,5.0
2,foo,two,2.0,
3,bar,one,3.0,6.0
4,bar,two,,7.0


In [203]:
left = pd.DataFrame({'key1': ['foo', 'foo', 'bar'],'key2': ['one', 'two', 'one'],'lval': [1, 2, 3]})
left

Unnamed: 0,key1,key2,lval
0,foo,one,1
1,foo,two,2
2,bar,one,3


In [204]:
right = pd.DataFrame({'key1': ['foo', 'foo', 'bar', 'bar'],'key2': ['one', 'one', 'one', 'two'],'rval': [4, 5, 6, 7]})
right

Unnamed: 0,key1,key2,rval
0,foo,one,4
1,foo,one,5
2,bar,one,6
3,bar,two,7


In [205]:
pd.merge(left, right, on='key1', suffixes=('_left', '_right'))

Unnamed: 0,key1,key2_left,lval,key2_right,rval
0,foo,one,1,one,4
1,foo,one,1,one,5
2,foo,two,2,one,4
3,foo,two,2,one,5
4,bar,one,3,one,6
5,bar,one,3,two,7


In some cases, the merge key(s) in a DataFrame will be found in its index. In this case, you can pass 
left_index=True or right_index=True (or both) to indicate that the index should be used as the merge key

In [206]:
left1 = pd.DataFrame({'key': ['a', 'b', 'a', 'a', 'b', 'c'],'value': range(6)})
left1

Unnamed: 0,key,value
0,a,0
1,b,1
2,a,2
3,a,3
4,b,4
5,c,5


In [207]:
right1 = pd.DataFrame({'group_val': [3.5, 7]}, index=['a', 'b'])
right1

Unnamed: 0,group_val
a,3.5
b,7.0


In [208]:
pd.merge(left1, right1, left_on='key', right_index=True, how='outer')

Unnamed: 0,key,value,group_val
0,a,0,3.5
2,a,2,3.5
3,a,3,3.5
1,b,1,7.0
4,b,4,7.0
5,c,5,


In [209]:
left2 = pd.DataFrame([[1., 2.], [3., 4.], [5., 6.]],index=['a', 'c', 'e'],columns=['Ohio', 'Nevada'])
left2

Unnamed: 0,Ohio,Nevada
a,1.0,2.0
c,3.0,4.0
e,5.0,6.0


In [210]:
right2 = pd.DataFrame([[7., 8.], [9., 10.], [11., 12.], [13, 14]],index=['b', 'c', 'd', 'e'],columns=['Missouri', 'Alabama'])
right2

Unnamed: 0,Missouri,Alabama
b,7.0,8.0
c,9.0,10.0
d,11.0,12.0
e,13.0,14.0


In [211]:
pd.merge(left2, right2, how='outer', left_index=True, right_index=True)

Unnamed: 0,Ohio,Nevada,Missouri,Alabama
a,1.0,2.0,,
b,,,7.0,8.0
c,3.0,4.0,9.0,10.0
d,,,11.0,12.0
e,5.0,6.0,13.0,14.0


DataFrame has a convenient join instance for merging by index. It can also be used to combine together 
many DataFrame objects having the same or similar indexes but non-overlapping columns. In the prior example, we could have written.

In [212]:
left2.join(right2, how='outer')

Unnamed: 0,Ohio,Nevada,Missouri,Alabama
a,1.0,2.0,,
b,,,7.0,8.0
c,3.0,4.0,9.0,10.0
d,,,11.0,12.0
e,5.0,6.0,13.0,14.0


In [213]:
left1

Unnamed: 0,key,value
0,a,0
1,b,1
2,a,2
3,a,3
4,b,4
5,c,5


In [214]:
right1

Unnamed: 0,group_val
a,3.5
b,7.0


In [215]:
left1.join(right1, on='key')

Unnamed: 0,key,value,group_val
0,a,0,3.5
1,b,1,7.0
2,a,2,3.5
3,a,3,3.5
4,b,4,7.0
5,c,5,


In [216]:
left2 = pd.DataFrame([[1., 2.], [3., 4.], [5., 6.]],index=['a', 'c', 'e'],columns=['Ohio', 'Nevada'])
left2

Unnamed: 0,Ohio,Nevada
a,1.0,2.0
c,3.0,4.0
e,5.0,6.0


In [217]:
right2 = pd.DataFrame([[7., 8.], [9., 10.], [11., 12.], [13, 14]],index=['b', 'c', 'd', 'e'],columns=['Missouri', 'Alabama'])
right2

Unnamed: 0,Missouri,Alabama
b,7.0,8.0
c,9.0,10.0
d,11.0,12.0
e,13.0,14.0


In [218]:
another = pd.DataFrame([[7., 8.], [9., 10.], [11., 12.], [16., 17.]],index=['a', 'c', 'e', 'f'],columns=['New York', 'Oregon'])
another

Unnamed: 0,New York,Oregon
a,7.0,8.0
c,9.0,10.0
e,11.0,12.0
f,16.0,17.0


In [219]:
left2.join([right2, another], how='outer')

Unnamed: 0,Ohio,Nevada,Missouri,Alabama,New York,Oregon
a,1.0,2.0,,,7.0,8.0
c,3.0,4.0,9.0,10.0,9.0,10.0
e,5.0,6.0,13.0,14.0,11.0,12.0
b,,,7.0,8.0,,
d,,,11.0,12.0,,
f,,,,,16.0,17.0


Another kind of data combination operation is referred to interchangeably as concatenation, binding, or stacking.    
     
NumPy’s concatenate function can do this with NumPy arrays

In [220]:
arr = np.arange(12).reshape((3, 4))
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [221]:
np.concatenate([arr, arr], axis=1)

array([[ 0,  1,  2,  3,  0,  1,  2,  3],
       [ 4,  5,  6,  7,  4,  5,  6,  7],
       [ 8,  9, 10, 11,  8,  9, 10, 11]])

In [222]:
s1 = pd.Series([0, 1], index=['a', 'b'])
s1

a    0
b    1
dtype: int64

In [223]:
s2 = pd.Series([2, 3, 4], index=['c', 'd', 'e'])
s2

c    2
d    3
e    4
dtype: int64

In [224]:
s3 = pd.Series([5, 6], index=['f', 'g'])
s3

f    5
g    6
dtype: int64

In [225]:
pd.concat([s1, s2, s3])

a    0
b    1
c    2
d    3
e    4
f    5
g    6
dtype: int64

In [226]:
pd.concat([s1, s2, s3],axis=1)

Unnamed: 0,0,1,2
a,0.0,,
b,1.0,,
c,,2.0,
d,,3.0,
e,,4.0,
f,,,5.0
g,,,6.0


In [227]:
s1 = pd.Series([0, 1], index=['a', 'b'])
s1

a    0
b    1
dtype: int64

In [228]:
s4 = pd.concat([s1, s3])
s4

a    0
b    1
f    5
g    6
dtype: int64

In [229]:
pd.concat([s1, s4], axis=1, join='inner')

Unnamed: 0,0,1
a,0,0
b,1,1


Suppose instead you wanted to create a hierarchical index on the concatenation axis. To do this, use the keys argument

In [230]:
result = pd.concat([s1, s1, s3], keys=['one', 'two', 'three'])
result

one    a    0
       b    1
two    a    0
       b    1
three  f    5
       g    6
dtype: int64

In [231]:
result.unstack()

Unnamed: 0,a,b,f,g
one,0.0,1.0,,
two,0.0,1.0,,
three,,,5.0,6.0


In [232]:
pd.concat([s1, s2, s3], axis=1, keys=['one', 'two', 'three'])

Unnamed: 0,one,two,three
a,0.0,,
b,1.0,,
c,,2.0,
d,,3.0,
e,,4.0,
f,,,5.0
g,,,6.0


In [233]:
df1 = pd.DataFrame(np.arange(6).reshape(3, 2), index=['a', 'b', 'c'],columns=['one', 'two'])
df1

Unnamed: 0,one,two
a,0,1
b,2,3
c,4,5


In [234]:
df2 = pd.DataFrame(5 + np.arange(4).reshape(2, 2), index=['a', 'c'],columns=['three', 'four'])
df2

Unnamed: 0,three,four
a,5,6
c,7,8


In [235]:
pd.concat([df1, df2], axis=1, keys=['level1', 'level2'])

Unnamed: 0_level_0,level1,level1,level2,level2
Unnamed: 0_level_1,one,two,three,four
a,0,1,5.0,6.0
b,2,3,,
c,4,5,7.0,8.0


In [236]:
pd.concat({'level1': df1, 'level2': df2}, axis=1)

Unnamed: 0_level_0,level1,level1,level2,level2
Unnamed: 0_level_1,one,two,three,four
a,0,1,5.0,6.0
b,2,3,,
c,4,5,7.0,8.0


In [237]:
pd.concat([df1, df2], axis=1, keys=['level1', 'level2'],names=['upper', 'lower'])

upper,level1,level1,level2,level2
lower,one,two,three,four
a,0,1,5.0,6.0
b,2,3,,
c,4,5,7.0,8.0


Hierarchical indexing provides a consistent way to rearrange data in a DataFrame. There are two primary actions:     
      

stack: This “rotates” or pivots from the columns in the data to the rows        
       
unstack: This pivots from the rows into the columns

Using the stack method on this data pivots the columns into the rows, producing a Series

In [238]:
df1

Unnamed: 0,one,two
a,0,1
b,2,3
c,4,5


In [239]:
result = df1.stack()
result

a  one    0
   two    1
b  one    2
   two    3
c  one    4
   two    5
dtype: int64

By default the innermost level is unstacked (same with stack). You can unstack a different level by passing a level number or name

In [240]:
result.unstack()

Unnamed: 0,one,two
a,0,1
b,2,3
c,4,5


In [241]:
result.unstack(0)

Unnamed: 0,a,b,c
one,0,2,4
two,1,3,5


In [242]:
result.unstack(1)

Unnamed: 0,one,two
a,0,1
b,2,3
c,4,5


When you unstack in a DataFrame, the level unstacked becomes the lowest level in the result

In [243]:
df = pd.DataFrame({'left': result, 'right': result + 5},columns=pd.Index(['left', 'right'], name='side'))
df

Unnamed: 0,side,left,right
a,one,0,5
a,two,1,6
b,one,2,7
b,two,3,8
c,one,4,9
c,two,5,10


A common way to store multiple time series in databases and CSV is in so-called long or stacked format.       
Let’s load some example data and do a small amount of time series wrangling and other data cleaning

In [244]:
data = pd.read_csv('macrodata.csv')
data.head(2)

Unnamed: 0.1,Unnamed: 0,year,quarter,realgdp,realcons,realinv,realgovt,realdpi,cpi,m1,tbilrate,unemp,pop,infl,realint
0,3/31/1959,1959,1,2710.349,1707.4,286.898,470.045,1886.9,28.98,139.7,2.82,5.8,177.146,0.0,0.0
1,6/30/1959,1959,2,2778.801,1733.7,310.859,481.301,1919.7,29.15,141.7,3.08,5.1,177.83,2.34,0.74


In [245]:
data.columns

Index(['Unnamed: 0', 'year', 'quarter', 'realgdp', 'realcons', 'realinv',
       'realgovt', 'realdpi', 'cpi', 'm1', 'tbilrate', 'unemp', 'pop', 'infl',
       'realint'],
      dtype='object')

In [246]:
periods = pd.PeriodIndex(year=data.year, quarter=data.quarter,name='date')
periods

PeriodIndex(['1959Q1', '1959Q2', '1959Q3', '1959Q4', '1960Q1', '1960Q2',
             '1960Q3', '1960Q4', '1961Q1', '1961Q2',
             ...
             '2007Q2', '2007Q3', '2007Q4', '2008Q1', '2008Q2', '2008Q3',
             '2008Q4', '2009Q1', '2009Q2', '2009Q3'],
            dtype='period[Q-DEC]', name='date', length=203)

In [247]:
columns = pd.Index(['realgdp', 'infl', 'unemp'], name='item')
columns

Index(['realgdp', 'infl', 'unemp'], dtype='object', name='item')

In [248]:
data = data.reindex(columns=columns)
data.head(2)

item,realgdp,infl,unemp
0,2710.349,0.0,5.8
1,2778.801,2.34,5.1


In [249]:
data.index = periods.to_timestamp('D', 'end')
data.head(2)

item,realgdp,infl,unemp
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1959-03-31 23:59:59.999999999,2710.349,0.0,5.8
1959-06-30 23:59:59.999999999,2778.801,2.34,5.1


In [250]:
ldata = data.stack().reset_index().rename(columns={0: 'value'})
ldata[:10]

Unnamed: 0,date,item,value
0,1959-03-31 23:59:59.999999999,realgdp,2710.349
1,1959-03-31 23:59:59.999999999,infl,0.0
2,1959-03-31 23:59:59.999999999,unemp,5.8
3,1959-06-30 23:59:59.999999999,realgdp,2778.801
4,1959-06-30 23:59:59.999999999,infl,2.34
5,1959-06-30 23:59:59.999999999,unemp,5.1
6,1959-09-30 23:59:59.999999999,realgdp,2775.488
7,1959-09-30 23:59:59.999999999,infl,2.74
8,1959-09-30 23:59:59.999999999,unemp,5.3
9,1959-12-31 23:59:59.999999999,realgdp,2785.204


This is the so-called long format for multiple time series, or other observational data with two or more keys (here, our keys are date and item).       
Each row in the table represents a single observation.

In [251]:
pivoted = ldata.pivot('date', 'item', 'value')
pivoted

  pivoted = ldata.pivot('date', 'item', 'value')


item,infl,realgdp,unemp
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1959-03-31 23:59:59.999999999,0.00,2710.349,5.8
1959-06-30 23:59:59.999999999,2.34,2778.801,5.1
1959-09-30 23:59:59.999999999,2.74,2775.488,5.3
1959-12-31 23:59:59.999999999,0.27,2785.204,5.6
1960-03-31 23:59:59.999999999,2.31,2847.699,5.2
...,...,...,...
2008-09-30 23:59:59.999999999,-3.16,13324.600,6.0
2008-12-31 23:59:59.999999999,-8.79,13141.920,6.9
2009-03-31 23:59:59.999999999,0.94,12925.410,8.1
2009-06-30 23:59:59.999999999,3.37,12901.504,9.2


By omitting the last argument, you obtain a DataFrame with hierarchical columns

In [252]:
pivoted = ldata.pivot('date', 'item')
print(pivoted[:5])

                              value                
item                           infl   realgdp unemp
date                                               
1959-03-31 23:59:59.999999999  0.00  2710.349   5.8
1959-06-30 23:59:59.999999999  2.34  2778.801   5.1
1959-09-30 23:59:59.999999999  2.74  2775.488   5.3
1959-12-31 23:59:59.999999999  0.27  2785.204   5.6
1960-03-31 23:59:59.999999999  2.31  2847.699   5.2


  pivoted = ldata.pivot('date', 'item')


Note that pivot is equivalent to creating a hierarchical index using set_index followed by a call to unstack

In [253]:
unstacked = ldata.set_index(['date', 'item']).unstack('item')
print(unstacked)

                              value                 
item                           infl    realgdp unemp
date                                                
1959-03-31 23:59:59.999999999  0.00   2710.349   5.8
1959-06-30 23:59:59.999999999  2.34   2778.801   5.1
1959-09-30 23:59:59.999999999  2.74   2775.488   5.3
1959-12-31 23:59:59.999999999  0.27   2785.204   5.6
1960-03-31 23:59:59.999999999  2.31   2847.699   5.2
...                             ...        ...   ...
2008-09-30 23:59:59.999999999 -3.16  13324.600   6.0
2008-12-31 23:59:59.999999999 -8.79  13141.920   6.9
2009-03-31 23:59:59.999999999  0.94  12925.410   8.1
2009-06-30 23:59:59.999999999  3.37  12901.504   9.2
2009-09-30 23:59:59.999999999  3.56  12990.341   9.6

[203 rows x 3 columns]


An inverse operation to pivot for DataFrames is pandas.melt. Rather than transforming one column into many in a new DataFrame, it merges multiple columns into one, producing a DataFrame that is longer than the input. Let’s look at an example:

In [256]:
import pandas as pd
import numpy as np

# Create sample DataFrame
data = {
    'side': ['left', 'right', 'top', 'bottom'],
    'length': [10, 12, 8, 15],
    'width': [5, 6, 4, 7],
    'height': [3, 4, 2, 5]
}

df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)
print("\n" + "="*50 + "\n")

# Perform the melt operation
melted = pd.melt(df, id_vars=['side'])
print("Melted DataFrame:")
print(melted)

Original DataFrame:
     side  length  width  height
0    left      10      5       3
1   right      12      6       4
2     top       8      4       2
3  bottom      15      7       5


Melted DataFrame:
      side variable  value
0     left   length     10
1    right   length     12
2      top   length      8
3   bottom   length     15
4     left    width      5
5    right    width      6
6      top    width      4
7   bottom    width      7
8     left   height      3
9    right   height      4
10     top   height      2
11  bottom   height      5


Using pivot, we can reshape back to the original layout

In [258]:
reshaped=melted.pivot('side','variable','value')
print(reshaped)

variable  height  length  width
side                           
bottom         5      15      7
left           3      10      5
right          4      12      6
top            2       8      4


  reshaped=melted.pivot('side','variable','value')


Since the result of pivot creates an index from the column used as the row labels, we may want to use reset_index to move the data back into a column

In [259]:
print(reshaped.reset_index())

variable    side  height  length  width
0         bottom       5      15      7
1           left       3      10      5
2          right       4      12      6
3            top       2       8      4


You can also specify a subset of columns to use as value columns

In [262]:
print(pd.melt(df, id_vars=['side'], value_vars=['length', 'width','height']))


      side variable  value
0     left   length     10
1    right   length     12
2      top   length      8
3   bottom   length     15
4     left    width      5
5    right    width      6
6      top    width      4
7   bottom    width      7
8     left   height      3
9    right   height      4
10     top   height      2
11  bottom   height      5


In [263]:
pd.melt(df, value_vars=['side', 'length', 'width','height'])

Unnamed: 0,variable,value
0,side,left
1,side,right
2,side,top
3,side,bottom
4,length,10
5,length,12
6,length,8
7,length,15
8,width,5
9,width,6
