In [1]:
import pandas as pd

Lets start by creating a simple DataFrame of weekdays, monuments and (imaginary) number of visitors. We will then use pivot function to rearrange columns in a DataFrame to get a different view of our data.

In [14]:
data = {
    'weekday' : ['sat', 'sun', 'sat', 'sun'],
    'monument': ['taj mahal', 'taj mahal', 'red fort', 'red fort' ],
    'visitors': [ 14579, 29435, 12561, 14989 ], 
    'guides': [ 100, 125, 75, 100 ]     
}

tourist = pd.DataFrame(data)
tourist

Unnamed: 0,guides,monument,visitors,weekday
0,100,taj mahal,14579,sat
1,125,taj mahal,29435,sun
2,75,red fort,12561,sat
3,100,red fort,14989,sun


In [15]:
tourist.pivot(index='weekday', columns='monument', values='visitors')

monument,red fort,taj mahal
weekday,Unnamed: 1_level_1,Unnamed: 2_level_1
sat,12561,14579
sun,14989,29435


In [16]:
# If we want to pivot on all the remaining columns,  we can skip the values parameter
tourist.pivot(index='weekday', columns='monument')

Unnamed: 0_level_0,guides,guides,visitors,visitors
monument,red fort,taj mahal,red fort,taj mahal
weekday,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
sat,75,100,12561,14579
sun,100,125,14989,29435


#### Note: pivot requires unique index column pairs to reshape the data 

We will take a look at how to resolve this condition using pivot_table in the next chapter

## stacking / unstacking DataFrames

Hierarchical indexes can also be used to create pivot like dataset. We will start by creating a DataFrame of our visitor dataset using two levels of indexing.

In [21]:
df = pd.DataFrame(data)
df = df.set_index(['weekday', 'monument'])
df.sort_index()

Unnamed: 0_level_0,Unnamed: 1_level_0,guides,visitors
weekday,monument,Unnamed: 2_level_1,Unnamed: 3_level_1
sat,red fort,75,12561
sat,taj mahal,100,14579
sun,red fort,100,14989
sun,taj mahal,125,29435


### unstack

Now we will look at unstacking. It is useful in cases where we have a thin and long dataset. We can make it short and wide (by transforming some of the rows into columns)

In [22]:
# Now we can use the unstack method to get pivot like data
df.unstack(level='monument')

Unnamed: 0_level_0,guides,guides,visitors,visitors
monument,red fort,taj mahal,red fort,taj mahal
weekday,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
sat,75,100,12561,14579
sun,100,125,14989,29435


The difference between this and pivot is that we have hierarchical columns. 

We can also use an integer to indicate the level. 

In [23]:
df.unstack(level=1)

Unnamed: 0_level_0,guides,guides,visitors,visitors
monument,red fort,taj mahal,red fort,taj mahal
weekday,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
sat,75,100,12561,14579
sun,100,125,14989,29435


### stack

The opposite of unstack is stack. It is used to convert a wide dataset to a thin and long dataset by reducing the number of columns.

Another way to look at this is that unstacking will give pivot like data where stacking will give grouped dataset.


In [32]:
print ("\n\n Original Dataset")
print(df)

print("\n\n Pivot like dataset")
df2 = df.unstack(level=1)
print(df2)

print("\n\n Stacked dataset")
stacked = df2.stack(level='monument')
stacked



 Original Dataset
                   guides  visitors
weekday monument                   
sat     taj mahal     100     14579
sun     taj mahal     125     29435
sat     red fort       75     12561
sun     red fort      100     14989


 Pivot like dataset
           guides           visitors          
monument red fort taj mahal red fort taj mahal
weekday                                       
sat            75       100    12561     14579
sun           100       125    14989     29435


 Stacked dataset


Unnamed: 0_level_0,Unnamed: 1_level_0,guides,visitors
weekday,monument,Unnamed: 2_level_1,Unnamed: 3_level_1
sat,red fort,75,12561
sat,taj mahal,100,14579
sun,red fort,100,14989
sun,taj mahal,125,29435


Suppose instead of having weekday as the first level and monument as the second level, we want things the other way round. We can do this using the swaplevel method we had seen earlier

In [36]:
swapped = stacked.swaplevel(0,1)
swapped

Unnamed: 0_level_0,Unnamed: 1_level_0,guides,visitors
monument,weekday,Unnamed: 2_level_1,Unnamed: 3_level_1
red fort,sat,75,12561
taj mahal,sat,100,14579
red fort,sun,100,14989
taj mahal,sun,125,29435


In [37]:
# Now let's sort it to get pivot like data
swapped.sort_index()

Unnamed: 0_level_0,Unnamed: 1_level_0,guides,visitors
monument,weekday,Unnamed: 2_level_1,Unnamed: 3_level_1
red fort,sat,75,12561
red fort,sun,100,14989
taj mahal,sat,100,14579
taj mahal,sun,125,29435


In [45]:
# Note carefully these set of transformations
print("\n\n Original DataFrame with multilevel index")
print(df)
print("\n Index")
print(df.index)

print("\n\n Sorted DataFrame")
print( df.sort_index() )

print("\n\n Unstacking on monument")
print(df.unstack(level='monument'))

print("\n\n Stacking on monument")
df.unstack(level='monument').stack(level='monument')



 Original DataFrame with multilevel index
                   guides  visitors
weekday monument                   
sat     taj mahal     100     14579
sun     taj mahal     125     29435
sat     red fort       75     12561
sun     red fort      100     14989

 Index
MultiIndex(levels=[['sat', 'sun'], ['red fort', 'taj mahal']],
           labels=[[0, 1, 0, 1], [1, 1, 0, 0]],
           names=['weekday', 'monument'])


 Sorted DataFrame
                   guides  visitors
weekday monument                   
sat     red fort       75     12561
        taj mahal     100     14579
sun     red fort      100     14989
        taj mahal     125     29435


 Unstacking on monument
           guides           visitors          
monument red fort taj mahal red fort taj mahal
weekday                                       
sat            75       100    12561     14579
sun           100       125    14989     29435


 Stacking on monument


Unnamed: 0_level_0,Unnamed: 1_level_0,guides,visitors
weekday,monument,Unnamed: 2_level_1,Unnamed: 3_level_1
sat,red fort,75,12561
sat,taj mahal,100,14579
sun,red fort,100,14989
sun,taj mahal,125,29435


You will notice how after a series of unstacking and stacking transformation, we get back a sorted index version of the original dataFrame 

## melting DataFrames

We can move multiple columns into a single column by melting multiple columns. This will again result in long tables instead of wide ones. The merged columns will get transformed into two columns - 'variable' and 'value'


In [55]:
# Let's reset the multi-column index to get a flat dataset
flatDf = df.reset_index()
print(flatDf)


# Let's use melt to transform all the columns so that we have all the data in just two columns - 
pd.melt(flatDf)

  weekday   monument  guides  visitors
0     sat  taj mahal     100     14579
1     sun  taj mahal     125     29435
2     sat   red fort      75     12561
3     sun   red fort     100     14989


Unnamed: 0,variable,value
0,weekday,sat
1,weekday,sun
2,weekday,sat
3,weekday,sun
4,monument,taj mahal
5,monument,taj mahal
6,monument,red fort
7,monument,red fort
8,guides,100
9,guides,125


Note that we have all the labels in a column called 'variable' and the corrosponding values in 'value' column.

This kind of information will not be very useful. Let's specify which are our id columns so that the remaining columns can be collapsed. Let's do away with dedicated columns for guides and visitors. 

In [57]:
# Now we will melt the flat dataset to do away with dedicated columns for guides and visitors
# We mention which columns should not be collapsed using the 'id_vars' parameter 
pd.melt(flatDf, id_vars=['monument', 'weekday'])

Unnamed: 0,monument,weekday,variable,value
0,taj mahal,sat,guides,100
1,taj mahal,sun,guides,125
2,red fort,sat,guides,75
3,red fort,sun,guides,100
4,taj mahal,sat,visitors,14579
5,taj mahal,sun,visitors,29435
6,red fort,sat,visitors,12561
7,red fort,sun,visitors,14989
