### Reshaping dataframes

This part of the section shows the different ways in which one can view the same datatable.

* [Crosstab](#useful-crosstab)
* [Merge](#useful-merge)
* [Melt](#useful-melt)
* [Pivot](#useful-pivot)
* [Stack/Unstack](#useful-stack)


In [None]:
import pandas as pd

#### Crosstab <a class="anchor" id="useful-crosstab"></a>

Suppose we have a datatable which has categorical attributes. We wish to see the item counts of various different combinations of categories. Crosstab is the function we would use in such a case.

For example, given below is a dataset of cars with categorical attributes. The categories describe how the price of the car, maintenance, space, etc. The full description can be found here: https://archive.ics.uci.edu/ml/datasets/Car+Evaluation


In [None]:
df_cars = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data', header=None)
df_cars.columns = ['Buying_Price', 'Maintenance', 'Doors', 'Persons', 'Boot_Space', 'Safety', 'Acceptability']

df_cars.sample(5)

In [None]:
df_cars.shape

In [None]:
# We want to see the number of cars of across different price ranges and acceptability.
# Using crosstab, we can read off the numbers quite easily. For example, there are 39 low priced cars
# that are also high acceptable.


pd.crosstab( df_cars['Buying_Price'] , df_cars['Acceptability'], margins=True )

#### Merge (similar to join in SQL) <a class="anchor" id="useful-merge"></a>

In [None]:
import pandas as pd

df_bikes = pd.read_csv('misc/bike_price.csv')
df_type= pd.read_csv('misc/bike_type.csv')

In [None]:
pd.merge(df_bikes,df_type,on='TypeNumber',how='inner')

#### Melt <a class="anchor" id="useful-melt"></a>

Melt is a function that reshapes the dataframe by converting column names into values.

In [None]:
temp_week = {
    'Channel': [ 'BT-TV' ,'CNN','BBC', 'Google'],
    'Mon': [26,26,27,25],
    'Tue': [25,26,27,25],
    'Wed': [27,26,27,25],
    'Thu': [29,28,28,28],
    'Fri': [26,26,27,26],
    'Sat': [26,24,27,25],
    'Sun': [23,23,23,22]
       
}
df = pd.DataFrame(data=temp_week)
df = df[ ['Channel','Mon','Tue','Wed','Thu','Fri','Sat','Sun'] ] # cols by default are sored in alphabetical order
df

In [None]:
df.set_index(keys='Channel')

In [None]:
temp_df = pd.melt(df, id_vars=['Channel'], var_name='Day', value_name='Temperature')
temp_df

#### Pivot  <a class="anchor" id="useful-pivot"></a>

Pivot is the reverse of melt.

In [None]:
temp_df.pivot(index='Channel', columns='Day', values='Temperature')

In [None]:
# Pivot_table can be used to get aggregate measures. But the prefered way to do this is groupby
import numpy as np
pd.pivot_table(temp_df, columns=['Day'], values='Temperature', aggfunc=np.mean)

#### Stack/Unstack <a class="anchor" id="useful-stack"></a>

These functions are similar to pivot and melt except that they work on multi-level indexed tables. Let us suppose we have a dataframe with multi-level index as shown below. We can convert the typenumber into a column similar sing unstack. 

In [None]:
df_bikes.sort_values(by=['TypeNumber'],inplace=True)
df_multi = df_bikes.set_index(['TypeNumber','Model'])
df_multi

In [None]:
df_unstacked = df_multi.unstack(level='TypeNumber')
df_unstacked

In [None]:
# Once would expect stack to do undo the unstack operation, but it does not quite do so. 
# They reason is because stack() has a preference for what is attribute is picked as the stacked level.
# Understanding the exact behaviour involves some info about multi-indexes which we have not discussed.

df_unstacked.stack('TypeNumber')