# Chapter 30: Melting, Transporting, and Stacking Data

## 30.1 Melting Data

There are two ways to organize the same data
- wide/stacked/record 
- long/tide

The following table is an example of a wide format

| name | age | test1 | test2 | teacher |
| --- | --- | --- | --- | --- |
| Adam | 15 | 95 | 80 | Ashby |
| Bob | 16 | 81 | 82 | Ashby |
| Dave | 16 | 89 | 84 | Jones |
| Fred | 15 |    | 88 | Jones |

In a long format, each row contains a single fact.Our long version of our scores looks like this

| name | age | test | score |
| --- | --- | --- | --- |
| Adam | 15 | test1 | 95 |
| Bob  | 16 | test1 | 81 |
| Dave | 16 | test1 | 89 |
| Fred | 15 | test1 | NaN | 
| Adam | 15 | test2 | 80 |
| Bob  | 16 | test2 | 82 |
| Dave | 16 | test2 | 84 |
| Fred | 15 | test2 | 88 | 

In [1]:
import pandas as pd
scores = pd.DataFrame({'name': ['Adam', 'Bob', 'Dave', 'Fred'],
                       'age': [15, 16, 16, 16],
                       'test1': [95, 81, 89, None],
                       'test2': [80, 82, 84, 88],
                       'teacher': ['Ashby', 'Ashby', 'Jones', 'Jones']})

In [2]:
scores

Unnamed: 0,name,age,test1,test2,teacher
0,Adam,15,95.0,80,Ashby
1,Bob,16,81.0,82,Ashby
2,Dave,16,89.0,84,Jones
3,Fred,16,,88,Jones


- We keep the name and age as dimensions and pull our the test scores as facts

In [3]:
scores.melt(id_vars=['name', 'age'],
            value_vars =['test1', 'test2'])

Unnamed: 0,name,age,variable,value
0,Adam,15,test1,95.0
1,Bob,16,test1,81.0
2,Dave,16,test1,89.0
3,Fred,16,test1,
4,Adam,15,test2,80.0
5,Bob,16,test2,82.0
6,Dave,16,test2,84.0
7,Fred,16,test2,88.0


- If we want to change the description of the fact column to a more descriptive, we can pass that as the ``var_name`` parameter.
- We can change the name of the value of the column by providing a ``value_name`` parameter

In [4]:
scores.melt(id_vars=['name', 'age'],
           value_vars=['test1', 'test2'],
            var_name='test',
            value_name='score')

Unnamed: 0,name,age,test,score
0,Adam,15,test1,95.0
1,Bob,16,test1,81.0
2,Dave,16,test1,89.0
3,Fred,16,test1,
4,Adam,15,test2,80.0
5,Bob,16,test2,82.0
6,Dave,16,test2,84.0
7,Fred,16,test2,88.0


- If we want to preserve the teacher information, we need to include it in the ``id_vars``

In [5]:
scores.melt(id_vars=['name', 'age', 'teacher'],
            value_vars=['test1', 'test2'],
            var_name='test',
            value_name='score')

Unnamed: 0,name,age,teacher,test,score
0,Adam,15,Ashby,test1,95.0
1,Bob,16,Ashby,test1,81.0
2,Dave,16,Jones,test1,89.0
3,Fred,16,Jones,test1,
4,Adam,15,Ashby,test2,80.0
5,Bob,16,Ashby,test2,82.0
6,Dave,16,Jones,test2,84.0
7,Fred,16,Jones,test2,88.0


## 30.2 Un-melting Data

In [7]:
melted = scores.melt(id_vars=['name', 'age', 'teacher'],
            value_vars=['test1', 'test2'],
            var_name='test',
            value_name='score')
melted

Unnamed: 0,name,age,teacher,test,score
0,Adam,15,Ashby,test1,95.0
1,Bob,16,Ashby,test1,81.0
2,Dave,16,Jones,test1,89.0
3,Fred,16,Jones,test1,
4,Adam,15,Ashby,test2,80.0
5,Bob,16,Ashby,test2,82.0
6,Dave,16,Jones,test2,84.0
7,Fred,16,Jones,test2,88.0


In [8]:
# unmelt using pivot table method
(melted
 .pivot_table(index=['name', 'age', 'teacher'],
              columns='test',
              values='score')
 .reset_index())

test,name,age,teacher,test1,test2
0,Adam,15,Ashby,95.0,80.0
1,Bob,16,Ashby,81.0,82.0
2,Dave,16,Jones,89.0,84.0
3,Fred,16,Jones,,88.0


In [9]:
# unmelt using groupby method
(melted
 .groupby(['name', 'age', 'teacher', 'test'])
 .score
 .mean()
 .unstack()
 .reset_index()
)

test,name,age,teacher,test1,test2
0,Adam,15,Ashby,95.0,80.0
1,Bob,16,Ashby,81.0,82.0
2,Dave,16,Jones,89.0,84.0
3,Fred,16,Jones,,88.0


## 30.4 Stacking & Unstacking

- ``.unstack`` moves an index level into columns. We use this operation on multi-index data, moving ont he indices into the columns and creating a hierarchical columns
- ``.stack`` moves a multi-level column into index

In [16]:
(scores
.groupby(['name', 'age'])
.size()
.unstack()
)

age,15,16
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Adam,1.0,
Bob,,1.0
Dave,,1.0
Fred,,1.0


- If we want to pull up one of the columns, we can specify the position. 0 is for name and 1 for age or the name of the index

In [17]:
# specifying position
(scores
.groupby(['name', 'age'])
.size()
.unstack(0)
)

name,Adam,Bob,Dave,Fred
age,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
15,1.0,,,
16,,1.0,1.0,1.0


In [19]:
# specifying index
(scores
.groupby(['name', 'age'])
.size()
.unstack('name')
)

name,Adam,Bob,Dave,Fred
age,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
15,1.0,,,
16,,1.0,1.0,1.0


In [18]:
(scores
.groupby(['name', 'age'])
.size()
.unstack(1)
)

age,15,16
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Adam,1.0,
Bob,,1.0
Dave,,1.0
Fred,,1.0


## 30.5 Stacking

In [28]:
gb = (scores
      .groupby(['teacher', 'age'])
      .min())
gb

Unnamed: 0_level_0,Unnamed: 1_level_0,name,test1,test2
teacher,age,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Ashby,15,Adam,95.0,80
Ashby,16,Bob,81.0,82
Jones,16,Dave,89.0,84


In [34]:
teachers = gb.unstack()
teachers

Unnamed: 0_level_0,name,name,test1,test1,test2,test2
age,15,16,15,16,15,16
teacher,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Ashby,Adam,Bob,95.0,81.0,80.0,82.0
Jones,,Dave,,89.0,,84.0


In [35]:
teachers.stack()

Unnamed: 0_level_0,Unnamed: 1_level_0,name,test1,test2
teacher,age,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Ashby,15,Adam,95.0,80.0
Ashby,16,Bob,81.0,82.0
Jones,16,Dave,89.0,84.0
