# Meet the hardest functions of Pandas, Part III
## Finally, master the when and how of `melt()` and `pivot()`
<img src='images/gym_male.jpg'></img>

### Introduction

### Setup

In [3]:
# Load necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Plotting pretty figures and avoid blurry images
%config InlineBackend.figure_format = 'retina'
# Larger scale for plots in notebooks
sns.set_context('talk')

# Ignore warnings
import warnings
warnings.filterwarnings('ignore')

# Enable multiple cell outputs
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

Let's start with a very stupid example. I will create an 1x1 dataframe that holds a city name and a temperature for a single day. Then, I will call `melt()` on it to see what effect it has:

In [75]:
df =  pd.DataFrame({'New York': [25]})
df

Unnamed: 0,New York
0,25


In [76]:
df.melt()

Unnamed: 0,variable,value
0,New York,25


So, without any parameters `melt()` takes a column and turns it into a row with two new columns (excluding the index). Let's add two more cities as columns:

In [77]:
df =  pd.DataFrame({'New york': [25],
                     'Paris': [27],
                     'London': [30]})
df

Unnamed: 0,New york,Paris,London
0,25,27,30


If you notice, this type of format for dataframes are not easy to work with and it is not clean. What would be ideal is to take the columns and turn them into rows with their temperature values on the right side:

In [78]:
df.melt()

Unnamed: 0,variable,value
0,New york,25
1,Paris,27
2,London,30


Let's add more temperatures for the cities:

In [83]:
df_larger =  pd.DataFrame({'New york': [25, 27, 23, 25, 29],
                     'Paris': [27, 22, 24, 26, 28],
                     'London': [30, 31, 33, 29, 25]})
df_larger

Unnamed: 0,New york,Paris,London
0,25,27,30
1,27,22,31
2,23,24,33
3,25,26,29
4,29,28,25


What do you think will happen if we call `melt()` on this version of the dataframe? Watch:

In [84]:
df_larger.melt()

Unnamed: 0,variable,value
0,New york,25
1,New york,27
2,New york,23
3,New york,25
4,New york,29
5,Paris,27
6,Paris,22
7,Paris,24
8,Paris,26
9,Paris,28


Just like expected, it converts each column value intro a row. For example, let's a take a key-value pair. New York's temperatures are \[25, 27, 23, 25, 29\].  This means there are 5 key-value pairs and when we use `melt()`, `pandas` takes each of those pairs and displays them as a single row in two columns. After it is done with `New York`, it moves on to other columns.

When `melt()` displays each key-value pair in two columns, it gives the columns default names which are `variable` and `value`. It is possible to change them to something that makes more sense:

In [82]:
df.melt(var_name='city', value_name='temperature')

Unnamed: 0,city,temperature
0,New york,25
1,New york,27
2,New york,23
3,New york,25
4,New york,29
5,Paris,27
6,Paris,22
7,Paris,24
8,Paris,26
9,Paris,28


`var_name` and `value_name` can be used to change the labels of the melted dataframe's columns.

If we keep adding columns, `melt()` will always convert each value into a row with two columns which contain previous columns name and its value.

Now, let's get a little serious. Say we have this dataframe:

In [89]:
temperatures = pd.DataFrame({
    'city': ['New York', 'London', 'Paris', 'Berlin', 'Amsterdam'],
    'day1': [23, 25, 27, 26, 24],
    'day2': [22, 21, 25, 26, 23],
    'day3': [26, 25, 24, 27, 23],
    'day4': [23, 21, 22, 26, 27],
    'day5': [27, 26, 27, 24, 28]
})
temperatures

Unnamed: 0,city,day1,day2,day3,day4,day5
0,New York,23,22,26,23,27
1,London,25,21,25,21,26
2,Paris,27,25,24,22,27
3,Berlin,26,26,27,26,24
4,Amsterdam,24,23,23,27,28


This type of format for tables are not useful to work with. This dataset holds temperature information for 5 cities for 5 days. We can't even perform simple computations like mean on this type of data. Let's try melting the dataframe:

In [90]:
temperatures.melt()

Unnamed: 0,variable,value
0,city,New York
1,city,London
2,city,Paris
3,city,Berlin
4,city,Amsterdam
5,day1,23
6,day1,25
7,day1,27
8,day1,26
9,day1,24


This is not what we want, `melt()` turned the city names into rows too. What would be ideal is if we kept the cities as columns and append the rest as rows. `melt()` has a parameter called `id_vars` to do just that.

If we want to turn only some of the columns into rows, pass the columns to keep as a list (even if it is a single value) to `id_vars` . `id_vars` stands for identity variables.

In [92]:
temperatures.melt(id_vars=['city'])

Unnamed: 0,city,variable,value
0,New York,day1,23
1,London,day1,25
2,Paris,day1,27
3,Berlin,day1,26
4,Amsterdam,day1,24
5,New York,day2,22
6,London,day2,21
7,Paris,day2,25
8,Berlin,day2,26
9,Amsterdam,day2,23


We have the table in a better shape but the column names are not exactly what we want. Instead of changing them manually after melting the table, we can directly do it with `melt()`:

In [96]:
temperatures.melt(id_vars=['city'], var_name='date', value_name='temperature').sample(5)

Unnamed: 0,city,date,temperature
15,New York,day4,23
6,London,day2,21
21,London,day5,26
22,Paris,day5,27
17,Paris,day4,22


The same dataframe with different column labels. Now, it is time we work on a real-world dataset to bring the point home. I will load in the NYC stocks dataset which can be downloaded from Kaggle using this [link](https://www.kaggle.com/dgawlik/nyse/download):