In [1]:
import pandas as pd
import datetime
from os import listdir
from os.path import isfile, join

Introduction
===

Let's create a table that we might use in a report, recording the heights of our TV-highschool sports teams:

`columns` let's give labels to our data as we create the DataFrame

In [2]:
df = pd.DataFrame([["Basketball", 9,15],["Soccer",12,12],["Horse Racing", 20,0]],
                  columns=["Sport","<6ft",">6ft"])

In [3]:
df

Unnamed: 0,Sport,<6ft,>6ft
0,Basketball,9,15
1,Soccer,12,12
2,Horse Racing,20,0


Melt
===

The DataFrame above is nice, for _reading_ but not great for _processing_. Ideally we'd want each row to represent a single 'data point', but currently it represents _two_ data points:

 1. The number of people shorter than 6ft on a team
 2. The number of peole taller than 6ft on a team
 
Let's _melt_ it.


Unpivot
---

'melting' is really "unpivoting", but that  may not help much in understanding what's going on.

Main idea: Take what was currently a column, and make it a _value_ in a new column.

In our case we want to take the column '<6ft' and make that a value, in a new column: 'height'.

'>6ft' also becomes a value in this column.


What about the old values?
---

The values that used to be in the columns, now become values in an additional column.

Let's look at our table again:

In [4]:
df

Unnamed: 0,Sport,<6ft,>6ft
0,Basketball,9,15
1,Soccer,12,12
2,Horse Racing,20,0


Problem
===

How do we determine which columns stay and which get 'melted' together?

Solution
===

We tell pandas which column gets to stay, the name of the two new columns:

 1. The new column for what used to be the column names: `var_name`
 2. The new column for what used to be the column values: `value_name`

In [7]:
melted = pd.melt(df, id_vars=["Sport"], var_name="height", value_name="number of members")
melted

Unnamed: 0,Sport,height,number of members
0,Basketball,<6ft,9
1,Soccer,<6ft,12
2,Horse Racing,<6ft,20
3,Basketball,>6ft,15
4,Soccer,>6ft,12
5,Horse Racing,>6ft,0


In [8]:
melted = melted.sort_values(by=["Sport"])

In [9]:
melted

Unnamed: 0,Sport,height,number of members
0,Basketball,<6ft,9
3,Basketball,>6ft,15
2,Horse Racing,<6ft,20
5,Horse Racing,>6ft,0
1,Soccer,<6ft,12
4,Soccer,>6ft,12
