# Reshaping Data
*Curtis Miller*

In this notebook I demonstrate shaping data, going from "long-form format" to "wide-form format."

The dataset we will use in this notebook is the population pyramid dataset seen in the first section. I load that dataset in now.

In [None]:
import pandas as pd

In [None]:
pop_pyramid = pd.read_csv("PopPyramids.csv", index_col=["Country", "Year", "Age"])
pop_pyramid.drop("Region", 1, inplace=True)    # Drop unwanted column

In [None]:
pop_pyramid.head()

## Stacking and Unstacking

Stack the data; it is now one-dimensional.

In [None]:
pp_stack = pop_pyramid.stack()
pp_stack

Unstack the data into a wide format.

In [None]:
pp_stack.unstack()    # Original format; equivalent, in this case, to pp_stack.unstack(level=3)

In [None]:
pp_stack.unstack(level=0)

In [None]:
pp_stack.unstack(level=1)    # A different unstacking

In [None]:
pp_stack.unstack(level=2)

In [None]:
pp_stack.head()

In [None]:
pp_stack.unstack(level="Age").head()    # Can use names directly when they exist

In [None]:
pp_stack.unstack(level=["Country", "Year"])    # Lists also work; listing out what becomes columns

## `MultiIndex` to Columns

Perhaps we don't want a `MultiIndex` to exist and we want the data contained in it to be columns. Use the `reset_index()` method.

In [None]:
pp_nomulti = pop_pyramid.reset_index()
pp_nomulti.head()

What would stacking/unstacking this object look like?

In [None]:
pp_nomulti.stack()

In [None]:
pp_nomulti.stack().unstack()

**Warning:** In the process of doing this, we may have changed the type of data.

## Melting and Casting

Melting takes wide-form format data and transforms it into long-form format. A simple melt may look like so:

In [None]:
pd.melt(pp_nomulti)

In [None]:
# A more digestable illustration
pp_nomulti.head()

In [None]:
pd.melt(pp_nomulti.head())    # This is smaller

We get a `DataFrame` with a column for the variable and a column for the value of that variable. The only hint of which row a variable belongs to is its position in the `DataFrame` (in other words, no hint at all).

We can specify which columns we wish to keep, melting the rest.

In [None]:
pd.melt(pp_nomulti, id_vars=["Year", "Age", "Country"])

Or we can specify which variables to melt (or both at the same time).

In [None]:
pd.melt(pp_nomulti.head(), value_vars=["Both Sexes Population", "Male Population", "Female Population"])

In [None]:
pd.melt(pp_nomulti.head(),
        id_vars=["Year", "Age", "Country"],
        value_vars=["Both Sexes Population", "Male Population", "Female Population"])

Casting takes a melted `DataFrame` and reshapes it into wide-form format. Casting might look like so:

In [None]:
pp_melt = pd.melt(pp_nomulti, id_vars=["Year", "Age", "Country"])
pp_melt.head()

In [None]:
pd.pivot_table(pp_melt, values="value", index=["Year", "Age", "Country"], columns="variable")

This recovered the structure we had before, but every row could be uniquely identified. If rows were not unique, aggregation would occur, like so:

In [None]:
pp_melt2 = pd.melt(pp_nomulti.head(), value_vars=["Both Sexes Population", "Male Population", "Female Population"])
pp_melt2    # Notice rows are not uniquely identified by any id variables

In [None]:
pd.pivot_table(pp_melt2, values="value", columns="variable")    # Aggregated (sum)

This will be discussed more in a later section.