# Rearranging and reshaping data


## 1. Pivoting a single variable
Suppose you started a blog for a band, and you would like to log how many visitors you have had, and how many signed-up for your newsletter. To help design the tours later, you track where the visitors are.

Make a note of which variable you want to use to index the rows ('weekday'), which variable you want to use to index the columns ('city'), and which variable will populate the values in the cells ('visitors'). Try to visualize what the result should be.

We can use 'treatment' to index the rows, 'gender' to index the columns, and 'response' to populate the cells. Prior to pivoting, the DataFrame looked like this:

```
   id treatment gender  response
0   1         A      F         5
1   2         A      M         3
2   3         B      F         8
3   4         B      M         9
```

After pivoting:

```
gender     F  M
treatment      
A          5  3
B          8  9

```
In this exercise, your job is to pivot users so that the focus is on 'visitors', with the columns indexed by 'city' and the rows indexed by 'weekday'.

In [1]:
import pandas as pd

In [4]:
# Preloading dataframe
weekday = ["Sun", "Sun", "Mon", "Mon"]
city = ["Austin", "Dallas", "Austin", "Dallas"]
visitors = [139, 237, 326, 456]
signups = [7, 12, 3, 5]
users = pd.DataFrame({"weekday": weekday, "city": city, "visitors": visitors, "signups": signups})
users

Unnamed: 0,weekday,city,visitors,signups
0,Sun,Austin,139,7
1,Sun,Dallas,237,12
2,Mon,Austin,326,3
3,Mon,Dallas,456,5


In [5]:
# Pivot the users DataFrame: visitors_pivot
visitors_pivot = users.pivot(index = "weekday", columns = "city", values = "visitors")

# Print the pivoted DataFrame
visitors_pivot

city,Austin,Dallas
weekday,Unnamed: 1_level_1,Unnamed: 2_level_1
Mon,326,456
Sun,139,237


Well done! Notice how in the pivoted DataFrame, the index is labeled 'weekday', the columns are labeled 'city', and the values are populated by the number of visitors.

## 2. Pivoting all variables
If you do not select any particular variables, all of them will be pivoted. In this case - with the users DataFrame - both 'visitors' and 'signups' will be pivoted, creating hierarchical column labels.

In [12]:
# Pivot users with signups indexed by weekday and city: signups_pivot
signups_pivot = users.pivot(values = "signups", columns= "city", index = "weekday")

# Print signups_pivot
signups_pivot

city,Austin,Dallas
weekday,Unnamed: 1_level_1,Unnamed: 2_level_1
Mon,3,5
Sun,7,12


In [13]:
# Pivot users pivoted by both signups and visitors: pivot
pivot = users.pivot(index = "weekday", columns = "city")

# Print the pivoted DataFrame
pivot

Unnamed: 0_level_0,visitors,visitors,signups,signups
city,Austin,Dallas,Austin,Dallas
weekday,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Mon,326,456,3,5
Sun,139,237,7,12


Great work! Notice how in the second DataFrame, both 'signups' and 'visitors' were pivoted by default since you didn't provide an argument for the values parameter.

## 3. Stacking & unstacking I
You are now going to practice stacking and unstacking DataFrames. Load the `users` DataFrame you have been working with, this time with a MultiIndex. Explore it to see the data layout. Pay attention to the index, and notice that the index levels are ['city', 'weekday']. So 'weekday' - the second entry - has position 1. This position is what corresponds to the level parameter in .stack() and .unstack() calls. Alternatively, you can specify 'weekday' as the level instead of its position.

Your job in this exercise is to unstack users by 'weekday'. You will then use .stack() on the unstacked DataFrame to see if you get back the original layout of users.

In [16]:
# Setting multilevel indexing for the users dataframe
users.set_index(["city", "weekday"], inplace=True)
users

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,signups
city,weekday,Unnamed: 2_level_1,Unnamed: 3_level_1
Austin,Sun,139,7
Dallas,Sun,237,12
Austin,Mon,326,3
Dallas,Mon,456,5


In [19]:
# Sorting multi-level index
users.sort_index(inplace=True)
users

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,signups
city,weekday,Unnamed: 2_level_1,Unnamed: 3_level_1
Austin,Mon,326,3
Austin,Sun,139,7
Dallas,Mon,456,5
Dallas,Sun,237,12


In [20]:
# Unstack users by 'weekday': byweekday
byweekday = users.unstack(level="weekday")

# Print the byweekday DataFrame
byweekday

Unnamed: 0_level_0,visitors,visitors,signups,signups
weekday,Mon,Sun,Mon,Sun
city,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Austin,326,139,3,7
Dallas,456,237,5,12


In [21]:
# Stack byweekday by 'weekday' and print it
byweekday.stack(level="weekday")

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,signups
city,weekday,Unnamed: 2_level_1,Unnamed: 3_level_1
Austin,Mon,326,3
Austin,Sun,139,7
Dallas,Mon,456,5
Dallas,Sun,237,12


Great work! By stacking and then unstacking users, you ended up with the same layout as the original DataFrame.

## 4. Stacking & unstacking II
You are now going to continue working with the `users` DataFrame. 

Your job in this exercise is to unstack and then stack the 'city' level, as you did previously for 'weekday'. Note that you won't get the same DataFrame.

In [22]:
users

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,signups
city,weekday,Unnamed: 2_level_1,Unnamed: 3_level_1
Austin,Mon,326,3
Austin,Sun,139,7
Dallas,Mon,456,5
Dallas,Sun,237,12


In [23]:
# Unstack users by 'city': bycity
bycity = users.unstack(level="city")

# Print the bycity DataFrame
bycity

Unnamed: 0_level_0,visitors,visitors,signups,signups
city,Austin,Dallas,Austin,Dallas
weekday,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Mon,326,456,3,5
Sun,139,237,7,12


In [24]:
# Stack bycity by 'city' and print it
bycity.stack(level="city")

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,signups
weekday,city,Unnamed: 2_level_1,Unnamed: 3_level_1
Mon,Austin,326,3
Mon,Dallas,456,5
Sun,Austin,139,7
Sun,Dallas,237,12


Fantastic work! Hopefully this exercise and the previous one have developed your intuition for how stacking and unstacking work.

## 5. Restoring the index order
Continuing from the previous exercise, you will now use `.swaplevel(0, 1)` to flip the index levels. Note they won't be sorted. To sort them, you will have to follow up with a `.sort_index()`. You will then obtain the original DataFrame. Note that an unsorted index leads to slicing failures.

To begin, print both `users` and `bycity`. The goal here is to convert `bycity` back to something that looks like `users`.

In [26]:
# Printing bycity dataframe
bycity

Unnamed: 0_level_0,visitors,visitors,signups,signups
city,Austin,Dallas,Austin,Dallas
weekday,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Mon,326,456,3,5
Sun,139,237,7,12


In [25]:
# Stack 'city' back into the index of bycity: newusers
newusers = bycity.stack(level="city")
newusers

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,signups
weekday,city,Unnamed: 2_level_1,Unnamed: 3_level_1
Mon,Austin,326,3
Mon,Dallas,456,5
Sun,Austin,139,7
Sun,Dallas,237,12


In [27]:
# Swap the levels of the index of newusers: newusers
newusers = newusers.swaplevel(0,1)

# Print newusers and verify that the index is not sorted
newusers

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,signups
city,weekday,Unnamed: 2_level_1,Unnamed: 3_level_1
Austin,Mon,326,3
Dallas,Mon,456,5
Austin,Sun,139,7
Dallas,Sun,237,12


In [28]:
# Sort the index of newusers: newusers
newusers = newusers.sort_index()

# Print newusers and verify that the index is now sorted
newusers

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,signups
city,weekday,Unnamed: 2_level_1,Unnamed: 3_level_1
Austin,Mon,326,3
Austin,Sun,139,7
Dallas,Mon,456,5
Dallas,Sun,237,12


In [29]:
# Verify that the new DataFrame is equal to the original
newusers.equals(users)

True

Wonderful work! It's now time to learn about melting DataFrames!

## 6. Adding names for readability
You are now going to practice melting DataFrames. 

The goal of melting is to restore a pivoted DataFrame to its original form, or to change it from a wide shape to a long shape. You can explicitly specify the columns that should remain in the reshaped DataFrame with `id_vars`, and list which columns to convert into values with `value_vars`. If you don't pass a name to the values in `pd.melt()`, you will lose the name of your variable. You can fix this by using the `value_name` keyword argument.

Your job in this exercise is to melt `visitors_by_city_weekday` to move the city names from the column labels to values in a single column called `'city'`. If you were to use just `pd.melt(visitors_by_city_weekday)`, you would obtain the following result:
```
      city value
0  weekday   Mon
1  weekday   Sun
2   Austin   326
3   Austin   139
4   Dallas   456
5   Dallas   237
```
Therefore, you have to specify the `id_vars` keyword argument to ensure that `'weekday'` is retained in the reshaped DataFrame, and the `value_name` keyword argument to change the name of value to visitors.

In [35]:
# Preparing data for this exercise
visitors_by_city_weekday = users.visitors.unstack(level = "city")
visitors_by_city_weekday

city,Austin,Dallas
weekday,Unnamed: 1_level_1,Unnamed: 2_level_1
Mon,326,456
Sun,139,237


In [36]:
# Reset the index: visitors_by_city_weekday
visitors_by_city_weekday = visitors_by_city_weekday.reset_index()

# Print visitors_by_city_weekday
visitors_by_city_weekday

city,weekday,Austin,Dallas
0,Mon,326,456
1,Sun,139,237


In [37]:
# Melt visitors_by_city_weekday: visitors
visitors = pd.melt(visitors_by_city_weekday, id_vars="weekday", value_name="visitors")

# Print visitors
visitors

Unnamed: 0,weekday,city,visitors
0,Mon,Austin,326
1,Sun,Austin,139
2,Mon,Dallas,456
3,Sun,Dallas,237


Well done! Notice how your melted DataFrame now has a `'city'` column with `Austin` and `Dallas` as its values. In the original DataFrame, they were columns themselves. Also note how specifying the `value_name` parameter has renamed the `'value'` column to `'visitors'`.

## 7. Going from wide to long
You can move multiple columns into a single column (making the data long and skinny) by "melting" multiple columns. In this exercise, you will practice doing this.

In [43]:
# Preparing dataframe for this exercise
users = users.swaplevel(0,1).reset_index()
users

Unnamed: 0,weekday,city,visitors,signups
0,Mon,Austin,326,3
1,Sun,Austin,139,7
2,Mon,Dallas,456,5
3,Sun,Dallas,237,12


In [44]:
# Melt users: skinny
skinny = pd.melt(users, id_vars=["weekday", "city"])

# Print skinny
skinny

Unnamed: 0,weekday,city,variable,value
0,Mon,Austin,visitors,326
1,Sun,Austin,visitors,139
2,Mon,Dallas,visitors,456
3,Sun,Dallas,visitors,237
4,Mon,Austin,signups,3
5,Sun,Austin,signups,7
6,Mon,Dallas,signups,5
7,Sun,Dallas,signups,12


Well done! Here, because you didn't specify the `var_name` or `value_name` parameters, the melted DataFrame has the default variable and value column names.

## 8. Obtaining key-value pairs with melt()
Sometimes, all you need is some key-value pairs, and the context does not matter. If said context is in the index, you can easily obtain what you want. For example, in the `users` DataFrame, the `visitors` and `signups` columns lend themselves well to being represented as key-value pairs. So if you created a hierarchical index with `'city'` and `'weekday'` columns as the index, you can easily extract key-value pairs for the `'visitors'` and `'signups'` columns by melting users and specifying `col_level=0`.

In [46]:
# Dataframe to use for this exercise
users

Unnamed: 0,weekday,city,visitors,signups
0,Mon,Austin,326,3
1,Sun,Austin,139,7
2,Mon,Dallas,456,5
3,Sun,Dallas,237,12


In [63]:
# Set the new index: users_idx
users_idx = users.set_index(["city", "weekday"])

# Print the users_idx DataFrame
users_idx

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,signups
city,weekday,Unnamed: 2_level_1,Unnamed: 3_level_1
Austin,Mon,326,3
Austin,Sun,139,7
Dallas,Mon,456,5
Dallas,Sun,237,12


In [64]:
# Obtain the key-value pairs: kv_pairs
kv_pairs = pd.melt(users_idx, col_level=0)

# Print the key-value pairs
kv_pairs

Unnamed: 0,variable,value
0,visitors,326
1,visitors,139
2,visitors,456
3,visitors,237
4,signups,3
5,signups,7
6,signups,5
7,signups,12


Great work! It's always worth keeping in mind whether any aspects of your data lend themselves well to being represented as key-value pairs.

## 9. Setting up a pivot table
A pivot table allows you to see all of your variables as a function of two other variables. In this exercise, you will use the `.pivot_table()` method to see how the `users` DataFrame entries appear when presented as functions of the `'weekday'` and `'city'` columns. That is, with the rows indexed by `'weekday'` and the columns indexed by `'city'`.

In [67]:
# Starting Dataframe
users

Unnamed: 0,weekday,city,visitors,signups
0,Mon,Austin,326,3
1,Sun,Austin,139,7
2,Mon,Dallas,456,5
3,Sun,Dallas,237,12


In [68]:
users.pivot_table(index = "weekday", columns=["city"])

Unnamed: 0_level_0,signups,signups,visitors,visitors
city,Austin,Dallas,Austin,Dallas
weekday,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Mon,3,5,326,456
Sun,7,12,139,237


Excellent! Notice the labels of the index and the columns are `'weekday'` and `'city'`, respectively - exactly as you specified.

## 10. Using other aggregations in pivot tables
You can also use aggregation functions within a pivot table by specifying the aggfunc parameter. In this exercise, you will practice using the `'count'` and `len` aggregation functions - which produce the same result - on the `users` DataFrame.

In [69]:
# Use a pivot table to display the count of each column: count_by_weekday1
count_by_weekday1 = users.pivot_table(index = "weekday", aggfunc = "count")

# Print count_by_weekday
count_by_weekday1

Unnamed: 0_level_0,city,signups,visitors
weekday,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Mon,2,2,2
Sun,2,2,2


In [70]:
# Replace 'aggfunc='count'' with 'aggfunc=len': count_by_weekday2
count_by_weekday2 = users.pivot_table(index = "weekday", aggfunc= len)

# Verify that the same result is obtained
count_by_weekday1.equals(count_by_weekday2)

True

Well done! As expected, both the `len` and `'count'` aggregation functions produced the same result.

## 11. Using margins in pivot tables
Sometimes it's useful to add totals in the margins of a pivot table. You can do this with the argument `margins=True`. In this exercise, you will practice using `margins` in a pivot table along with a new aggregation function: `sum`.

In [75]:
# Starting Dataframe
users

Unnamed: 0,weekday,city,visitors,signups
0,Mon,Austin,326,3
1,Sun,Austin,139,7
2,Mon,Dallas,456,5
3,Sun,Dallas,237,12


In [72]:
# Create the DataFrame with the appropriate pivot table: signups_and_visitors
signups_and_visitors = users.pivot_table(index = "weekday", aggfunc = sum)

# Print signups_and_visitors
signups_and_visitors

Unnamed: 0_level_0,signups,visitors
weekday,Unnamed: 1_level_1,Unnamed: 2_level_1
Mon,8,782
Sun,19,376


In [73]:
# Add in the margins: signups_and_visitors_total 
signups_and_visitors_total = users.pivot_table(index = "weekday",aggfunc = sum ,  margins = True)

# Print signups_and_visitors_total
signups_and_visitors_total

Unnamed: 0_level_0,signups,visitors
weekday,Unnamed: 1_level_1,Unnamed: 2_level_1
Mon,8,782
Sun,19,376
All,27,1158


Fantastic! Take a look at how specifying margins=True resulted in the totals in each column being computed.