# Pivot tables
* A pivot table is itself a DataFrame, which compares two groups on some shared columns, thus:
  1. the rows represent one variable that you're interested in
  2. the columns another variables, and 
  3. the cell content is some *aggregate* value of a third column 
* Often a pivot table includes marginal values as well, which are comparisons across multiple groups (more in a minute)

In [None]:
# Here we have the Times Higher Education World University Ranking dataset
import pandas as pd
import numpy as np
df = pd.read_csv('datasets/cwurData.csv')
df.head()

Let's say we want to create a new column called *Rank_Level*, where institutions with world ranking 1-100 are
categorized as *first tier* and those with world ranking 101 - 200 are *second tier*, ranking 201 - 300 are
*third tier*, after 301 is *other* top universities.

Try it now!

* Let's pivot! We need two columns, let's say the *country* and our *rank level* these will become our new rows (index)/columns (labels)
* Now we need one column of interest for the cell value, let's use the *score*
* Then we need on aggregation function, which we'll apply to *score* let's use `np.mean`

* Essentially this means we're comparing two groups, "Countries" vs. "Rank Level" with respect to score using an average. Think for a moment how you might tackle this with group by...

* Notice that there are some NaN values, e.g. Argentina has only observations in the "Other" unversities category
* Pivot tables aren't limited to one aggregation! We could use multiple functions and see those results with hierarchial column labels

In [None]:
# we can also provide those marginal values


In [None]:
# A pivot table is just a multi-level dataframe


How would we query this if we want to get the average scores of First Tier universities broken down by country?

* Let's get weird. We can `stack` and `unstack` columns in our dataframe.
* `stack` takes pivots the lowermost column index to become the innermost row index. unstack is the inverse
* Let's look back at that pivot table...

In [None]:
new_df.head() #we want to take the tier of uni and move it to a row index, so we are stacking....

In [None]:
# It can get complex! You are just comparing two groups and a value (or multiple values in this case!)
# We can unstack() all the way if we want to, which means move a row index into a column index
new_df.head()