In [None]:
import pandas as pd
import numpy as np

np.random.seed(1349)

In [None]:
df = pd.read_csv('students.csv')

## Reshaping

We will talk about reshaping operations in more detail when we discuss tidy data, but for now we will focus on a couple of common operations that can be used to summarize our data by different subgroups.

### `pd.crosstab`

For an example of `.crosstab`, we will count the number of students passing math in each classroom.

In [None]:
# We will use our student grades DataFrame, df.

df

We'll use the `pd.crosstab` function to count the number of occurances of each subgroup (i.e. each unique combination of classroom and whether or not the student is passing math):

We can also view subtotals with the `margins` set to `True`.

The `.crosstab` function will let us view the numbers as percentages of the total as well by setting `normalize` to `True`.

### `.pivot_table`

Here we use the `.pivot_table` method to create our summary. This method produces output similar to an excel pivot table. We must supply 3 things here:

- which values will make up the rows (the `index`)
- which values will make up the columns
- the values we are aggregating
- an aggregation method (`aggfunc`); if we can omit this, and `mean` will be used by default

For an example using the `pivot_table` method, we'll calculate the average math grade for the combination of `classroom` and `passing_math` status.

Here we'll create a dataframe that represents various orders at a restaurant.

In [None]:
n = 40

orders = pd.DataFrame({
    'drink': np.random.choice(['Tea', 'Water', 'Water'], n),
    'meal': np.random.choice(['Curry', 'Yakisoba Noodle', 'Pad Thai'], n),
})

orders.sample(10)

#### `.map`

The `.map` method lets us use a dictionary to calculate the total price for an order; then I can save my calculations to a new column named `bill`. Let's do this step-by-step.

In [None]:
# Create a dictionary of prices for drinks and meals.

prices = {
    'Yakisoba Noodle': 9,
    'Curry': 11,
    'Pad Thai': 10,
    'Tea': 2,
    'Water': 0,
}

In [None]:
"""
Match the values in the 'drink' and 'meal' columns with the values in the 'prices' dictionary 
and perform the specified calculation. Save this calculation to a new column named 'bill'.
"""

orders['bill'] = orders.drink.map(prices) + orders.meal.map(prices)

orders.sample(10)

Let's take a look at how many orders have each combination of meal and drink:

In [None]:
pd.crosstab(orders.drink, orders.meal)

In [None]:
pd.crosstab(orders.drink, orders.meal, normalize=True, margins=True)

And let's find out the average bill amount for each combination: 

In [None]:
orders.pivot_table(index='drink', columns='meal', values='bill', aggfunc = 'mean')

It's interesting to note that we could find the same information with a multi-level group by:

In [None]:
orders.groupby(['drink', 'meal']).bill.mean()

The choice between group by and a pivot table here is mostly asthetic, and you should use whichever makes more sense to you with the problem at hand. 

### Transposing

In [None]:
df.T

In [None]:
df.describe().T