# Data Aggregation
*Curtis Miller*

Now that we can form groups, let's look at how to get useful quantities for groups.

## Group Quantities

Let's first pick up after where the last video left off.

In [None]:
import pandas as pd

In [None]:
pop_pyramids = pd.read_csv("PopPyramids.csv")
pop_pyramids = pop_pyramids.loc[:, ["Year", "Country", "Age", "Male Population", "Female Population"]]
pop_pyramids.columns = pd.Index(["Year", "Country", "Age", "Male", "Female"])
pop_pyramids = pop_pyramids.loc[pop_pyramids.Age != "Total"]
pop_pyramids = pd.melt(pop_pyramids, id_vars=["Year", "Country", "Age"], var_name="Sex", value_name="Population")
pop_pyramids.head(21)

In [None]:
pop_pyramids.tail(21)

In [None]:
pop_pyramids_16 = pop_pyramids.loc[pop_pyramids.Year == 2016].drop("Year", axis=1)    # 2016 data

# The groups
yeargroup = pop_pyramids.groupby("Year")
agegroup16 = pop_pyramids_16.groupby("Age")
countrygroup16 = pop_pyramids_16.groupby("Country")
sexgroup16 = pop_pyramids_16.groupby("Sex")
cyagroup = pop_pyramids.groupby(["Country", "Year", "Age"])

# A preview of the groups
sexgroup.groups

## Group-Level Calculations
You can apply `DataFrame` methods to group objects to get group-level statistics.

In [None]:
yeargroup.sum()    # Total population per year

In [None]:
agegroup16.sum()    # Age group 

In [None]:
agegroup16.mean()    # What is the mean number of people in age groups?

In [None]:
agegroup16.std()    # Standard deviation?

In [None]:
agegroup16.describe()     # A detailed description

In [None]:
sexgroup16.describe()

In [None]:
countrygroup16.quantile(0.9)    # Quantiles for countries

## `aggregate()`

The group method `aggregate()` (or equivalently `agg()`) computes group-level statistics as we have been doing but permits custom functions and computing multiple statistics for groups.

In [None]:
import numpy as np

In [None]:
countrygroup16.agg(np.sum)

In [None]:
countrygroup16.agg([np.sum, np.mean, np.std])

In [None]:
iqr = lambda x: np.percentile(x, 75) - np.percentile(x, 0.25)    # A function computing the inter-quartile range (IQR)
iqr(np.array([1, 2, 3, 4, 5, 6]))

In [None]:
sexgroup16.agg(iqr)

In [None]:
iqr.__name__

In [None]:
sexgroup16.agg([np.sum, iqr])    # Notice how IQR is named

In [None]:
sexgroup16.agg((("Total", np.sum), ("IQR", iqr)))