# Aggregating

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv('data/dc-wikia-data-clean.csv')

## Basic summarizing and descriptive statistics

In [None]:
df.mean()

In [None]:
df.std()

In [None]:
df['sex'].unique()

In [None]:
df['sex'].value_counts()

In [None]:
df['year'].min()

In [None]:
df['year'].max()

## `groupby`

**Figure copied from [Jake Vanderplas's book](https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/03.08-Aggregation-and-Grouping.ipynb).**

![title](figures/jake-vanderplas-split-apply-combine.png)

### Basic built-in aggregation functions

`count`, `sum`, `mean`, `median`, `std`, `var`, `min`, `max`, `prod`, `first`, `last`.

In [None]:
df.groupby('sex').count()

In [None]:
df.groupby('sex').mean()

With multiple level (search for `MultiIndex` for more info).

In [None]:
df.groupby(['sex', 'align']).count()

### Custom aggregations

Specifying pandas built-in functions by name.

In [None]:
df.groupby('sex').agg({'page_id': 'count'})

Using multiple functions for the same column.

In [None]:
df.groupby('sex').agg({'appearances': ['mean', 'std']})

Using custom python functions.

In [None]:
def values_range(x):
    return max(x) - min(x)

In [None]:
df.groupby('sex').agg({'appearances': values_range}).head(10)

## *Exercise*

Among bisexual characters, what is the sex that appears the most? Is that the same for homosexual characters?