**Aggregation Functions**

Aggregation regfers to using one value to describe multiple datapoints.  Calculating an average is the classisc example of aggregation, because we use one value (the average) to describe the 'center' of multiple datapoints. Aggregations like the average are also called summary statistics because they summarize an entire group of data using a statistic.  Using the .describe() method, we can calculate the most common aggregation functions like the mean, minimum, and maximum all at once:

In [8]:
import pandas as pd

df = pd.read_csv('filename.csv')

df['Points Per Game'].describe()

count    10.000000
mean     20.140000
std       8.524761
min       5.500000
25%      13.200000
50%      22.150000
75%      27.450000
max      28.800000
Name: Points Per Game, dtype: float64

This output tells us, for example, that the average points per game were 20.14, with a maximum of 28.8.  If we want to compute aggregations individually, we can apply individual pandas methods to one or more columns:

In [9]:

# syntax to summarize a single column: df['column_name'].summary_method()
#  syntax to summarize multiple columns: df[['column_name1', 'column_name2']].summary_method()

Built-in summary methods include:

- .mean() returns the mean
- .median() returns the median
- .std() returns the standard deviation
- .max() and .min() return the maximum and minimum values respectively
- .nunique() returns the count of unique values
- .count() returns the count of non-null values
- .sum() returns the sum

**Aggregating Booleans**

Here’s a sample Boolean column with its representation as 1s and 0s:

![image.png](attachment:image.png)

When we apply .sum() to the Boolean column, we get 1+0+1+1 = 3 – exactly the number of True entries. And this will always work, because the False entries are 0, and so disappear when we compute the sum.

If we were to calculate .mean() of True values, it would add up all True entries (3) divided by the total number of entries (4) giving us an output of 75% or .75