# Aggregation 📚

### After this encounter we will have covered 
- how to apply different aggregation methods to your datasets
- explanation of what .groupby() does and some options on how apply it to your datasets

In [None]:
import pandas as pd

import warnings
warnings.filterwarnings("ignore")

## 1. Applying aggregation methods: 

In [None]:
df = pd.read_csv("large_countries_2015.csv", sep = ",")
df.info()

In [None]:
df["population"] = (df["population"]/1000).astype(int)

In [None]:
df.head(2)

Let's apply some aggregation methods!
Intuition of "aggregation": take some rows, apply some kind of "operation" on them and return a resumed version of these rows.

In [None]:
df.sum()

If we apply .sum() to the complete dataframe, strings will be concatenated. 

In [None]:
df["population"].sum()

In [None]:
df[["population", "fertility"]].sum()

In [None]:
df["country"].count()

In [None]:
df["country"].value_counts()

In [None]:
df.describe()

.agg() can be used to aggregate more "modularly":

In [None]:
df.agg(
    {"population":"mean",
    "fertility":"median"
    }
)

In [None]:
df.agg(
    ["median","mean","std"]
)

In [None]:
def double(x):
    return 2*x

In [None]:
df.agg(
    {"population":"mean",
    "fertility":"double"
    }
)

In [None]:
df[["population", "fertility"]].agg(
    ["median","mean","double"]
)

## 2. .groupby()

What DOES .groupby() actually do?
1. it **splits** the data
2. it **applies** some kind of operation ON THE GROUPED data
3. it **combines** the data back into a new (pandas) object (i.e. series or dataframe)

After grouping, we can now "mix" between aggregations and transformations, combining .groupby() with .agg() and our customized function from above.

Applying transformations to selected cols:

Plotting examples:

## Comments and questions during the encounter