## Pandas `groupby()` vs `groupby().transform()`

### warm-up

- `groupby` is used to analyze different groups or categories on your data
- When an aggregate statistic is performed on it the result is this statistic per group or category 
- In the following notebook look at other ways to structure the result of groupby during analysis
- Run the code and answer the questions to be discussed as a class

In [None]:
import pandas as pd
import seaborn as sns

In [None]:
penguins = sns.load_dataset('penguins')

In [None]:
penguins.head()

### 1. Calculate the mean of the `bill_length_mm` in the dataset:

In [None]:
penguins['bill_length_mm'].mean()

### 2. Calculate the mean of the `bill_length_mm` per species

In [None]:
penguins.groupby('species')['bill_length_mm'].mean()

### 3. What is the difference between the result in 2. and the code snippet below?

In [None]:
penguins.groupby('species')['bill_length_mm'].transform('mean')

### 4. How could we add this to the dataframe?

In [None]:
penguins['species_mean'] = penguins.groupby('species')['bill_length_mm'].transform('mean')
penguins

### 5. What exactly does `.transform()` do?

In [None]:
penguins['species_mean'] = penguins.groupby('species')['bill_depth_mm'].transform('mean')

pd.set_option('display.max_rows', None)
penguins


In [None]:
pd.reset_option('display.max_rows')

### 6. BONUS: Can `.transform()` be used without `groupby()`?

In [None]:
def mean_diff(value):
    return value - value.mean()

penguins['bill_length_mm'].transform(mean_diff)#(lambda x : x - x.mean())

In [None]:
import numpy as np

penguins['bill_length_mm'].transform(np.sqrt)

In [None]:
df = pd.DataFrame({
                   'age': [25, 30, 35, 40, 45],
                   'score': [80, 85, 90, 95, 100]})

In [None]:
df_transformed = df.transform(lambda x: x-x.mean())

In [None]:
df_transformed

In [None]:
df['age'].mean()