## Pandas `groupby()` vs `groupby().transform()`

### warm-up

- `groupby` is used to analyze different groups or categories on your data
- When an aggregate statistic is performed on it the result is this statistic per group or category 
- In the following notebook look at other ways to structure the result of groupby during analysis
- Run the code and answer the questions to be discussed as a class

In [31]:
import pandas as pd
import seaborn as sns

In [32]:
penguins = sns.load_dataset('penguins')

In [33]:
penguins.head()

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,Male
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,Female
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,Female
3,Adelie,Torgersen,,,,,
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,Female


### 1. Calculate the mean of the `bill_length_mm` in the dataset:

In [34]:
penguins['bill_length_mm'].mean()

43.9219298245614

### 2. Calculate the mean of the `bill_length_mm` per species

In [35]:
penguins.groupby('species')['bill_length_mm'].mean()

species
Adelie       38.791391
Chinstrap    48.833824
Gentoo       47.504878
Name: bill_length_mm, dtype: float64

### 3. What is the difference between the result in 2. and the code snippet below?

In [36]:
penguins.groupby('species')['bill_length_mm'].transform('mean')

0      38.791391
1      38.791391
2      38.791391
3      38.791391
4      38.791391
         ...    
339    47.504878
340    47.504878
341    47.504878
342    47.504878
343    47.504878
Name: bill_length_mm, Length: 344, dtype: float64

### 4. How could we add this to the dataframe?
### 5. What exactly does `.transform()` do?
### 6. Can `.transform()` be used without `groupby()`?

In [37]:
def convert_mm_to_m(x):
    return x/1000

In [38]:
penguins['bill_depth_mm'].transform(convert_mm_to_m)

0      0.0187
1      0.0174
2      0.0180
3         NaN
4      0.0193
        ...  
339       NaN
340    0.0143
341    0.0157
342    0.0148
343    0.0161
Name: bill_depth_mm, Length: 344, dtype: float64

In [12]:
penguins['bill_length_mm'].transform(lambda x: x/1000)

0      0.0391
1      0.0395
2      0.0403
3         NaN
4      0.0367
        ...  
339       NaN
340    0.0468
341    0.0504
342    0.0452
343    0.0499
Name: bill_length_mm, Length: 344, dtype: float64