# Groupby

The groupby method allows you to _group rows of data together and call aggregate functions_. This is a method used a lot in SQL writing. 

In [1]:
import pandas as pd

# Create dataframe
data = {'Company':['GOOG','GOOG','MSFT','MSFT','FB','FB'],
       'Person':['Sam','Charlie','Amy','Vanessa','Carl','Sarah'],
       'Sales':[200,120,340,124,243,350]}

df = pd.DataFrame(data)
df

Unnamed: 0,Company,Person,Sales
0,GOOG,Sam,200
1,GOOG,Charlie,120
2,MSFT,Amy,340
3,MSFT,Vanessa,124
4,FB,Carl,243
5,FB,Sarah,350


Use the `.groupby()` method to _group rows together based off of a column name_, i.e. group based off of Company.  
NOTE: this will create a DataFrameGroupBy object

```
# Input:
df.groupby('Company')

# Output:
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x10fa97e10>
```
You can save this object as a new variable and/or call aggregate methonds on the object. 

In [11]:
df.groupby('Company').mean(numeric_only=True) # <- `numeric_only` is to get rid of error

Unnamed: 0_level_0,Sales
Company,Unnamed: 1_level_1
FB,296.5
GOOG,160.0
MSFT,232.0


More examples of aggregate methods:

In [10]:
# Standard Deviation
df.groupby('Company').std(numeric_only=True)

Unnamed: 0_level_0,Sales
Company,Unnamed: 1_level_1
FB,75.660426
GOOG,56.568542
MSFT,152.735065


In [12]:
# Min & Max
df.groupby('Company').min(numeric_only=True)

Unnamed: 0_level_0,Sales
Company,Unnamed: 1_level_1
FB,243
GOOG,120
MSFT,124


In [16]:
# Count
df.groupby('Company').count()

Unnamed: 0_level_0,Person,Sales
Company,Unnamed: 1_level_1,Unnamed: 2_level_1
FB,2,2
GOOG,2,2
MSFT,2,2


In [20]:
# Can get METADATA on the df. For example, metadata on the sales of Google
df.groupby('Company').describe().transpose()['GOOG']

Sales  count      2.000000
       mean     160.000000
       std       56.568542
       min      120.000000
       25%      140.000000
       50%      160.000000
       75%      180.000000
       max      200.000000
Name: GOOG, dtype: float64