# Groupby
Groupby allows us to group together rows based off of a column and perform an aggregate function on them.
![Untitled-2024-02-28-2007.png](attachment:8373d7db-fbf0-4481-a15e-78244d505f4a.png)

In [3]:
import pandas as pd
import numpy as np

In [4]:
# Create dataframe
data = {'Company':['GOOG','GOOG','MSFT','MSFT','FB','FB'],
       'Person':['Sam','Charlie','Amy','Vanessa','Carl','Sarah'],
       'Sales':[200,120,340,124,243,350]}

In [5]:
df = pd.DataFrame(data)

In [6]:
df

Unnamed: 0,Company,Person,Sales
0,GOOG,Sam,200
1,GOOG,Charlie,120
2,MSFT,Amy,340
3,MSFT,Vanessa,124
4,FB,Carl,243
5,FB,Sarah,350


In [10]:
byComp = df.groupby('Company') # this method will return a "groupby object"

In [11]:
byComp

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001CD5CE959D0>

<p style="font-size: 18px; color: green;">Now, on this <code>groupby</code> object, we can perform aggregate function.</p>


In [15]:
byComp.sum(numeric_only=True)

Unnamed: 0_level_0,Sales
Company,Unnamed: 1_level_1
FB,593
GOOG,320
MSFT,464


<p style="font-size: 18px; color: green;">We need to pass an argument <code>numeric_only=True</code> if we want to ignore non-numeric values.</p>

In [16]:
byComp.mean(numeric_only=True)

Unnamed: 0_level_0,Sales
Company,Unnamed: 1_level_1
FB,296.5
GOOG,160.0
MSFT,232.0


In [17]:
byComp.std(numeric_only=True) # std stands for standard deviation

Unnamed: 0_level_0,Sales
Company,Unnamed: 1_level_1
FB,75.660426
GOOG,56.568542
MSFT,152.735065


#### Accessing a particular cell

In [18]:
byComp.std(numeric_only=True).loc['FB']

Sales    75.660426
Name: FB, dtype: float64

### Doing all in a single line

In [19]:
df.groupby('Company').std(numeric_only=True).loc['FB']

Sales    75.660426
Name: FB, dtype: float64

In [20]:
df.groupby('Company').count()

Unnamed: 0_level_0,Person,Sales
Company,Unnamed: 1_level_1,Unnamed: 2_level_1
FB,2,2
GOOG,2,2
MSFT,2,2


## groupby.describe() method
This method will give information about a particular column

In [22]:
df.groupby('Company').describe()

Unnamed: 0_level_0,Sales,Sales,Sales,Sales,Sales,Sales,Sales,Sales
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max
Company,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
FB,2.0,296.5,75.660426,243.0,269.75,296.5,323.25,350.0
GOOG,2.0,160.0,56.568542,120.0,140.0,160.0,180.0,200.0
MSFT,2.0,232.0,152.735065,124.0,178.0,232.0,286.0,340.0


In [24]:
df.groupby('Company').describe().transpose()

Unnamed: 0,Company,FB,GOOG,MSFT
Sales,count,2.0,2.0,2.0
Sales,mean,296.5,160.0,232.0
Sales,std,75.660426,56.568542,152.735065
Sales,min,243.0,120.0,124.0
Sales,25%,269.75,140.0,178.0
Sales,50%,296.5,160.0,232.0
Sales,75%,323.25,180.0,286.0
Sales,max,350.0,200.0,340.0
