## Groupby Method

The groupby() method in pandas is used to group data in a DataFrame or Series based on one or more columns. It allows you to perform operations on the groups of data separately, such as aggregation, transformation, or filtering.

Here are some of the common use cases of groupby() method in pandas:

Aggregation: You can use groupby() to group data based on one or more columns and apply an aggregation function to each group to get summary statistics. For example, you can group data by a categorical column and compute the mean, median, or sum of a numeric column for each group.

Transformation: You can use groupby() to group data based on one or more columns and apply a function to each group to transform the data. For example, you can group data by a categorical column and normalize a numeric column within each group.

Filtering: You can use groupby() to group data based on one or more columns and filter the groups based on a condition. For example, you can group data by a categorical column and filter out groups that have less than a certain number of observations.

Iteration: You can use groupby() to iterate over groups of data and perform operations on each group separately.

Overall, groupby() is a powerful method in pandas that allows you to perform complex operations on data by grouping it based on one or more columns. It is a key tool for data analysis and data manipulation in pandas.



In [2]:
import pandas as pd
# Create dataframe
data = {'Company':['GOOG','GOOG','MSFT','MSFT','FB','FB'],
       'Person':['Sam','Charlie','Amy','Vanessa','Carl','Sarah'],
       'Sales':[200,120,340,124,243,350]}

In [3]:
df = pd.DataFrame(data)

In [4]:
df

Unnamed: 0,Company,Person,Sales
0,GOOG,Sam,200
1,GOOG,Charlie,120
2,MSFT,Amy,340
3,MSFT,Vanessa,124
4,FB,Carl,243
5,FB,Sarah,350


In [6]:
bycomp = df.groupby('Company')

### Finding the mean of the company sales 

In [9]:
bycomp.mean()

  bycomp.mean()


Unnamed: 0_level_0,Sales
Company,Unnamed: 1_level_1
FB,296.5
GOOG,160.0
MSFT,232.0


In [10]:
bycomp.std()

  bycomp.std()


Unnamed: 0_level_0,Sales
Company,Unnamed: 1_level_1
FB,75.660426
GOOG,56.568542
MSFT,152.735065


In [11]:
df.groupby('Company').sum()

  df.groupby('Company').sum()


Unnamed: 0_level_0,Sales
Company,Unnamed: 1_level_1
FB,593
GOOG,320
MSFT,464


In [12]:
df.groupby('Company').count()

Unnamed: 0_level_0,Person,Sales
Company,Unnamed: 1_level_1,Unnamed: 2_level_1
FB,2,2
GOOG,2,2
MSFT,2,2


In [13]:
df.groupby('Company').max()

Unnamed: 0_level_0,Person,Sales
Company,Unnamed: 1_level_1,Unnamed: 2_level_1
FB,Sarah,350
GOOG,Sam,200
MSFT,Vanessa,340


In [14]:
df.groupby('Company').min()

Unnamed: 0_level_0,Person,Sales
Company,Unnamed: 1_level_1,Unnamed: 2_level_1
FB,Carl,243
GOOG,Charlie,120
MSFT,Amy,124


### This max and min does not occur 

In [15]:
df

Unnamed: 0,Company,Person,Sales
0,GOOG,Sam,200
1,GOOG,Charlie,120
2,MSFT,Amy,340
3,MSFT,Vanessa,124
4,FB,Carl,243
5,FB,Sarah,350
