**<h1><center>Pandas - GroupBy</center></h1>**

Groupby function is used to split the data into groups depending on the condition given. Usually we split the data to perform some operations like,Aggregation, Transformation, Filtration.

In [1]:
#import the pandas library
import pandas as pd
import numpy as np

ipl_data = {'Colleges': ['College1', 'College2', 'College3', 'College4',
                         'College5', 'College1'],
            'Year' : [2014, 2015, 2014, 2016, 2017, 2015],
            'Rank': [1, 2, 1, 2, 3, 3],
            'Percentage': [75.6, 81.6, 82.8, 78, 85,82]
            }

df = pd.DataFrame(ipl_data)

print(df)

   Colleges  Year  Rank  Percentage
0  College1  2014     1        75.6
1  College2  2015     2        81.6
2  College3  2014     1        82.8
3  College4  2016     2        78.0
4  College5  2017     3        85.0
5  College1  2015     3        82.0


```
Syntax : DataFrame.groupby(by=None,
                           axis=0,
                           level=None,
                           sort=True)

        by : mapping, function, label, or list of labels
        axis : {0 or ‘index’, 1 or ‘columns’}, default 0
        level : int, level name, or sequence of such, default None
        sort : bool, default True

        Returns : Returns a groupby object that contains information about the groups.
```

In [2]:
print("Returns group by object",df.groupby('Year'))

Returns group by object <pandas.core.groupby.generic.DataFrameGroupBy object at 0x7eb29bb981f0>


<u>View Groups</u> : Using `.groups` function we can view groups of groupby object. `.groups` function returns in pretty dict where key is group name and values will be row index

`Syntax : dataframe.groupby("column name").groups`

In [3]:
print(df.groupby('Year').groups)

{2014: [0, 2], 2015: [1, 5], 2016: [3], 2017: [4]}


In [4]:
print(type(df.groupby('Year').groups))

<class 'pandas.io.formats.printing.PrettyDict'>


In [None]:
print(df.groupby(by = df['Year']>2015).groups)

{False: [0, 1, 2, 5], True: [3, 4]}


In [6]:
grouped = df.groupby(by = df['Year']>2015)
print(grouped['Colleges'])
for name,group in grouped:
    print(name)
    print(type(group))
    print(group['Percentage'])
    print(group['Percentage'].agg([np.mean]))

<pandas.core.groupby.generic.SeriesGroupBy object at 0x7eb2747ef850>
False
<class 'pandas.core.frame.DataFrame'>
0    75.6
1    81.6
2    82.8
5    82.0
Name: Percentage, dtype: float64
mean    80.5
Name: Percentage, dtype: float64
True
<class 'pandas.core.frame.DataFrame'>
3    78.0
4    85.0
Name: Percentage, dtype: float64
mean    81.5
Name: Percentage, dtype: float64


In [None]:
# Group by with multiple columns −
print(df.groupby(['Colleges','Year']).groups)

{('College1', 2014): [0], ('College1', 2015): [5], ('College2', 2015): [1], ('College3', 2014): [2], ('College4', 2016): [3], ('College5', 2017): [4]}


<u>Iterating through Groups</u> : With the groupby object in hand, we can iterate through the object similar to itertools.obj.

In [None]:

grouped = df.groupby('Year')

print(df.groupby('Year').groups)

for name,group in grouped:
    print(name)
    print(group)

{2014: [0, 2], 2015: [1, 5], 2016: [3], 2017: [4]}
2014
   Colleges  Year  Rank  Percentage
0  College1  2014     1        75.6
2  College3  2014     1        82.8
2015
   Colleges  Year  Rank  Percentage
1  College2  2015     2        81.6
5  College1  2015     3        82.0
2016
   Colleges  Year  Rank  Percentage
3  College4  2016     2        78.0
2017
   Colleges  Year  Rank  Percentage
4  College5  2017     3        85.0


<u>Select a Group</u> : Using the `get_group()` method, we can select a single group.

In [None]:

grouped = df.groupby('Year')
print(grouped.get_group(2014))

   Colleges  Year  Rank  Percentage
0  College1  2014     1        75.6
2  College3  2014     1        82.8


In [None]:
import pandas as pd

data = {
    'Product': ['A', 'B', 'A', 'B', 'A', 'B', 'A'],
    'Salesperson': ['Alice', 'Bob', 'Alice', 'Bob', 'Alice', 'Bob', 'Alice'],
    'Revenue': [100, 150, 120, 130, 80, 200, 90]
}

df = pd.DataFrame(data)

In [None]:
grouped = df.groupby(['Product'])
for name,group in grouped:
    print(name)
    print(group['Revenue'].sum())

A
390
B
480


  for name,group in grouped:


In [None]:
grouped.groups

{'A': [0, 2, 4, 6], 'B': [1, 3, 5]}

In [None]:
grouped = df.groupby(['Product', 'Salesperson'])
grouped.groups

{('A', 'Alice'): [0, 2, 4, 6], ('B', 'Bob'): [1, 3, 5]}

In [None]:
for name, group in grouped:
    print(name)
    print(group['Revenue'].sum())

('A', 'Alice')
390
('B', 'Bob')
480


**<h3>Aggregation</h3>**

Aggregated function (`agg`) returns single aggregated value. We can perform several aggregated operation on aggregated object.

In [None]:

grouped = df.groupby('Colleges')

print(grouped['Percentage'].agg(np.mean))

Colleges
College1    78.8
College2    81.6
College3    82.8
College4    78.0
College5    85.0
Name: Percentage, dtype: float64


In [None]:
grouped = df.groupby('Colleges')
print(grouped['Percentage'].agg([np.sum, np.mean, np.std]))

            sum  mean       std
Colleges                       
College1  157.6  78.8  4.525483
College2   81.6  81.6       NaN
College3   82.8  82.8       NaN
College4   78.0  78.0       NaN
College5   85.0  85.0       NaN


In [None]:
import pandas as pd
import numpy as np
data = {
    'Product': ['A', 'B', 'A', 'B', 'A', 'B', 'A'],
    'Salesperson': ['Alice', 'Bob', 'Alice', 'Bob', 'Alice', 'Bob', 'Alice'],
    'Revenue': [100, 150, 120, 130, 80, 200, 90]
}

df = pd.DataFrame(data)

grouped = df.groupby(['Product'])
print(type(grouped))
grouped['Revenue'].agg([np.sum, np.mean, np.std])

<class 'pandas.core.groupby.generic.DataFrameGroupBy'>


Unnamed: 0_level_0,sum,mean,std
Product,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A,390,97.5,17.078251
B,480,160.0,36.055513
