What is Group By in Pandas?
Imagine you have a list of sales data for different stores and you want to know how much each store sold in total.
In simple words,
groupby() helps you group rows that have the same value in one or more columns — just like grouping students by class or products by category.

In [3]:
import pandas as pd
data={
    'Store':['A','A','B','B','C','A'],
    'Sales' :[100,200,300,100,400,150] 
}
df=pd.DataFrame(data)
print(df)
             

  Store  Sales
0     A    100
1     A    200
2     B    300
3     B    100
4     C    400
5     A    150


In [7]:
grouped=df.groupby('Store')
grouped

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000012A123EEC00>

In [8]:
grouped['Sales'].sum()

Store
A    450
B    400
C    400
Name: Sales, dtype: int64

In [9]:
grouped['Sales'].mean()

Store
A    150.0
B    200.0
C    400.0
Name: Sales, dtype: float64

In [10]:
grouped['Sales'].count()

Store
A    3
B    2
C    1
Name: Sales, dtype: int64

In [11]:
grouped['Sales'].max()

Store
A    200
B    300
C    400
Name: Sales, dtype: int64

In [12]:
grouped['Sales'].min()

Store
A    100
B    100
C    400
Name: Sales, dtype: int64

In [14]:
df.groupby('Store')['Sales'].agg(['sum','mean','min','max','count'])

Unnamed: 0_level_0,sum,mean,min,max,count
Store,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
A,450,150.0,100,200,3
B,400,200.0,100,300,2
C,400,400.0,400,400,1


Grouping by Multiple Columns

When you use .groupby() with more than one column, Pandas groups your data by unique combinations of those columns.

In [15]:
import pandas as pd
data= {
    'Store': ['A', 'A', 'A', 'B', 'B', 'C', 'C'],
    'Month': ['Jan', 'Feb', 'Jan', 'Jan', 'Feb', 'Jan', 'Feb'],
    'Sales': [100, 150, 200, 300, 100, 400, 500]
}
df = pd.DataFrame(data)
print(df)

  Store Month  Sales
0     A   Jan    100
1     A   Feb    150
2     A   Jan    200
3     B   Jan    300
4     B   Feb    100
5     C   Jan    400
6     C   Feb    500


In [17]:
grouped=df.groupby(['Store','Month'])['Sales'].sum()
print(grouped)

Store  Month
A      Feb      150
       Jan      300
B      Feb      100
       Jan      300
C      Feb      500
       Jan      400
Name: Sales, dtype: int64


In [18]:
result=grouped.reset_index()
print(result)

  Store Month  Sales
0     A   Feb    150
1     A   Jan    300
2     B   Feb    100
3     B   Jan    300
4     C   Feb    500
5     C   Jan    400


In [19]:
df.groupby(['Store', 'Month'])['Sales'].agg(['sum', 'mean', 'count'])


Unnamed: 0_level_0,Unnamed: 1_level_0,sum,mean,count
Store,Month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A,Feb,150,150.0,1
A,Jan,300,150.0,2
B,Feb,100,100.0,1
B,Jan,300,300.0,1
C,Feb,500,500.0,1
C,Jan,400,400.0,1


In [22]:
import pandas as pd

In [29]:
df = pd.read_csv(r"C:\Users\rohit\OneDrive\Desktop\analysis with alex\Python\pandas files\Flavors.csv")
df

Unnamed: 0,Flavor,Base Flavor,Liked,Flavor Rating,Texture Rating,Total Rating
0,Mint Chocolate Chip,Vanilla,Yes,10.0,8.0,18.0
1,Chocolate,Chocolate,Yes,8.8,7.6,16.6
2,Vanilla,Vanilla,No,4.7,5.0,9.7
3,Cookie Dough,Vanilla,Yes,6.9,6.5,13.4
4,Rocky Road,Chocolate,Yes,8.2,7.0,15.2
5,Pistachio,Vanilla,No,2.3,3.4,5.7
6,Cake Batter,Vanilla,Yes,6.5,6.0,12.5
7,Neapolitan,Vanilla,No,3.8,5.0,8.8
8,Chocolte Fudge Brownie,Chocolate,Yes,8.2,7.1,15.3


In [30]:
group_by_frame=df.groupby('Base Flavor')

In [32]:
group_by_frame.mean(numeric_only=True)


Unnamed: 0_level_0,Flavor Rating,Texture Rating,Total Rating
Base Flavor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Chocolate,8.4,7.233333,15.7
Vanilla,5.7,5.65,11.35


In [34]:
df.groupby('Base Flavor').mean(numeric_only=True)


Unnamed: 0_level_0,Flavor Rating,Texture Rating,Total Rating
Base Flavor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Chocolate,8.4,7.233333,15.7
Vanilla,5.7,5.65,11.35


In [37]:
df.groupby('Base Flavor').size().rename('Count')

Base Flavor
Chocolate    3
Vanilla      6
Name: Count, dtype: int64

In [38]:
df.groupby('Base Flavor')[['Flavor Rating','Texture Rating','Total Rating']].sum()

Unnamed: 0_level_0,Flavor Rating,Texture Rating,Total Rating
Base Flavor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Chocolate,25.2,21.7,47.1
Vanilla,34.2,33.9,68.1


In [39]:
df.groupby('Base Flavor')[['Flavor Rating','Texture Rating']].agg(['mean','max','count','sum'])

Unnamed: 0_level_0,Flavor Rating,Flavor Rating,Flavor Rating,Flavor Rating,Texture Rating,Texture Rating,Texture Rating,Texture Rating
Unnamed: 0_level_1,mean,max,count,sum,mean,max,count,sum
Base Flavor,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
Chocolate,8.4,8.8,3,25.2,7.233333,7.6,3,21.7
Vanilla,5.7,10.0,6,34.2,5.65,8.0,6,33.9


In [41]:
df.groupby(['Base Flavor','Liked'])[['Flavor Rating']].agg(['mean','max','count','sum'])



Unnamed: 0_level_0,Unnamed: 1_level_0,Flavor Rating,Flavor Rating,Flavor Rating,Flavor Rating
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,max,count,sum
Base Flavor,Liked,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
Chocolate,Yes,8.4,8.8,3,25.2
Vanilla,No,3.6,4.7,3,10.8
Vanilla,Yes,7.8,10.0,3,23.4


In [42]:
df.groupby('Base Flavor')[['Flavor Rating','Texture Rating','Total Rating']].describe()

Unnamed: 0_level_0,Flavor Rating,Flavor Rating,Flavor Rating,Flavor Rating,Flavor Rating,Flavor Rating,Flavor Rating,Flavor Rating,Texture Rating,Texture Rating,Texture Rating,Texture Rating,Texture Rating,Total Rating,Total Rating,Total Rating,Total Rating,Total Rating,Total Rating,Total Rating,Total Rating
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,...,75%,max,count,mean,std,min,25%,50%,75%,max
Base Flavor,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
Chocolate,3.0,8.4,0.34641,8.2,8.2,8.2,8.5,8.8,3.0,7.233333,...,7.35,7.6,3.0,15.7,0.781025,15.2,15.25,15.3,15.95,16.6
Vanilla,6.0,5.7,2.710719,2.3,4.025,5.6,6.8,10.0,6.0,5.65,...,6.375,8.0,6.0,11.35,4.263684,5.7,9.025,11.1,13.175,18.0
