# https://www.geeksforgeeks.org/python/pandas-groupby-summarising-aggregating-and-grouping-data-in-python/

## Pandas Groupby: Summarising, Aggregating, and Grouping data in Python
Last Updated : 23 Jul, 2025
GroupBy is a pretty simple concept. We can create a grouping of categories and apply a function to the categories. It’s a simple concept, but it’s an extremely valuable technique that’s widely used in data science. In real data science projects, you’ll be dealing with large amounts of data and trying things over and over, so for efficiency, we use Groupby concept. Groupby concept is really important because of its ability to summarize, aggregate, and group data efficiently.

Summarize
Summarization includes counting, describing all the data present in data frame. We can summarize the data present in the data frame using describe() method. This method is used to get min, max, sum, count values from the data frame along with data types of that particular column.

describe(): This method elaborates the type of data and its attributes.
Syntax:

dataframe_name.describe()

unique(): This method is used to get all unique values from the given column.
Syntax:

dataframe['column_name].unique()

nunique(): This method is similar to unique but it will return the count the unique values.
Syntax:

dataframe_name['column_name].nunique()

info(): This command is used to get the data types and columns information
Syntax:

dataframe.info()

columns: This command is used to display all the column names present in data frame
Syntax:

dataframe.columns

Example:

We are going to analyze the student marks data in this example.

In [10]:
# importing pandas as pd for using data frame
import pandas as pd

# creating dataframe with student details
dataframe = pd.DataFrame({'id': [7058, 4511, 7014, 7033],
                          'name': ['sravan', 'manoj', 'aditya', 'bhanu'],
                          'Maths_marks': [99, 97, 88, 90],
                          'Chemistry_marks': [89, 99, 99, 90],
                          'telugu_marks': [99, 97, 88, 80],
                          'hindi_marks': [99, 97, 56, 67],
                          'social_marks': [79, 97, 78, 90], })

# display dataframe
dataframe

Unnamed: 0,id,name,Maths_marks,Chemistry_marks,telugu_marks,hindi_marks,social_marks
0,7058,sravan,99,89,99,99,79
1,4511,manoj,97,99,97,97,97
2,7014,aditya,88,99,88,56,78
3,7033,bhanu,90,90,80,67,90


In [11]:
# describing the data frame
print(dataframe.describe())

print("-----------------------------")
# finding unique values
print(dataframe['Maths_marks'].unique())

print("-----------------------------")
# counting unique values
print(dataframe['Maths_marks'].nunique())

print("-----------------------------")
# display the columns in the data frame
print(dataframe.columns)

print("-----------------------------")
# information about dataframe
print(dataframe.info())

                id  Maths_marks  Chemistry_marks  telugu_marks  hindi_marks  \
count     4.000000     4.000000             4.00       4.00000     4.000000   
mean   6404.000000    93.500000            94.25      91.00000    79.750000   
std    1262.128625     5.322906             5.50       8.75595    21.561926   
min    4511.000000    88.000000            89.00      80.00000    56.000000   
25%    6388.250000    89.500000            89.75      86.00000    64.250000   
50%    7023.500000    93.500000            94.50      92.50000    82.000000   
75%    7039.250000    97.500000            99.00      97.50000    97.500000   
max    7058.000000    99.000000            99.00      99.00000    99.000000   

       social_marks  
count      4.000000  
mean      86.000000  
std        9.128709  
min       78.000000  
25%       78.750000  
50%       84.500000  
75%       91.750000  
max       97.000000  
-----------------------------
[99 97 88 90]
-----------------------------
4
--------------

# Aggregation
Aggregation is used to get the mean, average, variance and standard deviation of all column in a dataframe or particular column in a data frame.

sum(): It returns the sum of the data frame
Syntax:

dataframe['column].sum()

mean(): It returns the mean of the particular column in a data frame
Syntax:

dataframe['column].mean()

std(): It returns the standard deviation of that column.
Syntax:

dataframe['column].std()

var(): It returns the variance of that column

dataframe['column'].var()

min(): It returns the minimum value in column
Syntax:

dataframe['column'].min()

max(): It returns maximum value in column
Syntax:

dataframe['column'].max()

Example:

In the below program we will aggregate data.

In [12]:
# importing pandas as pd for using data frame
import pandas as pd

# creating dataframe with student details
dataframe = pd.DataFrame({'id': [7058, 4511, 7014, 7033],
                          'name': ['sravan', 'manoj', 'aditya', 'bhanu'],
                          'Maths_marks': [99, 97, 88, 90],
                          'Chemistry_marks': [89, 99, 99, 90],
                          'telugu_marks': [99, 97, 88, 80],
                          'hindi_marks': [99, 97, 56, 67],
                          'social_marks': [79, 97, 78, 90], })

# display dataframe
dataframe

Unnamed: 0,id,name,Maths_marks,Chemistry_marks,telugu_marks,hindi_marks,social_marks
0,7058,sravan,99,89,99,99,79
1,4511,manoj,97,99,97,97,97
2,7014,aditya,88,99,88,56,78
3,7033,bhanu,90,90,80,67,90


In [13]:
dataframe.dtypes

id                  int64
name               object
Maths_marks         int64
Chemistry_marks     int64
telugu_marks        int64
hindi_marks         int64
social_marks        int64
dtype: object

In [14]:
# getting all minimum values from
# all columns in a dataframe
print(dataframe.min())
print("-----------------------------------------")

# minimum value from a particular
# column in a data frame
print(dataframe['Maths_marks'].min())
print("-----------------------------------------")

# computing maximum values
print(dataframe.max())
print("-----------------------------------------")

# computing sum
print(dataframe.sum())
print("-----------------------------------------")

# finding count
print(dataframe.count())
print("-----------------------------------------")


# computing standard deviation
print(dataframe.std())
print("-----------------------------------------")

# computing variance
print(dataframe.var())

id                   4511
name               aditya
Maths_marks            88
Chemistry_marks        89
telugu_marks           80
hindi_marks            56
social_marks           78
dtype: object
-----------------------------------------
88
-----------------------------------------
id                   7058
name               sravan
Maths_marks            99
Chemistry_marks        99
telugu_marks           99
hindi_marks            99
social_marks           97
dtype: object
-----------------------------------------
id                                  25616
name               sravanmanojadityabhanu
Maths_marks                           374
Chemistry_marks                       377
telugu_marks                          364
hindi_marks                           319
social_marks                          344
dtype: object
-----------------------------------------
id                 4
name               4
Maths_marks        4
Chemistry_marks    4
telugu_marks       4
hindi_marks        4
soc

TypeError: could not convert string to float: 'sravan'

# Grouping
It is used to group one or more columns in a dataframe by using the groupby() method. Groupby mainly refers to a process involving one or more of the following steps they are:

Splitting: It is a process in which we split data into group by applying some conditions on datasets.
Applying: It is a process in which we apply a function to each group independently
Combining: It is a process in which we combine different datasets after applying groupby and results in a data structure
Example 1:

In [15]:
# importing pandas as pd for using data frame
import pandas as pd

# creating dataframe with student details
dataframe = pd.DataFrame({'id': [7058, 4511, 7014, 7033],
                          'name': ['sravan', 'manoj', 'aditya', 'bhanu'],
                          'Maths_marks': [99, 97, 88, 90],
                          'Chemistry_marks': [89, 99, 99, 90],
                          'telugu_marks': [99, 97, 88, 80],
                          'hindi_marks': [99, 97, 56, 67],
                          'social_marks': [79, 97, 78, 90], })


# group by name
print(dataframe.groupby('name').first())

print("---------------------------------")
# group by name with social_marks sum
print(dataframe.groupby('name')['social_marks'].sum())
print("---------------------------------")

# group by name with maths_marks count
print(dataframe.groupby('name')['Maths_marks'].count())
print("---------------------------------")

# group by name with maths_marks
print(dataframe.groupby('name')['Maths_marks'])

          id  Maths_marks  Chemistry_marks  telugu_marks  hindi_marks  \
name                                                                    
aditya  7014           88               99            88           56   
bhanu   7033           90               90            80           67   
manoj   4511           97               99            97           97   
sravan  7058           99               89            99           99   

        social_marks  
name                  
aditya            78  
bhanu             90  
manoj             97  
sravan            79  
---------------------------------
name
aditya    78
bhanu     90
manoj     97
sravan    79
Name: social_marks, dtype: int64
---------------------------------
name
aditya    1
bhanu     1
manoj     1
sravan    1
Name: Maths_marks, dtype: int64
---------------------------------
<pandas.core.groupby.generic.SeriesGroupBy object at 0x1130ad050>


Example 2:

In [16]:
# importing pandas as pd for using data frame
import pandas as pd

# creating dataframe with student details
dataframe = pd.DataFrame({'id': [7058, 4511, 7014, 7033],
                          'name': ['sravan', 'manoj', 'aditya', 'bhanu'],
                          'Maths_marks': [99, 97, 88, 90],
                          'Chemistry_marks': [89, 99, 99, 90],
                          'telugu_marks': [99, 97, 88, 80],
                          'hindi_marks': [99, 97, 56, 67],
                          'social_marks': [79, 97, 78, 90], })

# group by name
print(dataframe.groupby('name').first())

print("------------------------")
# group by name with social_marks sum
print(dataframe.groupby('name')['social_marks'].sum())
print("------------------------")
# group by name with maths_marks count
print(dataframe.groupby('name')['Maths_marks'].count())

          id  Maths_marks  Chemistry_marks  telugu_marks  hindi_marks  \
name                                                                    
aditya  7014           88               99            88           56   
bhanu   7033           90               90            80           67   
manoj   4511           97               99            97           97   
sravan  7058           99               89            99           99   

        social_marks  
name                  
aditya            78  
bhanu             90  
manoj             97  
sravan            79  
------------------------
name
aditya    78
bhanu     90
manoj     97
sravan    79
Name: social_marks, dtype: int64
------------------------
name
aditya    1
bhanu     1
manoj     1
sravan    1
Name: Maths_marks, dtype: int64
