# Pandas GroupBy and Aggregation Methods Explained Using Students Dataset

## Introduction
This notebook demonstrates the use of the `groupby` method in pandas with the students dataset. It covers basic and advanced examples with detailed explanations and comments.


In [1]:

## Importing Libraries

import pandas as pd
import numpy as np


#Loading the Dataset
 - We'll start by loading the students dataset.



In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
# Load the students dataset
df = pd.read_csv('/content/drive/MyDrive/0.Latest_DS_Course/Pandas/Data/Students.csv')
# Display the first few rows of the dataset
df.head(10)

Unnamed: 0,Name,Grade,Age,Gender,Score
0,Student1,Grade 8,14,Male,85
1,Student2,Grade 7,13,Female,92
2,Student3,Grade 9,15,Male,78
3,Student4,Grade 6,12,Female,95
4,Student5,Grade 8,14,Male,89
5,Student6,Grade 7,13,Female,88
6,Student7,Grade 9,15,Male,76
7,Student8,Grade 6,12,Female,94
8,Student9,Grade 8,14,Male,87
9,Student10,Grade 7,13,Female,91


#Understanding the Dataset
 - Let's get a basic understanding of the dataset by checking its columns and some basic statistics.


In [4]:
# Display basic information about the dataset
df.info()

# Display basic statistics of the dataset
df.describe()



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    20 non-null     object
 1   Grade   20 non-null     object
 2   Age     20 non-null     int64 
 3   Gender  20 non-null     object
 4   Score   20 non-null     int64 
dtypes: int64(2), object(3)
memory usage: 932.0+ bytes


Unnamed: 0,Age,Score
count,20.0,20.0
mean,13.5,87.0
std,1.147079,7.290946
min,12.0,74.0
25%,12.75,83.25
50%,13.5,88.5
75%,14.25,92.25
max,15.0,97.0


#1. Grouping by a Single Column
 - Grouping by a single column allows us to split the dataset into groups based on one specific column.


In [5]:
# Group by the 'Grade' column
students_per_grade = df.groupby('Grade')
# Display the result
students_per_grade

#its a group of dataframe

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x784870ffd690>

##Inspecting Groups
 - You can inspect the groups created by the groupby method.

In [6]:
students_per_grade.groups

{'Grade 6': [3, 7, 11, 15, 19], 'Grade 7': [1, 5, 9, 13, 17], 'Grade 8': [0, 4, 8, 12, 16], 'Grade 9': [2, 6, 10, 14, 18]}

##Iterating Over Groups
 - You can iterate over the groups to access each group separately.

In [7]:
for name, group in students_per_grade:
    print(f"Group name: {name}")
    print(group)
    print(type(group))

Group name: Grade 6
         Name    Grade  Age  Gender  Score
3    Student4  Grade 6   12  Female     95
7    Student8  Grade 6   12  Female     94
11  Student12  Grade 6   12  Female     96
15  Student16  Grade 6   12  Female     93
19  Student20  Grade 6   12  Female     97
<class 'pandas.core.frame.DataFrame'>
Group name: Grade 7
         Name    Grade  Age  Gender  Score
1    Student2  Grade 7   13  Female     92
5    Student6  Grade 7   13  Female     88
9   Student10  Grade 7   13  Female     91
13  Student14  Grade 7   13  Female     90
17  Student18  Grade 7   13  Female     89
<class 'pandas.core.frame.DataFrame'>
Group name: Grade 8
         Name    Grade  Age Gender  Score
0    Student1  Grade 8   14   Male     85
4    Student5  Grade 8   14   Male     89
8    Student9  Grade 8   14   Male     87
12  Student13  Grade 8   14   Male     88
16  Student17  Grade 8   14   Male     86
<class 'pandas.core.frame.DataFrame'>
Group name: Grade 9
         Name    Grade  Age Gender  Sc

In [8]:
for name, group in students_per_grade:
    print(f"Group name: {name}")
    # print(group)

Group name: Grade 6
Group name: Grade 7
Group name: Grade 8
Group name: Grade 9


##Get a single group

In [9]:
x = students_per_grade.get_group('Grade 9')
x

Unnamed: 0,Name,Grade,Age,Gender,Score
2,Student3,Grade 9,15,Male,78
6,Student7,Grade 9,15,Male,76
10,Student11,Grade 9,15,Male,75
14,Student15,Grade 9,15,Male,77
18,Student19,Grade 9,15,Male,74


In [10]:
x = students_per_grade.get_group('Grade 7')
x

Unnamed: 0,Name,Grade,Age,Gender,Score
1,Student2,Grade 7,13,Female,92
5,Student6,Grade 7,13,Female,88
9,Student10,Grade 7,13,Female,91
13,Student14,Grade 7,13,Female,90
17,Student18,Grade 7,13,Female,89


#Example: Count of Students per Grade
 - Let's group the dataset by the Grade column and count the total number of students in each grade.


In [11]:
# Group by the 'Grade' column and count the number of students in each grade
students_per_grade = df.groupby('Grade').count()
# Display the result
students_per_grade

Unnamed: 0_level_0,Name,Age,Gender,Score
Grade,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Grade 6,5,5,5,5
Grade 7,5,5,5,5
Grade 8,5,5,5,5
Grade 9,5,5,5,5


In [12]:
df.groupby('Grade').size()

Unnamed: 0_level_0,0
Grade,Unnamed: 1_level_1
Grade 6,5
Grade 7,5
Grade 8,5
Grade 9,5


In [13]:
# Group by the 'Grade' column and count the number of students in each grade
students_per_grade = df.groupby('Grade')
# Display the result
students_per_grade.nunique()

Unnamed: 0_level_0,Name,Age,Gender,Score
Grade,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Grade 6,5,1,1,5
Grade 7,5,1,1,5
Grade 8,5,1,1,5
Grade 9,5,1,1,5


In [14]:
df['Grade'].value_counts()


Unnamed: 0_level_0,count
Grade,Unnamed: 1_level_1
Grade 8,5
Grade 7,5
Grade 9,5
Grade 6,5


##Find the topper of Grade9

> Add blockquote



In [15]:
df

Unnamed: 0,Name,Grade,Age,Gender,Score
0,Student1,Grade 8,14,Male,85
1,Student2,Grade 7,13,Female,92
2,Student3,Grade 9,15,Male,78
3,Student4,Grade 6,12,Female,95
4,Student5,Grade 8,14,Male,89
5,Student6,Grade 7,13,Female,88
6,Student7,Grade 9,15,Male,76
7,Student8,Grade 6,12,Female,94
8,Student9,Grade 8,14,Male,87
9,Student10,Grade 7,13,Female,91


In [16]:
# first fetch all the student in grade 9

dfgrade9 = df[df['Grade']=="Grade 9"]
dfgrade9

Unnamed: 0,Name,Grade,Age,Gender,Score
2,Student3,Grade 9,15,Male,78
6,Student7,Grade 9,15,Male,76
10,Student11,Grade 9,15,Male,75
14,Student15,Grade 9,15,Male,77
18,Student19,Grade 9,15,Male,74


In [17]:
dfgrade9[dfgrade9['Score']==dfgrade9['Score'].max()]['Name']

Unnamed: 0,Name
2,Student3


In [18]:
# Filter the DataFrame for 'Grade 9' first
grade_9_df = df[df['Grade'] == 'Grade 9']

# Then filter for the maximum score within that subset
result = grade_9_df[grade_9_df['Score'] == grade_9_df['Score'].max()]

print(result)

       Name    Grade  Age Gender  Score
2  Student3  Grade 9   15   Male     78


In [19]:
result = df[(df['Grade'] == 'Grade 9') & (df['Score'] == df[df['Grade'] == 'Grade 9']['Score'].max())]
result['Name']

Unnamed: 0,Name
2,Student3


In [20]:
# find the topper of all grades

In [21]:
df[df['Score']==df['Score'].max()]

Unnamed: 0,Name,Grade,Age,Gender,Score
19,Student20,Grade 6,12,Female,97


In [22]:
df

Unnamed: 0,Name,Grade,Age,Gender,Score
0,Student1,Grade 8,14,Male,85
1,Student2,Grade 7,13,Female,92
2,Student3,Grade 9,15,Male,78
3,Student4,Grade 6,12,Female,95
4,Student5,Grade 8,14,Male,89
5,Student6,Grade 7,13,Female,88
6,Student7,Grade 9,15,Male,76
7,Student8,Grade 6,12,Female,94
8,Student9,Grade 8,14,Male,87
9,Student10,Grade 7,13,Female,91


In [23]:
df['Grade'].unique()

array(['Grade 8', 'Grade 7', 'Grade 9', 'Grade 6'], dtype=object)

In [24]:
df['Grade'].nunique()

4

In [25]:
df['Grade'].value_counts()

Unnamed: 0_level_0,count
Grade,Unnamed: 1_level_1
Grade 8,5
Grade 7,5
Grade 9,5
Grade 6,5


In [26]:
len(df.groupby("Grade"))

4

In [27]:
df["Grade"].nunique()

4

In [28]:
#Find the topper in each grade

In [29]:
# we want the student with max score in each grade

In [38]:
rown = df.groupby("Grade")["Score"].idxmax()

df.loc[rown][["Name","Grade", "Score"]]

Unnamed: 0,Name,Grade,Score
19,Student20,Grade 6,97
1,Student2,Grade 7,92
4,Student5,Grade 8,89
2,Student3,Grade 9,78
