# Classwork 01: Student Performance Analysis¶
Created by Golnaz Sahebi, Organized by Gemini, Autumn 2025
### Objective:¶
In this hands-on assignment, you will:
- Create a DataFrame using a provided dataset of student records.
- Use Pandas to manipulate and analyze the dataset.
- Apply numerical operations to calculate new metrics.
- Perform advanced data grouping and filtering operations.
### Instructions:¶
You are provided with a dataset of students with the following columns:
- Student Name: The name of the student.
- Major: The student's primary field of study.
- Grade_Math: The student's grade in the Math course.
- Grade_Science: The student's grade in the Science course.
- Attendance_Percentage: The student's attendance rate as a percentage.

#### 1. Create the Data and Use the provided data to create a DataFrame.¶
- Hint: Use a dictionary to represent the student data and pass it to pd.DataFrame() to create the DataFrame.

In [1]:
import pandas as pd
import numpy as np

# Dictionary of student data
data = {
    'Student Name': ['Anna', 'Brian', 'Clara', 'Derek', 'Eliza'],
    'Major': ['Biology', 'Computer Science', 'Physics', 'Computer Science', 'Biology'],
    'Grade_Math': [85, 92, 78, 88, 90],
    'Grade_Science': [91, 85, 95, 82, 88],
    'Attendance_Percentage': [95, 88, 98, 91, 85]
}

# Create the DataFrame from the dictionary
df = pd.DataFrame(data)

# Display the DataFrame to verify its structure
print(df)

  Student Name             Major  Grade_Math  Grade_Science  \
0         Anna           Biology          85             91   
1        Brian  Computer Science          92             85   
2        Clara           Physics          78             95   
3        Derek  Computer Science          88             82   
4        Eliza           Biology          90             88   

   Attendance_Percentage  
0                     95  
1                     88  
2                     98  
3                     91  
4                     85  


#### 2. Add a New Column 'Average_Grade'.¶
- Create a new column Average_Grade that calculates the average of Grade_Math and Grade_Science for each student.

In [3]:
# Calculate the average grade
df['Average_Grade'] = (df['Grade_Math'] + df['Grade_Science']) / 2

# Print the updated DataFrame to check the new column
print(df)

  Student Name             Major  Grade_Math  Grade_Science  \
0         Anna           Biology          85             91   
1        Brian  Computer Science          92             85   
2        Clara           Physics          78             95   
3        Derek  Computer Science          88             82   
4        Eliza           Biology          90             88   

   Attendance_Percentage  Average_Grade  
0                     95           88.0  
1                     88           88.5  
2                     98           86.5  
3                     91           85.0  
4                     85           89.0  


#### 3. Filter Students with High Attendance.¶
- Find all students who have an attendance rate of more than 90% and display their details.
- Hint: Use filtering with df[] to select rows where Attendance_Percentage is greater than 90.

In [4]:
# Filter rows where 'Attendance_Percentage' > 90
attendance_percentage = df[df['Attendance_Percentage'] > 90]

# Display the filtered DataFrame
print(attendance_percentage)

  Student Name             Major  Grade_Math  Grade_Science  \
0         Anna           Biology          85             91   
2        Clara           Physics          78             95   
3        Derek  Computer Science          88             82   

   Attendance_Percentage  Average_Grade  
0                     95           88.0  
2                     98           86.5  
3                     91           85.0  


#### 4. Find the Average Math Grade.¶
- Calculate and display the average Grade_Math of all students.
- Hint: The .mean() function is used to calculate the average of the Grade_Math column.

In [6]:
# Calculate the average math grade using the mean() function
average_grade_math = df['Grade_Math'].mean()
# Print the average math grade
print(average_grade_math)

86.6


#### 5. Sort Students by Average Grade.¶
- Sort the students in descending order based on their Average_Grade.
- Hint: The .sort_values() function sorts the DataFrame by the Average_Grade column in descending order.

In [8]:
# Sort the DataFrame by 'Average_Grade' in descending order
sorted_by_AveGrade = df.sort_values(by='Average_Grade', ascending=False)

# Display the sorted DataFrame
print(sorted_by_AveGrade)

  Student Name             Major  Grade_Math  Grade_Science  \
4        Eliza           Biology          90             88   
1        Brian  Computer Science          92             85   
0         Anna           Biology          85             91   
2        Clara           Physics          78             95   
3        Derek  Computer Science          88             82   

   Attendance_Percentage  Average_Grade  
4                     85           89.0  
1                     88           88.5  
0                     95           88.0  
2                     98           86.5  
3                     91           85.0  


#### 6. Find Students in the 'Computer Science' Major.¶
- Display all the students who are in the 'Computer Science' major.
- Hint: This filters students where the Major column equals 'Computer Science'.

In [9]:
# Filter rows where the 'Major' is 'Computer Science'
computer_science_major = df[df['Major'] == 'Computer Science']

# Display the filtered DataFrame
print(computer_science_major)

  Student Name             Major  Grade_Math  Grade_Science  \
1        Brian  Computer Science          92             85   
3        Derek  Computer Science          88             82   

   Attendance_Percentage  Average_Grade  
1                     88           88.5  
3                     91           85.0  


#### 7. Calculate Total Number of Students.¶
- Calculate the total number of students in the dataset.
- Hint: The len() function can be used on a DataFrame to find the total number of rows (i.e., students).

In [10]:
# Calculate the total number of students
total_numberof_students = len(df)

# Print the total number of students
print(total_numberof_students)

5


#### 8. Find the Top-Performing Student.¶
- Identify the student with the highest Average_Grade and display their details.
- Hint: The .max() function finds the maximum value of the Average_Grade column, and we filter to find the row where the average grade is the highest.

In [11]:
# Find the row where 'Average_Grade' is the maximum
max_grade_student = df[df['Average_Grade'] == df['Average_Grade'].max()]

# Display the top student's details
print(max_grade_student)

  Student Name    Major  Grade_Math  Grade_Science  Attendance_Percentage  \
4        Eliza  Biology          90             88                     85   

   Average_Grade  
4           89.0  


#### 9. Group Students by Major and Find the Average Science Grade.¶
- Group the students by their Major and calculate the average Grade_Science in each major.
- Hint: The groupby() function groups the students by the Major column, and then the .mean() function calculates the average science grade within each major.

In [12]:
# Group by 'Major' and calculate the mean science grade for each group
avg_grad_by_major = df.groupby('Major')['Grade_Science'].mean()

# Print the average science grade for each major
print(avg_grad_by_major)

Major
Biology             89.5
Computer Science    83.5
Physics             95.0
Name: Grade_Science, dtype: float64


#### 10. Group Students by Major and Count Them.¶
- Group the students by their Major and count how many students are in each major.
- Hint: groupby() is used to group students by Major, and the .size() function counts the number of entries in each group.

In [13]:
# Group by 'Major' and count the number of students in each group
size_by_major = df.groupby('Major').size()

# Print the student count for each major
print(size_by_major)

Major
Biology             2
Computer Science    2
Physics             1
dtype: int64


#### 11. Add a Column 'Status' Based on Performance (Without apply() or lambda).¶
- Add a new column 'Status' to the DataFrame. The status is determined as follows:
If the student's Average_Grade is 88 or higher, their status is 'Honor Roll'.
Otherwise, their status is 'Satisfactory'.
- If the student's Average_Grade is 88 or higher, their status is 'Honor Roll'.
- Otherwise, their status is 'Satisfactory'.
- Hint: Use the .loc[] method to filter and assign values based on the condition. This method is excellent for updating specific rows based on logic.
This task reinforces the use of conditional logic with the .loc[] method, a key tool for data manipulation in Pandas. It teaches how to update specific rows or columns based on conditions, without needing to use apply() or lambda.
Note: A more advanced solution for this kind of tasks is to use the apply() and lambada functions. In this case,  the lambda function is used inside the apply() function to create a new column, 'Status', based on the values in the 'Average_Grade' column of the DataFrame. You can search the web to find more information in this regard.

In [14]:
# Create a new 'Status' column initialized with a default value
df['Status'] = 'Satisfactory'

# Update the status to 'Honor Roll' for students with an average grade of 88 or higher
df.loc[df['Average_Grade'] >= 88, 'Status'] = 'Honor Roll'

# Print the DataFrame to check the added 'Status' column
print(df)

  Student Name             Major  Grade_Math  Grade_Science  \
0         Anna           Biology          85             91   
1        Brian  Computer Science          92             85   
2        Clara           Physics          78             95   
3        Derek  Computer Science          88             82   
4        Eliza           Biology          90             88   

   Attendance_Percentage  Average_Grade        Status  
0                     95           88.0    Honor Roll  
1                     88           88.5    Honor Roll  
2                     98           86.5  Satisfactory  
3                     91           85.0  Satisfactory  
4                     85           89.0    Honor Roll  


#### Optional: Add a Column 'Status' Based on Performance (Using apply() and lambda)¶
- We will calculate the 'Status' column where:
Students with an Average_Grade of 88 or higher get a status of 'Honor Roll'.
Students with an Average_Grade below 88 get a status of 'Satisfactory'.
- Students with an Average_Grade of 88 or higher get a status of 'Honor Roll'.
- Students with an Average_Grade below 88 get a status of 'Satisfactory'.

In [15]:
# Create a new 'Status' column using apply and a lambda function
df['Status'] = df['Average_Grade'].apply(lambda row: 'Honor Roll' if row >= 88 else 'Satisfactory')

# Print the DataFrame to check the added 'Status' column
print(df)

  Student Name             Major  Grade_Math  Grade_Science  \
0         Anna           Biology          85             91   
1        Brian  Computer Science          92             85   
2        Clara           Physics          78             95   
3        Derek  Computer Science          88             82   
4        Eliza           Biology          90             88   

   Attendance_Percentage  Average_Grade        Status  
0                     95           88.0    Honor Roll  
1                     88           88.5    Honor Roll  
2                     98           86.5  Satisfactory  
3                     91           85.0  Satisfactory  
4                     85           89.0    Honor Roll  


### Good Luck!¶