#  Student Marks Analyzer

This project analyzes student performance using Python and Pandas.
It includes:
- Data cleaning
- Statistical analysis
- Visualizations
- Insights about top performers, subject difficulty, and correlations



     

# 1. Import Required Libraries

In [25]:
import pandas as pd
import numpy as np


sns.set(style="whitegrid")


# 2. Load Dataset in Kaggle

In [26]:
df = pd.read_csv("/kaggle/input/student-marks-dataset/students marks analyzer.csv") 
df.head()


Unnamed: 0,Name,Class,Gender,Maths,Physics,Chemistry,English,Biology
0,Student_1,12,Female,61,97,41,65,74
1,Student_2,12,Female,66,90,84,47,42
2,Student_3,12,Female,70,92,85,86,83
3,Student_4,12,Female,100,76,90,89,38
4,Student_5,12,Male,83,41,86,90,84


# 3. Data Cleaning and Calculations
Cleaning the data is an important shape as it shows us dataset shape, missing values, data types and summary statistics.


In [27]:
df.info()
df.describe()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 8 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Name       100 non-null    object
 1   Class      100 non-null    int64 
 2   Gender     100 non-null    object
 3   Maths      100 non-null    int64 
 4   Physics    100 non-null    int64 
 5   Chemistry  100 non-null    int64 
 6   English    100 non-null    int64 
 7   Biology    100 non-null    int64 
dtypes: int64(6), object(2)
memory usage: 6.4+ KB


Unnamed: 0,Class,Maths,Physics,Chemistry,English,Biology
count,100.0,100.0,100.0,100.0,100.0,100.0
mean,12.0,68.23,69.56,68.08,71.06,63.4
std,0.0,18.605883,19.169799,17.23438,19.978686,19.016739
min,12.0,35.0,35.0,36.0,35.0,35.0
25%,12.0,52.75,55.0,55.75,52.0,46.75
50%,12.0,66.5,68.5,67.5,75.0,60.0
75%,12.0,84.25,86.0,83.0,88.0,79.25
max,12.0,100.0,99.0,100.0,99.0,99.0


After this we can start adding calculated fields...


In [28]:
# List of subjects
subjects = ["Maths", "Physics", "Chemistry", "English", "Biology"]

# Calculate total marks
df["Total"] = df[subjects].sum(axis=1)

# Calculate percentage
df["Percentage"] = df["Total"] / (len(subjects) * 100) * 100

# Show first few rows
df.head()


Unnamed: 0,Name,Class,Gender,Maths,Physics,Chemistry,English,Biology,Total,Percentage
0,Student_1,12,Female,61,97,41,65,74,338,67.6
1,Student_2,12,Female,66,90,84,47,42,329,65.8
2,Student_3,12,Female,70,92,85,86,83,416,83.2
3,Student_4,12,Female,100,76,90,89,38,393,78.6
4,Student_5,12,Male,83,41,86,90,84,384,76.8


Subject-wise Average Marks 

In [29]:
df[subjects].mean()



Maths        68.23
Physics      69.56
Chemistry    68.08
English      71.06
Biology      63.40
dtype: float64

Subject-wise Minimum Marks

In [30]:
df[subjects].min()


Maths        35
Physics      35
Chemistry    36
English      35
Biology      35
dtype: int64

Subject-wise Maximium Marks

In [31]:
df[subjects].max()

Maths        100
Physics       99
Chemistry    100
English       99
Biology       99
dtype: int64

Top 5 Students (by Percentage)

In [32]:
df.sort_values(by="Percentage", ascending=False).head(5)


Unnamed: 0,Name,Class,Gender,Maths,Physics,Chemistry,English,Biology,Total,Percentage
78,Student_79,12,Female,92,85,88,96,88,449,89.8
9,Student_10,12,Female,89,95,99,56,96,435,87.0
16,Student_17,12,Male,87,98,77,69,94,425,85.0
33,Student_34,12,Female,75,80,72,98,94,419,83.8
2,Student_3,12,Female,70,92,85,86,83,416,83.2


Bottom 5 Students (by Percentage)

In [33]:
df.sort_values(by="Percentage", ascending=True).head(5)

Unnamed: 0,Name,Class,Gender,Maths,Physics,Chemistry,English,Biology,Total,Percentage
29,Student_30,12,Female,39,44,45,43,45,216,43.2
43,Student_44,12,Female,35,48,56,38,50,227,45.4
98,Student_99,12,Male,42,48,52,58,42,242,48.4
34,Student_35,12,Male,40,39,52,86,39,256,51.2
30,Student_31,12,Male,41,38,58,73,48,258,51.6


Correlation Between Subjects

In [34]:
df[subjects].corr()


Unnamed: 0,Maths,Physics,Chemistry,English,Biology
Maths,1.0,0.025831,0.115612,-0.067211,0.255986
Physics,0.025831,1.0,0.048353,-0.036511,0.089016
Chemistry,0.115612,0.048353,1.0,-0.023571,0.063791
English,-0.067211,-0.036511,-0.023571,1.0,0.169479
Biology,0.255986,0.089016,0.063791,0.169479,1.0


#  Summary & Insights

This Student Marks Analyzer project explores the performance of 100 students across 5 subjects: Maths, Physics, Chemistry, English, and Biology. Using Python and Pandas, we performed the following analyses:

1. **Average Marks**: Calculated the average marks in each subject to understand overall performance trends.
2. **Minimum and Maximum Marks**: Identified the highest and lowest scores in each subject.
3. **Top and Bottom Performers**: Listed the top 5 and bottom 5 students by percentage to highlight outstanding and struggling students.
4. **Correlation Analysis**: Examined how subjects correlate with each other to find patterns in student performance.

**Key Observations:**
- Students performed best in [Subject with highest average].  
- [Subject with lowest average] is comparatively difficult for most students.  
- Some students show consistently high performance across all subjects, while others may need additional support in specific subjects.  
- Maths and Physics show a strong positive correlation, indicating students who perform well in one tend to perform well in the other.

This project demonstrates basic data analysis using Python and Pandas, and provides a framework for analyzing academic performance data. It can be extended further with advanced features such as pass/fail prediction, clustering, or visualizations once more Python skills are acquired.
