# Introduction: Student Marks Analysis

In this project, I analyze student performance using a dataset sourced from Kaggle. The dataset contains academic records of 250 students, including their marks in Science, English, History, and Maths.

## Objectives:

✔️ Calculate the total and average marks for each student.  
✔️ Identify the top-performing student across all sections.  
✔️ Find the top student in each section based on total marks.  
✔️ Compare student performance across different sections.  

This analysis will help understand overall student performance, identify trends, and highlight top achievers. 

# Importing libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# About Dataset

In [None]:
df = pd.read_csv("/kaggle/input/student-scores/student_scores.csv")
df.head()

In [None]:
df.info()

## Dataset Overview

>* Rows: 250 (each row represents a student)
>* Columns: 9 (before adding total and average)
>  
## Categories:

> Student Info: id, Name, Gender, Age, Section   
> Subjects & Scores: Science, English, History, Maths

## Checking for missing values

In [None]:
df.isnull().sum()

> There is no missing values in the dataset

## Checking for Duplicates

In [None]:
df.duplicated().sum()

> There is no duplicates in the dataset

# descriptive anlysis

In [None]:
df.describe()

## Key Insights

1. Performance Trends

>* The average scores for subjects are around 50%.
>* History has the highest average (52.27), while English has the lowest (47.98).

2. Score Distribution

>* The standard deviation shows high variability in scores, meaning some students scored very high while others scored very low.
>* Some students have perfect scores (100%), while others have very low scores (1%) in each subject.

3. Subject Difficulty

>*Since English has the lowest average score (47.98), it might be the most challenging subject for students.
>*Science and Maths have slightly higher scores, indicating a better overall performance in these subjects.

4. Age & Performance

>* Students are between 13 to 15 years old, with most being 14 years old (median age).


# Add total and average Columns

In [None]:
df['total']=df["Science"]+df['English']+df['History']+df["Maths"]
df['average']=df['total']/4
df.head()

# Data Analysis

In [None]:
df.columns

## Top Student Overall (Highest Total Marks)

In [None]:
top_student = df.loc[df['total'].idxmax()]
top_student

> **Dunn (ID 11, Section C)** achieved the highest total marks (361) with an impressive average of 90.25%. He scored a perfect 100 in Science and also performed well in English (93) and History (87).

## Top Scorer in Each Section

In [None]:
top_per_section = df.loc[df.groupby("Section")["total"].idxmax()]
top_per_section

### Insights:

>* **Dunn (Section C)** is the overall top scorer with 361 marks (90.25%).
>* **Patrizia (Section A)** is the best in Section A with 356 marks (89.00%).
>* **Ruddie (Section B)** leads Section B with 333 marks (83.25%).

## The Best Performer in Each Subject

In [None]:
# Best in science
best_science = df.loc[df['Science'].idxmax(), ['Name', 'Science','Section']]
best_science

> **Dunn**(section C) is best in science 100 out of 100

In [None]:
# best in english
best_english = df.loc[df['English'].idxmax(), ['Name', 'English','Section']]
best_english

> **Brandise**(Section A) is best in english with full score

In [None]:
# best in History
best_History = df.loc[df['History'].idxmax(), ['Name', 'History','Section']]
best_History

> **Drusi**(Section C) is best in History with full marks

In [None]:
# best in maths
best_maths = df.loc[df['Maths'].idxmax(), ['Name', 'Maths','Section']]
best_maths

> **David**(Section B) is best in maths with full marks

## Visualization of Average score

In [None]:
plt.figure(figsize=(8, 5))
sns.histplot(df['average'], bins=10, kde=True, color="skyblue")

# Customize the chart
plt.title("Distribution of Student Average Scores", fontsize=14)
plt.xlabel("Average Score", fontsize=12)
plt.ylabel("Number of Students", fontsize=12)
plt.grid(axis='y', linestyle="--", alpha=0.7)
plt.show()

plt.savefig('Distribution_of_student_average_scores.png')

 > normal distribution where most students have scores around 50-60, with fewer students at the extremes.

 # Conclusion for Student Marks Analysis

## Overall Top Scorer

The highest-scoring student is **Dunn (Section C)** with a total score of 361 and an average of 90.25%. Dunn performed exceptionally well across all subjects.

## Top Student in Each Section

* **Dunn (Section C)** – 361 marks (90.25%) → Overall Top Scorer  
* **Patrizia (Section A)** – 356 marks (89.00%) → Top in Section A  
* **Ruddie (Section B)** – 333 marks (83.25%) → Top in Section B  

## Top Scorer in Each Subject

**Subject**	- **Top Scorer** - **Section** - **Score**   

Science - **Dunn**	   - C - 100/100   
English - **Brandise** - A - 100/100   
History - **Drusi**	   - C - 100/100  
Maths	- **David**    - B - 100/100  

# Key Takeaways

* Dunn (Section C) is the overall top performer, leading in Science.
* Each section has strong students, with Section A and C showing competitive scores.
* Four students achieved full marks in their respective subjects, proving excellence in Science, English, History, and Maths.