
# 🎓 Student Performance Data Analysis

**Objective:** Analyze student exam scores using Python to derive insights and answer specific questions through visualizations and statistics.


In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


In [None]:

# Load the dataset (ensure 'student-mat.csv' is in the same folder)
df = pd.read_csv('student-mat.csv')
df.head()



## 🔍 Data Exploration

We check for missing values, data types, and dataset shape.


In [None]:

# Check for missing values
print(df.isnull().sum())

# Column data types
print(df.dtypes)

# Shape of dataset
print("Dataset shape:", df.shape)



## 🧹 Data Cleaning

We remove duplicate rows to avoid biased results. If any missing values were present, they would be handled appropriately.


In [None]:

# Remove duplicate entries
df = df.drop_duplicates()



## 📊 Data Analysis

We now answer the following:
1. What is the average score in math (G3)?
2. How many students scored above 15 in G3?
3. Is there a correlation between study time and G3?
4. Which gender has a higher average G3?


In [None]:

# 1. Average G3 score
average_g3 = df['G3'].mean()
print("Average G3 score:", average_g3)

# 2. Students scoring above 15 in G3
above_15 = df[df['G3'] > 15].shape[0]
print("Students scoring above 15:", above_15)

# 3. Correlation between study time and G3
correlation = df['studytime'].corr(df['G3'])
print("Correlation between study time and G3:", correlation)

# 4. Average G3 by gender
gender_avg = df.groupby("sex")["G3"].mean()
print(gender_avg)



## 📈 Data Visualization

We create histograms, scatter plots, and bar charts to visually interpret the results.


In [None]:

# 1. Histogram of final grades
plt.figure(figsize=(8,5))
plt.hist(df['G3'], bins=10, edgecolor='black')
plt.title('Distribution of Final Grades (G3)')
plt.xlabel('Final Grade')
plt.ylabel('Number of Students')
plt.show()

# 2. Scatter plot: study time vs G3
plt.figure(figsize=(8,5))
plt.scatter(df['studytime'], df['G3'], alpha=0.6)
plt.title('Study Time vs Final Grade')
plt.xlabel('Study Time (hours/week)')
plt.ylabel('Final Grade (G3)')
plt.grid(True)
plt.show()

# 3. Bar chart: average G3 by gender
gender_avg.plot(kind='bar', color=['skyblue', 'salmon'])
plt.title('Average Final Grade by Gender')
plt.xlabel('Gender')
plt.ylabel('Average Final Grade (G3)')
plt.xticks(rotation=0)
plt.show()



## 📝 Conclusion

- Average final grade (G3) is calculated and explored.
- Number of high scoring students (>15) is highlighted.
- A positive correlation exists between study time and grades.
- Female students have slightly higher average scores than males.
