# 🎓 Student Performance Data Analysis

This notebook demonstrates how to perform basic data loading, cleaning, analysis, and visualization on the student performance dataset using only **pandas**, **numpy**, **matplotlib**, and **seaborn**.

**Restrictions Followed:**
- Only basic pandas and numpy for data handling and calculations.
- No use of external statistical libraries (like scipy).
- Plots are created using matplotlib and seaborn only.
- Proper markdown documentation included.


## 🔹 Step 1: Load the Dataset
We use `pandas` to load the student performance data from a CSV file named `student-mat.csv`. Ensure this file is in the same directory as the notebook.

In [None]:
import pandas as pd
import numpy as np

# Load dataset
df = pd.read_csv("student-mat.csv")

# Preview the dataset
df.head()

## 🔹 Step 2: Explore the Dataset
We'll check for missing values, data types, and the shape of the dataset.

In [None]:
# Missing values
df.isnull().sum()

In [None]:
# Data types
df.dtypes

In [None]:
# Shape of the dataset
df.shape

## 🔹 Step 3: Data Cleaning
- Remove duplicate rows
- Handle any missing values using the median (if any)

In [None]:
# Remove duplicates
df = df.drop_duplicates()

# Fill missing numeric values with median (if any exist)
df = df.fillna(df.median(numeric_only=True))

## 🔹 Step 4: Data Analysis
We will now answer the following questions using basic `pandas` and `numpy` operations:
1. What is the average final grade (G3)?
2. How many students scored above 15 in G3?
3. Is there a correlation between study time and final grade?
4. Which gender has a higher average final grade?

In [None]:
# 1. Average G3
avg_g3 = np.mean(df['G3'])
print(f"Average final grade (G3): {avg_g3:.2f}")

In [None]:
# 2. Count of students scoring above 15
above_15_count = df[df['G3'] > 15].shape[0]
print(f"Number of students scoring above 15: {above_15_count}")

In [None]:
# 3. Correlation between studytime and G3 using numpy
correlation = np.corrcoef(df['studytime'], df['G3'])[0, 1]
print(f"Correlation between study time and G3: {correlation:.2f}")

In [None]:
# 4. Average G3 by gender
avg_by_gender = df.groupby('sex')['G3'].mean()
avg_by_gender

## 🔹 Step 5: Data Visualization
We'll now use `matplotlib` and `seaborn` to visualize the results.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Histogram of G3
plt.figure(figsize=(8, 5))
plt.hist(df['G3'], bins=10, edgecolor='black', color='skyblue')
plt.title("Histogram of Final Grades (G3)")
plt.xlabel("Final Grade (G3)")
plt.ylabel("Number of Students")
plt.grid(True)
plt.show()

In [None]:
# Scatter plot: Study Time vs G3
plt.figure(figsize=(8, 5))
plt.scatter(df['studytime'], df['G3'], alpha=0.6, color='orange')
plt.title("Study Time vs Final Grade (G3)")
plt.xlabel("Study Time")
plt.ylabel("Final Grade (G3)")
plt.grid(True)
plt.show()

In [None]:
# Bar chart: Average G3 by Gender
plt.figure(figsize=(6, 5))
sns.barplot(data=df, x='sex', y='G3', ci=None, palette='pastel')
plt.title("Average Final Grade (G3) by Gender")
plt.xlabel("Gender")
plt.ylabel("Average G3")
plt.grid(axis='y')
plt.show()