# 📊 Student Performance Analysis

An exploratory data analysis (EDA) to understand how gender, parental education, and test preparation influence student scores.

## 1. Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import altair as alt
import missingno as msno
import warnings

%matplotlib inline
warnings.filterwarnings("ignore")

## 2. Load Dataset

In [None]:
df = pd.read_csv('StudentsPerformance.csv')
df.shape, df.head()

## 3. Dataset Overview

In [None]:
# Unique values and counts for categorical features
for col in df.columns:
    print(f"{col}: {df[col].unique()}\n")

df['gender'].value_counts()
df['test preparation course'].value_counts()
df['parental level of education'].value_counts()

## 4. Descriptive Statistics

In [None]:
df.groupby('gender')[['math score', 'reading score', 'writing score']].mean()
df.groupby('test preparation course')[['math score', 'reading score', 'writing score']].mean()
df.groupby('parental level of education')[['math score', 'reading score', 'writing score']].mean()

## 5. Derived Metrics: Average Score and Grade

In [None]:
df['average score'] = df[['math score', 'reading score', 'writing score']].mean(axis=1)

def assign_grade(score):
    if score >= 90: return 'A'
    elif score >= 80: return 'B'
    elif score >= 70: return 'C'
    elif score >= 60: return 'D'
    else: return 'F'

df['grade'] = df['average score'].apply(assign_grade)

## 6. Distributions and Correlations

In [None]:
df[['math score', 'reading score', 'writing score']].hist(bins=15, figsize=(12, 5))

sns.heatmap(df.corr(numeric_only=True), annot=True, cmap='coolwarm')
plt.title('Correlation Between Scores')
plt.show()

## 7. Do Male and Female Students Perform Equally Well?

In [None]:
subjects = ['math score', 'reading score', 'writing score']
for subject in subjects:
    sns.boxplot(x='gender', y=subject, data=df, palette='Set2')
    plt.title(f'{subject.title()} by Gender')
    plt.show()

**Insight:** Female students perform better in reading and writing, while male students score slightly higher in math.

## 8. Does Test Preparation Improve Scores?

In [None]:
for subject in subjects:
    sns.boxplot(x='test preparation course', y=subject, data=df, palette='pastel')
    plt.title(f'{subject.title()} vs Test Preparation Course')
    plt.show()

**Insight:** Students who completed the test preparation course scored higher across all subjects.

## 9. Does Parental Education Influence Performance?

In [None]:
edu_order = [
    'some high school', 'high school', 'some college',
    "associate's degree", "bachelor's degree", "master's degree"
]

for subject in subjects:
    sns.boxplot(x='parental level of education', y=subject, data=df,
                order=edu_order, palette='coolwarm')
    plt.xticks(rotation=45)
    plt.title(f'{subject.title()} vs Parental Education Level')
    plt.show()

**Insight:** Students with more highly educated parents, especially at the bachelor’s and master’s levels, perform better in reading and writing.

## 10. Optional: Interactive Chart with Altair

In [None]:
df['test preparation course'] = df['test preparation course'].map({
    'none': 'No Prep',
    'completed': 'Prep Completed'
})

choose_selection = alt.selection_point(
    name='choose',
    fields=['grade'],
    bind=alt.binding_select(options=sorted(df['grade'].unique()), name='Select Grade')
)

alt.Chart(df).mark_arc().encode(
    theta=alt.Theta(field="grade", type="nominal", aggregate="count"),
    color=alt.Color(field="test preparation course", type="nominal"),
    tooltip=[
        alt.Tooltip(field="test preparation course", type="nominal"),
        alt.Tooltip(field="average score", type="quantitative", aggregate='mean'),
    ]
).add_params(
    choose_selection
).transform_filter(
    choose_selection
).properties(
    title="Average Scores by Test Preparation Course"
).interactive()

## 📌 Summary of Findings

| Question | Conclusion |
|----------|------------|
| **Do female and male students perform equally well across subjects?** | No. Female students perform better in reading and writing, males slightly better in math. |
| **Does completing a test preparation course improve performance?** | Yes. Students who completed the course perform better overall. |
| **How does parental education affect student performance?** | Higher parental education correlates with better student outcomes, especially in reading and writing. |