# Project Title: Student Performance Prediction Using Decision Tree Classifier

Author: Md. Maruf
Date: 26-12-2025

Project Overview:
This project aims to predict the performance of students based on various features such as 
Student_ID,Age,Gender,Class,Study_Hours_Per_Day,Attendance_Percentage,and other relevant attributes. The main goal is to classify students into different performance categories using a Decision Tree Classifier.

Dataset:
- Source: Dataset downloaded from kaggle.
- Features: Student_ID,Age,Gender,Class,Study_Hours_Per_Day,Attendance_Percentage,       Parental_Education,Internet_Access,Extracurricular_Activities,Math_Score,Science_Score,English_Score,Previous_Year_Score,Final_Percentage,Performance_Level,Pass_Fail.
- Target: Student performance category (Pass or Fail)

Objectives:
1. Load and explore the student performance dataset.
2. Preprocess the data (handle missing values, encode categorical variables, normalize if needed).
3. Train a Decision Tree Classifier model.
4. Evaluate the model's performance using metrics like accuracy, recall, and F1-score, Classification report, Confussion Metrics, ruc auc score.
5. Visualize the decision tree for better understanding and interpretation.

Tools & Libraries:
- Python 3.x
- pandas, numpy
- scikit-learn
- matplotlib, seaborn (for visualization)

Expected Outcome:
- A trained Decision Tree model capable of predicting student performance.
- Insights into which features most influence student performance.
- Visual representation of the decision-making process of the tree.

Notes:
- Ensure to split the dataset into training and testing sets to avoid overfitting.
- Hyperparameter tuning may be applied to improve model accuracy.


In [1]:
# Importing Necessary libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score, f1_score, recall_score, precision_score, roc_auc_score, auc, roc_curve

In [3]:
#Plots the ROC curve and calculates the AUC for the given actual and predicted probabilities
def plot_roc_curve(y_actual_valyues: np.ndarray, predicted_probability: np.ndarray) -> None:
    
    #Calculate FPR, TPR, Thresholds
    fpr, tpr, thresholds = roc_curve(y_actual_valyues, predicted_probability)

    #Calculate AUC(Area Under Curve)
    roc_auc = roc_auc_score(y_actual_valyues, predicted_probability)

    #plot roc curve 
    plt.figure(figsize=(8,8))
    plt.plot(fpr, tpr, color='balck', lw=2, label= f'Roc Curve (Auc = {roc_auc:.3f})')
    plt.plot([0,1], [0,1], color='green', lw=2, linestyle='--', label='Random Guess')

    #Hightlight a specific threshold
    threshold_index = np.where(thresholds == 0.5)[0]
    if len(threshold_index) > 0:
        plt.scatter(fpr[threshold_index], tpr[threshold_index], color='yellow', label='Therehold = 0.5')
        