🎓 Academic Performance Regression Model

Predicting student exam outcomes using machine learning and educational analytics.

📌 Purpose

The goal of this project is to build a regression model that predicts a student's final exam grade (G3) using behavioral, academic, and lifestyle features from the UCI Student Performance Dataset. Beyond prediction accuracy, this project focuses on understanding which factors most influence academic performance, and how data-driven insights can support early intervention for at-risk students.

📂 Dataset

Source: UCI Machine Learning Repository
- File used: student-mat.csv
- Samples: 395
- Target variable: G3 (final grade, 0–20)

The dataset includes categories such as:

Academic: study time, past grades (G1, G2), absences
Family/Social: family support, living situation
Lifestyle: weekday/weekend alcohol use
Demographic: age, parental education

🛠️ Techniques Used

Data Cleaning & Exploration (EDA)
Feature Engineering
- avg_previous_grade = (G1 + G2) / 2
- Weekalc = (Dalc + Walc) / 2
One-Hot Encoding for categorical variables
Feature Scaling with StandardScaler
Train/Test Split (test_size=0.2, random_state=42)
Regression Models
- Baseline: Linear Regression
- Improved: Polynomial Regression (degree = 2)
Evaluation Metrics
- MAE, RMSE, R² score
Model Diagnostics
- Correlation heatmap
- Predictions vs Actual
- Residual plots

🔎 Approach

1. Exploratory Data Analysis

Inspected dataset structure with .info() and .describe()
Checked distributions and missing values
Identified key predictors through a correlation heatmap → avg_previous_grade strongly correlated with G3

2. Feature Engineering

Created two meaningful features based on domain intuition:

Feature	Description
`avg_previous_grade`	Average of G1 and G2 — captures academic trend
`Weekalc`	Combined alcohol use — lifestyle pattern indicator

3. Preprocessing

One-hot encoded categorical columns (schoolsup)
Scaled numerical features using StandardScaler
Split into training and testing sets before scaling

4. Modeling

Trained two models:

Baseline Linear Regression
Polynomial Regression (degree=2) Used PolynomialFeatures() to capture mild nonlinearity.

5. Evaluation

Evaluated using MAE, RMSE, and R². (Results are shown below.)

6. Visualization

Generated three key plots:

Correlation heatmap

Actual vs Predicted scatterplots

Residual plots for error analysis

📈 Key Insights

🔥 Correlation Heatmap

avg_previous_grade was by far the strongest predictor of final exam performance.
Alcohol consumption (Weekalc) had a mild negative correlation.
Absences and study time showed weak linear relationships.

Interpretation: Past academic performance matters far more than lifestyle or support features.

🎯 Model Performance

Model	MAE	RMSE	R²
Baseline Linear Regression	1.6751	2.2543	0.7522
Polynomial Regression (degree=2)	1.6076	2.2319	0.7571

The polynomial model offered a small but consistent improvement across all metrics.

📊 Predictions vs Actual

Both models predicted well, with the polynomial model producing slightly tighter clustering around the ideal line.

🪫 Residual Behavior

Residuals were centered around zero with no strong pattern, indicating:

No major bias
Linearity assumptions hold reasonably well
Mild heteroscedasticity at high grades (common in student datasets)

🧪 Final Results

✔ Achieved an R² of ~0.76 on the polynomial model
✔ Predicted final grades within ~1.6 points MAE
✔ Engineered features noticeably improved performance
✔ Past grades dominate prediction strength
✔ Visualization confirmed the model’s reliability and interpretability

🧩 Project Value (Portfolio Impact)

This project demonstrates the ability to:

Work with real-world educational datasets
Apply practical regression techniques
Engineer meaningful features from domain knowledge
Evaluate and compare multiple models
Create insightful visualizations for interpretability
Write clean, reproducible ML code
Explain model behavior clearly and professionally

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
Correlation Heatmap.png		Correlation Heatmap.png
README.md		README.md
Residual Plot - Baseline.png		Residual Plot - Baseline.png
Residual Plot - Polynomial.png		Residual Plot - Polynomial.png
Scatterplot - Baseline.png		Scatterplot - Baseline.png
Scatterplot - Polynomial.png		Scatterplot - Polynomial.png
student-mat.csv		student-mat.csv
student_performance.ipynb		student_performance.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎓 Academic Performance Regression Model

📌 Purpose

📂 Dataset

🛠️ Techniques Used