This project demonstrates how to predict students' final grades using Linear regression, Ridge Regression, and Lasso Regression. The dataset contains academic and demographic data about students, and the model aims to provide insights into factors influencing academic performance.
Student performance prediction can help educators identify at-risk students and implement interventions to improve academic outcomes. This project uses linear regression to:
- Analyze relationships between features (e.g., study time, attendance) and final grades.
- Predict student grades based on input data.
The dataset used in this project is sourced from UCI Machine Learning Repository - Student Performance Dataset. It includes attributes like:
- Demographic data (age, gender, etc.)
- Academic performance (grades, study time, failures, etc.)
- Social factors (family support, extracurricular activities, etc.)
Key features used in the prediction model include:
studytime: Weekly study time.failures: Number of past class failures.absences: Number of school absences.G1,G2: Grades from the first and second terms.
The project employs three different models:
- Linear Regression
- Ridge Regression
- Lasso Regression
- Data preprocessing: Cleaning and encoding categorical variables.
- Exploratory Data Analysis (EDA): Visualizing relationships between features.
- Model training: Fitting the models.
- Evaluation: Measuring performance using metrics like MAE, MSE, and R² score.
The models achieved the following results:
- Linear Regression
- R² Score: 0.73
- Mean Absolute Error (MAE): 1.62
- Mean Squared Error (MAE): 5.59
- Ridge Regression
- R² Score: 0.72
- Mean Absolute Error (MAE): 1.65
- Mean Squared Error (MAE): 5.65
- Lasso Regression
- R² Score: 0.76
- Mean Absolute Error (MAE): 1.42
- Mean Squared Error (MAE): 4.69
These results indicate the models perform well for predicting student final grades based on the given features.