A small machine-learning project that predicts a student's final grade (0-10) from a few study habits, and compares several regression algorithms to see which one does best.
Built as a first from-scratch ML project after the Coursera Machine Learning specialization. Style kept close to my Habit Tracker IU project: plain functions, string paths, inline comments, no type hints.
- Generates its own synthetic dataset (so the repo doesn't need external data).
- Trains 4 algorithms and prints a comparison table:
- Linear Regression
- KNN (k=5)
- Decision Tree
- Random Forest
- Saves the best model + scaler to
models/withjoblib. - Predicts a new student's grade from the terminal.
- Produces a
true vs predictedplot inplots/. - Has
pytesttests that check the pipeline end-to-end.
pip install -r requirements.txt
If the above doesn't work, try pip3 instead.
The fastest way is to launch the menu:
python main.py
That gives you a CLI with these options:
I. Generate a new dataset
II. Explore the dataset (print head + stats)
III. Train and compare models
IV. Predict a student's final grade
V. Quit
A typical first run is: I -> II -> III -> IV.
You can also run each script on its own:
python train.py # generates data if needed, trains everything, saves best
python predict.py # asks for values in the terminal, prints a prediction
pytest -q
All tests use the real code path (no mocks), so running them also regenerates the dataset and trains the models. Safe to re-run any time.
student-performance-predictor/
data.py # generate / load / split / scale the dataset
train.py # build models, train, evaluate, save the best
predict.py # load saved model and predict on a new student
main.py # CLI menu
test_pipeline.py # pytest tests
requirements.txt
.gitignore
data/ # the generated students.csv lives here
models/ # best_model.joblib, scaler.joblib, comparison.csv
plots/ # true_vs_predicted.png
- How to go from raw features -> scaled features -> trained model -> saved model.
- Why scaling only the training data (not the test data) matters (data leakage).
- Why we compare several algorithms: sometimes the simple Linear Regression wins, sometimes the Random Forest does. Depends on the data.
- The three regression metrics: MAE, RMSE, R^2.
Eric Ristol - 1st year Bachelor in Artificial Intelligence, UAB.