Student Performance Predictor

A small machine-learning project that predicts a student's final grade (0-10) from a few study habits, and compares several regression algorithms to see which one does best.

Built as a first from-scratch ML project after the Coursera Machine Learning specialization. Style kept close to my Habit Tracker IU project: plain functions, string paths, inline comments, no type hints.

Features

Generates its own synthetic dataset (so the repo doesn't need external data).
Trains 4 algorithms and prints a comparison table:
- Linear Regression
- KNN (k=5)
- Decision Tree
- Random Forest
Saves the best model + scaler to models/ with joblib.
Predicts a new student's grade from the terminal.
Produces a true vs predicted plot in plots/.
Has pytest tests that check the pipeline end-to-end.

Required packages

pip install -r requirements.txt

If the above doesn't work, try pip3 instead.

How to run

The fastest way is to launch the menu:

python main.py

That gives you a CLI with these options:

I.   Generate a new dataset
II.  Explore the dataset (print head + stats)
III. Train and compare models
IV.  Predict a student's final grade
V.   Quit

A typical first run is: I -> II -> III -> IV.

You can also run each script on its own:

python train.py       # generates data if needed, trains everything, saves best
python predict.py     # asks for values in the terminal, prints a prediction

Running the tests

pytest -q

All tests use the real code path (no mocks), so running them also regenerates the dataset and trains the models. Safe to re-run any time.

Files

student-performance-predictor/
    data.py               # generate / load / split / scale the dataset
    train.py              # build models, train, evaluate, save the best
    predict.py            # load saved model and predict on a new student
    main.py               # CLI menu
    test_pipeline.py      # pytest tests
    requirements.txt
    .gitignore
    data/                 # the generated students.csv lives here
    models/               # best_model.joblib, scaler.joblib, comparison.csv
    plots/                # true_vs_predicted.png

What I learned

How to go from raw features -> scaled features -> trained model -> saved model.
Why scaling only the training data (not the test data) matters (data leakage).
Why we compare several algorithms: sometimes the simple Linear Regression wins, sometimes the Random Forest does. Depends on the data.
The three regression metrics: MAE, RMSE, R^2.

Author

Eric Ristol - 1st year Bachelor in Artificial Intelligence, UAB.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Student Performance Predictor

Features

Required packages

How to run

Running the tests

Files

What I learned

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
models		models
plots		plots
.gitignore		.gitignore
README.md		README.md
data.py		data.py
main.py		main.py
predict.py		predict.py
requirements.txt		requirements.txt
test_pipeline.py		test_pipeline.py
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

Student Performance Predictor

Features

Required packages

How to run

Running the tests

Files

What I learned

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages