Skip to content

Eric-Ristol/grade-predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Student Performance Predictor

A small machine-learning project that predicts a student's final grade (0-10) from a few study habits, and compares several regression algorithms to see which one does best.

Built as a first from-scratch ML project after the Coursera Machine Learning specialization. Style kept close to my Habit Tracker IU project: plain functions, string paths, inline comments, no type hints.

Features

  • Generates its own synthetic dataset (so the repo doesn't need external data).
  • Trains 4 algorithms and prints a comparison table:
    • Linear Regression
    • KNN (k=5)
    • Decision Tree
    • Random Forest
  • Saves the best model + scaler to models/ with joblib.
  • Predicts a new student's grade from the terminal.
  • Produces a true vs predicted plot in plots/.
  • Has pytest tests that check the pipeline end-to-end.

Required packages

pip install -r requirements.txt

If the above doesn't work, try pip3 instead.

How to run

The fastest way is to launch the menu:

python main.py

That gives you a CLI with these options:

I.   Generate a new dataset
II.  Explore the dataset (print head + stats)
III. Train and compare models
IV.  Predict a student's final grade
V.   Quit

A typical first run is: I -> II -> III -> IV.

You can also run each script on its own:

python train.py       # generates data if needed, trains everything, saves best
python predict.py     # asks for values in the terminal, prints a prediction

Running the tests

pytest -q

All tests use the real code path (no mocks), so running them also regenerates the dataset and trains the models. Safe to re-run any time.

Files

student-performance-predictor/
    data.py               # generate / load / split / scale the dataset
    train.py              # build models, train, evaluate, save the best
    predict.py            # load saved model and predict on a new student
    main.py               # CLI menu
    test_pipeline.py      # pytest tests
    requirements.txt
    .gitignore
    data/                 # the generated students.csv lives here
    models/               # best_model.joblib, scaler.joblib, comparison.csv
    plots/                # true_vs_predicted.png

What I learned

  • How to go from raw features -> scaled features -> trained model -> saved model.
  • Why scaling only the training data (not the test data) matters (data leakage).
  • Why we compare several algorithms: sometimes the simple Linear Regression wins, sometimes the Random Forest does. Depends on the data.
  • The three regression metrics: MAE, RMSE, R^2.

Author

Eric Ristol - 1st year Bachelor in Artificial Intelligence, UAB.

About

Student exam performance predictor using scikit-learn. Compares Linear Regression, Random Forest, and Gradient Boosting models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages