Skip to content

JoshKeesee/Mini-GPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mini-GPT

A small conversational language model project with scripts for dataset preparation, hyperparameter analysis, training, CLI chat, and a Flask web chat UI.

Project Layout

  • analyze_dataset.py: computes recommended training hyperparameters.
  • prepare.py: builds corpus cache, tokenizer, train.bin, and meta.pkl.
  • train.py: trains the model and writes checkpoints.
  • chat.py: terminal chat using the latest model checkpoint.
  • app.py: Flask + Socket.IO web chat app.
  • artifacts/models/: saved model checkpoints and final weights.
  • artifacts/training_data/: generated tokenized training data and metadata.
  • artifacts/hyperparameters/: generated hyperparameter config.

Prerequisites

  • Python 3.10+
  • Optional but recommended: NVIDIA GPU + CUDA for faster training

Setup

  1. Create and activate a virtual environment.
python -m venv .venv
.\.venv\Scripts\Activate.ps1
  1. Install dependencies.
pip install torch numpy datasets flask flask-socketio sentencepiece

End-to-End Run

  1. Prepare data and tokenizer.
python prepare.py --rebuild-cache
  1. Analyze dataset and generate tuned hyperparameters.
python analyze_dataset.py --hours 18
  1. Train model.
python train.py
  1. Chat in terminal.
python chat.py
  1. Run web app.
python app.py

Then open http://127.0.0.1:5000.

Artifact Paths

These scripts now use the following output paths:

  • Model checkpoints and weights: artifacts/models/
  • Tokenized training data and metadata: artifacts/training_data/
  • Hyperparameters JSON: artifacts/hyperparameters/hyperparameters.json

Git Exclusions

This repository is configured to ignore:

  • tools/
  • data/personal/
  • takeout/
  • artifacts/models/
  • artifacts/training_data/
  • artifacts/hyperparameters/

Notes

  • train.py expects artifacts/training_data/meta.pkl and artifacts/training_data/train.bin from prepare.py.
  • app.py expects artifacts/models/checkpoint_best.pt to exist.
  • You can override initial checkpoint with INIT_CKPT, e.g. INIT_CKPT=artifacts/models/checkpoint_best.pt.

About

Train, run, and configure your own local AI from scratch! This includes a full frontend styled like the ChatGPT website, automatic hyperparameters finetuned for your specific device and training time, and full tokenization handling.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors