This repository contains the minimal public code needed to generate the dataset, train the model locally on CPU, and benchmark saved checkpoints.
Included:
- training (
is_even/train.py) - data generation (
is_even/data.py) - benchmarking (
is_even/benchmark.py) - required support modules (
constants.py,modeling.py,tokenizer.py,__init__.py)
Not included:
- model weights
- internal research tooling
- prediction CLI convenience code
Published weights and the model card live at https://hf.co/SnurfyAI/is-even.
- Python 3.12
- a local environment that can install
torchandtransformers
From the repository root:
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install .This installs two console commands:
is-even-trainis-even-benchmark
If you prefer not to install the console scripts, you can run the modules directly:
python -m is_even.train
python -m is_even.benchmark --model-dir artifacts/is-evenThe default training command is:
is-even-trainWhat it does:
- creates
data/is_even_chatml.jsonlif it does not already exist - trains a CPU model on the generated train split
- evaluates on the validation split during training
- writes the trained model, tokenizer, and
training_summary.jsontoartifacts/is-even
Useful overrides:
is-even-train --dataset-path data/is_even_chatml.jsonl --output-dir artifacts/is-even --epochs 60 --batch-size 64 --learning-rate 1e-3The main knobs are:
--epochs--batch-size--learning-rate--train-samples-per-last-digit--validation-samples-per-last-digit--layers--heads--embedding-size
Once you have a model directory, run:
is-even-benchmark --model-dir artifacts/is-even --examples 1000The benchmark prints:
- total examples evaluated
- accuracy
- examples per second
- up to five incorrect predictions
This repository does not include weights. To benchmark the published model, download the files from https://hf.co/SnurfyAI/is-even into a local directory and point --model-dir at that directory:
is-even-benchmark --model-dir path/to/downloaded-is-even-model --examples 1000You can also import the package directly:
from is_even.data import ensure_dataset, number_is_even
from is_even.tokenizer import IsEvenTokenizer
print(number_is_even("42"))
summary = ensure_dataset("data/is_even_chatml.jsonl", IsEvenTokenizer())
print(summary.total_examples)