Releases: DaoyuanLi2816/can-i-finetune-this
Releases · DaoyuanLi2816/can-i-finetune-this
canifinetune 0.1.1
Docs-only patch release — first release published via PyPI trusted publishing.
- README masthead: SVG hero banner (also rendered on the PyPI project page), centered badges
- Architecture diagram now uses an absolute URL so it renders on PyPI
- No code changes since 0.1.0
Install: pip install canifinetune
v0.1.0 — first release
First release. Estimate, benchmark, and generate fine-tuning recipes for
LLMs on consumer GPUs.
What's included
- `canifinetune doctor` — environment + GPU summary.
- `canifinetune estimate` — pure-math feasibility with full memory
breakdown (weights, trainable params, gradients, optimizer states,
activations, CUDA / fragmentation, safety margin). - `canifinetune recommend` — auto-search a feasible config for your card.
- `canifinetune bench` — real PyTorch + PEFT + bitsandbytes + TRL
training step on your GPU. Captures peak reserved VRAM, tok/s, step time. - `canifinetune calibrate` — closes the predict ↔ measure gap with a
multiplicative correction fit to the activations term. - `canifinetune recipe` — generates a ready-to-run `training.py` for the
chosen config. - `canifinetune report` / `canifinetune compare` — markdown tables across
runs.
RTX 4080 baselines
Real measurements committed in `docs/rtx4080_baselines.md`:
| model | method | seq_len | measured peak | tok/sec |
|---|---|---|---|---|
| `sshleifer/tiny-gpt2` (smoke) | lora | 128 | 0.12 GB | 1735 |
| `Qwen/Qwen2.5-0.5B-Instruct` | qlora | 1024 | 3.30 GB | 1995 |
| `Qwen/Qwen2.5-1.5B-Instruct` | qlora | 1024 | 4.36 GB | 1352 |
| `Qwen/Qwen2.5-1.5B-Instruct` | qlora | 2048 | 7.10 GB | 1470 |
| `Qwen/Qwen2.5-3B-Instruct` | qlora | 1024 | 5.54 GB | 1158 |
Tested on
- Python 3.10 / 3.11 / 3.12 (CI matrix)
- Linux + Windows (WSL not required for the math layer; bench needs CUDA)