Releases · DaoyuanLi2816/can-i-finetune-this

First release. Estimate, benchmark, and generate fine-tuning recipes for
LLMs on consumer GPUs.

What's included

`canifinetune doctor` — environment + GPU summary.
`canifinetune estimate` — pure-math feasibility with full memory
breakdown (weights, trainable params, gradients, optimizer states,
activations, CUDA / fragmentation, safety margin).
`canifinetune recommend` — auto-search a feasible config for your card.
`canifinetune bench` — real PyTorch + PEFT + bitsandbytes + TRL
training step on your GPU. Captures peak reserved VRAM, tok/s, step time.
`canifinetune calibrate` — closes the predict ↔ measure gap with a
multiplicative correction fit to the activations term.
`canifinetune recipe` — generates a ready-to-run `training.py` for the
chosen config.
`canifinetune report` / `canifinetune compare` — markdown tables across
runs.

Real measurements committed in `docs/rtx4080_baselines.md`:

model	method	seq_len	measured peak	tok/sec
`sshleifer/tiny-gpt2` (smoke)	lora	128	0.12 GB	1735
`Qwen/Qwen2.5-0.5B-Instruct`	qlora	1024	3.30 GB	1995
`Qwen/Qwen2.5-1.5B-Instruct`	qlora	1024	4.36 GB	1352
`Qwen/Qwen2.5-1.5B-Instruct`	qlora	2048	7.10 GB	1470
`Qwen/Qwen2.5-3B-Instruct`	qlora	1024	5.54 GB	1158