Skip to content

Releases: DaoyuanLi2816/can-i-finetune-this

canifinetune 0.1.1

10 Jun 07:14
4c89527

Choose a tag to compare

Docs-only patch release — first release published via PyPI trusted publishing.

  • README masthead: SVG hero banner (also rendered on the PyPI project page), centered badges
  • Architecture diagram now uses an absolute URL so it renders on PyPI
  • No code changes since 0.1.0

Install: pip install canifinetune

v0.1.0 — first release

16 May 20:34

Choose a tag to compare

First release. Estimate, benchmark, and generate fine-tuning recipes for
LLMs on consumer GPUs.

What's included

  • `canifinetune doctor` — environment + GPU summary.
  • `canifinetune estimate` — pure-math feasibility with full memory
    breakdown (weights, trainable params, gradients, optimizer states,
    activations, CUDA / fragmentation, safety margin).
  • `canifinetune recommend` — auto-search a feasible config for your card.
  • `canifinetune bench` — real PyTorch + PEFT + bitsandbytes + TRL
    training step on your GPU. Captures peak reserved VRAM, tok/s, step time.
  • `canifinetune calibrate` — closes the predict ↔ measure gap with a
    multiplicative correction fit to the activations term.
  • `canifinetune recipe` — generates a ready-to-run `training.py` for the
    chosen config.
  • `canifinetune report` / `canifinetune compare` — markdown tables across
    runs.

RTX 4080 baselines

Real measurements committed in `docs/rtx4080_baselines.md`:

model method seq_len measured peak tok/sec
`sshleifer/tiny-gpt2` (smoke) lora 128 0.12 GB 1735
`Qwen/Qwen2.5-0.5B-Instruct` qlora 1024 3.30 GB 1995
`Qwen/Qwen2.5-1.5B-Instruct` qlora 1024 4.36 GB 1352
`Qwen/Qwen2.5-1.5B-Instruct` qlora 2048 7.10 GB 1470
`Qwen/Qwen2.5-3B-Instruct` qlora 1024 5.54 GB 1158

Tested on

  • Python 3.10 / 3.11 / 3.12 (CI matrix)
  • Linux + Windows (WSL not required for the math layer; bench needs CUDA)