loopz — Never lose loop progress again.

Add one decorator. Any Python loop auto-resumes from exactly where it crashed.

The Problem

You are processing 100,000 images. Your Colab session drops at 60,000.
You are training a model for 50 epochs. Your laptop dies at epoch 30.
You start over. Every single time.

loopz fixes this.

Install

pip install loopz

Quick Start

import loopz

@loopz.track("process_images", save_every=100)
def process(image_path):
    extract_and_save_features(image_path)

process(all_image_paths)   # 💥 crash at 60k?  run again → resumes at 60k ✅

That is the entire API for the common case.
One decorator. One argument. Done.

ML Training — Full State Save

loopz saves and restores your model weights, optimizer state, LR scheduler, GradScaler, and any accumulators living inside the loop — all automatically.

import loopz
import torch

model     = MyModel()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=50)

running_loss = [0.0]   # wrap in a list so loopz can restore it in-place
best_acc     = [0.0]

@loopz.track(
    "training",
    save_every  = 1,                                          # save every epoch
    state       = {"model": model, "optimizer": optimizer,
                   "scheduler": scheduler},
    loop_vars   = {"running_loss": running_loss,
                   "best_acc":     best_acc},
    notify      = print,                                      # or send a Telegram/webhook
)
def train(epoch):
    loss, acc = train_one_epoch(model, train_loader, optimizer, scheduler)
    running_loss[0] += loss
    best_acc[0]      = max(best_acc[0], acc)
    print(f"Epoch {epoch} | loss={loss:.4f} | acc={acc:.4f}")

train(range(50))
# 💥 crashes at epoch 12?  run the script again →
# 🔁 loopz: Resuming 'training' from 12/50 (24.0% done)
#    State     : ['model', 'optimizer', 'scheduler'] ✅
#    Loop vars : ['running_loss', 'best_acc'] ✅

What Gets Saved

Object	Supported
`torch.nn.Module`	✅
`torch.nn.DataParallel`	✅
`torch.nn.parallel.DistributedDataParallel`	✅
`torch.optim.Optimizer` (Adam, SGD, AdamW, …)	✅
`torch.optim.lr_scheduler.*`	✅
`torch.cuda.amp.GradScaler`	✅
`torch.Tensor`	✅
`numpy.ndarray`	✅
sklearn estimator	✅
Plain Python object (any picklable)	✅
Python / Numpy / PyTorch / CUDA random state	✅
Variables inside the loop (`running_loss`, `best_acc`, …)	✅

API Reference

`@loopz.track(...)`

@loopz.track(
    job_name   = "my_job",    # unique name — used for resume lookup
    save_every = 10,          # checkpoint every N iterations
    state      = {...},       # ML objects to save (optional)
    loop_vars  = {...},       # accumulators inside the loop (optional)
    notify     = callable,    # called on completion or crash (optional)
)
def process(item):
    ...

process(my_list)

`loopz.status()`

Print a summary of all incomplete (saved) jobs.

📋 loopz — 1 saved job(s):

  🔁 training
     Progress : 12/50 (24.0%)
     Saved at : 2026-03-22 14:30:00
     Crashed  : training crash at epoch 12

`loopz.reset("job_name")`

Delete all saved data for a job — it will start fresh next run.

`loopz.reset_all()`

Delete all saved data for every job.

How It Works

On every save_every-th iteration loopz atomically writes:
- your loop position (JSON)
- your ML object weights (.state)
- your loop variables (.vars)
- the full random seed state (Python + Numpy + PyTorch + CUDA)
On crash or KeyboardInterrupt, it saves one final checkpoint then re-raises the original exception so your stack trace is still visible.
On the next run, loopz detects the saved position, restores all state, and resumes the loop from exactly that index.
On clean completion, all saved files are deleted automatically.

Limitations (be honest)

Primitives as loop_vars — int, float, str cannot be mutated in-place in Python. Wrap them in a list: loss = [0.0] not loss = 0.0.
Distributed training (multi-node) — DDP on a single machine is supported; multi-node DDP across separate machines is not.
Custom C++ extensions — if your model uses custom CUDA ops with non-standard state, manual checkpointing is needed alongside loopz.
Non-picklable objects — if an object in state= cannot be pickled, loopz will print a warning and skip it.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
loopz		loopz
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.png		demo.png
gif.mp4		gif.mp4
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

loopz — Never lose loop progress again.

The Problem

Install

Quick Start

ML Training — Full State Save

What Gets Saved

API Reference

`@loopz.track(...)`

`loopz.status()`

`loopz.reset("job_name")`

`loopz.reset_all()`

How It Works

Limitations (be honest)

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

loopz — Never lose loop progress again.

The Problem

Install

Quick Start

ML Training — Full State Save

What Gets Saved

API Reference

@loopz.track(...)

loopz.status()

loopz.reset("job_name")

loopz.reset_all()

How It Works

Limitations (be honest)

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`@loopz.track(...)`

`loopz.status()`

`loopz.reset("job_name")`

`loopz.reset_all()`

Packages