Skip to content

CosmosRedshift7/ml-template

Repository files navigation

Pixi PyTorch Lightning ML Template

Python PyTorch Lightning Aim Pixi License: MIT

A lightweight, reproducible machine learning project template using Pixi, PyTorch Lightning, Aim experiment tracking, YAML configs, pytest smoke tests, and CPU/GPU environments.

This template is designed for research ML projects where you want a clean starting point with reproducible dependencies, structured training/evaluation scripts, local experiment tracking, and easy project reuse.

Reproducibility is handled through Pixi environments and the generated pixi.lock file. After dependencies are resolved once, the lock file records the exact package versions, so another machine can recreate the same environment instead of playing the traditional "works on my machine" academic sport. Pixi provides reproducible environments and one-command task execution, PyTorch provides the core deep learning framework, Lightning organizes training/evaluation code, and Aim tracks metrics, parameters, and figures locally through a web UI.

Documentation

Quick start

Clone the repository:

git clone https://github.com/CosmosRedshift7/ml-template.git
cd ml-template

Install the environment, train the example model, and start the Aim UI:

pixi install
pixi run train
pixi run aim-ui

Then open:

http://127.0.0.1:43800

Tip

In the Aim UI, open the ml-template experiment to view runs, metrics, hyperparameters, and tracked figures.

What you get

Feature Included
Reproducible environment Pixi + pixi.lock
Training framework PyTorch Lightning
Multi-GPU training Configurable through Lightning Trainer
Experiment tracking Local Aim tracking
Configuration YAML config in configs/default.yaml
Checkpointing Lightning ModelCheckpoint
Evaluation Separate evaluate.py entry point
Plot tracking Aim callback for plots
Tests Pytest smoke tests
Code quality Ruff formatting and linting
Local cleanup Pixi cleanup tasks

Why use this template?

Main benefits:

  • Reproducible environments with Pixi and pixi.lock.
  • Simple training loop using PyTorch Lightning.
  • Easy multi-GPU training through Lightning Trainer settings such as accelerator, devices, and strategy.
  • Local experiment tracking with Aim.
  • Config-driven experiments through configs/default.yaml.
  • Clean project structure separating data, model, loss, training, evaluation, callbacks, and utilities.
  • Local outputs kept out of git through the ignored local/ directory.
  • Ready-to-run example using a toy linear regression dataset.
  • Smoke tests included so you can quickly check that the template still works.
  • Useful Pixi tasks for training, evaluation, Aim UI, formatting, linting, testing, and cleanup.
  • Reusable callback pattern for logging figures during training and evaluation.

Structure

.
├── train.py
├── evaluate.py
├── callbacks.py
├── utils.py
├── pyproject.toml
├── pixi.lock
├── README.md
├── LICENSE
├── .gitignore
├── configs/
│   └── default.yaml
├── local/
├── model/
│   ├── __init__.py
│   ├── dataset.py
│   ├── loss.py
│   ├── model.py
│   └── pl_model.py
└── tests/
    └── test_smoke.py

Setup

Install Pixi first if you do not already have it.

Linux & macOS:

curl -fsSL https://pixi.sh/install.sh | sh

Windows:

Download installer

or install from PowerShell:

powershell -ExecutionPolicy ByPass -c "irm -useb https://pixi.sh/install.ps1 | iex"

Important

🔥 Restart your terminal or shell after installing Pixi.

This makes the pixi command available in your shell.

Then install the project environment:

pixi install

This creates a local Pixi environment using the dependencies specified in pyproject.toml and locked in pixi.lock.

Tip

Commit pixi.lock to make the environment reproducible across machines.

Note

The default environment uses CPU PyTorch. For CUDA-enabled training, see the GPU training section.

Managing Pixi environments

Activate the project environment in your terminal:

pixi shell

This lets you run commands such as python, pytest, or ruff directly inside the Pixi environment.

Tip

Use pixi shell when you want your terminal or editor to use the project environment interactively.

To rebuild the Pixi environment from the lock file:

rm -rf .pixi
pixi install

To fully resolve dependencies again and regenerate the lock file:

rm -rf .pixi pixi.lock
pixi install

Warning

Deleting .pixi/ removes the local environment. Deleting pixi.lock forces Pixi to resolve package versions again, which may produce a different environment.

Train

Run training with:

pixi run train

or manually:

pixi run python train.py --config configs/default.yaml

Training will:

  • load configuration from configs/default.yaml,
  • train a small fully connected model,
  • track metrics and hyperparameters with Aim,
  • save checkpoints under local/checkpoints/,
  • save predicted-vs-true fit plots under local/figures/,

Note

Training outputs are saved under local/, which is ignored by git.

GPU training

PyTorch Lightning makes it easy to use the same training script on CPU, single-GPU, or multi-GPU machines.

This template defines separate Pixi environments for CPU and GPU usage:

cpu      # CPU PyTorch environment
gpu      # CUDA-enabled PyTorch environment
default  # uses the CPU environment by default

The default environment uses CPU PyTorch, so normal training works with:

pixi run train

or explicitly:

pixi run -e cpu train

For CUDA-enabled PyTorch, install the GPU environment:

pixi install -e gpu

Check that PyTorch can see CUDA:

pixi run -e gpu python -c 'import torch; print(torch.cuda.is_available()); print(torch.version.cuda)'

Expected output should look similar to:

True
12.9

With the default trainer settings in configs/default.yaml,

trainer:
  max_epochs: 10
  accelerator: auto
  devices: auto

running the GPU environment will automatically use a GPU if one is available:

pixi run -e gpu train

For explicit GPU control, edit configs/default.yaml:

# single GPU
trainer:
  accelerator: gpu
  devices: 1
# two GPUs with distributed data parallel training
trainer:
  accelerator: gpu
  devices: 2
  strategy: ddp
# all available GPUs
trainer:
  accelerator: gpu
  devices: auto
  strategy: ddp

Important

GPU training requires NVIDIA GPUs, a compatible NVIDIA driver, and the CUDA-enabled Pixi environment. The CPU environment is kept as the default because it works on most machines.

Evaluate from a checkpoint

Evaluate the checkpoint specified in configs/default.yaml:

pixi run evaluate

By default, this evaluates:

local/checkpoints/best.ckpt

Important

Run pixi run train before pixi run evaluate, unless you already have a checkpoint at local/checkpoints/best.ckpt.

To evaluate a different checkpoint:

pixi run python evaluate.py --config configs/default.yaml --ckpt path/to/checkpoint.ckpt

Evaluation logs test metrics and tracks a predicted-vs-true fit plot in Aim.

Open Aim UI

Start the local Aim UI:

pixi run aim-ui

Then open:

http://127.0.0.1:43800

In the Aim UI, open the ml-template experiment. You should see runs with tracked parameters, metrics such as train/loss, val/loss, and test/loss, and generated image sequences such as predicted-vs-true fit plots.

Configuration

The main configuration file is:

configs/default.yaml

It controls:

  • random seed,
  • dataset sizes,
  • input dimension,
  • batch size,
  • model dimensions,
  • optimizer settings,
  • trainer settings,
  • Aim repository path,
  • checkpoint path,
  • evaluation checkpoint path.

Local outputs

Generated files are stored under local/, which is ignored by git.

Typical local outputs:

local/aim/
local/checkpoints/
local/figures/

This keeps the repository clean while allowing experiments, checkpoints, plots, and temporary files to stay available locally.

Cleaning local outputs

Clean only Aim runs and experiment metadata:

pixi run clean-runs

Clean only model checkpoints:

pixi run clean-checkpoints

Clean only generated figures:

pixi run clean-figures

Clean everything generated locally:

pixi run clean-all

The cleanup tasks remove these files/directories:

local/aim/
local/checkpoints/
local/figures/

Format, lint, and test

Use these checks before committing changes. fix applies automatic Ruff fixes where possible, format formats the code, lint checks for remaining style/import issues, and pytest runs the smoke tests.

pixi run fix
pixi run format
pixi run lint
pixi run pytest

Starting a new project from this template

For a new research project, use the GitHub Use this template button. This creates a fresh repository with the same files but without carrying over the template commit history.

Alternatively, create a fresh local copy manually:

git clone https://github.com/CosmosRedshift7/ml-template.git new-project-name
cd new-project-name
rm -rf .git
git init
git add -A
git commit -m "Initial commit from ml-template"

After creating the new repository, update the project-specific files. At minimum, update the project metadata in pyproject.toml:

[project]
name = "new-project-name"
description = "Short description of the new project"

and update the Aim experiment name in configs/default.yaml:

aim:
  experiment_name: new-project-name

Tip

Fork this repository only if you want your new repository to remain visibly connected to ml-template or if you plan to contribute changes back to the template.

Extending the template

Tip

Start by replacing the data module and model, then update configs/default.yaml to match your project.

Common next steps:

  • Replace LinearRegressionData in model/dataset.py with your own data module.
  • Replace FCNet in model/model.py with your own neural network model.
  • Modify mse_loss in model/loss.py or add new loss functions.
  • Add more configuration files under configs/.
  • Add project-specific metrics, plots, callbacks, or Aim-tracked figures.
  • Modify AimPlotCallback in callbacks.py for custom image logging.
  • Add real unit tests under tests/.

Notes

  • Keep raw data, generated data, Aim runs, checkpoints, and figures under local/.
  • Commit pixi.lock for reproducible environments.
  • The default example trains a tiny fully connected model on a synthetic linear regression dataset.
  • The template uses local Aim tracking by default.

License

This project is licensed under the MIT License.