# Weights & Biases

## Introduction to Weights & Biases (W&B)

**Weights & Biases (W&B)** is a popular experiment-tracking and model-management platform used in machine learning workflows. It helps you monitor training runs, visualize metrics in real time, compare models, store hyperparameters, and keep your experiments reproducible.

W&B integrates seamlessly with deep learning frameworks such as **PyTorch, TensorFlow, Keras, and scikit-learn**, and is widely used in both research and industry. Its goal is to make machine learning experiments easier to track, analyze, and share.

### What W&B is used for

* **Experiment tracking:** log loss, accuracy, learning rate, gradients, and other metrics during training.
* **Hyperparameter management:** store and compare runs with different hyperparameter settings.
* **Model comparison:** analyze multiple experiments side-by-side in interactive dashboards.
* **Artifact management:** version datasets, models, and trained weights.
* **Visualization:** plot metrics, confusion matrices, images, and embeddings.
* **Collaboration:** share experiment dashboards with teammates or publish results.


## The resulting interactive W&B dashboard will look like:
![](https://i.imgur.com/z8TK2Et.png)

### In pseudocode, what we'll do is:

```python
# import the library
import wandb

# start a new experiment
wandb.init(project="new-sota-model")

# capture a dictionary of hyperparameters with config
wandb.config = {"learning_rate": 0.001, "epochs": 100, "batch_size": 128}

# set up model and data
model, dataloader = get_model(), get_data()

# optional: track gradients
wandb.watch(model)

for batch in dataloader:
  metrics = model.training_step()
  # log metrics inside your training loop to visualize model performance
  wandb.log(metrics)

# optional: save model at the end
model.to_onnx()
wandb.save("model.onnx")
```

# Import, and Log In

In [1]:
import os
import random

import numpy as np
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
from tqdm.auto import tqdm

# Ensure deterministic behavior
torch.backends.cudnn.deterministic = True
random.seed(hash("setting random seeds") % 2**32 - 1)
np.random.seed(hash("improves reproducibility") % 2**32 - 1)
torch.manual_seed(hash("by removing stochasticity") % 2**32 - 1)
torch.cuda.manual_seed_all(hash("so runs are repeatable") % 2**32 - 1)

# Device configuration
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# remove slow mirror from list of MNIST mirrors
torchvision.datasets.MNIST.mirrors = [mirror for mirror in torchvision.datasets.MNIST.mirrors
                                      if not mirror.startswith("http://yann.lecun.com")]

### Import W&B and Login

In order to log data to our web service,
you'll need to log in.

If this is your first time using W&B,
you'll need to sign up for a free account at the link that appears.

In [None]:
# ! pip install wandb onnx -Uq

In [4]:
import wandb


In [5]:
wandb.login()

[34m[1mwandb[0m: Currently logged in as: [33mmaria-szlasa[0m ([33mmaria-szlasa-university-of-wroclaw[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


True

# Define the Experiment and Pipeline

## Track metadata and hyperparameters with `wandb.init`

When setting up an experiment in code, the first step is to clearly define it:
which hyperparameters are we using, and what metadata describes this particular run?

A common practice is to store all of this information inside a **`config` dictionary** (or a similar structure). This makes it easy to access and reuse parameters throughout the training process.

In our example, we only allow a few hyperparameters to change and manually specify the rest. However, **anything in your model — layers, optimizer settings, data options — can be included in the `config`**.

We also attach some basic metadata, such as noting that this run uses the MNIST dataset and a convolutional neural network. Later, if we experiment with fully connected models within the same project, this metadata helps us keep different runs organized and easy to distinguish.

In [6]:
config = dict(
    epochs=5,
    classes=10,
    kernels=[16, 32],
    batch_size=128,
    learning_rate=0.005,
    dataset="MNIST",
    architecture="CNN")

Now, let's define the overall pipeline,
which is pretty typical for model-training:

1. we first **make a model**, plus associated data and optimizer, then
2. we **train the model** accordingly and finally
3. **test** it to see how training went.

We'll implement these functions below.

In [7]:
def model_pipeline(hyperparameters):

    # tell wandb to get started
    with wandb.init(project="pytorch-demo", config=hyperparameters):
      # access all HPs through wandb.config, so logging matches execution!
      config = wandb.config

      # make the model, data, and optimization problem
      model, train_loader, test_loader, criterion, optimizer = make(config)
      print(model)

      # and use them to train the model
      train(model, train_loader, criterion, optimizer, config)

      # and test its final performance
      test(model, test_loader)

    return model

The only real change from a typical pipeline is that everything happens inside a `wandb.init` block.
Calling this function opens a communication channel between your script and the W&B servers.

When you pass a `config` dictionary into `wandb.init`, all of those settings are logged right away, so you always have a record of the hyperparameters your experiment used.

To make sure the parameters you log are exactly the ones your model relies on, it’s best to use the `wandb.config` version of your configuration object.
Take a look at the `make` function definition below for examples.

In [8]:
def make(config):
    # Make the data
    train, test = get_data(train=True), get_data(train=False)
    train_loader = make_loader(train, batch_size=config.batch_size)
    test_loader = make_loader(test, batch_size=config.batch_size)

    # Make the model
    model = ConvNet(config.kernels, config.classes).to(device)

    # Make the loss and optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(
        model.parameters(), lr=config.learning_rate)

    return model, train_loader, test_loader, criterion, optimizer

# Define the Data Loading and Model

Now we just have to define how the data will be loaded and how the model will be structured.

This step is crucial, but it’s exactly the same as it would be without using wandb, so we won’t spend extra time on it.

In [9]:
def get_data(slice=5, train=True):
    full_dataset = torchvision.datasets.MNIST(root=".",
                                              train=train,
                                              transform=transforms.ToTensor(),
                                              download=True)
    #  equiv to slicing with [::slice]
    sub_dataset = torch.utils.data.Subset(
      full_dataset, indices=range(0, len(full_dataset), slice))

    return sub_dataset


def make_loader(dataset, batch_size):
    loader = torch.utils.data.DataLoader(dataset=dataset,
                                         batch_size=batch_size,
                                         shuffle=True,
                                         pin_memory=True, num_workers=2)
    return loader

Defining the model is usually the enjoyable part!

But since `wandb` doesn’t alter anything here, we’ll just use a standard ConvNet architecture.

Feel free to play with the design and run your own experiments — all your results will be recorded on [wandb.ai](https://wandb.ai)!

In [10]:
# Conventional and convolutional neural network

class ConvNet(nn.Module):
    def __init__(self, kernels, classes=10):
        super(ConvNet, self).__init__()

        self.layer1 = nn.Sequential(
            nn.Conv2d(1, kernels[0], kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(16, kernels[1], kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.fc = nn.Linear(7 * 7 * kernels[-1], classes)

    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        return out

# Define Training Logic

Moving on in our `model_pipeline`, it's time to specify how we `train`.

Two `wandb` functions come into play here: `watch` and `log`.

### Track gradients with `wandb.watch` and everything else with `wandb.log`

`wandb.watch` will log the gradients and the parameters of your model,
every `log_freq` steps of training.

All you need to do is call it before you start training.

The rest of the training code remains the same:
we iterate over epochs and batches,
running forward and backward passes
and applying our `optimizer`.

In [None]:
def train(model, loader, criterion, optimizer, config):
    # Tell wandb to watch what the model gets up to: gradients, weights, and more!
    wandb.watch(model, criterion, log="all", log_freq=10)

    # Run training and track with wandb
    total_batches = len(loader) * config.epochs
    example_ct = 0  # number of examples seen
    batch_ct = 0
    for epoch in tqdm(range(config.epochs)):
        for _, (images, labels) in enumerate(loader):

            loss = train_batch(images, labels, model, optimizer, criterion)
            example_ct +=  len(images)
            batch_ct += 1

            # Report metrics every 25th batch
            if ((batch_ct + 1) % 25) == 0:
                train_log(loss, example_ct, epoch)


def train_batch(images, labels, model, optimizer, criterion):
    images, labels = images.to(device), labels.to(device)

    # Forward pass
    outputs = model(images)
    loss = criterion(outputs, labels)

    # Backward pass
    optimizer.zero_grad()
    loss.backward()

    # Step with optimizer
    optimizer.step()

    return loss

The only difference is in the logging code:
where previously you might have reported metrics by printing to the terminal,
now you pass the same information to `wandb.log`.

`wandb.log` expects a dictionary with strings as keys.
These strings identify the objects being logged, which make up the values.
You can also optionally log which `step` of training you're on.

In [12]:
def train_log(loss, example_ct, epoch):
    # Where the magic happens
    wandb.log({"epoch": epoch, "loss": loss}, step=example_ct)
    print(f"Loss after {str(example_ct).zfill(5)} examples: {loss:.3f}")

# Define Testing Logic

Once the model is done training, we want to test it:
run it against some fresh data from production, perhaps,
or apply it to some hand-curated "hard examples".



#### Call `wandb.save`

This is also a great time to save the model's architecture
and final parameters to disk.
For maximum compatibility, we'll `export` our model in the
[Open Neural Network eXchange (ONNX) format](https://onnx.ai/).

Passing that filename to `wandb.save` ensures that the model parameters
are saved to W&B's servers.

In [13]:
def test(model, test_loader):
    model.eval()

    # Run the model on some test examples
    with torch.no_grad():
        correct, total = 0, 0
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

        print(f"Accuracy of the model on the {total} " +
              f"test images: {correct / total:%}")

        wandb.log({"test_accuracy": correct / total})

    # Save the model in the exchangeable ONNX format
    torch.onnx.export(model, images, "model.onnx")
    wandb.save("model.onnx")

# Run training and watch your metrics live on wandb.ai

Now that we've defined the whole pipeline and slipped in
those few lines of W&B code,
we're ready to run our fully-tracked experiment.

We'll report a few links to you:
our documentation,
the Project page, which organizes all the runs in a project, and
the Run page, where this run's results will be stored.

Navigate to the Run page and check out these tabs:

1. **Charts**, where the model gradients, parameter values, and loss are logged throughout training
2. **System**, which contains a variety of system metrics, including Disk I/O utilization, CPU and GPU metrics (watch that temperature soar), and more
3. **Logs**, which has a copy of anything pushed to standard out during training
4. **Files**, where, once training is complete, you can click on the `model.onnx` to view our network with the [Netron model viewer](https://github.com/lutzroeder/netron).

Once the run in finished
(i.e. the `with wandb.init` block is exited),
we'll also print a summary of the results in the cell output.

In [16]:
# Build, train and analyze the model with the pipeline
model = model_pipeline(config)

ConvNet(
  (layer1): Sequential(
    (0): Conv2d(1, 16, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (layer2): Sequential(
    (0): Conv2d(16, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (fc): Linear(in_features=1568, out_features=10, bias=True)
)


  0%|          | 0/5 [00:00<?, ?it/s]

Loss after 03072 examples: 0.343
Loss after 06272 examples: 0.160
Loss after 09472 examples: 0.189
Loss after 12640 examples: 0.173
Loss after 15840 examples: 0.200
Loss after 19040 examples: 0.051
Loss after 22240 examples: 0.082
Loss after 25408 examples: 0.185
Loss after 28608 examples: 0.030
Loss after 31808 examples: 0.073
Loss after 35008 examples: 0.113
Loss after 38176 examples: 0.044
Loss after 41376 examples: 0.030
Loss after 44576 examples: 0.012
Loss after 47776 examples: 0.032
Loss after 50944 examples: 0.185
Loss after 54144 examples: 0.029
Loss after 57344 examples: 0.048
Accuracy of the model on the 2000 test images: 98.200000%
[torch.onnx] Obtain model graph for `ConvNet([...]` with `torch.export.export(..., strict=False)`...




[torch.onnx] Obtain model graph for `ConvNet([...]` with `torch.export.export(..., strict=False)`... ✅
[torch.onnx] Run decomposition...
[torch.onnx] Run decomposition... ✅
[torch.onnx] Translate the graph into ONNX...
[torch.onnx] Translate the graph into ONNX... ✅


0,1
epoch,▁▁▁▃▃▃▃▅▅▅▅▆▆▆▆███
loss,█▄▅▄▅▂▂▅▁▂▃▂▁▁▁▅▁▂
test_accuracy,▁

0,1
epoch,4.0
loss,0.04767
test_accuracy,0.982
