# 1. Weights & Biases (WandB)


### Hyperparameters

Use `config` to store run settings (they version well and show up in filters).

```python
import wandb

run = wandb.init(
    project="demo-proj",
    name="exp-resnet50-lr1e-3",
    config={
        "seed": 42,
        "model": "resnet50",
        "optimizer": "AdamW",
        "lr": 1e-3,
        "batch_size": 32,
        "epochs": 20,
        "weight_decay": 1e-2,
        "num_layers": 50,
        "dataset": "CIFAR10",
        "img_size": 224,
    },
)
cfg = wandb.config
```

---

### Metrics (changing over time)

Log scalars per **step** or **epoch**. Use hierarchical keys to keep dashboards tidy.

```python
global_step = 0
for epoch in range(cfg.epochs):
    # ... compute loss, acc ...
    train_loss, train_acc = 0.42, 0.91
    val_loss, val_acc = 0.38, 0.93

    wandb.log({
        "global_step": global_step,
        "train/loss": train_loss,
        "train/acc":  train_acc,
        "val/loss":   val_loss,
        "val/acc":    val_acc,
        "epoch":      epoch,
    }, step=global_step)

    global_step += 1
```

---

### Model Gradients (optional)

Let W\&B watch the model to capture gradients/weights histograms every N steps.

```python
# after you create your model:
# model = ...
wandb.watch(models=model, log="gradients", log_freq=100)  # or log="all"
```

If you want manual control, log specific grad histograms:

```python
for name, p in model.named_parameters():
    if p.grad is not None and name.endswith("weight"):
        wandb.log({f"grad/{name}": wandb.Histogram(p.grad.detach().cpu().numpy())}, step=global_step)
```

---

### Model Weights & Checkpoints (optional)

Save checkpoints locally **and** version them with **Artifacts**:

```python
import torch, os

ckpt_path = f"checkpoints/epoch{epoch:03d}_acc{val_acc:.3f}.pt"
os.makedirs("checkpoints", exist_ok=True)
torch.save({"epoch": epoch, "model": model.state_dict()}, ckpt_path)

artifact = wandb.Artifact(
    name=f"{wandb.run.project}-model",
    type="model",
    metadata={"epoch": epoch, "val_acc": val_acc, "model": cfg.model}
)
artifact.add_file(ckpt_path)
wandb.log_artifact(artifact)
```

Later, in another run, you can **restore** a specific version:

```python
model_art = wandb.use_artifact(f"{wandb.run.entity}/{wandb.run.project}-model:latest")
model_dir = model_art.download()
# load from `model_dir/...pt`
```

---

### Artifacts (datasets, predictions, eval results)

Version non-model files: raw or processed datasets, eval JSON, etc.

```python
# 1) Log a dataset folder (e.g., the exact split you trained on)
ds_art = wandb.Artifact("cifar10-split-v1", type="dataset", metadata={"split_seed": 42})
ds_art.add_dir("data/cifar10_split")    # reproducible split
wandb.log_artifact(ds_art)

# 2) Log evaluation results as a structured file
import json
eval_payload = {"epoch": epoch, "val_acc": val_acc, "per_class": {"cat":0.95, "dog":0.91}}
os.makedirs("eval", exist_ok=True)
with open("eval/metrics.json", "w") as f:
    json.dump(eval_payload, f, indent=2)

eval_art = wandb.Artifact("eval-epoch-%03d" % epoch, type="evaluation")
eval_art.add_file("eval/metrics.json")
wandb.log_artifact(eval_art)
```

---

# Custom Visualizations

### Images (e.g., predictions vs ground truth)

```python
import numpy as np

# imgs: (B,H,W,3) uint8 or file paths; preds, labels: lists
samples = []
for i in range(8):
    img = np.random.randint(0, 255, size=(224,224,3), dtype=np.uint8)
    pred, label = "dog", "cat"
    samples.append(wandb.Image(img, caption=f"true={label} pred={pred}"))

wandb.log({"val/examples": samples}, step=global_step)
```

### Confusion Matrix

```python
from sklearn.metrics import confusion_matrix
import numpy as np

y_true = np.array([0,1,2,1,0,2,2,1,0])
y_pred = np.array([0,2,2,1,0,2,1,1,0])
class_names = ["cat", "dog", "car"]

cm_plot = wandb.plot.confusion_matrix(
    probs=None,
    y_true=y_true,
    preds=y_pred,
    class_names=class_names
)
wandb.log({"val/confusion_matrix": cm_plot}, step=global_step)
```

### Tables (inspectable rows with media)

```python
table = wandb.Table(columns=["id", "y_true", "y_pred", "confidence", "image"])
for i in range(5):
    img = np.random.randint(0, 255, (128,128,3), dtype=np.uint8)
    table.add_data(f"img_{i}", "cat", "dog", 0.63, wandb.Image(img))
wandb.log({"val/table_samples": table}, step=global_step)
```

### Videos (mp4 or numpy tensor)

```python
# From file:
wandb.log({"demo/video": wandb.Video("samples/clip.mp4", fps=24, format="mp4")}, step=global_step)

# Or from a numpy tensor (T,H,W,C), uint8:
vid = np.random.randint(0,255,(60,128,128,3), dtype=np.uint8)
wandb.log({"demo/sim_rollout": wandb.Video(vid, fps=10, format="mp4")}, step=global_step)
```

---

## Putting it together (tiny end-to-end sketch)

```python
import wandb, torch, torch.nn as nn, torch.optim as optim

run = wandb.init(project="demo-proj", config={"lr":1e-3, "epochs":3, "batch_size":32})
cfg = run.config

model = nn.Sequential(nn.Flatten(), nn.Linear(224*224*3, 10))
opt = optim.AdamW(model.parameters(), lr=cfg.lr)
wandb.watch(model, log="gradients", log_freq=50)

global_step = 0
for epoch in range(cfg.epochs):
    # ... your dataloader here ...
    loss = torch.tensor(0.123)  # pretend
    acc  = 0.91

    wandb.log({"train/loss": loss.item(), "train/acc": acc, "epoch": epoch}, step=global_step)

    # save checkpoint + artifact
    ckpt = f"checkpoints/epoch{epoch:03d}.pt"
    torch.save({"model": model.state_dict(), "epoch": epoch}, ckpt)
    art = wandb.Artifact("demo-model", type="model", metadata={"epoch": epoch})
    art.add_file(ckpt)
    wandb.log_artifact(art)

    global_step += 1

run.finish()
```

---

If you want, I can refactor these into a **drop-in `logger.py`** for your PyTorch template (with CLI flags like `--log online|offline|disabled`, automatic artifact versioning for checkpoints, and helper methods for images/tables/CM).

  
---

##  **1.1 How is it Logged (Online Mode**)



```python
import wandb
import os
import math

# wandb.require("core")
wandb.login()
project = "simulated-experiment"
config = {
    "lr": 0.001,
    "model": "CNN",
    "weight": True
}
with wandb.init(project=project, config=config, name="") as run:
    epochs = 10
    for epoch in range(1, epochs):
        loss = 1/(epoch)
        acc = 1 - 2/(epoch*epoch)

        run.log({"acc": acc, "loss": loss})
```


#### **Log hyperparameters**
```python
import wandb

wandb.init( project="my-project", entity="behnamasadi", config={"learning_rate": 0.01, "epochs": 5,  "batch_size": 64})
```

- The `entity` parameter refers to the username or team name that owns the project. It is useful when you want to organize projects under different teams or users within your wandb workspace.
When you omit the `entity` parameter, wandb defaults to using your personal account (the one you're logged in with)

- The `config` parameter is used to track and version your hyperparameters, model configuration, and other experiment settings. It's a dictionary that gets stored with your run and can be used to:

1. Track experiment parameters (like learning rate, batch size, epochs)
2. Compare different configurations across runs
3. Version your experiments
4. Reproduce experiments later

In the **wandb** dashboard, if you click on `project>my-project`, then select your run under `files` you can see `config.yaml`


You can access these config values during your run using:
```python
wandb.config.learning_rate  # returns 0.01
wandb.config.epochs         # returns 5
wandb.config.batch_size     # returns 64
```

--- 



#### **Log metrics**


```python
    for epoch in range(10):

    train_loss = random.random()
    val_loss = random.random()
    wandb.log({
        "train_loss": train_loss,
        "val_loss": val_loss,
        "epoch": epoch
    })
```
---


#### **Logging Model Gradients, Weights, and Checkpoints**


```python
import torch
import torch.nn as nn
import wandb

# Initialize wandb
wandb.init(project="simple-example", name="gradients-weights-checkpoints")

# Simple model
model = nn.Sequential(
    nn.Linear(2, 2),
    nn.ReLU(),
    nn.Linear(2, 1)
)

optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
loss_fn = nn.MSELoss()

# Dummy data
x = torch.randn(10, 2)
y = torch.randn(10, 1)

# Training
for epoch in range(5):
    optimizer.zero_grad()
    preds = model(x)
    loss = loss_fn(preds, y)
    loss.backward()

    optimizer.step()

    #  Log gradients and weights
    wandb.log({"loss": loss})

    for name, param in model.named_parameters():
        wandb.log({
            f"gradients/{name}": wandb.Histogram(param.grad.detach().cpu().numpy()),
            f"weights/{name}": wandb.Histogram(param.detach().cpu().numpy())
        })

#  Save and log model checkpoint
torch.save(model.state_dict(), "model.pth")
wandb.save("model.pth")
```


---



#### **Logging an Artifact**


```python
# Create an artifact
artifact = wandb.Artifact('model', type='model')
artifact.add_file('model.pth')
wandb.log_artifact(artifact)
```


#### **Logging Custom Visualizations**

```python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
import seaborn as sns

# Dummy labels
y_true = np.random.randint(0, 3, size=(100,))
y_pred = np.random.randint(0, 3, size=(100,))

#  Confusion Matrix
cm = confusion_matrix(y_true, y_pred)

fig, ax = plt.subplots(figsize=(5,5))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", ax=ax)
wandb.log({"confusion_matrix": wandb.Image(fig)})

# Log some sample images
for i in range(5):
    random_image = np.random.randint(0, 255, (64, 64, 3), dtype=np.uint8)
    wandb.log({f"sample_image_{i}": [wandb.Image(random_image, caption=f"Random {i}")]})

```



##  **1.2 Offline Mode (local logging, sync later)**
You can **run Weights & Biases (WandB) locally** without sending data to the WandB cloud server ( **"offline mode"** or **"local mode"**).

Logs data to your local machine first, and you can choose to sync to the server later.

```bash
export WANDB_MODE=offline
```

Or in Python:

```python
import wandb

wandb.init(mode="offline", project="my-project", config={
    "learning_rate": 0.01,
    "epochs": 5,
    "batch_size": 64
})
```

This creates a local folder `wandb/` with logs.

1. **Default Location**: By default, wandb creates a `wandb` directory in your current working directory. This is where it stores all the run data, including logs, configuration, and model checkpoints.

2. **Environment Variables**: You can override the default location by setting the `WANDB_DIR` environment variable. This allows you to specify a custom directory for wandb to store its data.

3. **Configuration in `wandb.init()`**: When you initialize wandb with `wandb.init()`, you can specify the `dir` parameter to set a custom directory for the current run. This is useful if you want to change the location for a specific run without affecting others.

4. **Run ID**: Each run is assigned a unique ID, which is used to create a subdirectory within the `wandb` directory. This subdirectory contains all the data related to that specific run, including logs.



### Browse Data

Now you can visualize your logged data using WandB's built-in UI:

#### Option A: Sync and View in Browser (but still local)
1. Run this to convert offline logs into viewable runs:


```bash
wandb server start
```

```bash
wandb server stop
```

```bash
wandb status
```


```bash
wandb sync wandb/offline-run-*
```

2. Then open the local dashboard:
```bash
wandb local
```

This launches a local server at:
```
http://localhost:8080
```



Later, if you want to sync to the cloud:

```bash
wandb sync wandb/offline-run-*
```
---



# **2. WandB Configuration**

## 2.1. Settings

```bash
wandb status
```

give status about  your settings:

```bash
Current Settings
{
  "_extra_http_headers": null,
  "_proxies": null,
  "api_key": null,
  "base_url": "https://api.wandb.ai",
  "entity": null,
  "git_remote": "origin",
  "ignore_globs": [],
  "organization": null,
  "project": null,
  "root_dir": null,
  "section": "default"
}
```

---


WandB uses a file called `.wandb/settings` to store configuration.  
It can be in two places:
- **Local project**: inside your current folder, like `./wandb/settings`
- **Global user**: inside your home directory `~/.config/wandb/settings`

which might be like:

```
[default]
base_url = https://api.wandb.ai
```


---

## 2.2 Password and API key

The file is in `~/.netrc` (permissions must be 600), so fix it by `chmod 600 ~/.netrc`


```
machine api.wandb.ai
  login behnamasadi
  password <API-KEY>
```

Now run:

```bash
wandb login --relogin
```

Paste your **API Key** from [https://wandb.ai/settings](https://wandb.ai/settings).

This creates a fresh clean setup that points **only to the cloud**.

The API key should go to:


```
~/.netrc
```


---


## **2.3. Remove any Docker containers/images/servers related to WandB**

#### 1. **Check all running Docker containers**

First, list any running containers:

```bash
docker ps
```

If you see containers like `wandb-local`, `wandb-server`, `wandb-postgres`, etc —  
 they are still running.

 Stop all WandB-related containers:

```bash
docker stop $(docker ps -q --filter "ancestor=wandb/local")
```
or more generally:

```bash
docker ps | grep wandb
docker stop <container_id>
```

---

#### 2. **Remove WandB containers**

List **all containers** (including stopped ones):

```bash
docker ps -a
```

If you see wandb-related ones (names like `wandb-local`, `wandb-server`),  
then remove them:

```bash
docker rm <container_id>
```
or if you want to **force remove all stopped containers**:

```bash
docker container prune
```
( Caution: this removes **all** stopped containers.)

---

#### 3. **Remove WandB Docker images**

Now remove WandB docker images to free space.

List docker images:

```bash
docker images
```

Look for images named like:
- `wandb/local`
- `wandb/server`
- or anything with `wandb`

Then remove them:

```bash
docker rmi <image_id>
```

Or force remove **all** unused images:

```bash
docker image prune -a
```
( Careful: this removes all images you aren't actively using.)

---

#### 4. **(Optional) Remove Docker volumes**

Sometimes WandB also creates **Docker volumes** (for database, storage).

List volumes:

```bash
docker volume ls
```

If you see wandb-related volumes (names like `wandb-db`, `wandb-storage`), remove them:

```bash
docker volume rm <volume_name>
```

Or prune all unused volumes:

```bash
docker volume prune
```

---

#### 5. **(Optional) Remove WandB server install files**

If you previously downloaded a WandB `docker-compose.yml` or setup folder for self-hosted server,  
manually delete it:

```bash
rm -rf /path/to/your/wandb-server-folder
```


---
