**<h1 align="center">Weights & Biases (WandB)</h1>**

<p align="center"><i>Complete Guide to ML Experiment Tracking & Visualization</i></p>

<p align="center">
  <img src="https://d15shllkswkct0.cloudfront.net/wp-content/blogs.dir/1/files/2021/10/1.png" alt="WandB">
</p>

## What is Weights & Biases (WandB)?

**Weights & Biases** is a machine learning platform that provides tools for experiment tracking, dataset versioning, and model management. It helps data scientists and ML engineers track, compare, and reproduce their machine learning experiments.

### Key Problems WandB Solves:

- **Experiment Tracking**: Keep track of hyperparameters, metrics, and model performance across multiple runs
- **Visualization**: Create interactive plots and dashboards to understand model behavior
- **Collaboration**: Share results and insights with team members
- **Reproducibility**: Ensure experiments can be reproduced and compared
- **Model Management**: Version control for models and datasets
- **Hyperparameter Optimization**: Systematic search for optimal hyperparameters

## Core Features

### 1. **Experiment Tracking**
- Log hyperparameters, metrics, and model outputs
- Track system metrics (GPU utilization, memory usage)
- Monitor training progress in real-time

### 2. **Visualization**
- Interactive plots and charts
- Custom dashboards
- Image, audio, and text logging
- Model performance comparison

### 3. **Hyperparameter Optimization**
- Sweeps for systematic hyperparameter search
- Bayesian optimization
- Early stopping strategies

### 4. **Artifacts**
- Dataset versioning
- Model versioning
- File storage and lineage tracking

### 5. **Reports**
- Share findings with interactive reports
- Combine visualizations with markdown
- Collaborative documentation

## Installation & Setup

In [None]:
# Install WandB
!pip install wandb

# For specific integrations
!pip install wandb[media]  # For media logging (images, audio, etc.)

In [None]:
# Login to WandB (requires account at wandb.ai)
import wandb

# Login using API key
wandb.login()

# Or login with API key directly
# wandb.login(key="your_api_key_here")

## Basic Usage

In [None]:
import wandb
import numpy as np
import random

# Initialize a new run
wandb.init(
    project="my-awesome-project",  # Project name
    name="experiment-1",           # Run name (optional)
    tags=["baseline", "v1"],       # Tags for organization
    config={                       # Hyperparameters
        "learning_rate": 0.001,
        "batch_size": 32,
        "epochs": 100,
        "model_type": "CNN"
    }
)

# Access config
config = wandb.config
print(f"Learning rate: {config.learning_rate}")

In [None]:
# Simulate training loop
for epoch in range(config.epochs):
    # Simulate training metrics
    loss = random.uniform(0.1, 2.0) * np.exp(-epoch * 0.1)
    accuracy = min(0.95, random.uniform(0.5, 1.0) * (1 - np.exp(-epoch * 0.1)))
    
    # Log metrics
    wandb.log({
        "epoch": epoch,
        "loss": loss,
        "accuracy": accuracy,
        "learning_rate": config.learning_rate * (0.95 ** epoch)  # Decay
    })
    
    if epoch % 10 == 0:
        print(f"Epoch {epoch}: Loss={loss:.4f}, Accuracy={accuracy:.4f}")

# Finish the run
wandb.finish()

## Advanced Logging

In [None]:
import matplotlib.pyplot as plt
import pandas as pd

# Initialize run
run = wandb.init(project="advanced-logging")

# Log images
fig, ax = plt.subplots()
x = np.linspace(0, 10, 100)
y = np.sin(x)
ax.plot(x, y)
ax.set_title("Sample Plot")

wandb.log({"sample_plot": wandb.Image(fig)})
plt.close()

# Log tables
data = [[i, i**2, i**3] for i in range(10)]
table = wandb.Table(data=data, columns=["x", "x^2", "x^3"])
wandb.log({"sample_table": table})

# Log histograms
wandb.log({"histogram": wandb.Histogram(np.random.randn(1000))})

wandb.finish()

## Integration with Popular Frameworks

In [None]:
# PyTorch Integration
import torch
import torch.nn as nn
import torch.optim as optim

# Initialize WandB
wandb.init(project="pytorch-integration")

# Define model
class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(10, 1)
    
    def forward(self, x):
        return self.linear(x)

model = SimpleModel()
optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.MSELoss()

# Watch the model (logs gradients and parameters)
wandb.watch(model, log="all")

# Training loop
for epoch in range(10):
    # Dummy data
    x = torch.randn(32, 10)
    y = torch.randn(32, 1)
    
    # Forward pass
    outputs = model(x)
    loss = criterion(outputs, y)
    
    # Backward pass
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    # Log metrics
    wandb.log({"loss": loss.item()})

wandb.finish()

In [None]:
# TensorFlow/Keras Integration
import tensorflow as tf
from wandb.keras import WandbCallback

# Initialize WandB
wandb.init(project="tensorflow-integration")

# Build model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(10,)),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

# Dummy data
X_train = np.random.randn(1000, 10)
y_train = np.random.randint(0, 2, (1000, 1))

# Train with WandB callback
model.fit(
    X_train, y_train,
    epochs=10,
    batch_size=32,
    validation_split=0.2,
    callbacks=[WandbCallback()]  # Automatically logs metrics
)

wandb.finish()

In [None]:
# Scikit-learn Integration
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# Initialize WandB
wandb.init(project="sklearn-integration")

# Generate dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

# Make predictions
y_pred = rf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

# Log results
wandb.log({
    "accuracy": accuracy,
    "n_estimators": 100,
    "feature_importance": wandb.Histogram(rf.feature_importances_)
})

# Log classification report as table
report = classification_report(y_test, y_pred, output_dict=True)
wandb.log({"classification_report": report})

wandb.finish()

## Hyperparameter Sweeps

In [None]:
# Define sweep configuration
sweep_config = {
    'method': 'bayes',  # or 'grid', 'random'
    'metric': {
        'name': 'val_accuracy',
        'goal': 'maximize'
    },
    'parameters': {
        'learning_rate': {
            'distribution': 'log_uniform_values',
            'min': 1e-5,
            'max': 1e-1
        },
        'batch_size': {
            'values': [16, 32, 64, 128]
        },
        'optimizer': {
            'values': ['adam', 'sgd', 'rmsprop']
        },
        'hidden_units': {
            'distribution': 'int_uniform',
            'min': 32,
            'max': 512
        }
    }
}

# Initialize sweep
sweep_id = wandb.sweep(sweep_config, project="hyperparameter-sweep")
print(f"Sweep ID: {sweep_id}")

In [None]:
# Training function for sweep
def train():
    # Initialize run
    run = wandb.init()
    config = wandb.config
    
    # Simulate training with hyperparameters
    for epoch in range(20):
        # Simulate metrics based on hyperparameters
        loss = random.uniform(0.1, 1.0) * np.exp(-epoch * config.learning_rate)
        val_accuracy = min(0.95, random.uniform(0.7, 0.95) * (1 - np.exp(-epoch * 0.1)))
        
        # Add some hyperparameter-dependent behavior
        if config.optimizer == 'adam':
            val_accuracy *= 1.05  # Adam tends to work better
        elif config.optimizer == 'sgd':
            val_accuracy *= 0.95
        
        wandb.log({
            'epoch': epoch,
            'loss': loss,
            'val_accuracy': val_accuracy
        })
    
    run.finish()

# Run sweep (in practice, run this in separate processes/machines)
# wandb.agent(sweep_id, train, count=5)  # Run 5 experiments

## Artifacts: Dataset & Model Versioning

In [None]:
# Creating and logging datasets as artifacts
run = wandb.init(project="artifacts-demo")

# Create a sample dataset
data = pd.DataFrame({
    'feature1': np.random.randn(1000),
    'feature2': np.random.randn(1000),
    'target': np.random.randint(0, 2, 1000)
})

# Save dataset locally
data.to_csv('dataset.csv', index=False)

# Create artifact
dataset_artifact = wandb.Artifact(
    name="my-dataset",
    type="dataset",
    description="Sample dataset for ML experiments",
    metadata={"rows": len(data), "columns": len(data.columns)}
)

# Add file to artifact
dataset_artifact.add_file('dataset.csv')

# Log artifact
run.log_artifact(dataset_artifact)

wandb.finish()

In [None]:
# Using artifacts in another run
run = wandb.init(project="artifacts-demo")

# Download and use artifact
dataset_artifact = run.use_artifact('my-dataset:latest')
dataset_dir = dataset_artifact.download()

# Load the data
data = pd.read_csv(f'{dataset_dir}/dataset.csv')
print(f"Loaded dataset with {len(data)} rows")

# Train a model and save it as artifact
from sklearn.ensemble import RandomForestClassifier
import joblib

X = data[['feature1', 'feature2']]
y = data['target']

model = RandomForestClassifier()
model.fit(X, y)

# Save model
joblib.dump(model, 'model.pkl')

# Create model artifact
model_artifact = wandb.Artifact(
    name="my-model",
    type="model",
    description="Trained RandomForest model"
)
model_artifact.add_file('model.pkl')
run.log_artifact(model_artifact)

wandb.finish()

## Best Practices

### 1. **Project Organization**

```python
# Use descriptive project names
wandb.init(project="image-classification-resnet")

# Use meaningful run names
wandb.init(
    project="my-project",
    name=f"resnet50-lr{lr}-bs{batch_size}-{timestamp}"
)

# Use tags for filtering
wandb.init(
    project="my-project",
    tags=["baseline", "resnet", "augmented"]
)
```

### 2. **Configuration Management**

```python
# Define all hyperparameters in config
config = {
    "learning_rate": 0.001,
    "batch_size": 32,
    "epochs": 100,
    "model_architecture": "resnet50",
    "optimizer": "adam",
    "data_augmentation": True
}

wandb.init(project="my-project", config=config)
```

### 3. **Consistent Logging**

```python
# Log at consistent intervals
if step % log_interval == 0:
    wandb.log({
        "step": step,
        "train_loss": train_loss,
        "train_accuracy": train_acc,
        "val_loss": val_loss,
        "val_accuracy": val_acc
    })

# Use meaningful metric names
wandb.log({
    "train/loss": train_loss,
    "train/accuracy": train_acc,
    "val/loss": val_loss,
    "val/accuracy": val_acc
})
```

### 4. **Error Handling & Resource Management**

In [None]:
# Proper error handling
try:
    run = wandb.init(project="my-project")
    
    # Training code here
    for epoch in range(100):
        # Training logic
        loss = train_epoch()
        wandb.log({"loss": loss})
        
except KeyboardInterrupt:
    print("Training interrupted")
except Exception as e:
    print(f"Error occurred: {e}")
    wandb.log({"error": str(e)})
finally:
    # Always finish the run
    wandb.finish()

# Or use context manager
with wandb.init(project="my-project") as run:
    # Training code here
    for epoch in range(100):
        loss = train_epoch()
        wandb.log({"loss": loss})
    # wandb.finish() is called automatically

### 5. **Performance Optimization**

```python
# Batch logging for efficiency
metrics_buffer = []
for step in range(1000):
    # Training step
    loss = compute_loss()
    
    metrics_buffer.append({"step": step, "loss": loss})
    
    # Log every N steps
    if step % 10 == 0:
        for metrics in metrics_buffer:
            wandb.log(metrics)
        metrics_buffer = []

# Control logging frequency
wandb.init(
    project="my-project",
    settings=wandb.Settings(
        _stats_sample_rate_seconds=60,  # System stats every 60s
        _stats_samples_to_average=10    # Average over 10 samples
    )
)
```

## Advanced Features

In [None]:
# Custom metrics and plots
run = wandb.init(project="advanced-features")

# ROC Curve
from sklearn.metrics import roc_curve, auc
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

# Generate data and train model
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

model = RandomForestClassifier()
model.fit(X_train, y_train)
y_proba = model.predict_proba(X_test)[:, 1]

# Log ROC curve
wandb.log({"roc": wandb.plot.roc_curve(y_test, y_proba)})

# Log confusion matrix
y_pred = model.predict(X_test)
wandb.log({"confusion_matrix": wandb.plot.confusion_matrix(
    y_true=y_test, 
    preds=y_pred,
    class_names=["Class 0", "Class 1"]
)})

wandb.finish()

In [None]:
# Model monitoring and alerts
run = wandb.init(project="model-monitoring")

# Define alerts
wandb.alert(
    title="High Loss Alert",
    text="Training loss exceeded threshold",
    level=wandb.AlertLevel.WARN
)

# Custom summary metrics
run.summary["best_accuracy"] = 0.95
run.summary["total_training_time"] = "2h 30m"
run.summary["final_model_size"] = "45MB"

wandb.finish()

## Team Collaboration

### 1. **Sharing Results**

```python
# Make runs public
wandb.init(
    project="my-project",
    settings=wandb.Settings(anonymous="allow")
)

# Add notes to runs
wandb.init(project="my-project", notes="Baseline experiment with ResNet50")

# Create reports programmatically
report = wandb.Api().create_report(
    project="my-project",
    title="Weekly Model Performance",
    description="Summary of this week's experiments"
)
```

### 2. **API Usage**

```python
# Query runs programmatically
api = wandb.Api()

# Get all runs from a project
runs = api.runs("username/project-name")

# Filter runs
runs = api.runs(
    "username/project-name",
    filters={"config.learning_rate": 0.001}
)

# Export data to DataFrame
summary_list = []
config_list = []
for run in runs:
    summary_list.append(run.summary._json_dict)
    config_list.append(run.config)

runs_df = pd.DataFrame({
    "summary": summary_list,
    "config": config_list
})
```

## Common Use Cases

### 1. **Deep Learning Experiments**
- Track training loss, validation accuracy, learning curves
- Log model architecture and hyperparameters
- Monitor GPU utilization and training time
- Version control datasets and trained models

### 2. **Hyperparameter Optimization**
- Systematic search across parameter spaces
- Early stopping based on validation metrics
- Parallel execution across multiple machines
- Visualization of parameter importance

### 3. **Model Comparison**
- Compare different architectures side-by-side
- A/B testing of model variants
- Performance across different datasets
- Reproducibility and result sharing

### 4. **Production Monitoring**
- Model performance drift detection
- Data quality monitoring
- Real-time inference metrics
- Alert systems for anomalies

## WandB vs Alternatives

| Feature | WandB | MLflow | TensorBoard | Neptune |
|---------|-------|--------|-------------|----------|
| **Ease of Use** | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ |
| **Visualization** | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| **Collaboration** | ⭐⭐⭐⭐⭐ | ⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐ |
| **Cloud/Hosted** | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐⭐ |
| **Self-Hosted** | ⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐ |
| **Hyperparameter Optimization** | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐ | ⭐⭐⭐⭐ |
| **Model Registry** | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐ | ⭐⭐⭐⭐ |
| **Free Tier** | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |

## Troubleshooting & Tips

### Common Issues:

1. **Authentication Problems**
   ```bash
   # Re-login
   wandb login --relogin
   
   # Check login status
   wandb status
   ```

2. **Slow Logging**
   ```python
   # Reduce logging frequency
   if step % 100 == 0:  # Log every 100 steps instead of every step
       wandb.log(metrics)
   ```

3. **Large File Uploads**
   ```python
   # For large artifacts, use references instead
   artifact.add_reference("s3://bucket/large-file.pkl")
   ```

4. **Offline Mode**
   ```python
   # Work offline and sync later
   wandb.init(mode="offline")
   
   # Sync offline runs
   # wandb sync path/to/offline/run
   ```

## Conclusion

WandB is a powerful platform that significantly enhances the machine learning workflow by providing:

- **Comprehensive Experiment Tracking**: Keep track of all experiments with detailed logging
- **Beautiful Visualizations**: Interactive charts and dashboards for better insights
- **Easy Collaboration**: Share results and collaborate with team members
- **Reproducibility**: Ensure experiments can be reproduced and compared
- **Integration**: Works seamlessly with popular ML frameworks
- **Hyperparameter Optimization**: Systematic search for optimal parameters
- **Model Management**: Version control for models and datasets

### Getting Started:
1. Create an account at [wandb.ai](https://wandb.ai)
2. Install wandb: `pip install wandb`
3. Login: `wandb login`
4. Start logging your experiments!

### Resources:
- [Official Documentation](https://docs.wandb.ai/)
- [Examples Gallery](https://wandb.ai/gallery)
- [Community Forum](https://community.wandb.ai/)
- [YouTube Tutorials](https://www.youtube.com/c/WeightsBiases)