# PyTorch on macOS Guide

This notebook provides a comprehensive guide for using PyTorch effectively on macOS, with a focus on:

1. Setting up PyTorch with hardware acceleration
2. Using Metal Performance Shaders (MPS) on Apple Silicon
3. Working with BERT models using PyTorch on Mac
4. Performance optimization techniques
5. Troubleshooting common issues

In [1]:
import torch
import platform

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

# Check for MPS (Metal Performance Shaders) on macOS with Apple Silicon
mps_available = hasattr(torch.backends, "mps") and torch.backends.mps.is_available()
print(f"MPS available: {mps_available}")
print(f"System: {platform.system()} {platform.machine()}")

# Determine the best available device
device = "cuda" if torch.cuda.is_available() else "mps" if mps_available else "cpu"
print(f"Using device: {device}")

if torch.cuda.is_available():
    print(f"CUDA Device: {torch.cuda.get_device_name(0)}")

PyTorch version: 2.8.0
CUDA available: False
MPS available: True
System: Darwin arm64
Using device: mps


# Using PyTorch on macOS

This notebook demonstrates how to effectively use PyTorch on macOS, including hardware acceleration with Metal Performance Shaders (MPS) on Apple Silicon devices (M1/M2/M3).

In [2]:
# Example: Using MPS for hardware acceleration on macOS

# Determine the best available device
device = "cuda" if torch.cuda.is_available() else "mps" if hasattr(torch.backends, "mps") and torch.backends.mps.is_available() else "cpu"
print(f"Using device: {device}")

# Create a sample tensor and move it to the device
x = torch.rand(5, 3)
x = x.to(device)
print(f"Tensor created on {device}:\n{x}")

# Simple matrix multiplication example
y = torch.rand(3, 2).to(device)
z = torch.matmul(x, y)
print(f"Matrix multiplication result:\n{z}")

# Compare performance between CPU and device (if MPS/CUDA is available)
import time

# Function to measure performance
def measure_performance(device_type, size=1000):
    start = time.time()

    # Create matrices on the specified device
    matrix1 = torch.rand(size, size, device=device_type)
    matrix2 = torch.rand(size, size, device=device_type)

    # Matrix multiplication
    result = torch.matmul(matrix1, matrix2)

    # Force computation to complete (important for timing)
    result.mean().item()

    end = time.time()
    return end - start

# Only compare if we have hardware acceleration
if device != "cpu":
    cpu_time = measure_performance("cpu", size=1000)
    device_time = measure_performance(device, size=1000)
    speedup = cpu_time / device_time

    print(f"\nPerformance comparison for 1000x1000 matrix multiplication:")
    print(f"CPU time: {cpu_time:.4f} seconds")
    print(f"{device.upper()} time: {device_time:.4f} seconds")
    print(f"Speedup: {speedup:.2f}x")

Using device: mps
Tensor created on mps:
tensor([[0.3890, 0.0666, 0.9399],
        [0.2088, 0.2788, 0.2158],
        [0.7727, 0.8135, 0.2127],
        [0.8628, 0.5630, 0.9082],
        [0.1651, 0.7615, 0.7113]], device='mps:0')
Matrix multiplication result:
tensor([[1.1655, 0.1997],
        [0.5208, 0.0950],
        [1.1879, 0.2541],
        [1.7323, 0.3345],
        [1.3454, 0.1998]], device='mps:0')

Performance comparison for 1000x1000 matrix multiplication:
CPU time: 0.0040 seconds
MPS time: 0.0109 seconds
Speedup: 0.36x


In [None]:
import os
import pickle
import subprocess
import re
import pandas as pd
import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader
from torch.amp import autocast, GradScaler
from torch.optim import AdamW
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification, get_scheduler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import confusion_matrix, classification_report, roc_curve, auc
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm
import pickle
import zipfile
import requests
import json

In [None]:
import mlflow
from mlflow.models.signature import infer_signature
from mlflow import MlflowClient
from mlflow.models.signature import ModelSignature
from mlflow.types import Schema, ColSpec, TensorSpec
from google.colab import userdata
import mlflow.pytorch

from dotenv import load_dotenv

# For scikit-learn
mlflow.sklearn.autolog()

# Configuring MLflow
load_dotenv()
tracking_uri = os.getenv("MLFLOW_TRACKING_URI")
mlflow.set_tracking_uri(tracking_uri)
print(f"MLflow Tracking URI: {tracking_uri}")

In [None]:
df_sample = pd.read_csv('../data/processed_sample.csv')
df_sample.info()

In [None]:
display(df_sample['target'].value_counts())

In [1]:
# Split data into training and test sets
train_val_texts, test_texts, train_val_labels, test_labels = train_test_split(
        df_sample['processed_text'].values,
        df_sample['target'].values,
        test_size=0.2,
        random_state=42
    )

train_texts, val_texts, train_labels, val_labels = train_test_split(
    train_val_texts,
    train_val_labels,
    test_size=0.2,
    random_state=42
)

data = {
        'train': {'texts': train_texts, 'labels': train_labels},
        'val': {'texts': val_texts, 'labels': val_labels},
        'test': {'texts': test_texts, 'labels': test_labels}
    }

NameError: name 'train_test_split' is not defined

In [None]:
# Tokenize text
def tokenize_texts(tokenizer, texts, max_length=128):
    return tokenizer(
        list(texts),
        padding='max_length',
        truncation=True,
        max_length=max_length,
        return_tensors='pt'
    )

In [None]:
# training
def train_model(model, train_loader, val_loader, num_epochs=3, gradient_accumulation_steps=4):
    """Trains BERT model and returns history.

    Args:
        model (_type_): _description_
        train_loader (_type_): _description_
        val_loader (_type_): _description_
        num_epochs (int, optional): _description_. Defaults to 3.
        gradient_accumulation_steps (int, optional): _description_. Defaults to 4.
    """
    scaler = GradScaler()
    optimizer = AdamW(model.parameters(), lr=2e-5, eps=1e-8)

    total_steps = len(train_loader) * num_epochs // gradient_accumulation_steps
    scheduler = get_scheduler(
        "linear",
        optimizer=optimizer,
        num_warmup_steps=0,
        num_training_steps=total_steps
    )

    history = {'train_loss': [], 'val_loss': [], 'train_accuracy': [], 'val_accuracy': []}

    for epoch in range(num_epochs):
        model.train()
        total_train_loss = 0
        total_train_correct = 0
        train_preds, train_true = [], []

        train_progress_bar = tqdm(train_loader, desc=f"Training Epoch {epoch+1}/{num_epochs}")

        optimizer.zero_grad()

    for batch_idx, batch in enumerate(train_progress_bar):
                input_ids = batch['input_ids'].to(device)
                attention_mask = batch['attention_mask'].to(device)
                labels = batch['labels'].to(device)

                with autocast():
                    outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
                    loss = outputs.loss / gradient_accumulation_steps
                    logits = outputs.logits

                scaler.scale(loss).backward()

                # preds = torch.argmax(logits, dim=1)
                # total_train_correct += (preds == labels).sum().item()
                # train_preds.extend(preds.cpu().numpy())
                # train_true.extend(labels.cpu().numpy())
                # total_train_loss += loss.item() * gradient_accumulation_steps

                # Update weights every 'gradient_accumulation_steps' batches
                if (batch_idx + 1) % gradient_accumulation_steps == 0:
                    scaler.unscale_(optimizer)
                    torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

                    scaler.step(optimizer)
                    scaler.update()
                    optimizer.zero_grad()
                    scheduler.step()

                train_progress_bar.set_postfix({'loss': loss.item() * gradient_accumulation_steps})

# Using PyTorch with BERT on macOS

When working with BERT and other transformer models on macOS, here are some best practices:

1. **Use hardware acceleration with MPS** on Apple Silicon Macs (M1/M2/M3)
2. **Manage batch sizes** carefully based on your available memory
3. **Use mixed precision** when possible to improve performance
4. **Monitor memory usage** as transformer models can be memory-intensive

In [None]:
# Example: Using BERT with MPS on macOS
import torch
from transformers import BertTokenizer, BertForSequenceClassification

device = "cuda" if torch.cuda.is_available() else "mps" if hasattr(torch.backends, "mps") and torch.backends.mps.is_available() else "cpu"
print(f"Using device: {device}")

# Load a pre-trained BERT model and tokenizer
try:
    # model_name = "finiteautomata/bertweet-base-sentiment-analysis"
    model_name = "bert-base-uncased"
    tokenizer = BertTokenizer.from_pretrained(model_name)
    model = BertForSequenceClassification.from_pretrained(model_name, num_labels=2)

    # Move model to device (MPS/CUDA/CPU)
    model = model.to(device)
    print(f"Model loaded successfully and moved to {device}")

    documents = df_sample['text']
    labels = df_sample['target']

    # Tokenize and prepare inputs
    def tokenize_data(documents):
        return tokenizer(
            documents.tolist(),
            max_length=128, padding=True, truncation=True, return_tensors='pt'
        )
    tokens = tokenize_data(documents)

    # Convert to numPy arrays for DataLoader
    input_ids_np = tokens['input_ids'].numpy()
    attention_masks_np = tokens['attention_mask'].numpy()
    labels_np = labels.to_numpy()

    train_input_ids, val_input_ids, train_labels, val_labels = train_test_split(input_ids_np, labels_np, test_size=0.2, random_state=42)
    train_attention_masks, val_attention_masks = train_test_split(attention_masks_np, test_size=0.2, random_state=42)

    # Move inputs to device
    inputs = {k: v.to(device) for k, v in inputs.items()}

    # Inference
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

    # Move predictions back to CPU for processing
    predictions = predictions.cpu().numpy()

    # Display results
    for text, pred in zip(example_texts, predictions):
        sentiment = "Positive" if pred.argmax() == 2 else "Neutral" if pred.argmax() == 1 else "Negative"
        print(f"Text: '{text}'")
        print(f"Sentiment: {sentiment} (Confidence: {pred.max():.4f})")
        print()

except Exception as e:
    print(f"Error loading or using model: {str(e)}")
    print("If you're having issues with the model, try installing additional dependencies:")
    print("!pip install transformers[torch] datasets")

# PyTorch on macOS: Troubleshooting and Optimization

## Common Issues and Solutions

1. **Installation Issues**
   - Use a proper package manager like Conda or uv (as you did)
   - Install PyTorch with: `pip install torch torchvision torchaudio`
   - For Apple Silicon: Make sure you have Python for arm64, not x86_64

2. **MPS-specific Issues**
   - MPS is only available on macOS 12.3+
   - Some operations aren't supported on MPS yet; fallback to CPU for these
   - If you see "MPS backend out of memory" errors, reduce batch size

3. **Performance Optimization**
   - Use mixed precision when possible (`torch.cuda.amp` equivalent functionality)
   - Adjust batch sizes based on available memory
   - Avoid unnecessary data transfers between CPU and MPS device
   - Use `torch.compile()` for PyTorch 2.0+ models

4. **Memory Management**
   - Monitor memory with Activity Monitor
   - Call `torch.cuda.empty_cache()` equivalent for MPS
   - Use smaller precision when possible (FP16 instead of FP32)

## Installing the Right PyTorch Version

```bash
# For pip
pip install torch torchvision torchaudio

# For Conda
conda install pytorch torchvision torchaudio -c pytorch
```

## PyTorch MPS References

- [PyTorch MPS Documentation](https://pytorch.org/docs/stable/notes/mps.html)
- [Apple Developer MPS Documentation](https://developer.apple.com/metal/pytorch/)
- [PyTorch Forums - MPS Discussions](https://discuss.pytorch.org/)