# ML Advisor: Detecting Inefficiencies in Machine Learning Code

This notebook demonstrates how to use the ML Advisor tool to detect and fix common inefficiencies in machine learning code.

In [None]:
# Load the ML Advisor extension
%load_ext neural_scope.advanced_analysis.ml_advisor.jupyter_extension

## Demonstration of Common ML Inefficiencies

Let's look at some common inefficiencies in ML code and how the advisor can detect and fix them.

In [None]:
%%ml_advisor

import torch
import torch.nn as nn
import numpy as np
import pandas as pd
from torch.utils.data import Dataset, DataLoader

# Define a simple model
class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(10, 1)
        
    def forward(self, x):
        return self.fc(x)

# Create synthetic data
data = torch.randn(100, 10)
targets = torch.randn(100, 1)

# Non-vectorized training loop with inefficiencies
def train_inefficient(model, data, targets, epochs=5):
    criterion = nn.MSELoss()
    optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
    
    for epoch in range(epochs):
        # Create DataLoader inside the loop (inefficient)
        dataset = torch.utils.data.TensorDataset(data, targets)
        train_loader = DataLoader(dataset, batch_size=32)
        
        # Process individual samples instead of batches (inefficient)
        for i in range(len(data)):
            # Non-vectorized operation
            output = model(data[i:i+1])
            loss = criterion(output, targets[i:i+1])
            
            # Call backward but forget to call optimizer.step()
            loss.backward()
            
            # Unnecessary .item() call for printing
            print(f"Epoch {epoch}, Sample {i}, Loss: {loss.item()}")
            
# Create model and train
model = SimpleModel()
train_inefficient(model, data, targets)

## Efficient Implementation

Here's how the same code should be written for better efficiency:

In [None]:
%%ml_advisor

# Efficient implementation
def train_efficient(model, data, targets, epochs=5):
    criterion = nn.MSELoss()
    optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
    
    # Create DataLoader outside the loop
    dataset = torch.utils.data.TensorDataset(data, targets)
    train_loader = DataLoader(dataset, batch_size=32)
    
    for epoch in range(epochs):
        epoch_loss = 0.0
        
        # Process batches of data
        for batch_data, batch_targets in train_loader:
            # Vectorized operation on batches
            outputs = model(batch_data)
            loss = criterion(outputs, batch_targets)
            
            # Proper optimization steps
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            # Accumulate loss without unnecessary .item() calls
            epoch_loss += loss
            
        # Print only once per epoch
        print(f"Epoch {epoch}, Loss: {epoch_loss/len(train_loader)}")
        
# Create model and train efficiently
model = SimpleModel()
train_efficient(model, data, targets)

## Conclusion

The ML Advisor extension can help identify common inefficiencies in machine learning code, including:

1. Non-vectorized operations in training loops
2. Recreation of DataLoaders inside training loops
3. Incomplete optimization steps (missing optimizer.step() after backward())
4. Unnecessary .item() calls

By avoiding these inefficiencies, you can improve the performance and correctness of your ML code.