<div class="markdown-google-sans">
  <h2>Machine Learning Hardware Course​</h2>
</div>

<div class="markdown-google-sans">
  <h2>Lab 4b: HW benchmarking</h2>
</div>


Run the below code twice. One time using a CPU and the other one use a GPU. Before re-running the code for the different HW, record your results with this notebook TO AVOID LOSING YOUR PROGRESS!

In [1]:
from transformers import ViTFeatureExtractor, ViTForImageClassification
import torch
import torchvision.transforms as T
from PIL import Image
import requests
from torchvision import datasets, models
from tqdm import tqdm
import time

# Load an image from COCO dataset
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

# Define preprocessing for AlexNet
transform = T.Compose([
    T.Resize((224, 224)),  # Resize to 224x224 as expected by AlexNet
    T.ToTensor(),
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),  # Normalization for ImageNet
])

# Apply preprocessing
image = transform(image).unsqueeze(0)  # Add batch dimension

In [2]:
# Move to GPU/MPS if available
device, dev_name = (torch.device("mps"), "mps") if torch.backends.mps.is_available() else \
         (torch.device("cuda"), "cuda") if torch.cuda.is_available() else (torch.device("cpu"), "cpu")

In [3]:
def profile_workload(model, device, dev_name, image, iterations=100):
    try:
        model_name = type(model).__name__
        print(f"profiling {model_name} on {dev_name}...")
    except:
        print(f"profiling on {dev_name}...")
    model.to(device)
    image = image.to(device)

    # Run inference
    if dev_name=="cpu":
        start_time = time.time()
        for _ in tqdm(range(iterations), desc ='profiling latency is in progress...'):
            with torch.no_grad():
              output = model(image)
        elapsed_time = time.time()-start_time
        latency = elapsed_time/iterations*1000
    elif dev_name=="cuda":
        torch.cuda.synchronize()  # Ensure any pending tasks are done
        start = torch.cuda.Event(enable_timing=True)
        end = torch.cuda.Event(enable_timing=True)
        start.record()
        for _ in tqdm(range(iterations), desc ='profiling latency is in progress...'):
            with torch.no_grad():
              output = model(image)
        end.record()
        torch.cuda.synchronize()  # Wait for all kernels to finish
        latency = start.elapsed_time(end)/iterations
    elif dev_name=="mps":
        torch.mps.synchronize()  # Ensure all pending tasks are complete before starting
        start_time = time.time()
        for _ in tqdm(range(iterations), desc ='profiling latency is in progress...'):
            with torch.no_grad():
              output = model(image)
        elapsed_time = time.time()-start_time
        torch.mps.synchronize()  # Ensure all pending tasks are complete before starting
        latency = elapsed_time/iterations*1000
    # Get predicted class
    predicted_class = output.argmax(dim=1).item()
    # print(f"Predicted Class: {predicted_class}")
    return latency

In [4]:
def profile_workload_on_ViT(device, dev_name, iterations=100):
    # Load an image from COCO dataset
    url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
    image = Image.open(requests.get(url, stream=True).raw)

    # Use ViTImageProcessor instead of the deprecated ViTFeatureExtractor
    processor = ViTImageProcessor.from_pretrained('google/vit-large-patch16-224')
    ViT_large = ViTForImageClassification.from_pretrained('google/vit-large-patch16-224')
    # Apply feature extractor directly on the raw image
    inputs = processor(images=image, return_tensors="pt")

    ViT_large.to(device)
    inputs = inputs.to(device)

    # Run inference
    if dev_name=="cpu":
        start_time = time.time()
        for _ in tqdm(range(iterations), desc ='profiling latency is in progress...'):
            with torch.no_grad():
              outputs = ViT_large(**inputs)
        elapsed_time = time.time()-start_time
        latency = elapsed_time/iterations*1000
    elif dev_name=="cuda":
        torch.cuda.synchronize()  # Ensure any pending tasks are done
        start = torch.cuda.Event(enable_timing=True)
        end = torch.cuda.Event(enable_timing=True)
        start.record()
        for _ in tqdm(range(iterations), desc ='profiling latency is in progress...'):
            with torch.no_grad():
              outputs = ViT_large(**inputs)
        end.record()
        torch.cuda.synchronize()  # Wait for all kernels to finish
        latency = start.elapsed_time(end)/iterations
    elif dev_name=="mps":
        torch.mps.synchronize()  # Ensure all pending tasks are complete before starting
        start_time = time.time()
        for _ in tqdm(range(iterations), desc ='profiling latency is in progress...'):
            with torch.no_grad():
              outputs = ViT_large(**inputs)
        elapsed_time = time.time()-start_time
        torch.mps.synchronize()  # Ensure all pending tasks are complete before starting
        latency = elapsed_time/iterations*1000
    # Get predicted class
    return latency

In [5]:
alexnet = torch.hub.load('pytorch/vision:v0.10.0', 'alexnet', pretrained=True)
alexnet_inference_latency = profile_workload(alexnet, device, dev_name, image, iterations=100)
print(f"\n\nAlexNet inference latency: {alexnet_inference_latency:.2f} ms")

Downloading: "https://github.com/pytorch/vision/zipball/v0.10.0" to /root/.cache/torch/hub/v0.10.0.zip
Downloading: "https://download.pytorch.org/models/alexnet-owt-7be5be79.pth" to /root/.cache/torch/hub/checkpoints/alexnet-owt-7be5be79.pth
100%|██████████| 233M/233M [00:01<00:00, 135MB/s]


profiling AlexNet on cpu...


profiling latency is in progress...: 100%|██████████| 100/100 [00:08<00:00, 11.79it/s]



AlexNet inference latency: 84.87 ms





In [6]:
resnet152 = torch.hub.load('pytorch/vision:v0.10.0', 'resnet152', pretrained=True)
resnet152_inference_latency = profile_workload(resnet152, device, dev_name, image, iterations=100)
print(f"\n\nResNet152 inference latency: {resnet152_inference_latency:.2f} ms")

Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0
Downloading: "https://download.pytorch.org/models/resnet152-394f9c45.pth" to /root/.cache/torch/hub/checkpoints/resnet152-394f9c45.pth
100%|██████████| 230M/230M [00:02<00:00, 112MB/s]


profiling ResNet on cpu...


profiling latency is in progress...: 100%|██████████| 100/100 [01:08<00:00,  1.47it/s]



ResNet152 inference latency: 680.85 ms





In [7]:
from transformers import ViTImageProcessor, ViTForImageClassification
import torch
from PIL import Image
import requests

ViTLarge_inference_latency = profile_workload_on_ViT(device, dev_name, iterations=100)
print(f"\n\nViTLarge inference latency: {ViTLarge_inference_latency:.2f} ms")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


preprocessor_config.json:   0%|          | 0.00/160 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/69.7k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]


profiling latency is in progress...:   0%|          | 0/100 [00:00<?, ?it/s][A
profiling latency is in progress...:   1%|          | 1/100 [00:05<08:42,  5.28s/it][A
profiling latency is in progress...:   2%|▏         | 2/100 [00:07<06:03,  3.71s/it][A
profiling latency is in progress...:   3%|▎         | 3/100 [00:13<07:11,  4.45s/it][A
profiling latency is in progress...:   4%|▍         | 4/100 [00:16<06:05,  3.81s/it][A
profiling latency is in progress...:   5%|▌         | 5/100 [00:18<05:14,  3.31s/it][A
profiling latency is in progress...:   6%|▌         | 6/100 [00:20<04:41,  3.00s/it][A
profiling latency is in progress...:   7%|▋         | 7/100 [00:24<04:43,  3.04s/it][A
profiling latency is in progress...:   8%|▊         | 8/100 [00:26<04:29,  2.93s/it][A
profiling latency is in progress...:   9%|▉         | 9/100 [00:29<04:11,  2.76s/it][A
profiling latency is in progress...:  10%|█         | 10/100 [00:31<03:57,  2.64s/it][A
profiling latency is in progress...:  



ViTLarge inference latency: 2796.66 ms





## Record the profiled latancy on:
- CPU AlexNet latency: ## ms
- CPU ResNet152 latency: ## ms
- CPU ViT-Large latency: ## ms
- GPU AlexNet latency: ## ms
- GPU ResNet152 latency: ## ms
- GPU ViT-Large latency: ## ms

Compare your results for the different DNN models you profiles on the different hardware:

Why did you get different latencies for each DNN model?

Why did you get different latencies for different hardware?