# Debugging

In [14]:
import torch

# If your gradients contain NaNs, PyTorch will stop and tell you where it happened.
torch.autograd.set_detect_anomaly(True) # Add this to your training loop

<torch.autograd.anomaly_mode.set_detect_anomaly at 0x7f637ac74cd0>

In [None]:
# Debugging Gradients with Hooks
# Hooks let you inspect the gradients of each layer. You can attach a hook to a tensor like this:
import torch

def print_grad(grad):
    print(grad)
    
x = torch.tensor(1.0, requires_grad=True)
y = x ** 2
y.register_hook(print_grad)
y.backward()
# Attach a hook to a model’s layer (model.layer_name.weight.register_hook(print_grad)) and see the gradients during training.

tensor(1.)


In [86]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.nn.utils import clip_grad_norm_
# Define a simple model
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(2, 2)

    def forward(self, x):
        return self.fc1(x)

# Function to print gradients
def print_grad(grad):
    print("Gradient:", grad)

# Initialize model and data
model = SimpleNN()
clip_grad_norm_(model.parameters(), max_norm=1.0)
optimizer = optim.SGD(model.parameters(), lr=0.1)
criterion = nn.MSELoss()

# Attach hook to the first layer's weights
model.fc1.weight.register_hook(print_grad)

# Sample input and target
x = torch.tensor([[1.0, 2.0]], requires_grad=True)
y_true = torch.tensor([[0.0, 1.0]])

# Forward pass
y_pred = model(x)

# Compute loss
loss = criterion(y_pred, y_true)

# Backward pass
optimizer.zero_grad()
loss.backward()
optimizer.step()

Gradient: tensor([[-0.6871, -1.3741],
        [-0.5729, -1.1458]])


In [85]:
import torch.profiler

with torch.profiler.profile(
    activities=[torch.profiler.ProfilerActivity.CPU, torch.profiler.ProfilerActivity.CUDA],
    record_shapes=True
) as prof:
    for i in range(10):
        model(x) # Run some model inference 

print(prof.key_averages().table(sort_by="self_cpu_time_total", row_limit=10))

----------------------  ------------  ------------  ------------  ------------  ------------  ------------  
                  Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls  
----------------------  ------------  ------------  ------------  ------------  ------------  ------------  
           aten::addmm        38.97%     339.741us        60.59%     528.161us      52.816us            10  
               aten::t        16.24%     141.595us        31.12%     271.310us      27.131us            10  
           aten::copy_        10.63%      92.683us        10.63%      92.683us       9.268us            10  
       aten::transpose         9.87%      86.043us        14.88%     129.715us      12.971us            10  
          aten::linear         8.29%      72.246us       100.00%     871.717us      87.172us            10  
          aten::expand         8.17%      71.231us        10.27%      89.546us       8.955us            10  
      aten::as_stri

# Quiz. Check Later for clarifications

Questions: 
* 1. What is an encoder in an autoencoder?
* 2. What loss function does an autoencoder optimize for?
* 3. How do autoencoders help in grouping similar images?
* 4. When is a convolutional autoencoder useful?
* 5. Why do we get non-intuitive images if we randomly sample from vector space of embeddings obtained from vanilla/convolutional autoencoders?
* 6. What are the loss functions that VAEs optimize for?
* 7. How do VAEs overcome the limitation of vanilla/convolutional autoencoders to generate new images?
* 8. During an adversarial attack, why do we modify the input image pixels and not the weight values?
* 9. In a neural style transfer, what are the losses that we optimize for?
* 10. Why do we consider the activation of different layers and not the original image when calculating style and content loss?
* 11. Why do we consider gram matrix loss and not the difference between images when calculating style loss?
* 12. Why do we warp images while building a model to generate deep fakes?

Answers
* 1. Encoder takes an image as input, takes the input into lower dimension and then pass it to decoder which tries to reconstruct the image.Between encoder and decoder is called the latent space. 
* 2. The loss function is called the MSELoss because it calculates the loss of continues values.
* 3. Autoencoders help grouping images by taking them as inputs then trasnforming them into lower dimensional inputs in order to gather the most relevant data.
* 4. Convolutional Autoencoder is useful when we want to make a model that needs to collect useful informations or features in an image in order for us to perform image manipulation.
* 5. 
* 6. The main VAEs loss is MSELoss.
* 7. 
* 8. We modify the input image pixels because our focus is on input and we can't modify its weights because we don't have the control in our target model. Our only choice is that we can modify the pixel however we wanted without making it too obvious in our naked eye. 
* 9. In neural style transfer, the losses that we need to define are content loss, style loss and the gram matrix loss. Gram matrix loss helps us measure how accurate our the calculation between content loss and style loss.
* 10. We consider gram matrix because it is a multiplication of different matrix by transpose of itself. Gram matrix also helps us measure style loss easily. Without it it will be difficult to come up with another solution. 
* 11. 
* 12. 