<a href="https://colab.research.google.com/github/Jhansipothabattula/Data_Science/blob/main/Day162.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Debugging and Troubleshooting


### **Introduction**

* Even with well-structured code, issues can arise during the development of deep learning models.
* Debugging and troubleshooting are essential skills for identifying and resolving errors, improving model stability, and optimizing performance.
* In this section, we will explore common errors and warnings in PyTorch, effective debugging techniques, strategies for handling numerical stability issues like NaNs and infinities, and tools for profiling and optimizing PyTorch code.
* By mastering these skills, you can ensure your models run smoothly and efficiently in both development and production environments.


 **Understanding Common PyTorch Errors and Warnings**

### **Common Errors**

* **Shape Mismatch Errors:** These occur when operations are performed on tensors with incompatible shapes. Common examples include trying to add tensors of different dimensions or incorrectly defining model layers.
* **Example:** `RuntimeError: The size of tensor a (10) must match the size of tensor b (12) at non-singleton dimension 0`
* **Solution:** Check the shapes of the tensors involved using `tensor.shape` and ensure they are compatible for the intended operation.


* **Type Errors:** PyTorch operations are sensitive to tensor data types. A common mistake is performing operations between tensors of different types, such as float32 and int64.
* **Example:** `RuntimeError: Expected object of scalar type Float but got scalar type Long for argument #2 'weight'`
* **Solution:** Ensure tensors are of the same type using `tensor.type()` and convert them if necessary using `tensor.float()` or `tensor.long()`.


* **CUDA Errors:** These errors are related to GPU usage and occur when operations are performed on tensors that are not on the same device or when there is insufficient GPU memory.
* **Example:** `RuntimeError: CUDA out of memory. Tried to allocate 1.00 GiB (GPU 0; 11.17 GiB total capacity; 8.51 GiB already allocated)`
* **Solution:** Free up memory by deleting unnecessary variables with `del`, use smaller batch sizes, or move some tensors to the CPU using `tensor.cpu()`.



### **Common Warnings**

* **UserWarnings:** PyTorch often issues warnings when it detects potentially problematic operations, such as using deprecated features or inefficient methods.
* **Example:** `UserWarning: Using a target size (torch.Size([10, 1])) that is different to the input size (torch.Size([10])) is deprecated.`
* **Solution:** Pay attention to warnings and update your code to comply with the latest recommended practices.


* **DeprecationWarnings:** These warnings inform you that a particular feature or function will be removed in a future version of PyTorch.
* **Solution:** Replace deprecated features with their modern equivalents as suggested in the warning message.




## **Debugging Techniques: Printing Tensors, Using PyTorch Debugger (pdb)**

### **Printing Tensors**

* **Overview:** One of the simplest yet most effective debugging techniques is printing the values and shapes of tensors at various points in your code. This helps verify that operations are producing the expected results.
* **Example:**
```python
print(f"Tensor shape: {tensor.shape}")
print(f"Tensor values: {tensor}")

```




* **Inspecting Gradients:** You can also print the gradients of tensors after the backward pass to ensure they are being computed correctly.
* **Example:** `print(f"Gradients: {tensor.grad}")`



### **Using PyTorch Debugger (pdb)**

* **Overview:** PyTorch can be debugged using Python's built-in `pdb` debugger, which allows you to set breakpoints, step through code, and inspect variables.
* **Setting a Breakpoint:** Insert `import pdb; pdb.set_trace()` at the point where you want to start debugging. The execution will pause, allowing you to inspect the environment.
* **Example:**
```python
import pdb
def forward_pass(x):
    pdb.set_trace() # Start debugging here
    y = x + 2
    return y

```




* **Common Commands:**
* **n (next):** Execute the next line of code.
* **c (continue):** Continue execution until the next breakpoint.
* **q (quit):** Exit the debugger.
* **p variable_name:** Print the value of a variable.



## **Handling Numerical Stability Issues: NaNs, Infinities**

### **Common Causes of NaNs and Infinities**

* **Exploding Gradients:** Gradients that grow exponentially during backpropagation can lead to NaNs or infinities in the model's parameters.
* **Solution:** Use gradient clipping to limit the size of the gradients.
`torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)`


* **Division by Zero:** Certain operations, like division or logarithms, can produce NaNs or infinities when the input is zero or negative.
* **Solution:** Add a small epsilon value to the denominator to prevent division by zero.
`result = x / (y + 1e-8)`


* **Overflow in Exponentials:** Exponential functions can quickly grow to very large values, leading to overflow.
* **Solution:** Use the `torch.clamp()` function to limit the range of inputs.
`x = torch.clamp(x, min=-10, max=10)`



### **Detecting and Handling NaNs and Infinities**

* **NaN Detection:** Use the `torch.isnan()` function to detect NaNs in tensors.
* **Example:** `if torch.isnan(tensor).any(): print("NaN detected!")`


* **Infinity Detection:** Similarly, `torch.isinf()` can be used to detect infinite values.
* **Example:** `if torch.isinf(tensor).any(): print("Infinity detected!")`


* **Debugging NaNs:** If NaNs or infinities are detected, backtrack through your operations to find where they first appear and modify the operations to ensure numerical stability.


## **Profiling and Optimizing PyTorch Code**

### **Profiling with torch.profiler**

* **Overview:** PyTorch's `torch.profiler` module provides tools for profiling model performance, identifying bottlenecks, and understanding how different parts of your code execute.
* **Example:**
```python
import torch
from torch.profiler import profile, record_function, ProfilerActivity

with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA], record_shapes=True) as prof:
    with record_function("model_inference"):
        model(input_tensor)
print(prof.key_averages().table(sort_by="cpu_time_total"))

```




* **Insights:** Profiling can reveal which operations are taking the most time, whether your model is effectively utilizing the GPU, and where you might have inefficiencies in your code.

### **Code Optimization Techniques**

* **Batching:** Process data in batches rather than individually to take full advantage of GPU parallelism.
* **Example:** Instead of processing one image at a time, use a batch of 32 images to maximize GPU utilization.


* **Mixed Precision Training:** Use mixed precision training to reduce memory usage and increase computational speed by using 16-bit floating-point numbers where possible.
* **Example:**
```python
model = model.half()
input_tensor = input_tensor.half()

```




* **Avoiding Python Loops:** Replace Python loops with PyTorch operations whenever possible to leverage the power of vectorization and GPU acceleration.
* **Example:** Instead of looping through tensors to add them, use `tensor.sum(dim=0)` for efficient computation.



