# **Steps For Training A Model On GPU**

# **1. Check For Availability Of GPU**

Before starting, verify if a GPU is available. If yes, select it; otherwise use the CPU.

```python

# Select device: use GPU if available, else fallback to CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Print which device is being used
print(f"Using device: {device}")

# **2. Move The Model To GPU**

Move your model to the selected device(`cuda` for GPU or `cpu`) so that all computations occur on the same device.

```python

# Instantiate your model
model = MyModel()

# Move the model to the selected device (GPU or CPU)
model = model.to(device)

# **3. Modify The Training Loop By Moving Data To GPU**

Ensure that each batch of data(features and labels) is moved to the GPU before processing. This ensures that both the model and data are on the same device.

```python

for batch_features, batch_labels in train_loader:
    # Move batch data to the selected device before processing
    batch_features, batch_labels = batch_features.to(device), batch_labels.to(device)
    # ...existing code for forward, backward, optimizer steps...

# **4. Modify The Evaluation Loop By Moving Data To GPU**

Similarly, ensure test data is moved to the GPU during evaluation. Disable gradient calculations using `torch.no_grad()` for efficiency.

```python

with torch.no_grad():
    for batch_features, batch_labels in test_loader:
        # Move batch data to the selected device before evaluation
        batch_features, batch_labels = batch_features.to(device), batch_labels.to(device)
        # ...existing code for evaluation...

# **5. Optimize The GPU Usage**

To make the best use of GPU resources, apply the following optimizations:

### **A. Use Larger Batch Sizes**

Larger batch sizes can better utilize GPU memory and reduce computation time per epoch(if memory allows).

### **B. Enable DataLoader Pinning**

Use `pin_memory=True` in `DataLoader` to speed up data transfer from CPU to GPU.

```python

train_loader = DataLoader(
    train_dataset, 
    batch_size=128, 
    shuffle=True, 
    pin_memory=True
)


test_loader = DataLoader(
    test_dataset, 
    batch_size=128, 
    shuffle=False,
    pin_memory=True
)

---
---