<a href="https://colab.research.google.com/github/OreamnosAR/Deep-Learning/blob/main/w8_gpu_debugging.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

While using the GPU has some very tangible benefits in terms of execution speed of our code, combining CPU and GPU also comes with some further complications. In other words, new and exciting error messages! 😏

In this notebook, you will be presented with various simple training setups, which are all failing to run for different reasons. The task in each case is to determine what the cause of the failure is, and fix it.

First we need to make sure that you remembered to turn on the GPU:

In [2]:
import torch

if torch.cuda.is_available():
  print('Congrats, you have a gpu :)')
else:
  print('Woups, please go go into edit -> notebook settings and give yourself some hardware acceleration')

Congrats, you have a gpu :)


In our example today, Alice has a small, working bit of pytorch code.
However, yesterday was Alice's birthday, and Bob gave her a shiny new GPU. So, today, Alice wants to upgrade her code to take advantage of her new hardware.

This is what Alice's code looked like before she starting upgrading:

In [3]:
#gpu-free code:
import torch.nn as nn

Xset=torch.utils.data.TensorDataset(torch.rand(1000000,100),torch.rand(1000000,1))
trainLoader=torch.utils.data.DataLoader(Xset,batch_size=32)

model=nn.Sequential(nn.Linear(100,5),nn.ReLU(),
                    nn.Linear(5,1))

lossF=nn.functional.huber_loss
optimizer=torch.optim.Adam(model.parameters())

for xbatch,ybatch in trainLoader:
  pred=model(xbatch)
  loss=lossF(ybatch,pred)

  model.zero_grad()
  loss.backward()
  optimizer.step()

This runs fine.
Now alice makes a gpu-device, and moves the calculations there:

In [4]:
device=torch.device('cuda')

model.to(device)
optimizer=torch.optim.Adam(model.parameters())

nEpochs=10
for xbatch,ybatch in trainLoader:
  xbatch = xbatch.to(device)
  ybatch = ybatch.to(device)
  pred=model(xbatch)
  loss=lossF(ybatch,pred)

  model.zero_grad()
  loss.backward()
  optimizer.step()

Task 1:
Get the code above to run, on the GPU.

Having solved the first hurdle, Alice decides to further upgrade the model with an extra offset:

In [5]:
class aliceModel(nn.Module):
  def __init__(self):
    super().__init__()
    self.layer1=nn.Linear(100,5)
    self.layer2=nn.Linear(5,1)

    self.extraOffset=torch.rand((1,5),requires_grad=True)


  def forward(self,x):
    #modify the code to explicitly move self.extraOffset to the device within the forward method:
    extra_offset = self.extraOffset.to(x.device)
    x=self.layer1(x)+extra_offset
    #x=self.layer1(x)+self.extraOffset
    x=nn.functional.relu(x)
    x=self.layer2(x)
    x=nn.functional.relu(x)

    return x

#test:
net=aliceModel()
net(torch.rand((10,100)));

However, when she now runs her training loop:

In [6]:
device=torch.device('cuda')
model=aliceModel()
model = model.to(device)

optimizer=torch.optim.Adam(model.parameters())

for xbatch,ybatch in trainLoader:
  xbatch=xbatch.to(device)
  ybatch=ybatch.to(device)

  pred=model(xbatch)
  loss=lossF(ybatch,pred)

  model.zero_grad()
  loss.backward()
  optimizer.step()

Task 2:
Solve this new GPU issue

Feeling now quite an expert on pytorch and devices, Alice wishes to speed up her training by increasing the batch size. To make sure she doesn't run out of memory, she checks the free memory on the GPU first, and sets the batch accordingly:

In [5]:
class aliceModel(nn.Module):
  def __init__(self):
    super().__init__()
    self.layer1=nn.Linear(100,5)
    self.layer2=nn.Linear(5,1)

  def forward(self,x):
    x=self.layer1(x)
    x=nn.functional.relu(x)
    x=self.layer2(x)
    x=nn.functional.relu(x)

    return x

#test:
device=torch.device('cuda')
net=aliceModel()
net(torch.rand((10,100)));


from math import floor

t = torch.cuda.get_device_properties(device).total_memory
print('total memory: ',t/1e6,'mb')

batchSize=floor(t/Xset[0][0].element_size()/2) #this is less than half of the space on the GPU
print('batch size: ',batchSize)



total memory:  15835.660288 mb
batch size:  1979457536


Der er en fejl i koden over... batchsize udregnes udfra et element i datasættet, og ikke hele datasættet størrelse... Er rettet i koden neden under:

In [8]:
Xset[0][0].element_size()
t/4

3958915072.0

In [13]:
len(Xset)

1000000

In [20]:
batchSize=floor(t/(Xset[0][0].element_size()*len(Xset)))
print('batch size: ',batchSize)

batch size:  3958


In [21]:
trainLoader=torch.utils.data.DataLoader(Xset,batch_size=batchSize)

model=aliceModel()
model.to(device)
optimizer=torch.optim.Adam(model.parameters())

for xbatch,ybatch in trainLoader:
  xbatch=xbatch.to(device)
  ybatch=ybatch.to(device)

  pred=model(xbatch)
  loss=lossF(ybatch,pred)

  model.zero_grad()
  loss.backward()
  optimizer.step()
