Below we will try and fit a Logisitc Regression Model step by step for the XOR problem.
Fill in the code where there is a `_FILL_` specified. For this model, we have $x_1$ and $x_2$ are either 0/1 each and $y = x_1 + x_2 - 2x_1x_2$. Notice that this is True (1) if $x_1 = 1$ and $x_2 = 0$ OR $x_1 = 0$ and $x_2 = 1$; $y$ is zero otherwise.

In [1]:
import torch.nn as nn
import torch
# Don't fill this in
_FILL_ = '_FILL_'

In [3]:
x_data = [[0, 0], [0, 1], [1, 0], [1, 1]]
y_data = [[0], [1], [1], [0]]
x_data = torch.Tensor(x_data)
y_data = torch.Tensor(y_data)

In [7]:
# Define each tensor to be 1x1 and have them require a gradient for tracking; these are parameters
alpha = torch.randn(1, requires_grad=True)
beta_1 = torch.randn(1, requires_grad=True)
beta_2 = torch.randn(1, requires_grad=True)

In [15]:
lr = 0.01

for epoch in range(10):
  for x, y in zip(x_data, y_data):

    # Have z be beta_2*x[0] + beta_1*x[1] + alpha
    z = beta_2*x[0] + beta_1*x[1] + alpha

    # Push z through a nn.Sigmoid layer to get the p(y=1)
    a = torch.sigmoid(z)

    # Write the loss manually between y and a
    loss = -y*torch.log(a) - (1-y)*torch.log(1-a)

    # Get the loss gradients; the gradients with respect to alpha, beta_1, beta_2
    loss.backward()
    

    # Manually update the gradients
    # What we do below is wrapped within this clause because weights have required_grad=True but we don't need to track this in autograd
    with torch.no_grad():
        # Do an update for each parameter
        alpha -= lr * alpha.grad
        beta_1 -= lr * beta_1.grad
        beta_2 -= lr * beta_2.grad

        # Manually zero the gradients after updating weights
        alpha.grad.zero_()
        beta_1.grad.zero_()
        beta_2.grad.zero_()

  # Manually get the accuracy of the model after each epoch
  with torch.no_grad():
    print(f'Epoch: {epoch}')
    y_pred = []
    loss = 0.0

    for x, y in zip(x_data, y_data):
      # Get z
      z = beta_2*x[0] + beta_1*x[1] + alpha

      # Get a
      a = torch.sigmoid(z)

      # Get the loss
      loss += -y*torch.log(a) - (1-y)*torch.log(1-a)

      # Get the prediction given a
      y_pred.append(1 if a > 0.5 else 0)

    # Get the current accuracy over 4 points; make this a tensor
    y_pred = torch.tensor(y_pred)

    accuracy = (y_pred == y_data).float().mean()
    loss = loss / 4

    # Print the accuracy and the loss
    # You want the item in the tensor thats 1x1
    print('Loss: {} Accuracy: {}'.format(loss.item(), accuracy.item()))

Epoch: 0
Loss: 0.7696713805198669 Accuracy: 0.5
Epoch: 1
Loss: 0.7687812447547913 Accuracy: 0.5
Epoch: 2
Loss: 0.7679104208946228 Accuracy: 0.5
Epoch: 3
Loss: 0.7670583724975586 Accuracy: 0.5
Epoch: 4
Loss: 0.7662245631217957 Accuracy: 0.5
Epoch: 5
Loss: 0.7654088735580444 Accuracy: 0.5
Epoch: 6
Loss: 0.7646105289459229 Accuracy: 0.5
Epoch: 7
Loss: 0.7638294100761414 Accuracy: 0.5
Epoch: 8
Loss: 0.763064980506897 Accuracy: 0.5
Epoch: 9
Loss: 0.7623169422149658 Accuracy: 0.5


Exercise 1: Create a 2D tensor and then add a dimension of size 1 inserted at the 0th axis.



In [24]:
x=torch.randn(2, 3).unsqueeze(dim=1)
x

tensor([[[-0.5240, -0.7444,  0.5727]],

        [[-0.3527, -0.6371, -0.1981]]])

Exercise 2: Remove the extra dimension you just added to the previous tensor.



In [25]:
x =x.squeeze()
x

tensor([[-0.5240, -0.7444,  0.5727],
        [-0.3527, -0.6371, -0.1981]])

Exercise 3: Create a random tensor of shape 5x3 in the interval [3, 7)



In [26]:
3 + torch.rand(5, 3) * 4

tensor([[3.3034, 4.3474, 6.9959],
        [3.1301, 4.1922, 4.2445],
        [3.4870, 6.1628, 4.1539],
        [4.0046, 3.6545, 4.2316],
        [6.5788, 6.3570, 6.7518]])

Exercise 4: Create a tensor with values from a normal distribution (mean=0, std=1).



In [27]:
torch.rand(3,3).normal_(mean=0, std=1)

tensor([[-0.4396,  0.7305,  1.2345],
        [ 2.0732, -0.7741, -1.0739],
        [-0.6972, -1.8511, -0.8051]])

exercise 5: Retrieve the indexes of all the non zero elements in the tensor torch.Tensor([1, 1, 1, 0, 1]).



In [28]:
x=torch.Tensor([1, 1, 1, 0, 1])
torch.nonzero(x)

tensor([[0],
        [1],
        [2],
        [4]])

Exercise 6: Create a random tensor of size (3,1) and then horizonally stack 4 copies together.



In [29]:
torch.rand(3,1).expand(3,4)

tensor([[0.7343, 0.7343, 0.7343, 0.7343],
        [0.1371, 0.1371, 0.1371, 0.1371],
        [0.5051, 0.5051, 0.5051, 0.5051]])

Exercise 7: Return the batch matrix-matrix product of two 3 dimensional matrices (a=torch.rand(3,4,5), b=torch.rand(3,5,4)).



In [30]:
a = torch.rand(3,4,5)
b = torch.rand(3,5,4)
torch.bmm(a, b)

tensor([[[0.8256, 0.8774, 0.9409, 1.5316],
         [0.4824, 0.5852, 0.4013, 1.0560],
         [0.9193, 1.1430, 1.2075, 1.6282],
         [1.0752, 1.3022, 1.2029, 1.9151]],

        [[1.0178, 1.1421, 0.8326, 1.3754],
         [0.5466, 0.8347, 0.3251, 1.1396],
         [0.3506, 0.4528, 0.2895, 0.5627],
         [1.2862, 1.8336, 1.3080, 1.9938]],

        [[1.8646, 1.5553, 1.4375, 1.2376],
         [1.9361, 1.8066, 1.7534, 1.3161],
         [2.1781, 2.0065, 1.8016, 1.0613],
         [1.7194, 1.3998, 1.3294, 1.2011]]])

Exercise 8: Return the batch matrix-matrix product of a 3D matrix and a 2D matrix (a=torch.rand(3,4,5), b=torch.rand(5,4)).



In [32]:
a = torch.rand(3,4,5)
b = torch.rand(5,4)

torch.bmm(a, b.unsqueeze(dim=0).expand(a.size(0),*b.size()))

tensor([[[1.7260, 1.3707, 1.2195, 1.5173],
         [1.7599, 1.8178, 2.1290, 2.0782],
         [1.7229, 1.5337, 2.1185, 1.9211],
         [1.1690, 1.3057, 1.5000, 1.4344]],

        [[1.2274, 0.6852, 1.4323, 1.1770],
         [0.7564, 1.0317, 1.1052, 1.0394],
         [1.0998, 1.3043, 1.1831, 1.4294],
         [1.5357, 1.1266, 1.7714, 1.5891]],

        [[1.2535, 1.3614, 1.6904, 1.6466],
         [1.5170, 1.6165, 1.9225, 1.9392],
         [0.7325, 1.1157, 1.2611, 1.2070],
         [0.9363, 1.3898, 1.2894, 1.3576]]])

Exercise 9: Create a 1x1 random tensor and get the value inside of this tensor as a scalar. No tensor.

In [36]:
torch.rand(1).item()

0.6220725178718567

Exercise 10: Create a 2x1 tensor and have it require a gradient. Have $x$, this tensor, hold [-2, 1]. Set $y=x_1^2 + x_2^2$ and get the gradient of y with respect to $x_1$ and then $x_2$.

In [38]:

x = torch.tensor([-2.0,1.0],requires_grad=True)
y = x[0]**2+x[1]**2
y.backward()

x.grad

tensor([-4.,  2.])

Exercise 11: Check if cuda is available (it shuld be if in the Runtime setting for colab you choose the GPU). If it is, move $x$ above to a CUDA device. Create a new tensor of the same shape as $x$ and put it on the cpu. Try and add these tensors. What happens. How do you fix this?

In [41]:
if torch.cuda.is_available():
    x = x.cuda()
    print("x is now on:", x.device)
    
    y = torch.tensor([1,1]).to('cpu')
    print("y is on:", y.device)
    
    try:
        z = x + y
    except Exception as e:
        print("Error:", e)
        
    # Solution: Move one of the tensors to the same device 
    y = y.cuda()
    z = x + y
    print(z)
else:
    print('cuda is not available!')



cuda is not available!
