Below we will try and fit a Logisitc Regression Model step by step for the XOR problem.
Fill in the code where there is a `_FILL_` specified. For this model, we have $x_1$ and $x_2$ are either 0/1 each and $y = x_1 + x_2 - 2x_1x_2$. Notice that this is True (1) if $x_1 = 1$ and $x_2 = 0$ OR $x_1 = 0$ and $x_2 = 1$; $y$ is zero otherwise.

In [466]:
import torch.nn as nn
import torch
# Don't fill this in
_FILL_ = '_FILL_'

In [467]:
x_data = [[0, 0], [0, 1], [1, 0], [1, 1]]
y_data = [[0], [1], [1], [0]]
x_data = torch.Tensor(x_data)
y_data = torch.Tensor(y_data)

In [468]:
# Define each tensor to be 1x1 and have them require a gradient for tracking; these are parameters
alpha = torch.randn(1, requires_grad=True)
beta_1 = torch.randn(1, requires_grad=True)
beta_2 = torch.randn(1, requires_grad=True)

In [469]:
lr = 0.01

for epoch in range(10):
  for x, y in zip(x_data, y_data):

    # Have z be beta_2*x[0] + beta_1*x[1] + alpha
    z = beta_2*x[0] + beta_1*x[1] + alpha

    # Push z through a nn.Sigmoid layer to get the p(y=1)
    a = torch.sigmoid(z)

    # Write the loss manually between y and a
    loss = -y*torch.log(a) - (1-y)*torch.log(1-a)

    # Get the loss gradients; the gradients with respect to alpha, beta_1, beta_2
    loss.backward()
    

    # Manually update the gradients
    # What we do below is wrapped within this clause because weights have required_grad=True but we don't need to track this in autograd
    with torch.no_grad():
        # Do an update for each parameter
        alpha -= lr * alpha.grad
        beta_1 -= lr * beta_1.grad
        beta_2 -= lr * beta_2.grad

        # Manually zero the gradients after updating weights
        alpha.grad.zero_()
        beta_1.grad.zero_()
        beta_2.grad.zero_()

  # Manually get the accuracy of the model after each epoch
  with torch.no_grad():
    print(f'Epoch: {epoch}')
    y_pred = []
    loss = 0.0

    for x, y in zip(x_data, y_data):
      # Get z
      z = beta_2*x[0] + beta_1*x[1] + alpha

      # Get a
      a = torch.sigmoid(z)

      # Get the loss
      loss += -y*torch.log(a) - (1-y)*torch.log(1-a)

      # Get the prediction given a
      y_pred.append(1 if a > 0.5 else 0)

    # Get the current accuracy over 4 points; make this a tensor
    y_pred = torch.tensor(y_pred)
    accuracy = (y_pred == y_data).float().mean()
    loss = loss / 4

    # Print the accuracy and the loss
    # You want the item in the tensor thats 1x1
    print('Loss: {} Accuracy: {}'.format(loss.item(), accuracy.item()))

Epoch: 0
Loss: 1.0555728673934937 Accuracy: 0.5
Epoch: 1
Loss: 1.0555728673934937 Accuracy: 0.5
Epoch: 2
Loss: 1.0555728673934937 Accuracy: 0.5
Epoch: 3
Loss: 1.0555728673934937 Accuracy: 0.5
Epoch: 4
Loss: 1.0555728673934937 Accuracy: 0.5
Epoch: 5
Loss: 1.0555728673934937 Accuracy: 0.5
Epoch: 6
Loss: 1.0555728673934937 Accuracy: 0.5
Epoch: 7
Loss: 1.0555728673934937 Accuracy: 0.5
Epoch: 8
Loss: 1.0555728673934937 Accuracy: 0.5
Epoch: 9
Loss: 1.0555728673934937 Accuracy: 0.5


Exercise 1: Create a 2D tensor and then add a dimension of size 1 inserted at the 0th axis.



In [470]:
x=torch.randn(2, 3).unsqueeze(dim=1)
x

tensor([[[ 1.6353, -2.6256, -1.3068]],

        [[ 1.9800, -0.4244,  0.4171]]])

Exercise 2: Remove the extra dimension you just added to the previous tensor.



In [471]:
x =x.squeeze()
x

tensor([[ 1.6353, -2.6256, -1.3068],
        [ 1.9800, -0.4244,  0.4171]])

Exercise 3: Create a random tensor of shape 5x3 in the interval [3, 7)



In [472]:
3 + torch.rand(5, 3) * 4

tensor([[5.3841, 3.0914, 4.4542],
        [5.2814, 6.7705, 3.2416],
        [3.9344, 5.9424, 6.3185],
        [4.0282, 4.2624, 3.8962],
        [6.5506, 4.9585, 4.4203]])

Exercise 4: Create a tensor with values from a normal distribution (mean=0, std=1).



In [473]:
torch.rand(3,3).normal_(mean=0, std=1)

tensor([[ 1.5047,  1.3762,  2.6666],
        [ 1.7881, -1.2995, -0.0080],
        [-1.4231, -0.6612, -0.0824]])

exercise 5: Retrieve the indexes of all the non zero elements in the tensor torch.Tensor([1, 1, 1, 0, 1]).



In [474]:
x=torch.Tensor([1, 1, 1, 0, 1])
torch.nonzero(x)

tensor([[0],
        [1],
        [2],
        [4]])

Exercise 6: Create a random tensor of size (3,1) and then horizonally stack 4 copies together.



In [475]:
torch.rand(3,1).expand(3,4)

tensor([[0.9251, 0.9251, 0.9251, 0.9251],
        [0.1810, 0.1810, 0.1810, 0.1810],
        [0.3388, 0.3388, 0.3388, 0.3388]])

Exercise 7: Return the batch matrix-matrix product of two 3 dimensional matrices (a=torch.rand(3,4,5), b=torch.rand(3,5,4)).



In [476]:
a = torch.rand(3,4,5)
b = torch.rand(3,5,4)
torch.bmm(a, b)

tensor([[[1.0655, 1.7171, 0.9702, 0.9596],
         [1.5602, 1.9471, 1.8162, 1.8162],
         [1.1823, 1.6536, 1.0865, 1.1224],
         [1.7981, 1.4506, 1.9199, 1.9248]],

        [[0.4634, 0.9505, 0.9773, 1.2442],
         [0.6340, 1.1177, 1.4698, 1.3047],
         [0.5367, 1.0467, 1.2788, 1.2913],
         [0.4221, 0.8204, 1.0139, 0.9002]],

        [[0.9583, 1.9250, 2.2394, 2.9252],
         [0.8005, 0.8672, 0.9208, 1.5445],
         [0.2235, 0.3607, 0.7303, 0.8319],
         [0.5710, 1.0672, 1.3429, 2.1370]]])

Exercise 8: Return the batch matrix-matrix product of a 3D matrix and a 2D matrix (a=torch.rand(3,4,5), b=torch.rand(5,4)).



In [477]:
a = torch.rand(3,4,5)
b = torch.rand(5,4)

torch.bmm(a, b.unsqueeze(dim=0).expand(a.size(0),*b.size()))

tensor([[[1.6207, 1.5306, 1.9985, 1.9615],
         [1.0929, 1.0774, 1.5525, 1.4917],
         [1.2108, 1.0419, 1.1231, 1.2275],
         [1.9008, 1.6964, 2.3377, 2.1382]],

        [[1.2788, 1.5037, 1.7737, 1.4721],
         [1.3784, 1.4381, 2.0852, 1.8324],
         [1.8352, 1.4423, 2.0438, 1.9088],
         [0.7715, 0.5862, 0.6616, 0.7260]],

        [[0.8325, 0.9908, 1.5355, 1.2928],
         [1.2626, 1.1311, 1.5568, 1.5689],
         [1.5654, 1.3978, 1.6659, 1.7784],
         [1.5666, 1.5923, 1.8398, 1.8419]]])

Exercise 9: Create a 1x1 random tensor and get the value inside of this tensor as a scalar. No tensor.

In [478]:
torch.rand(1).item()

0.7447531819343567

Exercise 10: Create a 2x1 tensor and have it require a gradient. Have $x$, this tensor, hold [-2, 1]. Set $y=x_1^2 + x_2^2$ and get the gradient of y with respect to $x_1$ and then $x_2$.

In [479]:

x = torch.tensor([-2.0,1.0],requires_grad=True)
y = x[0]**2+x[1]**2
y.backward()

x.grad

tensor([-4.,  2.])

Exercise 11: Check if cuda is available (it shuld be if in the Runtime setting for colab you choose the GPU). If it is, move $x$ above to a CUDA device. Create a new tensor of the same shape as $x$ and put it on the cpu. Try and add these tensors. What happens. How do you fix this?

In [480]:
if torch.cuda.is_available():
    x = x.to('cuda')
    print("x is now on:", x.device)
    
    y = torch.tensor([1,1]).to('cpu')
    print("y is on:", y.device)
    
    try:
        z = x + y
    except Exception as e:
        print("Error:", e)
        
    # Solution: Move one of the tensors to the same device 
    y = y.to('cuda')
    z = x + y
    print(z)
else:
    print('cuda is not available!')



cuda is not available!
