Below we will try and fit a Logisitc Regression Model step by step for the XOR problem.
Fill in the code where there is a `_FILL_` specified. For this model, we have $x_1$ and $x_2$ are either 0/1 each and $y = x_1 + x_2 - 2x_1x_2$. Notice that this is True (1) if $x_1 = 1$ and $x_2 = 0$ OR $x_1 = 0$ and $x_2 = 1$; $y$ is zero otherwise.

In [1]:
import torch.nn as nn
import torch
# Don't fill this in
_FILL_ = '_FILL_'

In [2]:
x_data = [[0, 0], [0, 1], [1, 0], [1, 1]]
y_data = [[0], [1], [1], [0]]
x_data = torch.Tensor(x_data)
y_data = torch.Tensor(y_data)

In [3]:
# Define each tensor to be 1x1 and have them require a gradient for tracking; these are parameters
alpha = torch.randn(1, requires_grad=True)
beta_1 = torch.randn(1, requires_grad=True)
beta_2 = torch.randn(1, requires_grad=True)

In [4]:
lr = 0.01

for epoch in range(10):
  for x, y in zip(x_data, y_data):

    # Have z be beta_2*x[0] + beta_1*x[1] + alpha
    z = beta_2*x[0] + beta_1*x[1] + alpha

    # Push z through a nn.Sigmoid layer to get the p(y=1)
    a = torch.sigmoid(z)

    # Write the loss manually between y and a
    loss = -y*torch.log(a) - (1-y)*torch.log(1-a)

    # Get the loss gradients; the gradients with respect to alpha, beta_1, beta_2
    loss.backward()
    

    # Manually update the gradients
    # What we do below is wrapped within this clause because weights have required_grad=True but we don't need to track this in autograd
    with torch.no_grad():
        # Do an update for each parameter
        alpha -= lr * alpha.grad
        beta_1 -= lr * beta_1.grad
        beta_2 -= lr * beta_2.grad

        # Manually zero the gradients after updating weights
        alpha.grad.zero_()
        beta_1.grad.zero_()
        beta_2.grad.zero_()

  # Manually get the accuracy of the model after each epoch
  with torch.no_grad():
    print(f'Epoch: {epoch}')
    y_pred = []
    loss = 0.0

    for x, y in zip(x_data, y_data):
      # Get z
      z = beta_2*x[0] + beta_1*x[1] + alpha

      # Get a
      a = torch.sigmoid(z)

      # Get the loss
      loss += -y*torch.log(a) - (1-y)*torch.log(1-a)

      # Get the prediction given a
      y_pred.append(1 if a > 0.5 else 0)

    # Get the current accuracy over 4 points; make this a tensor
    y_pred = torch.tensor(y_pred)

    accuracy = (y_pred == y_data).float().mean()
    loss = loss / 4

    # Print the accuracy and the loss
    # You want the item in the tensor thats 1x1
    print('Loss: {} Accuracy: {}'.format(loss.item(), accuracy.item()))

Epoch: 0
Loss: 0.7643516063690186 Accuracy: 0.5
Epoch: 1
Loss: 0.7639364004135132 Accuracy: 0.5
Epoch: 2
Loss: 0.7635273933410645 Accuracy: 0.5
Epoch: 3
Loss: 0.7631244659423828 Accuracy: 0.5
Epoch: 4
Loss: 0.7627273797988892 Accuracy: 0.5
Epoch: 5
Loss: 0.7623360753059387 Accuracy: 0.5
Epoch: 6
Loss: 0.7619503736495972 Accuracy: 0.5
Epoch: 7
Loss: 0.7615700960159302 Accuracy: 0.5
Epoch: 8
Loss: 0.7611953020095825 Accuracy: 0.5
Epoch: 9
Loss: 0.7608257532119751 Accuracy: 0.5


Exercise 1: Create a 2D tensor and then add a dimension of size 1 inserted at the 0th axis.



In [5]:
x=torch.randn(2, 3).unsqueeze(dim=1)
x

tensor([[[0.6691, 0.6028, 0.9825]],

        [[1.0983, 0.6009, 0.6884]]])

Exercise 2: Remove the extra dimension you just added to the previous tensor.



In [6]:
x =x.squeeze()
x

tensor([[0.6691, 0.6028, 0.9825],
        [1.0983, 0.6009, 0.6884]])

Exercise 3: Create a random tensor of shape 5x3 in the interval [3, 7)



In [7]:
3 + torch.rand(5, 3) * 4

tensor([[5.1349, 6.5635, 3.3707],
        [5.0017, 6.0169, 5.4093],
        [4.3567, 6.4479, 3.1536],
        [6.1523, 5.5926, 3.8739],
        [5.4450, 5.9129, 3.8618]])

Exercise 4: Create a tensor with values from a normal distribution (mean=0, std=1).



In [8]:
torch.rand(3,3).normal_(mean=0, std=1)

tensor([[ 0.9072, -0.3443, -1.6035],
        [-1.5514, -0.6184,  0.6269],
        [-0.7896,  0.1800,  0.6960]])

exercise 5: Retrieve the indexes of all the non zero elements in the tensor torch.Tensor([1, 1, 1, 0, 1]).



In [9]:
x=torch.Tensor([1, 1, 1, 0, 1])
torch.nonzero(x)

tensor([[0],
        [1],
        [2],
        [4]])

Exercise 6: Create a random tensor of size (3,1) and then horizonally stack 4 copies together.



In [10]:
torch.rand(3,1).expand(3,4)

tensor([[0.6997, 0.6997, 0.6997, 0.6997],
        [0.9978, 0.9978, 0.9978, 0.9978],
        [0.7231, 0.7231, 0.7231, 0.7231]])

Exercise 7: Return the batch matrix-matrix product of two 3 dimensional matrices (a=torch.rand(3,4,5), b=torch.rand(3,5,4)).



In [11]:
a = torch.rand(3,4,5)
b = torch.rand(3,5,4)
torch.bmm(a, b)

tensor([[[1.1177, 1.2343, 0.8260, 0.6389],
         [1.4307, 1.7397, 0.9692, 0.6695],
         [1.2812, 1.5368, 0.7245, 0.8129],
         [2.0858, 2.5200, 1.2252, 1.5841]],

        [[1.5380, 0.8778, 1.3582, 1.4259],
         [0.8942, 1.1413, 1.8700, 1.0598],
         [0.6426, 0.6504, 0.9762, 0.7668],
         [0.9441, 0.2584, 0.5938, 0.8525]],

        [[0.8570, 0.2576, 0.2282, 0.7122],
         [1.2919, 1.0194, 1.1824, 1.6904],
         [1.0640, 0.8416, 0.6785, 1.1602],
         [1.2399, 0.7481, 1.2319, 1.6526]]])

Exercise 8: Return the batch matrix-matrix product of a 3D matrix and a 2D matrix (a=torch.rand(3,4,5), b=torch.rand(5,4)).



In [12]:
a = torch.rand(3,4,5)
b = torch.rand(5,4)

torch.bmm(a, b.unsqueeze(dim=0).expand(a.size(0),*b.size()))

tensor([[[1.3584, 0.8566, 0.5523, 0.7423],
         [0.6357, 0.7033, 0.4480, 0.5575],
         [1.1052, 1.1546, 0.7353, 1.2712],
         [1.1788, 0.8494, 0.6418, 0.9413]],

        [[0.5842, 0.4991, 0.2322, 0.4098],
         [1.4426, 1.0526, 0.8833, 1.0285],
         [1.4320, 1.3883, 0.4764, 0.9220],
         [2.0314, 1.6980, 0.7990, 0.9259]],

        [[1.5100, 0.9447, 0.7619, 0.9589],
         [1.3434, 0.8458, 0.7323, 0.7511],
         [1.5395, 1.2664, 1.0572, 1.0265],
         [1.0347, 0.8481, 0.4898, 0.9296]]])

Exercise 9: Create a 1x1 random tensor and get the value inside of this tensor as a scalar. No tensor.

In [13]:
torch.rand(1).item()

0.985990583896637

Exercise 10: Create a 2x1 tensor and have it require a gradient. Have $x$, this tensor, hold [-2, 1]. Set $y=x_1^2 + x_2^2$ and get the gradient of y with respect to $x_1$ and then $x_2$.

In [14]:

x = torch.tensor([-2.0,1.0],requires_grad=True)
y = x[0]**2+x[1]**2
y.backward()

x.grad

tensor([-4.,  2.])

Exercise 11: Check if cuda is available (it shuld be if in the Runtime setting for colab you choose the GPU). If it is, move $x$ above to a CUDA device. Create a new tensor of the same shape as $x$ and put it on the cpu. Try and add these tensors. What happens. How do you fix this?

In [15]:
if torch.cuda.is_available():
    x = x.to('cuda')
    print("x is now on:", x.device)
    
    y = torch.tensor([1,1]).to('cpu')
    print("y is on:", y.device)
    
    try:
        z = x + y
    except Exception as e:
        print("Error:", e)
        
    # Solution: Move one of the tensors to the same device 
    y = y.to('cuda')
    z = x + y
    print(z)
else:
    print('cuda is not available!')



x is now on: cuda:0
y is on: cpu
Error: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
tensor([-1.,  2.], device='cuda:0', grad_fn=<AddBackward0>)
