Below we will try and fit a Logisitc Regression Model step by step for the XOR problem.
Fill in the code where there is a `_FILL_` specified. For this model, we have $x_1$ and $x_2$ are either 0/1 each and $y = x_1 + x_2 - 2x_1x_2$. Notice that this is True (1) if $x_1 = 1$ and $x_2 = 0$ OR $x_1 = 0$ and $x_2 = 1$; $y$ is zero otherwise.

In [1]:
import torch.nn as nn
import numpy as np
import torch
# Don't fill this in
_FILL_ = '_FILL_'

In [2]:
x_data = [[0, 0], [0, 1], [1, 0], [1, 1]]
y_data = [[0], [1], [1], [0]]
x_data = torch.Tensor(x_data)
y_data = torch.Tensor(y_data)

In [3]:
# Define each tensor to be 1x1 and have them require a gradient for tracking; these are parameters
alpha = nn.Parameter(torch.Tensor(1,1),requires_grad=True)
beta_1 = nn.Parameter(torch.Tensor(1,1),requires_grad=True)
beta_2 = nn.Parameter(torch.Tensor(1,1),requires_grad=True)

In [4]:
lr = 0.01

for epoch in range(10):
  for x, y in zip(x_data, y_data):

    # Have z be beta_2*x[0] + beta_1*x[1] + alpha
    z = beta_2*x[0] + beta_1*x[1] + alpha

    # Push z through a nn.Sigmoid layer to get the p(y=1)
    a = nn.Sigmoid()(z)

    # Write the loss manually between y and a
    loss = -y*torch.log(a)-(1-y)*torch.log(1-a)

    # Get the loss gradients; the gradients with respect to alpha, beta_1, beta_2
    loss.backward()

    # Manually update the gradients
    grad_alpha = alpha.grad
    grad_beta_1 = beta_1.grad
    grad_beta_2 = beta_2.grad
    # What we do below is wrapped within this clause because weights have required_grad=True but we don't need to track this in autograd
    with torch.no_grad():
        # Do an update for each parameter
        alpha -= lr*grad_alpha
        beta_1 -= lr*grad_beta_1
        beta_2 -= lr*grad_beta_2

        # Manually zero the gradients after updating weights
        alpha.grad.zero_()
        beta_1.grad.zero_()
        beta_2.grad.zero_()

  # Manually get the accuracy of the model after each epoch
  with torch.no_grad():
    print(f'Epoch: {epoch}')
    y_pred = []
    loss = 0.0

    for x, y in zip(x_data, y_data):
      # Get z
      z = beta_2*x[0] + beta_1*x[1] + alpha

      # Get a
      a = nn.Sigmoid()(z)

      # Get the loss
      loss += -y*torch.log(a)-(1-y)*torch.log(1-a)

      # Get the prediction given a
      prediction = (a > 0.5)
      y_pred.append(prediction)

    # Get the current accuracy over 4 points; make this a tensor
    y_pred = torch.cat(y_pred).view(4, 1)

    accuracy = ((y_pred == y_data).sum().item())/4
    loss = loss / 4

    # Print the accuracy and the loss
    # You want the item in the tensor thats 1x1
    print('Loss: {} Accuracy: {}'.format(loss.item(), accuracy))

Epoch: 0
Loss: 0.6931471824645996 Accuracy: 0.5
Epoch: 1
Loss: 0.6931471228599548 Accuracy: 0.5
Epoch: 2
Loss: 0.6931471824645996 Accuracy: 0.5
Epoch: 3
Loss: 0.6931471824645996 Accuracy: 0.5
Epoch: 4
Loss: 0.6931472420692444 Accuracy: 0.5
Epoch: 5
Loss: 0.6931471824645996 Accuracy: 0.5
Epoch: 6
Loss: 0.6931471824645996 Accuracy: 0.5
Epoch: 7
Loss: 0.6931471824645996 Accuracy: 0.5
Epoch: 8
Loss: 0.6931472420692444 Accuracy: 0.5
Epoch: 9
Loss: 0.6931472420692444 Accuracy: 0.5


Exercise 1: Create a 2D tensor and then add a dimension of size 1 inserted at the 0th axis.



In [5]:
# Create a 2D tensor
tensor_2d = torch.randn(2, 3)

# Add a dimension of size 1 inserted at the 0th axis
tensor_3d = tensor_2d.unsqueeze(0)
print(tensor_3d)

tensor([[[-0.7529,  0.1514,  2.0254],
         [ 0.9597, -0.1363, -1.5390]]])


Exercise 2: Remove the extra dimension you just added to the previous tensor.



In [6]:
tensor_3d.squeeze(0)

tensor([[-0.7529,  0.1514,  2.0254],
        [ 0.9597, -0.1363, -1.5390]])

Exercise 3: Create a random tensor of shape 5x3 in the interval [3, 7)



In [7]:
torch.randn(5, 3)*4+3

tensor([[ 6.7234, -0.2551,  1.5924],
        [ 4.5209,  3.7097,  6.0206],
        [10.8677,  7.8818,  2.1126],
        [ 4.9688, -4.3182, -2.3925],
        [ 5.3777,  7.7346, -1.7977]])

Exercise 4: Create a tensor with values from a normal distribution (mean=0, std=1).



In [8]:
torch.randn(5,3)

tensor([[-0.7708, -0.1990, -0.1236],
        [ 0.6805,  1.2815, -0.5577],
        [ 0.2453,  0.1268, -1.6074],
        [-0.8201, -0.3856, -0.2435],
        [-0.3635, -0.7848,  0.5128]])

exercise 5: Retrieve the indexes of all the non zero elements in the tensor torch.Tensor([1, 1, 1, 0, 1]).



In [9]:
tensor_tmp = torch.Tensor([1, 1, 1, 0, 1])

# Get the indexes of non-zero elements
non_zero_indexes = tensor_tmp.nonzero()
print(non_zero_indexes)

tensor([[0],
        [1],
        [2],
        [4]])


Exercise 6: Create a random tensor of size (3,1) and then horizonally stack 4 copies together.



In [10]:
tensor_tmp1=torch.rand(3,1)
torch.cat([tensor_tmp1]*4,dim=1)

tensor([[0.4392, 0.4392, 0.4392, 0.4392],
        [0.1415, 0.1415, 0.1415, 0.1415],
        [0.1523, 0.1523, 0.1523, 0.1523]])

Exercise 7: Return the batch matrix-matrix product of two 3 dimensional matrices (a=torch.rand(3,4,5), b=torch.rand(3,5,4)).



In [11]:
# Create two random matrices
a = torch.rand(3, 4, 5)
b = torch.rand(3, 5, 4)

# Calculate the batch matrix-matrix product
torch.bmm(a, b)

tensor([[[1.4059, 1.6204, 2.0652, 1.3883],
         [1.2535, 1.1014, 1.6485, 1.1453],
         [0.9741, 1.0626, 1.4171, 0.9004],
         [1.1459, 0.7250, 1.2905, 0.7139]],

        [[1.2585, 1.4272, 2.1518, 1.2795],
         [1.2566, 1.6002, 1.6887, 1.1969],
         [1.3057, 1.7112, 1.7848, 1.2747],
         [1.2043, 1.4618, 1.7711, 1.1125]],

        [[1.5032, 1.8352, 1.6611, 1.2697],
         [1.2045, 1.4156, 1.1663, 0.9869],
         [1.4525, 1.6554, 2.0159, 1.2713],
         [1.3805, 1.5274, 1.7990, 1.0454]]])

Exercise 8: Return the batch matrix-matrix product of a 3D matrix and a 2D matrix (a=torch.rand(3,4,5), b=torch.rand(5,4)).



In [12]:
# Create two random matrices
a = torch.rand(3, 4, 5)
b = torch.rand(5, 4)

# Calculate the batch matrix-matrix product
torch.matmul(a, b)

tensor([[[0.7508, 0.7015, 0.5671, 1.0457],
         [1.4711, 1.8979, 1.5328, 2.2001],
         [1.1700, 1.3335, 0.7247, 1.6213],
         [1.2827, 1.4309, 0.9070, 1.5566]],

        [[0.9822, 1.3576, 1.0460, 1.5155],
         [0.4780, 0.8131, 1.2121, 1.2729],
         [0.5999, 0.8415, 1.2803, 1.7439],
         [1.0176, 1.3580, 1.3349, 1.5591]],

        [[1.2904, 1.4802, 0.7767, 1.4424],
         [1.2067, 1.3716, 1.3407, 1.7021],
         [0.7770, 1.0073, 0.4465, 0.9254],
         [0.5888, 0.8920, 1.3792, 1.5051]]])

Exercise 9: Create a 1x1 random tensor and get the value inside of this tensor as a scalar. No tensor.

In [13]:
tensor_tmp2 = torch.rand(1,1)
scalar = tensor_tmp2.item()
print(scalar)

0.7896783351898193


Exercise 10: Create a 2x1 tensor and have it require a gradient. Have $x$, this tensor, hold [-2, 1]. Set $y=x_1^2 + x_2^2$ and get the gradient of y wirht respect to $x_1$ and then $x_2$.

In [14]:
x=torch.tensor([[-2.0],[1.0]],requires_grad=True)
y=x[0]**2+x[1]**2
y.backward()
print("gradient of y with x1:",x.grad[0].item())
print("gradient of y with x2:",x.grad[1].item())

gradient of y with x1: -4.0
gradient of y with x2: 2.0


Exercise 11: Check if cuda is available (it shuld be if in the Runtime setting for colab you choose the GPU). If it is, move $x$ above to a CUDA device. Create a new tensor of the same shape as $x$ and put it on the cpu. Try and add these tensors. What happens. How do you fix this?

In [15]:
# Check if cuda is available
device = "cuda" if torch.cuda.is_available() else "cpu"
print("CUDA is available!")  if torch.cuda.is_available() else 0

x = x.to(device)
y = torch.randn(2, 1)
y = y.cpu()
y + x

CUDA is available!


RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

If the tensors are not both in cuda (GPU) or CPU, the error will be reported. To fix this error, I need to make sure that the two tensors are in the same device (both in GPU or CPU).