Below we will try and fit a Logisitc Regression Model step by step for the XOR problem.
Fill in the code where there is a `_FILL_` specified. For this model, we have $x_1$ and $x_2$ are either 0/1 each and $y = x_1 + x_2 - 2x_1x_2$. Notice that this is True (1) if $x_1 = 1$ and $x_2 = 0$ OR $x_1 = 0$ and $x_2 = 1$; $y$ is zero otherwise.

In [1]:
import torch.nn as nn
import torch

In [2]:
x_data = [[0, 0], [0, 1], [1, 0], [1, 1]]
y_data = [[0], [1], [1], [0]]
x_data = torch.Tensor(x_data)
y_data = torch.Tensor(y_data)

In [3]:
# Define each tensor to be 1x1 and have them require a gradient for tracking; these are parameters
alpha = torch.rand(1, requires_grad = True)
beta_1 = torch.rand(1, requires_grad = True)
beta_2 = torch.rand(1, requires_grad = True)

In [4]:
lr = 0.01

for epoch in range(10):
  for x, y in zip(x_data, y_data):

    # Have z be beta_2*x[0] + beta_1*x[1] + alpha
    z = beta_2 * x[0] + beta_1 * x[1] + alpha

    # Push z through a nn.Sigmoid layer to get the p(y=1)
    a = nn.Sigmoid()(z)

    # Write the loss manually between y and a
    loss = -y * torch.log(a) - (1 - y) * torch.log(1 - a)

    # Get the loss gradients; the gradients with respect to alpha, beta_1, beta_2
    loss.backward()

    # Manually update the gradients
    # What we do below is wrapped within this clause because weights have required_grad=True but we don't need to track this in autograd
    with torch.no_grad():
        # Do an update for each parameter
        alpha -= lr * alpha.grad
        beta_1 -= lr * beta_1.grad
        beta_2 -= lr * beta_2.grad

        # Manually zero the gradients after updating weights
        alpha.grad = None
        beta_1.grad = None
        beta_2.grad = None

  # Manually get the accuracy of the model after each epoch
  with torch.no_grad():
    print(f'Epoch: {epoch}')
    y_pred = []
    loss = 0.0

    for x, y in zip(x_data, y_data):
      # Get z
      z = beta_2 * x[0] + beta_1 * x[1] + alpha

      # Get a
      a = nn.Sigmoid()(z)

      # Get the loss
      loss += (-y * torch.log(a) - (1 - y) * torch.log(1 - a)) / 4

      # Get the prediction given a
      y_pred.append(1 if a.item() > 0.5 else 0)

    # Get the current accuracy over 4 points
    y_pred = torch.tensor(y_pred)

    accuracy = (y_pred == y_data.squeeze()).sum() / 4

    # Print the accuracy and the loss
    print('Loss: {} Accuracy: {}'.format(loss.item(), accuracy.item()))

Epoch: 0
Loss: 0.7430764436721802 Accuracy: 0.5
Epoch: 1
Loss: 0.7417216300964355 Accuracy: 0.5
Epoch: 2
Loss: 0.7404038906097412 Accuracy: 0.5
Epoch: 3
Loss: 0.7391220331192017 Accuracy: 0.5
Epoch: 4
Loss: 0.7378755211830139 Accuracy: 0.5
Epoch: 5
Loss: 0.736663281917572 Accuracy: 0.5
Epoch: 6
Loss: 0.7354844808578491 Accuracy: 0.5
Epoch: 7
Loss: 0.7343384027481079 Accuracy: 0.5
Epoch: 8
Loss: 0.7332241535186768 Accuracy: 0.5
Epoch: 9
Loss: 0.7321408987045288 Accuracy: 0.5


In [5]:
import math
dtype = torch.float

# Create Tensors to hold input and outputs.
# By default, requires_grad=False, which indicates that we do not need to
# compute gradients with respect to these Tensors during the backward pass.
x = torch.linspace(-math.pi, math.pi, 2000, dtype=dtype)
y = torch.sin(x)

# Create random Tensors for weights. For a third order polynomial, we need
# 4 weights: y = a + b x + c x^2 + d x^3
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Tensors during the backward pass.
a = torch.randn((), dtype=dtype, requires_grad=True)
b = torch.randn((), dtype=dtype, requires_grad=True)
c = torch.randn((), dtype=dtype, requires_grad=True)
d = torch.randn((), dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(2000):
    # Forward pass: compute predicted y using operations on Tensors.
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # Compute and print loss using operations on Tensors.
    # Now loss is a Tensor of shape (1,)
    # loss.item() gets the scalar value held in the loss.
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item())

    # Use autograd to compute the backward pass. This call will compute the
    # gradient of loss with respect to all Tensors with requires_grad=True.
    # After this call a.grad, b.grad. c.grad and d.grad will be Tensors holding
    # the gradient of the loss with respect to a, b, c, d respectively.
    loss.backward()

    # Manually update weights using gradient descent. Wrap in torch.no_grad()
    # because weights have requires_grad=True, but we don't need to track this
    # in autograd.
    with torch.no_grad():
        a -= learning_rate * a.grad
        b -= learning_rate * b.grad
        c -= learning_rate * c.grad
        d -= learning_rate * d.grad

        # Manually zero the gradients after updating weights
        a.grad = None
        b.grad = None
        c.grad = None
        d.grad = None

print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')

99 1729.8634033203125
199 1205.185791015625
299 841.1913452148438
399 588.433349609375
499 412.75860595703125
599 290.5504455566406
699 205.4640655517578
799 146.17465209960938
899 104.82827758789062
999 75.97254180908203
1099 55.819454193115234
1199 41.734344482421875
1299 31.883529663085938
1399 24.989667892456055
1499 20.162181854248047
1599 16.779708862304688
1699 14.408369064331055
1799 12.745028495788574
1899 11.577713966369629
1999 10.758106231689453
Result: y = 0.04442664608359337 + 0.8437238931655884 x + -0.007664335425943136 x^2 + -0.09147883951663971 x^3


Application 1: Create a 2D tensor and then add a dimension of size 1 inserted at the 0th axis.



In [6]:
x = torch.rand(2, 3)
x.unsqueeze(0)

tensor([[[0.0977, 0.6405, 0.2197],
         [0.0076, 0.6612, 0.6541]]])

Exercise 2: Remove the extra dimension you just added to the previous tensor.



In [7]:
x.squeeze()

tensor([[0.0977, 0.6405, 0.2197],
        [0.0076, 0.6612, 0.6541]])

Exercise 3: Create a random tensor of shape 5x3 in the interval [3, 7)



In [8]:
x = torch.rand(5, 3) * 4 + 3
x

tensor([[4.4375, 4.0747, 3.0686],
        [6.2739, 4.4818, 4.2337],
        [5.4632, 4.4997, 4.3507],
        [6.2013, 6.9733, 4.5900],
        [4.2644, 3.3225, 5.2225]])

Exercise 4: Create a tensor with values from a normal distribution (mean=0, std=1).



In [9]:
x = torch.randn(5, 3)

exercise 5: Retrieve the indexes of all the non zero elements in the tensor torch.Tensor([1, 1, 1, 0, 1]).



In [10]:
x = torch.Tensor([1, 1, 1, 0, 1])
x.nonzero()

tensor([[0],
        [1],
        [2],
        [4]])

Exercise 6: Create a random tensor of size (3,1) and then horizonally stack 4 copies together.



In [11]:
x = torch.rand(3, 1)
x.repeat(1, 4)


tensor([[0.1503, 0.1503, 0.1503, 0.1503],
        [0.7086, 0.7086, 0.7086, 0.7086],
        [0.3414, 0.3414, 0.3414, 0.3414]])

Exercise 7: Return the batch matrix-matrix product of two 3 dimensional matrices (a=torch.rand(3,4,5), b=torch.rand(3,5,4)).



In [12]:
x = torch.rand(3, 4, 5)
y = torch.rand(3, 5, 4)
torch.bmm(x, y)

tensor([[[1.6394, 0.6833, 1.5837, 1.5976],
         [1.1472, 0.3890, 1.0009, 1.2655],
         [1.0815, 0.5244, 1.0628, 1.4311],
         [1.9040, 1.2040, 1.4985, 1.4086]],

        [[0.9391, 1.2059, 1.3412, 0.8154],
         [1.3120, 1.4383, 1.8123, 1.1125],
         [0.5450, 1.4864, 1.3260, 0.6719],
         [0.6573, 1.7938, 1.7714, 1.0374]],

        [[0.9832, 0.5582, 1.1056, 1.0532],
         [1.9757, 1.1756, 2.0718, 1.7928],
         [1.2413, 0.8302, 1.0994, 0.5718],
         [1.7323, 0.9673, 1.6841, 1.2645]]])

Exercise 8: Return the batch matrix-matrix product of a 3D matrix and a 2D matrix (a=torch.rand(3,4,5), b=torch.rand(5,4)).



In [13]:
x = torch.rand(3, 4, 5)
y = torch.rand(5, 4)
torch.bmm(x, y.unsqueeze(0).expand(x.size(0), *y.size()))


tensor([[[0.3375, 0.2907, 0.3181, 0.1812],
         [1.1075, 1.0032, 1.0962, 1.0600],
         [1.4107, 1.2923, 1.7551, 1.4518],
         [1.1264, 0.7016, 1.4619, 0.9141]],

        [[1.1713, 0.7899, 1.4845, 0.5677],
         [1.5807, 1.1571, 1.4543, 1.1874],
         [1.3340, 0.9715, 1.3254, 0.9939],
         [1.3030, 0.6757, 1.9461, 1.0246]],

        [[0.9775, 0.8165, 1.1316, 0.9601],
         [0.9570, 0.9399, 0.8771, 0.8693],
         [0.9590, 0.6913, 0.9538, 0.7193],
         [1.1810, 0.8096, 1.4295, 1.1040]]])

Exercise 9: Create a 1x1 random tensor and get the value inside of this tensor as a scalar. No tensor.

In [14]:
x = torch.rand(1, 1)
x.item()

0.31901270151138306

Exercise 10: Create a 2x1 tensor and have it require a gradient. Have $x$, this tensor, hold [-2, 1]. Set $y=x_1^2 + x_2^2$ and get the gradient of y wirht respect to $x_1$ and then $x_2$.

In [15]:
x = torch.randn(2, 1, requires_grad=True)
y = x[0] ** 2 + x[1] ** 2
y.backward()
x.grad

tensor([[0.7460],
        [0.3626]])

Exercise 11: Check if cuda is available (it shuld be if in the Runtime setting for colab you choose the GPU). If it is, move $x$ above to a CUDA device. Create a new tensor of the same shape as $x$ and put it on the cpu. Try and add these tensors. What happens. How do you fix this?

In [16]:
x = torch.randn(2, 1, requires_grad=True)
if torch.cuda.is_available():
    x = x.cuda()
y = torch.randn(2, 1)
y = y.cpu()
y + x


RuntimeError: ignored