In [1]:
import torch

# 1

- rank 1 tensor: tensor with one dimension, a.k.a. a vector
- rank 2 tensor: tensor with 2 dimensions, a.k.a. a matrix
- rank 3 tensor: tensor with 3 dimensions, a vector of matrixes, or a matrix of vectors

In [2]:
rank1_tensor = torch.rand(10)

In [3]:
rank2_tensor = torch.rand(8, 14)

In [4]:
rank3_tensor = torch.rand(5, 6, 7)

Notice the shapes here are the same as the parameters of `torch.rand`:

In [6]:
rank1_tensor.shape, rank2_tensor.shape, rank3_tensor.shape

(torch.Size([10]), torch.Size([8, 14]), torch.Size([5, 6, 7]))

# 2

Define the tensor `x` and use `requires_grad_()` because we want to compute gradients w.r.t. `x`:

In [6]:
x = torch.Tensor([-10, 10, 8]).requires_grad_()

Compute the sigmoid function:

In [7]:
sigmoid = 1 / (1+torch.exp(-x))

Add up the outputs:

In [8]:
# s = sigmoid(x[0])+sigmoid(x[1])+sigmoid(x[2])
s = sigmoid.sum()

`s.backward()` computes derivatives of `s` w.r.t. the tensor `x`. This means computing `3` derivatives, one w.r.t. `x[0]`, one w.r.t. `x[1]`, one w.r.t `x[2]`.
Since `s` is equal to `sigmoid(x[0])+sigmoid(x[1])+sigmoid(x[2])`, computing the derivative w.r.t. `x[1]` will only consider the second term, since the first and third don't depend on `x[1]`, so we end up with the correct derivatives, even though we used `sum()`:

In [9]:
s.backward()

We can find the computed derivatives in `x.grad`:

In [11]:
x.grad

tensor([4.5396e-05, 4.5396e-05, 3.3524e-04])

# 3

Define the matrixes:

In [13]:
A = torch.Tensor([[1, 2, -3], [4, 5, 10]])
B = torch.Tensor([[10, 11, -2], [13, -3, 8]])

Hadamard product is just element by element product (element-wise product):

In [14]:
A*B

tensor([[ 10.,  22.,   6.],
        [ 52., -15.,  80.]])

Adding up `10`:

In [15]:
a = 10
B + a

tensor([[20., 21.,  8.],
        [23.,  7., 18.]])

In [16]:
A.T, B.T

(tensor([[ 1.,  4.],
         [ 2.,  5.],
         [-3., 10.]]),
 tensor([[10., 13.],
         [11., -3.],
         [-2.,  8.]]))

# 4

In [17]:
A = torch.Tensor([[1, 2, 3], [4, 5, 6]])
B = torch.Tensor([[10, 11], [-2, 0], [1, -3]])

In [18]:
A, B

(tensor([[1., 2., 3.],
         [4., 5., 6.]]),
 tensor([[10., 11.],
         [-2.,  0.],
         [ 1., -3.]]))

Multiplying means matrix multiplication, which is implemented by the `@` operator:

In [19]:
A @ B

tensor([[ 9.,  2.],
        [36., 26.]])

Frobenius norm is just the usual matrix norm, implemented by `.norm()`:

In [21]:
A.norm(), B.norm()

(tensor(9.5394), tensor(15.3297))