# Arrays with Numpy and PyTorch

This course has so far been (mostly) content to use sparse representations of text data. Next week, that is going to change--we will start looking at ways to transform text into dense vectors.

Today's lecture is going to look at Numpy and PyTorch's arrays (or, in PyTorch language, "tensors"). This is getting ahead of ourselves a bit, since we haven't introduced neural methods yet, but it will be helpful to see some of the "building blocks" in code before we introduce the "ideas" behind neural models themselves.

In [39]:
import numpy as np
import torch

## Dot product and matmul in Numpy

Let's look at one-dimensional array `a`

In [40]:
a = np.arange(10)

a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

and one-dimensional array `b`

In [41]:
b = np.tile(np.array([0, 1]), 5)

b

array([0, 1, 0, 1, 0, 1, 0, 1, 0, 1])

If we take the dot product of these two arrays, what are we doing?

In [42]:
a_dot_b = np.dot(a, b)
a_dot_b.item()

25

In [43]:
def dot(a, b):
    assert a.shape == b.shape and len(a.shape) == 1
    total = 0
    element_prod = []
    for i in range(a.shape[0]):
        prod = (a[i]*b[i]).item()
        element_prod.append(prod)
        total += prod
    return total, element_prod

a_dot_b_loop, elements = dot(a, b)

assert a_dot_b == a_dot_b_loop

elements

[0, 1, 0, 3, 0, 5, 0, 7, 0, 9]

In [44]:
print(a)
print(b)

[0 1 2 3 4 5 6 7 8 9]
[0 1 0 1 0 1 0 1 0 1]


The dot product of `a` and `b` is just the sum of the product of each element in a with the corresponding element in `b`. 

When we put this in a loop, we can see that the dot product of these two arrays is the same as looping over the corresponding elements of each vector, multiplying them together, and accumulating. When we look at `element_prod`, we can see these elementwiise products before summing. 

What about two-dimensional arrays?

In [45]:
A = np.repeat(a[None, :], 3, axis=0)
print(A.shape)
A

(3, 10)


array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

In [46]:
B = np.repeat(b[None, :], 3, axis=0)
print(B.shape)
B

(3, 10)


array([[0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
       [0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
       [0, 1, 0, 1, 0, 1, 0, 1, 0, 1]])

Here, we've repeated vector `a` and `b` three times on the row (0th) axis to build `A` and `B`, respectively.

Can we multiply these matrices together in the same way as we did with dot product above?

In [47]:
try:
    np.dot(A, B)
except ValueError: 
    print("Oops.")

Oops.


Why doesn't this work?

Numpy's `dot` product turns into matrix multiplication or "matmul" when dealing with two 2-d arrays. 

Matrix multiplicaiton is the dot product of the rows in A with the columns in B. This means that, for matrix multiplication to work, the number of columns in A has to be the same as number of rows in B. 

![](images/matrix-multiplication.gif)

[(Image source)](https://notesbylex.com/)

Knowing this, what can we do to make these two matrices "compatible" with one another?

In [48]:
B.T

array([[0, 0, 0],
       [1, 1, 1],
       [0, 0, 0],
       [1, 1, 1],
       [0, 0, 0],
       [1, 1, 1],
       [0, 0, 0],
       [1, 1, 1],
       [0, 0, 0],
       [1, 1, 1]])

In [49]:
B.T.shape

(10, 3)

In [50]:
A.shape[1] == B.T.shape[0]

True

As we can see looking at the shape, `.T` transposes the axes of B. Now,

In [51]:
AB = A @ B.T
AB 

array([[25, 25, 25],
       [25, 25, 25],
       [25, 25, 25]])

The `@` symbol is just matmul, the same as if we had called `np.numpy` on two 2-d arrays or used `np.matmul`. The new matrix `AB` is made up of 3 rows and 3 columns, which correspond to the dot products of the rows of A with the columns of the transpose of B. 

In [52]:
AB.shape

(3, 3)

But!

In [53]:
B.T @ A

array([[ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27]])

Matrix multiplication is not commutative. `A @ B.T` gives us a 3x3 array, but `B.T @ A` is 10x10. If we do the latter, we are multiplying the rows of the transpose of B with the columns in A. That is not the same as what we were doing before!

But why do we care about this?


Let's implement matmul using loops and our `dot` function:

In [54]:
def matmul2d(a, b):
    assert len(a.shape) == len(b.shape) == 2
    a_n, a_d = a.shape
    b_n, b_d = b.shape
    assert a_d == b_n

    mat = np.zeros((a_n, b_d))
    for i in range(a_n):
        for j in range(b_d):
            ai_bj, _ = dot(a[i, :], b[:, j])
            mat[i, j] = ai_bj
    
    return mat

assert np.allclose(matmul2d(A, B.T), A @ B.T)
assert np.allclose(matmul2d(B.T, A), B.T @ A)


This is a pretty complex operation (there's a whole [article on the time complexity of matrix multiplication on Wikipedia](https://en.wikipedia.org/wiki/Computational_complexity_of_matrix_multiplication), if you're interested). Our simple, (two in `matmul`, one in `dot`).

In practical terms, Numpy matmul is very fast compared to Python loops, since it is both optimized and implemented in C. So if you want to get, for example, the pairwise cosine similarity of a bunch of vectors, you can do it like this quickly and without explicitly looping:

In [55]:
fake_counts = np.random.randint(30, size=(3, 10))
fake_counts

array([[28, 20,  4, 15, 24,  1, 14,  9, 18, 12],
       [24,  2, 11,  5,  5, 19, 23,  0, 14,  4],
       [24, 21, 18, 25,  4, 22, 17, 28,  5,  3]])

In [60]:
l2_fake_counts = np.sqrt(
    np.sum(fake_counts**2, axis=1),
)[:, None] # What does this do?

print(l2_fake_counts.shape)

normed_fake_counts = fake_counts / l2_fake_counts

pairwise_nfc = normed_fake_counts @ normed_fake_counts.T

pairwise_nfc


(3, 1)


array([[1.        , 0.70562835, 0.72552703],
       [0.70562835, 1.        , 0.71975844],
       [0.72552703, 0.71975844, 1.        ]])

In [59]:
l2_fake_counts = np.sqrt(
    np.sum(fake_counts**2, axis=1),
)[:, None]
l2_fake_counts

array([[52.41183073],
       [43.0464865 ],
       [59.77457654]])

We'll talk more about how matrix multiplication fits into neural networks later on, but consider these examples:

- Permuting a matrix using another matrix

In [61]:
C = np.arange(15).reshape((5, 3))
C

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [62]:
P = np.array([
    [1, 0, 0],
    [0, 0, 1],
    [0, 1, 0]
])

In [63]:
C @ P # Swap 2nd and 3rd columns

array([[ 0,  2,  1],
       [ 3,  5,  4],
       [ 6,  8,  7],
       [ 9, 11, 10],
       [12, 14, 13]])

In [64]:
C.T

array([[ 0,  3,  6,  9, 12],
       [ 1,  4,  7, 10, 13],
       [ 2,  5,  8, 11, 14]])

In [65]:
P @ C.T # Swap second and third rows

array([[ 0,  3,  6,  9, 12],
       [ 2,  5,  8, 11, 14],
       [ 1,  4,  7, 10, 13]])

- Imagine you have learned (or just made up) some coefficients from a classification or regression model that you want to apply to a matrix of (samples, features):

In [68]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [66]:
C @ np.eye(3)

array([[ 0.,  1.,  2.],
       [ 3.,  4.,  5.],
       [ 6.,  7.,  8.],
       [ 9., 10., 11.],
       [12., 13., 14.]])

In [69]:
fake_counts

array([[28, 20,  4, 15, 24,  1, 14,  9, 18, 12],
       [24,  2, 11,  5,  5, 19, 23,  0, 14,  4],
       [24, 21, 18, 25,  4, 22, 17, 28,  5,  3]])

In [70]:
coefficients = np.array([
    [.1, .1, .3, 0.0, .2, .1, .1, .1, 0.0, 0.0]
])

bias = .05

predictions = fake_counts @ coefficients.T + bias

predictions

array([[13.25],
       [11.15],
       [17.45]])

- Or, what if you have *many* regression/classification models that you want to apply to your samples at the same time?

In [73]:
coefficients2d = np.random.randn(5, fake_counts.shape[1])

bias1d = np.random.randn(1, 5)

coefficients2d.shape

(5, 10)

In [72]:
pred_2d = fake_counts @ coefficients2d.T + bias1d
print("Input shape: ", fake_counts.shape)
print("Output shape: ", pred_2d.shape)
pred_2d

Input shape:  (3, 10)
Output shape:  (3, 5)


array([[-2.40187170e+01,  2.76790507e+01,  8.45469068e+00,
         4.48748455e+01, -8.68258241e+00],
       [-1.97325653e+01, -1.28759558e+01, -3.57315127e-02,
        -8.99755510e+00, -4.87305601e+00],
       [-8.76838795e+01,  1.42565908e+01,  5.90181244e+00,
        -4.07572262e+01,  5.59406733e+00]])

Think about what's going on above. What do we have in the five columns of `pred_2d`? 

This is what is going on in a "fully connected" neural network: we have a weight matrix (`pred_2d` here) that we multiply with some input. It is "fully connected" because each feature in our input is "connected" to each feature in our output (row-wise in the faux-example above). This is like running a bunch of different linear regression models at the same time. More on this soon!

## Introducing `torch`

PyTorch is a machine learning library that is extremely powerful. Because most people who use `torch` are alerady familiar with `numpy`, `torch` implements many of the same methods. 

The `torch.tensor` is directly analogous to the `np.array`:

In [74]:
torch.arange(10)

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [75]:
torch.allclose(torch.tensor(np.arange(10)), torch.arange(10))

True

`Tensor`s, like numpy `arrays`, have a `shape` attribute, but this is just an alias for `size()`, which can be confusing.

In [76]:
a = torch.tensor(a)
a.shape

torch.Size([10])

In [77]:
a.size()

torch.Size([10])

This looks effectively to how it would in Numpy:

In [78]:
space1d = torch.linspace(0, 1, 5)
space2d = space1d[:, None] @ space1d[None, :]
space2d

tensor([[0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0625, 0.1250, 0.1875, 0.2500],
        [0.0000, 0.1250, 0.2500, 0.3750, 0.5000],
        [0.0000, 0.1875, 0.3750, 0.5625, 0.7500],
        [0.0000, 0.2500, 0.5000, 0.7500, 1.0000]])

As in Numpy, you can get a boolean array like so:

In [79]:
space2d > .2

tensor([[False, False, False, False, False],
        [False, False, False, False,  True],
        [False, False,  True,  True,  True],
        [False, False,  True,  True,  True],
        [False,  True,  True,  True,  True]])

So if we want to set all elements below some threshold to 0.0, we can just do the following:

In [80]:
mask = (space2d > .2).type(torch.float32)
space2d.T @ mask

tensor([[0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.2500, 0.5625, 0.5625, 0.6250],
        [0.0000, 0.5000, 1.1250, 1.1250, 1.2500],
        [0.0000, 0.7500, 1.6875, 1.6875, 1.8750],
        [0.0000, 1.0000, 2.2500, 2.2500, 2.5000]])

There are two key attributes of a `Tensor` that are not shared with numpy `array`s:
- `requires_grad` records operations on the tensor for autograd. We will briefly discuss this when we go over neural networks. 
- `device` indicates which device on your machine the tensor is stored on. 
    - If you want to use your CPU, it should be `"cpu"`. 
    - If you have a Mac with Apple Silicon, you can specify `"mps"`. 
    - If you have an Nvidia graphics card, you can use `"cuda"`. 

To check if you have a device besides `"cpu"`, you can run 

In [81]:
torch.backends.mps.is_available()

False

In [82]:
torch.cuda.is_available()

True

To move a `Tensor` (or other PyTorch object) from one device to another, you can use the `to()` method:

In [83]:
a = a.type(torch.float32).to("cuda")
a

tensor([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.], device='cuda:0')

When performing any computation with multiple tensors, they all must be on the same device. This takes a while to get used to, and is a common source of errors when starting out with PyTorch.

In [84]:
b = torch.tensor(b, dtype=torch.float32).to("cpu")

try:
    b = a[:, None] @ b[None, :]
except RuntimeError:
    print("Tensors aren't on the same device!")

Tensors aren't on the same device!


In [85]:
b = b.to("cuda")

a[:, None] @ b[None, :]

tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 1., 0., 1., 0., 1., 0., 1., 0., 1.],
        [0., 2., 0., 2., 0., 2., 0., 2., 0., 2.],
        [0., 3., 0., 3., 0., 3., 0., 3., 0., 3.],
        [0., 4., 0., 4., 0., 4., 0., 4., 0., 4.],
        [0., 5., 0., 5., 0., 5., 0., 5., 0., 5.],
        [0., 6., 0., 6., 0., 6., 0., 6., 0., 6.],
        [0., 7., 0., 7., 0., 7., 0., 7., 0., 7.],
        [0., 8., 0., 8., 0., 8., 0., 8., 0., 8.],
        [0., 9., 0., 9., 0., 9., 0., 9., 0., 9.]], device='cuda:0')