2.3. Linear Algebra

By now, we can load datasets into tensors and manipulate these tensors with basic mathematical operations. To start building sophisticated models, we will also need a few tools from linear algebra. This section offers a gentle introduction to the most essential concepts, starting from scalar arithmetic and ramping up to matrix multiplication.

In [86]:
import tensorflow as tf

In [87]:
x = tf.constant(3.0)
y = tf.constant(2.0)

x + y, x * y, x / y, x**y

(<tf.Tensor: shape=(), dtype=float32, numpy=5.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=6.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=1.5>,
 <tf.Tensor: shape=(), dtype=float32, numpy=9.0>)

In [88]:
x = tf.range(3)
x

<tf.Tensor: shape=(3,), dtype=int32, numpy=array([0, 1, 2], dtype=int32)>

In [89]:
x[2]

<tf.Tensor: shape=(), dtype=int32, numpy=2>

In [90]:
len(x)

3

In [91]:
x.shape

TensorShape([3])

In [92]:
A = tf.reshape(tf.range(6), (3, 2))
A

<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[0, 1],
       [2, 3],
       [4, 5]], dtype=int32)>

In [93]:
tf.transpose(A)

<tf.Tensor: shape=(2, 3), dtype=int32, numpy=
array([[0, 2, 4],
       [1, 3, 5]], dtype=int32)>

In [94]:
A = tf.constant([[1, 2, 3], [2, 0, 4], [3, 4, 5]])
A == tf.transpose(A)

<tf.Tensor: shape=(3, 3), dtype=bool, numpy=
array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])>

In [95]:
tf.reshape(tf.range(24), (2, 3, 4))

<tf.Tensor: shape=(2, 3, 4), dtype=int32, numpy=
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]], dtype=int32)>

In [96]:
A = tf.reshape(tf.range(6, dtype=tf.float32), (2, 3))
B = A  # No cloning of A to B by allocating new memory
A, A + B

(<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
 array([[0., 1., 2.],
        [3., 4., 5.]], dtype=float32)>,
 <tf.Tensor: shape=(2, 3), dtype=float32, numpy=
 array([[ 0.,  2.,  4.],
        [ 6.,  8., 10.]], dtype=float32)>)

In [97]:
x = tf.range(3, dtype=tf.float32)
x, tf.reduce_sum(x)

(<tf.Tensor: shape=(3,), dtype=float32, numpy=array([0., 1., 2.], dtype=float32)>,
 <tf.Tensor: shape=(), dtype=float32, numpy=3.0>)

2.3.13. Exercises

1. Prove that the transpose of the transpose of a matrix is the matrix itself: (**A**<sup>T</sup>)<sup>T</sup> = **A**

In [98]:
A = tf.reshape(tf.range(9, dtype=tf.float32), (3, 3))
A

<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[0., 1., 2.],
       [3., 4., 5.],
       [6., 7., 8.]], dtype=float32)>

In [99]:
tf.transpose(A)

<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[0., 3., 6.],
       [1., 4., 7.],
       [2., 5., 8.]], dtype=float32)>

In [100]:
tf.transpose(A) == A

<tf.Tensor: shape=(3, 3), dtype=bool, numpy=
array([[ True, False, False],
       [False,  True, False],
       [False, False,  True]])>

In [101]:
tf.transpose(tf.transpose(A)) == A

<tf.Tensor: shape=(3, 3), dtype=bool, numpy=
array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])>

2. Given two matrices **A** and **B**, show that sum and transposition commute: **A**<sup>T</sup> + **B**<sup>T</sup> = (**A** + **B**)<sup>T</sup>

In [102]:
B = A*2
B

<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[ 0.,  2.,  4.],
       [ 6.,  8., 10.],
       [12., 14., 16.]], dtype=float32)>

In [103]:
tf.transpose(B) + tf.transpose(A)

<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[ 0.,  9., 18.],
       [ 3., 12., 21.],
       [ 6., 15., 24.]], dtype=float32)>

In [104]:
tf.transpose(A + B)

<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[ 0.,  9., 18.],
       [ 3., 12., 21.],
       [ 6., 15., 24.]], dtype=float32)>

In [105]:
tf.transpose(A + B) == tf.transpose(A) + tf.transpose(B)

<tf.Tensor: shape=(3, 3), dtype=bool, numpy=
array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])>

3. Given any square matrix **A**, is **A** + **A**<sup>T</sup> always symmetric? Can you prove the result by using only the results of the previous two exercises?

In [106]:
A, tf.transpose(A)

(<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
 array([[0., 1., 2.],
        [3., 4., 5.],
        [6., 7., 8.]], dtype=float32)>,
 <tf.Tensor: shape=(3, 3), dtype=float32, numpy=
 array([[0., 3., 6.],
        [1., 4., 7.],
        [2., 5., 8.]], dtype=float32)>)

In [107]:
A + tf.transpose(A)

<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[ 0.,  4.,  8.],
       [ 4.,  8., 12.],
       [ 8., 12., 16.]], dtype=float32)>

In [108]:
tf.transpose(A + tf.transpose(A)) == (A + tf.transpose(A))

<tf.Tensor: shape=(3, 3), dtype=bool, numpy=
array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])>

In [109]:
tf.transpose(A + tf.transpose(A)) == (tf.transpose(A) + tf.transpose(tf.transpose(A)))

<tf.Tensor: shape=(3, 3), dtype=bool, numpy=
array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])>

4. We defined the tensor X of shape (2, 3, 4) in this section. What is the output of len(X)? Write your answer without implementing any code, then check your answer using code.

In [110]:
X = tf.reshape(tf.range(24), (2, 3, 4))
X

<tf.Tensor: shape=(2, 3, 4), dtype=int32, numpy=
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]], dtype=int32)>

In [111]:
len(X)

2

5. For a tensor X of arbitrary shape, does len(X) always correspond to the length of a certain axis of X? What is that axis?

A: The first dimension, or X axis.

6. Run A / A.sum(axis=1) and see what happens. Can you analyze the results?

In [112]:
tf.reduce_sum(A, axis=1), A / tf.reduce_sum(A, axis=1)

(<tf.Tensor: shape=(3,), dtype=float32, numpy=array([ 3., 12., 21.], dtype=float32)>,
 <tf.Tensor: shape=(3, 3), dtype=float32, numpy=
 array([[0.        , 0.08333334, 0.0952381 ],
        [1.        , 0.33333334, 0.23809524],
        [2.        , 0.5833333 , 0.3809524 ]], dtype=float32)>)

7. When traveling between two points in downtown Manhattan, what is the distance that you need to cover in terms of the coordinates, i.e., in terms of avenues and streets? Can you travel diagonally?

A: 
The sum of difference of X and Y assuming one can't move diagonally between blocks.
- In downtown Manhattan, streets are rectilinearly distributed, and one has to travel from point $(x_1, y_1) to (x_2, y_2)$ along a trajectory composed of only horizontal and vertical lines.
  - In such a case, the shortest distance to travel is $|x_1 - x_2|+|y_1 - y_2|$.
- For a given space with dimension n1, the [Manhattan distance](https://en.wikipedia.org/wiki/Taxicab_geometry) between two points p and q is defined as $d_T$(p, q) := $\displaystyle\sum_{i=1}^n |p_i - q_i|$.
  - This is applicable in signal processing, Machine Learning models, etc.

In [113]:
manhattan = tf.ones((10, 10))
manhattan #assuming all blocks are evenly spaced and starting in the top left corner

<tf.Tensor: shape=(10, 10), dtype=float32, numpy=
array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], dtype=float32)>

8. Consider a tensor of shape (2, 3, 4). What are the shapes of the summation outputs along axes 0, 1, and 2?

In [114]:
X = tf.reshape(tf.range(24), (2, 3, 4))
X, tf.reduce_sum(X, axis=0), tf.reduce_sum(X, axis=1), tf.reduce_sum(X, axis=2)

(<tf.Tensor: shape=(2, 3, 4), dtype=int32, numpy=
 array([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]],
 
        [[12, 13, 14, 15],
         [16, 17, 18, 19],
         [20, 21, 22, 23]]], dtype=int32)>,
 <tf.Tensor: shape=(3, 4), dtype=int32, numpy=
 array([[12, 14, 16, 18],
        [20, 22, 24, 26],
        [28, 30, 32, 34]], dtype=int32)>,
 <tf.Tensor: shape=(2, 4), dtype=int32, numpy=
 array([[12, 15, 18, 21],
        [48, 51, 54, 57]], dtype=int32)>,
 <tf.Tensor: shape=(2, 3), dtype=int32, numpy=
 array([[ 6, 22, 38],
        [54, 70, 86]], dtype=int32)>)

9. Feed a tensor with three or more axes to the linalg.norm function and observe its output. What does this function compute for tensors of arbitrary shape?

In [115]:
X = tf.reshape(tf.range(24, dtype=tf.float32), (2, 3, 4))
X, tf.linalg.norm(X)

(<tf.Tensor: shape=(2, 3, 4), dtype=float32, numpy=
 array([[[ 0.,  1.,  2.,  3.],
         [ 4.,  5.,  6.,  7.],
         [ 8.,  9., 10., 11.]],
 
        [[12., 13., 14., 15.],
         [16., 17., 18., 19.],
         [20., 21., 22., 23.]]], dtype=float32)>,
 <tf.Tensor: shape=(), dtype=float32, numpy=65.75712585449219>)

A: tf.linalg.norm() computes the Frobenius norm for matrices (2D tensors) and the L2 norm for tensors with more than two dimensions. Thus, the previous result represents the magnitude of the vector in a high-dimensional space formed by flattening the tensor X. Applying this function “forcefully”:

In [116]:
tf.sqrt(tf.reduce_sum(tf.square(X)))

<tf.Tensor: shape=(), dtype=float32, numpy=65.75712585449219>

10. Consider three large matrices, say **A** $\in \mathbb{R}^{2^{10}\times2^{16}}$, **B** $\in \mathbb{R}^{2^{16}\times2^{5}}$ and **C** $\in \mathbb{R}^{2^{5}\times2^{14}}$, initialized with Gaussian random variables. You want to compute the product **ABC**. Is there any difference in memory footprint and speed, depending on whether you compute (**AB**)**C** or **A**(**BC**). Why?