# About
- Date: 28/09/2022
- Pages: 76

## Exercise 1
We have $A_{i, j} = (A^T)_{j, i}$ and $(A^T)_{j, i} = ((A^T)^T)_{i, j}$.

Which means $A_{i,j} = ((A^T)^T)_{i, j} \Rightarrow A = (A^T)^T$.

## Exercise 2
For every $(i, j)$ we have: $$\left(A^T + B^T\right)_{i, j} = \left(A^T\right)_{i, j} + \left(B^T\right)_{i, j} = A_{j, i} + B_{j,i} = \left(A + B\right)_{j, i} = \left(\left(A+B\right)^T\right)_{i, j}$$
$\Rightarrow A^T + B^T = \left(A+B\right)^T$

## Exercise 3
$$\left(A + A^T\right)^T = A^T + \left(A^T\right)^T = A + A^T$$
$\Rightarrow$ Symmetry proved

## Exercise 4
`len(X)` would return `2` I think.

Let's test it:

In [212]:
from mxnet import np

X = np.arange(24).reshape(2, 3, 4)
len(X), len(X[0]), len(X[0][0]), X[1][2][3]

(2, 3, 4, array([23.]))

Yes, correct. Same with the ideology of `vector` in C++.

## Exercise 5
`len(X)` corresponds to the length of the first axis (axis 0).
## Exercise 6

In [213]:
A = np.arange(24).reshape(4, 6)
print(A.sum(axis=1).shape)
# A / A.sum(axis=1)  --> This gives error

(4,)


This gives error when broadcasting as the shape of `A.sum(axis=1)` is `(4,)` which is a 1-dimension array and it can't be broadcasted.
The correct code should be:

In [214]:
A / A.sum(axis=1, keepdims=True)

array([[0.        , 0.06666667, 0.13333334, 0.2       , 0.26666668,
        0.33333334],
       [0.11764706, 0.13725491, 0.15686275, 0.1764706 , 0.19607843,
        0.21568628],
       [0.13793103, 0.14942528, 0.16091955, 0.1724138 , 0.18390805,
        0.1954023 ],
       [0.14634146, 0.15447155, 0.16260162, 0.17073171, 0.17886178,
        0.18699187]])

## Exercise 7
I don't really understand the question though.

## Exercise 8
- `axis=0`: `(3, 4)`
- `axis=1`: `(2, 4)`
- `axis=2`: `(2, 3)`

In [215]:
X.sum(axis=0).shape, X.sum(axis=1).shape, X.sum(axis=2).shape

((3, 4), (2, 4), (2, 3))

## Exercise 9

In [216]:
B = np.arange(120).reshape(1, 2, 3, 4, 5)
np.linalg.norm(B)

array(754.20154)

This should be the same as $\sqrt{ \sum_{i=0}^{119}\left(i^2\right) }$:

In [217]:
np.sqrt(int((B**2).sum()))

754.2015645701088

It just the square root of sum of all square of all element in the tensors

## Exercise 10
- $\left(AB\right)C$: $2^{10}\times 2^{16} \times 2^5 + 2^{10}\times 2^{5}\times 2^{16} = 2^{32}$
- $A\left(BC\right)$: $2^{16}\times 2^{5}\times 2^{16} + 2^{10}\times 2^{16}\times 2^{16} = 2^{42}+2^{37}$. This costs more because of matrix multiplication complexity.

## Exercise 11

In [218]:
import time

A = np.random.normal(size=(2**10, 2**16))
B = np.random.normal(size=(2**16, 2**5))
C = np.random.normal(size=(2**5, 2**16))
start_time = time.time()
np.dot(A, B)
print("--- AB takes:            %s seconds ---" % (time.time() - start_time))
start_time = time.time()
np.dot(A, C.T)
print("--- AC^T takes:          %s seconds ---" % (time.time() - start_time))
C = B.T
start_time = time.time()
np.dot(A, C.T)
print("--- AC (C=B^T) takes:    %s seconds ---" % (time.time() - start_time))

--- AB takes:            0.00017571449279785156 seconds ---
--- AC^T takes:          0.0004589557647705078 seconds ---
--- AC (C=B^T) takes:    0.00021696090698242188 seconds ---


It's faster like twice. I think maybe it's because the compiler knows `C.T` would equal to `B` anyway so it doesn't explicitly run `C^T` anymore.

## Exercise 12

In [219]:
A = np.full((100, 200), 1)
B = np.full((100, 200), 2)
C = np.full((100, 200), 3)
D = np.stack(( A, B, C ), axis=-1)
D[:, :,1]

array([[2., 2., 2., ..., 2., 2., 2.],
       [2., 2., 2., ..., 2., 2., 2.],
       [2., 2., 2., ..., 2., 2., 2.],
       ...,
       [2., 2., 2., ..., 2., 2., 2.],
       [2., 2., 2., ..., 2., 2., 2.],
       [2., 2., 2., ..., 2., 2., 2.]])

In [220]:
B

array([[2., 2., 2., ..., 2., 2., 2.],
       [2., 2., 2., ..., 2., 2., 2.],
       [2., 2., 2., ..., 2., 2., 2.],
       ...,
       [2., 2., 2., ..., 2., 2., 2.],
       [2., 2., 2., ..., 2., 2., 2.],
       [2., 2., 2., ..., 2., 2., 2.]])