# Cheatsheet 3. Matrices: Matrix Multiplication

This code and text is derived from Mike X Cohen's course on linear algebra. For more information, see https://www.udemy.com/linear-algebra-theory-and-implementation/?couponCode=LINALGPX7  

However: 
- Formulas in this particular cheatsheet added by me to enhance the material.   
- Code for manual calculation of all 4 perspectives is written by me. 
- Covariance section was added by me

# Table of contents
- [Notation](#notation)  
- [Matrix-matrix Multiplication: 4 perspectives](#matmat)
    - [Element perspective](#element)
    - [Layer Perspective](#layer)
    - [Column Perspective](#column)
    - [Row Perspective](#row)
- [Multiplication with Diagonal Matrices](#diag)
- [Order of Operations](#order)
- [Matrix-Vector Multiplication](#matvec)  
- [Covariance Matrix](#covariance)
- [Hadamard Multiplication](#hadamard)  
- [Frobenius Dot Product](#frobenius)
- [Matrix Division](#division)

<a id = "notation"></a>
# Notation

**Some terminology:**
$A \cdot B$ can be read as: 
- A left-multiplies B  
- B right-multiplies A 

**Condition:** The number of columns in the first matrix should be equal to the number of rows in the scond matrix. E.g., if we have $\mathbf{A} \in R^{M \times N}$ and $\mathbf{B} \in R^{K \times L}$, then we can only multiply A by B if $N=K$. We can also say that **inner dimensions** should match. 

**Notation:**  
In these notebooks I tried to enhance the course material by writing everything in a formal way in addition to examples. There is no standard notation for the matrix column, so I'm used one that I saw in several articles about maching learning:   
- $A_{i,:}$ - i-th row 
- $A_{:,i}$ - i-th column  

Other notation I've seen is $A_{i*}$ or $A_{i,*}$ for the row and $A_{*i}$ or $A_{*,i}$ for the column.


**Other conventions (if other is not specified):**. 
- M - number of rows of matrix A
- N - number of columns of matrix B
- K - number of columns of matrix A, number of rows of matrix B

<a id = "matmat"></a>
# Matrix-matrix Multiplication: 4 perspectives

Our test matrices: 
$$\mathbf{A} = {\begin{bmatrix}
4&2&-7\\
9&6&1\\
7&3&7\\
\end{bmatrix}}, \mathbf{B} = {\begin{bmatrix}
1&2&3\\
4&7&2\\
3&5&8\\
\end{bmatrix}}$$

In [82]:
import numpy as np
A = np.array([[4,2,-7],[9,6,1],[7,3,7]])
B = np.array([[1,2,3],[4,7,2],[3,5,8]])

In [83]:
C = np.dot(A,B)
C

array([[ -9, -13, -40],
       [ 36,  65,  47],
       [ 40,  70,  83]])

<a id='element'></a>
## Element perspective

$$c_{ij} = \sum_{k=1}^K a_{ik} b_{kj} = \mathbf{A_{i,:}} \cdot \mathbf{B_{:,j}}$$
Each element $c_{ij}$ is calculated by multiplying row $A_{i,:}$ by column $B_{:,j}$

$${\begin{bmatrix}
0&1\\
2&3\\
\end{bmatrix}} {\begin{bmatrix}
a&b\\
c&d\\
\end{bmatrix}} = {\begin{bmatrix}
0a+1c&0b+1d\\
2a+3c&2b+3d\\
\end{bmatrix}}$$

Example:  
Let's calculate manually one of the elements: say $c_{3,2}$, which should be equal to 35: 

In [84]:
C_elemwise = np.zeros((A.shape[0],B.shape[1]))
for i in range(A.shape[0]):
    for j in range(B.shape[1]):
        C_elemwise[i,j] = np.dot(A[i,:],B[:,j])

In [85]:
# check the result
C_elemwise-C

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

<a id="layer"></a>
## Layer Perspective 

We can compute the matrix product layer by layer where each layer has rank 1. Unlinke in the element perspective, we multiply column $a_i$ by row $b_i$, which is an outer product. So essentialy our resulting matrix $\mathbf{C}$ for $\mathbf{A} \cdot \mathbf{B}$ is a sum of all outer products:
$$\mathbf{C} = \sum_{i=1}^M \mathbf{A}_{:,i} \mathbf{B}_{i,:}$$

$${\begin{bmatrix}
0&1\\
2&3\\
\end{bmatrix}} {\begin{bmatrix}
a&b\\
c&d\\
\end{bmatrix}} = {\begin{bmatrix}
0a&0b\\
2a&2b\\
\end{bmatrix}}+  {\begin{bmatrix}
1c&1d\\
3c&3d\\
\end{bmatrix}} = {\begin{bmatrix}
0a+1c&0b+1d\\
2a+3c&2b+3d\\
\end{bmatrix}}$$

In code: 

In [86]:
C_layered = np.zeros((A.shape[0],B.shape[1]))
# for each column of A
for i in range(A.shape[1]):
    # compute outer product
    C_layered += np.outer(A[:,i],B[i,:])

In [87]:
# check that matrices are the same
C_layered-C

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

<a id="column"></a>
## Column Perspective 

We can also define matrix multiplication as linear weighted **combination** of the **columns of the left matrix** where the **weights (or scalars to combine these columns) come from the right matrix**.  

This perspective is very useful in machine learning and statistics. The columns of the left matrix are often used to store regressors, while the right matrix stores coefficients. 

To obtain the n-th column of the resulting matrix C we need to take the n-th column of matrix B and use it's elements as scalar multipliers for the corresponding columns of matrix A. So the n-th column of the resulting matrix $\mathbf{C}$ for $\mathbf{A} \cdot \mathbf{B}$ is the combination of the columns of the first matrix $\mathbf{A}$ scaled by n-th column of the seconf matrix - $\mathbf{B}$.  
$$\mathbf{C}_{:,n} = \sum_{i=1}^K b_{i,n} \mathbf{A}_{:,i}$$

$${\begin{bmatrix}
0&1\\
2&3\\
\end{bmatrix}} {\begin{bmatrix}
a&b\\
c&d\\
\end{bmatrix}} = 
{\begin{bmatrix}
a{\begin{bmatrix}
0\\
2\\
\end{bmatrix}} + c{\begin{bmatrix}
1\\
3\\
\end{bmatrix}}& b{\begin{bmatrix}
0\\
2\\
\end{bmatrix}} + d{\begin{bmatrix}
1\\
3\\
\end{bmatrix}}\\
\end{bmatrix}}= {\begin{bmatrix}
0a+1c&0b+1d\\
2a+3c&2b+3d\\
\end{bmatrix}}$$

In code:

In [88]:
C_column_perspective = []
for j in range(B.shape[1]):
    colj = B[:,j]
    Cj = np.zeros(A.shape[0])
    for i in range(B.shape[0]):
        Cj += colj[i]*A[:,i]    
    C_column_perspective.append(Cj)
# in the end we need to transpose, since we've been appending columns to a list which is interpreted as rows
C_column_perspective = np.transpose(C_column_perspective)

In [89]:
#check that the results are identical: 
C - C_column_perspective

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

<a id="row"></a>
## Row Perspective 

In this case we scale rows of the second matrix by the rows of the first matrix. The n_th row of the resulting matrix $\mathbf{C}$ for $\mathbf{A} \cdot \mathbf{B}$ is then calculated as: 
$$\mathbf{C}_{n,:} = \sum_{i=1}^K a_{n,i} B_{n,:}$$

$${\begin{bmatrix}
0&1\\
2&3\\
\end{bmatrix}} {\begin{bmatrix}
a&b\\
c&d\\
\end{bmatrix}} = 
{\begin{bmatrix}
0 {\begin{bmatrix}
a&b\\
\end{bmatrix}}+1 {\begin{bmatrix}
c&d\\
\end{bmatrix}}\\
2 {\begin{bmatrix}
a&b\\
\end{bmatrix}}+3 {\begin{bmatrix}
c&d\\
\end{bmatrix}}\\
\end{bmatrix}}
= {\begin{bmatrix}
0a+1c&0b+1d\\
2a+3c&2b+3d\\
\end{bmatrix}}$$

In [90]:
C_row_perspective = []
for j in range(A.shape[0]):
    rowj = A[j,:]
    Cj = np.zeros(B.shape[1])
    for i in range(A.shape[1]):
        Cj+=rowj[i]*B[i,:]
    C_row_perspective.append(Cj)

In [91]:
# check the result
C_row_perspective-C

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

<a id="diag"></a>
# Multiplication with Diagonal Matrices

When we multiply by a diagonal matrix, we're essentially scaling the corresponding columns by the numbers on the diagonal. So the n-th column of the resulting matrix $\mathbf{C}$ for $\mathbf{A} \cdot \mathbf{D}$, where $\mathbf{D}$ is the diagonal matrix: 

$$\mathbf{C}_{:,n} = d_{n,n} \mathbf{A}_{:,n}$$, 

$${\begin{bmatrix}
1&2&3\\
4&5&6\\
7&8&9\\
\end{bmatrix}}{\begin{bmatrix}
a&0&0\\
0&b&0\\
0&0&c\\
\end{bmatrix}}={\begin{bmatrix}
a1&b2&c3\\
a4&b5&c6\\
a7&b8&c9\\
\end{bmatrix}}$$

It can be illustrated with the column perspective: 

$${\begin{bmatrix}
1&2&3\\
4&5&6\\
7&8&9\\
\end{bmatrix}}{\begin{bmatrix}
a&0&0\\
0&b&0\\
0&0&c\\
\end{bmatrix}}={\begin{bmatrix}
a{\begin{bmatrix}
1\\
4\\
7\\
\end{bmatrix}} + \vec{0} + \vec{0}& \vec{0} + b{\begin{bmatrix}
2\\
5\\
8\\
\end{bmatrix}} +\vec{0}&
\vec{0} + \vec{0} + c{\begin{bmatrix}
3\\
6\\
9\\
\end{bmatrix}}\\
\end{bmatrix}}={\begin{bmatrix}
a{\begin{bmatrix}
1\\
4\\
7\\
\end{bmatrix}}& b{\begin{bmatrix}
2\\
5\\
8\\
\end{bmatrix}}&c{\begin{bmatrix}
3\\
6\\
9\\
\end{bmatrix}}\\
\end{bmatrix}}={\begin{bmatrix}
a1&b2&c3\\
a4&b5&c6\\
a7&b8&c9\\
\end{bmatrix}}$$

If we change the order of multiplication (i.e. left-multiply by the diagonal matrix), then we will be scaling the rows rather than columns. For the resulting matrix $\mathbf{C}$ for $\mathbf{D} \cdot \mathbf{A}$, where $\mathbf{D}$ is the diagonal matrix, it will look like this:  

$$C_{n,:} = d_{n,n} A_{n,:}$$

$${\begin{bmatrix}
a&0&0\\
0&b&0\\
0&0&c\\
\end{bmatrix}}{\begin{bmatrix}
1&2&3\\
4&5&6\\
7&8&9\\
\end{bmatrix}}={\begin{bmatrix}
a1&a2&a3\\
b4&b5&b6\\
c7&c8&c9\\
\end{bmatrix}}$$

It can be illustrated with the row perspective: 

$${\begin{bmatrix}
a&0&0\\
0&b&0\\
0&0&c\\
\end{bmatrix}}{\begin{bmatrix}
1&2&3\\
4&5&6\\
7&8&9\\
\end{bmatrix}}={\begin{bmatrix}
a{\begin{bmatrix}
1&2&3\\
\end{bmatrix}} + \vec{0} + \vec{0}\\ 
\vec{0} + b{\begin{bmatrix}
4&5&6\\
\end{bmatrix}} +\vec{0}\\
\vec{0} + \vec{0} + c{\begin{bmatrix}
7&8&9\\
\end{bmatrix}}\\
\end{bmatrix}}={\begin{bmatrix}
a{\begin{bmatrix}
1&2&3\\
\end{bmatrix}}\\ 
b{\begin{bmatrix}
4&5&6\\
\end{bmatrix}}\\
c{\begin{bmatrix}
7&8&9\\
\end{bmatrix}}\\
\end{bmatrix}}={\begin{bmatrix}
a1&a2&a3\\
b4&b5&b6\\
c7&c8&c9\\
\end{bmatrix}}$$

<a id="order"></a>
# Order of operations

One of the most important rules for matrices: 
$$(\mathbf{L}\mathbf{I}\mathbf{V}\mathbf{E})^T = \mathbf{E}^T\mathbf{V}^T\mathbf{I}^T\mathbf{L}^T$$

2D Proof: 
$$A = \begin{bmatrix}
a&b\\
c&d\\
\end{bmatrix}, 
B = \begin{bmatrix}
e&f\\
g&h\\
\end{bmatrix}$$

$$\Big(\begin{bmatrix}
a&b\\
c&d\\
\end{bmatrix}\begin{bmatrix}
e&f\\
g&h\\
\end{bmatrix}\Big)^T = 
\Big(\begin{bmatrix}
ae+bg&af + bh\\
ce+dg&cf+dh\\
\end{bmatrix}\Big)^T=
\begin{bmatrix}
ae+bg&ce+dg\\
af+bh&cf+dh\\
\end{bmatrix}$$

OR: 
$$
\begin{bmatrix}
e&g\\
f&h\\
\end{bmatrix}
\begin{bmatrix}
a&c\\
b&d\\
\end{bmatrix}=
\begin{bmatrix}
ae+bg&ce+dg\\
af+bh&cf+dh\\
\end{bmatrix}
$$

<a id="matvec"></a>
# Matrix-vector Multiplication

$\mathbf{A} \omega = v$ - produces a column vector ($\mathbf{A}$ is $m \times n$, $\omega$ is $n \times 1$, the result is $m \times 1$).  
$\omega^T A = v$ - produces a row vector ($\mathbf{A}$ is $m \times n$, $\omega$ is $n \times 1$, the result is $1 \times m$).

## Special case: symmetric matrix
For symmetric matrix it doesn't matter if we pre-multiply or post-multiply. The results will be identical (except for the type of vector, of course - i.e. column or row).  

Let S be symmetric matrix:

$$\mathbf{S} \omega = v$$

Transpose both sides: 

$$(\mathbf{S}\omega)^T = v^T$$

$$\omega^T \mathbf{S}^T = v^T$$

And since $\mathbf{S}$ is symmetric, we know that $\mathbf{S}^T = \mathbf{S}$, so: 

$$\omega^T \mathbf{S} = v^T$$

We can say that $\mathbf{S} \omega$ produces the weighted combinations of the columns of $\mathbf{S}$, while $\omega^T \mathbf{S}$ produces weighted combinations of the columns of $\mathbf{S}$:

$$\mathbf{S} \omega = \begin{bmatrix}
a&b\\
b&c\\
\end{bmatrix}
\begin{bmatrix}
1\\
2\\
\end{bmatrix} = {\begin{bmatrix}
1{\begin{bmatrix}
a\\
b\\
\end{bmatrix}} + 2{\begin{bmatrix}
b\\
c\\
\end{bmatrix}}\\
\end{bmatrix}} = \begin{bmatrix}
a+2b\\
b+2c\\
\end{bmatrix}$$

$$\omega^T \mathbf{S}  = \begin{bmatrix}
1&2\\
\end{bmatrix} \begin{bmatrix}
a&b\\
b&c\\
\end{bmatrix} = {\begin{bmatrix}
1{\begin{bmatrix}
a&b\\
\end{bmatrix}} + 2{\begin{bmatrix}
b&c\\
\end{bmatrix}}\\
\end{bmatrix}} = \begin{bmatrix}
a+2b&b+2c\\
\end{bmatrix}$$

# 2D Transformation matrices

When we left-multiply a vector by a matrix, we're transforming it. Normally, it changes both the size and it's orientation. Example: 

$$\begin{bmatrix}
2&-1\\
1&3\\
\end{bmatrix}\begin{bmatrix}
2\\
3\\ 
\end{bmatrix} =\begin{bmatrix}
1\\
11\\
\end{bmatrix}$$

If we want to simply rotate the vector without chaning it's magnitude, we use the following transformation matrix: 

$$\begin{bmatrix}
\cos(\theta)&-\sin(\theta)\\
\sin(\theta)&\cos(\theta)\\
\end{bmatrix}$$

**Orthogonal matrix** - matrix, where all the columns are mutually orthogonal (the dot product is 0). 

## Special case: matrix that stretches but doesn't rotate

$$\begin{bmatrix}
2&1\\
2&3\\
\end{bmatrix}\begin{bmatrix}
1\\
2\\ 
\end{bmatrix} =\begin{bmatrix}
4\\
8\\
\end{bmatrix}$$

The resulting vector is just 4 times the initial vector: $\begin{bmatrix}4\\8\end{bmatrix} = 4 \begin{bmatrix}1\\2\end{bmatrix}$. 

This is **not** a property of that particual vector and this is **not** a property of that particular matrix. It's their combination that is special. The vector $\begin{bmatrix}1\\2\end{bmatrix}$ is called the **eigenvector**, and 4 is the **eigenvalue**.

### Fundamental Eigenvalue Euqation 

$$\mathbf{A} v = \lambda v$$

# Additive (0) and Multiplicative (I) Matrix Identities

Multiplicative identity: 

$$\mathbf{A} \mathbf{I} = \mathbf{I} \mathbf{A} = \mathbf{A}$$ 

$$\mathbf{A} + \mathbf{I} \neq \mathbf{A}$$

Additive identity: 

$$\mathbf{A} \mathbf{0} = \mathbf{0} \mathbf{A} \neq \mathbf{A}$$ 

$$\mathbf{A} + \mathbf{0} = \mathbf{A}$$

## Additive and Multiplicative Symmetric Matrices

We can generate a symmetric matrix using addition or multiplication. 
With addtion we can do it as follows (division by 2 is optional): 

$$(\mathbf{A}+\mathbf{A}^T)/2$$ 


$$\mathbf{A}+\mathbf{A}^T = 
\begin{bmatrix}
a&b&c\\
d&e&f\\
g&h&i\\
\end{bmatrix}+\begin{bmatrix}
a&e&g\\
b&e&h\\
c&f&i\\
\end{bmatrix} = 
\begin{bmatrix}
2a&b+d&c+g\\
d+b&2e&f+h\\
g+c&h+f&2i\\
\end{bmatrix}$$

With multiplication we can obtain a symmetric matrix in the following ways:

$$S = A^T A$$

$$S = A A^T$$

However, $A^T A \neq A A^T$

**Proof** that $A^T A$ is symmetric:

$(A^T A)^T = A^T A^{TT} = A^T A \implies A^T A$ is symmetric by definition.

The same can be shown for $A A^T: (A A^T)^T = A^{TT} A^T = A A^T \implies A A^T$ is symmetric by definition.

<a id="covariance"></a>
# Covariance Matrix 
These operations are actually the basis for calulating the covariance matrix, which in its general form looks like that: 

$$
S = \begin{bmatrix}
s_{1,1}^2&s_{1,2}&s_{1,3}&\cdots&s_{1,p}\\
s_{2,1}&s_{2,2}^2&s_{2,3}&\cdots&s_{2,p}\\
s_{3,1}&s_{3,2}&s_{3,3}^2&\cdots&s_{3,p}\\
\vdots&\vdots&\vdots&\ddots&\vdots\\
s_{p,1}&s_{p,2}&s_{p,3}&\cdots&s_{p,p}^2\\
\end{bmatrix}
$$

Where each element is calculates as follows:  

$$s_{j,j}^2 = \frac{1}{n-1} \sum_{i=1}^n (x_{i,j}-\bar{x}_j)^2$$

$$s_{j,k}^2 = \frac{1}{n-1} \sum_{i=1}^n (x_{i,j}-\bar{x}_j)(y_{i,k}-\bar{y}_k)$$

With:
- $s_{j,j}$ - variance of the j-th variable  
- $s_{j,k}$ - covariance  
- $\bar{x}_j$ - mean of the j-th column. 

**Note:** (n-1) is because of the Bessel's correction


Practical example and code:  
Let's say, we have 5 observations and 3 features. So this will give us a $5 \times 4$ matrix (because this is how this data is usually stored). We're looking for the covariance between the variables, which are in this case are our features. So the resulting covariance matrix should be $3 \times 3$, as we have 3 such variables. Which means that we will have to transpose the matrix. 

Also, to be able to produce the above matrix, we want to calculate the mean for each variable (features) and subtract this mean from each varue of this feature - this will give us the matrix of deviations from the mean. And that we just have to multiply the transpose of the resulting matrix by the resulting matrix. 

In [155]:
# sample matrix 
G = np.array([[2.2,4.1,9.1], [3.3,5.1,8.8], [1.8, 4.6, 8.0], [2.5, 4.7, 9.3], [2.0, 4.6, 9.0]])
G

array([[2.2, 4.1, 9.1],
       [3.3, 5.1, 8.8],
       [1.8, 4.6, 8. ],
       [2.5, 4.7, 9.3],
       [2. , 4.6, 9. ]])

In [157]:
# subtracn the column means: 
M = G - G.mean(axis=0)
M

array([[-0.16, -0.52,  0.26],
       [ 0.94,  0.48, -0.04],
       [-0.56, -0.02, -0.84],
       [ 0.14,  0.08,  0.46],
       [-0.36, -0.02,  0.16]])

In [173]:
# multiply and divide by n-1
covG = np.dot(np.transpose(M),M)/(G.shape[0]-1)
covG

array([[ 0.343 ,  0.141 ,  0.0995],
       [ 0.141 ,  0.127 , -0.026 ],
       [ 0.0995, -0.026 ,  0.253 ]])

In [176]:
# check the result, using the built-in numpy function:
covG - np.cov(np.transpose(G))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

<a id="hadamard"></a>
# Hadamard (element-wise) Multiplication

$$\begin{bmatrix}
1&2&3\\
1&2&3\\
1&2&3\\
\end{bmatrix} 
\odot
\begin{bmatrix}
a&b&c\\
d&e&f\\
g&h&i\\
\end{bmatrix} = \begin{bmatrix}
1a&2b&3c\\
4d&5e&6f\\
7g&8h&9i\\
\end{bmatrix}$$ 

Hadamard multiplication is associative and commutative: 
$$\mathbf{A} \odot \mathbf{B} = \mathbf{B} \odot \mathbf{A}$$
$$ (\mathbf{A} \odot \mathbf{B}) \odot \mathbf{C} = \mathbf{A} \odot (\mathbf{B} \odot \mathbf{C})$$

**Special case Multiplication of two symmetric matrices**

Let's say we have two symmetric matrices - $S_1$, $S_2$. Generally, 
$S_1 \cdot S_2$ is not symmetric. However, there is a special case for (and only for!) $2 \times 2$ matrices when it is symmetric. It is symmetric if: $$\text{diag} (S_1) = \text{diag} (S_2)$$

**Special case for diagonal matrices**  

$$D_1 \cdot D_2 = D_1 \odot D_2$$

Python example:

In [185]:
Z = np.diag([1,2,3,17])
F = np.diag([4,5,6,21])

In [188]:
np.dot(Z,F)-Z*F

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

<a id="frobenius"></a>
# Frobenius Dot Product

There are 3 ways to calculate Frobenius dot product: 

### Method 1

1. Hadamard multiplication 
2. Sum all elements

$$<A,B>_F = \sum_{i=1}^m \sum_{j=1}^n a_{i,j} b_{i,j}$$

In Python: 

In [204]:
sum(sum(A*B))

159

### Method 2  

1. Vectorize (concatenate column by column into one vector)  
2. Compute vector dot product  

$$<A,B>_F = (\text{vec}(A))^T(\text{vec}(B))$$  

In Python: 

In [205]:
np.dot(A.flatten(),B.flatten())

159

### Method 3 

(this one is considered to be the most efficient)

1. Mulitply matrices
2. Take the trace 

$$\text{tr}(A^T B)$$  

In Python:

In [209]:
np.dot(np.transpose(A),B).trace()

159

Numpy way: 

In [229]:
np.tensordot(A,B).tolist()

159

Frobenius dot product is used to calculate the Frobenius norm: 

$$\text{norm}(A) = \sqrt{\sum_{i=1}^m \sum_{j=1}^n a_{i,j}^2} = \sqrt{<A,A>_F} = \sqrt{\text{tr}(A^T A)}$$

In Numpy it can be done simply as: 

In [231]:
np.linalg.norm(A)

17.146428199482248

**Frobenius norm** is often used a measure of distance between matrices. 

<a id="division"></a>
# Matrix Division

Matrix division as $\frac{\mathbf{A}}{\mathbf{B}}$ doesn't exist. We can only talk about the inverse matrices - in this case division is the same as multiplying by the inverse matrix. 
Also, we can perform element-wise division (Hadamard division): 
$$\begin{bmatrix}
1&3&5\\
2&1&5\\
3&2&0\\
\end{bmatrix} \oslash
\begin{bmatrix}
2&3&1\\
1&3&2\\
3&4&5\\
\end{bmatrix}=\begin{bmatrix}
0.5&1&5\\
2&0.33&2.5\\
1.5&0.5&0\\
\end{bmatrix}$$

In Python

In [234]:
A/B

array([[ 4.        ,  1.        , -2.33333333],
       [ 2.25      ,  0.85714286,  0.5       ],
       [ 2.33333333,  0.6       ,  0.875     ]])