## Vector multiplication 

In [2]:
import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = np.dot(a, b)
print(c)

32


## Matrix multiplication 

In [3]:
import numpy as np

a = np.array([[1, 2],
              [3, 4]])
b = np.array([[5, 6],
              [7, 8]])
c = np.dot(a, b)
print(c)

[[19 22]
 [43 50]]


When multiplying 2 matrix we have to check the both matrix shape

## Backpropagation of `Matrix Multplication`

### $\mathbf{y} = \mathbf{xW}$

- $\mathbf{x} : 1 \times D$
- $\mathbf{W} : D \times H$
- $\mathbf{y} : 1 \times H$

Let's asume we do some calculation on $\mathbf{y}$ and make it to scalar $L$!

## `x`

## $\frac{\partial L}{\partial x_{i}} = \sum\limits_{\scriptsize{j}} \frac{\partial L}{\partial y_{j}} \frac{\partial y_{j}}{\partial x_{i}}$

when we unfold $y_{j}$

### $y_{j} = x_{1}W_{1j} + x_{2}W_{2j} + \cdot\cdot\cdot + x_{i}W_{ij} + \cdot\cdot\cdot + x_{D}W_{Dj}$
## $\therefore \frac{\partial y_{j}}{\partial x_{i}} = W_{ij}$

We can rewrite the above formula

## $\frac{\partial L}{\partial x_{i}} = \sum\limits_{\scriptsize{j}} \frac{\partial L}{\partial y_{j}} \frac{\partial y_{j}}{\partial x_{i}} = \sum\limits_{\scriptsize{j}} \frac{\partial L}{\partial y_{j}} W_{ij}$

----

### Let's summarize

## $\frac{\partial L}{\partial x_{i}} = \sum\limits_{\scriptsize{j}} \frac{\partial L}{\partial y_{j}} W_{ij} = \frac{\partial L}{\partial \mathbf{y}} \cdot W_{i}$
## $\frac{\partial L}{\partial \mathbf{x}} = \frac{\partial L}{\partial \mathbf{y}} \cdot W^{T}$

<br><br>

## `W`

## $\frac{\partial L}{\partial W_{ij}} = \sum\limits_{\scriptsize{k}} \frac{\partial L}{\partial y_{k}} \frac{\partial y_{k}}{\partial W_{ij}}$

when we unfold $y_{k}$

### $y_{k} = x_{1}W_{1k} + x_{2}W_{2k} + \cdot\cdot\cdot + x_{i}W_{ik} + \cdot\cdot\cdot + x_{D}W_{Dk}$
## $\therefore \frac{\partial y_{k}}{\partial W_{ij}} = \mathbb{I}(k = j)x_{i}$

### $\mathbb{I}$ is `identicator function`

---

We can rewrite the above formula

## $\frac{\partial L}{\partial W_{ij}} = \sum\limits_{\scriptsize{k}} \frac{\partial L}{\partial y_{k}} \frac{\partial y_{k}}{\partial W_{ij}} = \sum\limits_{\scriptsize{k}} \frac{\partial L}{\partial y_{k}} \mathbb{I}(k = j)x_{i}$

----

### Let's summarize

## $\frac{\partial L}{\partial W_{ij}} = \sum\limits_{\scriptsize{k}} \mathbb{I}(k = j) \frac{\partial L}{\partial y_{k}}$
- $(1 \times 1) = (1 \times 1)(1 \times 1)$

## $\frac{\partial L}{\partial W_{j}} = \mathbf{x}^{\scriptsize{T}} \sum\limits_{\scriptsize{k}} \mathbb{I}(k = j) \frac{\partial L}{\partial y_{k}}$
- $(D \times 1) = (D \times 1)(1 \times 1)$

## $\downarrow$ 

## $\frac{\partial L}{\partial W} = \mathbf{x}^{\scriptsize{T}}
\bigg[
\sum\limits_{\scriptsize{k}} \scriptsize{\mathbb{I}(k = 1)} \frac{\partial L}{\partial y_{k}}
\sum\limits_{\scriptsize{k}} \mathbb{I}(k = 2) \frac{\partial L}{\partial y_{k}}
\cdot\cdot\cdot
\sum\limits_{\scriptsize{k}} \mathbb{I}(k = H) \frac{\partial L}{\partial y_{k}}
\bigg]
$
- $(D \times H) = (D \times 1)(1 \times H)$

## $\frac{\partial L}{\partial W} = \mathbf{x}^{\scriptsize{T}}
\bigg[
\frac{\partial L}{\partial y_{1}}
\frac{\partial L}{\partial y_{2}}
\cdot\cdot\cdot
\frac{\partial L}{\partial y_{H}}
\bigg]
$
- $(D \times H) = (D \times 1)(1 \times H)$

## $\frac{\partial L}{\partial \mathbf{W}} = \mathbf{x}^{\scriptsize{T}} \cdot \frac{\partial L}{\partial \mathbf{y}}$
- $(D \times H) = (D \times 1)(1 \times H)$

## When `x` dimension is `N` $\times$ `D`

### $\mathbf{y} = \mathbf{xW}$

- $\mathbf{x} : N \times D$
- $\mathbf{W} : D \times H$
- $\mathbf{y} : N \times H$


## $\frac{\partial L}{\partial \mathbf{x}} = \frac{\partial L}{\partial \mathbf{y}} \cdot W^{T}$
- $(N \times D) = (N \times H)(H \times D)$

## $\frac{\partial L}{\partial \mathbf{W}} = \mathbf{x}^{\scriptsize{T}} \cdot \frac{\partial L}{\partial \mathbf{y}}$
- $(D \times H) = (D \times N)(N \times H)$

## Implement `Matrix Multplication`

In [4]:
from dezero import Function

class MatMul(Function):
    def forward(self, x, W):
        y = x.dot(W)
        return y
    
    def backward(self, gy):
        x, W = self.inputs
        gx = matmul(gy, W.T)
        gW = matmul(x.T, gy)
        return gx, gW
    
def matmul(x, W):
    return MatMul()(x, W)

In [1]:
from dezero import Variable
import dezero.functions as F

x = Variable(np.random.randn(2, 3))
W = Variable(np.random.randn(3, 4))
y = F.matmul(x, W)
y.backward()

print(x.grad.shape)
print(W.grad.shape)

(2, 3)
(3, 4)
