<a href="https://colab.research.google.com/github/arkincognito/ML-from-scrap/blob/master/Computational_graph.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## This notebook is based on an assignment from DS School's Deep Learning Course.

In [6]:
import numpy as np

Computational Graph enables Partial Derivation without explicitly representing the partial derivative.

In this notebook, I'll write computational graph code for

*Nodes(Log, Square, Trigonometric Functions)
and
*Loss Functions(Mean Square Error, Cross Entropy)

## Computational Graph

Computational Graph는 represents the calculation process, where nodes represents calculation and edges represent input/outputs. Computational Graph visually represents chain rule and allows efficient calculation of derivitives.

<img src="http://drive.google.com/uc?export=view&id=14x4zQpEEatgMb1W0BY47lXKjM5haZq1x" width="600">


> **Forward Propagation**
>-  **Forward propagation** goes through the neural network from the input nodes to output. **Forward propagation** is represented as blue edges in the graph above.
  

> **Back Propagation**
>-  The process calculating gradient by applying chain rule through out the network. The term 'back' propagation comes from propagating the loss from the end of the network. **Back Propagaton** is represented as red edges in the graph above.

# 1. Multiply Node

Multiply Node function is represented as $z=f(x,y)=x\times y $.

Then the gradient of $z$ can be written as  

$$
\nabla z = \left ( \frac{\partial z}{\partial x},\frac{\partial z}{\partial y}  \right ) = \left ( \frac{\partial (xy)}{\partial x},\frac{\partial (xy)}{\partial y}  \right ) = (y,x)
$$

<img src="http://drive.google.com/uc?export=view&id=1pQ9HFmr9_31daD8YIhO72IQtr5Kvj2GP" width="600">


> **Forward Propagation**
>-  Multiply Node passes through the multiplied value of the two input values.
  

> **Back Propagation**
>-  Multiply Node returns the multiplied value of gradient from the next node, and input value from the other input edge. Node from x will have y multiplied to the gradient.

In [7]:
# Define Multiply class and forward, back propagation methods.
class Multiply:
    # f(x,y) = xy
    def forward(self, x, y):
        self.x = x
        self.y = y
        return self.x * self.y
    # df/dx = y, df/dy = x
    def backward(self):
        dx = self.y
        dy = self.x
        return dx, dy

Let's check if the class works properly.

If the input values are 10 and 3, then the return should be:
* Forward: $10 \times 3 = 30$
* Backward: $3, 10$

In [8]:
multiply = Multiply()
forward2 = multiply.forward(10,3)
forward2

30

In [9]:
multiply.backward()

(3, 10)

We can see that the Node works fine.

##1-1 Multiply in General
More generally, multiplication can be written as $z = f(x_0, x_1, ... , x_{n-1}) = $$\prod_{i=0}^{n-1} x_i$.

Then the gradient of $z$ is:

$$ \nabla z = \left ( \frac{\partial z}{\partial x_0},\frac{\partial z}{\partial x_1}, ... \frac{\partial z}{\partial x_{n-1}} \right ) = \left ( \frac{\partial (\prod_{i=0}^{n-1} x_i)}{\partial x_0},\frac{\partial (\prod_{i=0}^{n-1} x_i)}{\partial x_1}, ... ,\frac{\partial (\prod_{i=0}^{n-1} x_i)}{\partial x_{n-1}}  \right )$$

Thus,

$$ \frac{\partial z}{\partial x_j} = \frac{\prod_{i=0}^{n-1} x_i} {x_j} $$

Note that the input x now is a list of numbers.

In [10]:
class MultiplyAll:
    def forward(self, array):
        self.x = np.array(array)
        return np.product(self.x)
    def backward(self):
        return np.array(list(np.prod(self.x) / i for i in self.x))

In [11]:
multimult = MultiplyAll()
print(multimult.forward([5.0,2.5,3.0]))
print(multimult.backward())

37.5
[ 7.5 15.  12.5]


# 2. Log Function Node

If $z=f(x)=log(x)$, then the gradient of $z$ is


$$
\nabla z = \left ( \frac{\partial z}{\partial x}\right ) = \left ( \frac{\partial (log(x))}{\partial x}\right ) = \frac{1}{x}
$$

<img src="http://drive.google.com/uc?export=view&id=1YKou4QvGFwjWV_zzk0H8cMS0vGT7lFMB" width="600">

> **Forward Propagation**
>-  Log node returns the log value of the input.

> **Back Propagation**
>-  Log node returns the inverse value of the input.

In [12]:
# Define Log class and its forward, back propagation methods.
class Log:
    
    # f(x) = log(x)
    def forward(self, x):
        self.x = x
        return np.log(self.x)
        
    # df/dx = 1/x
    def backward(self):
        return 1.0/self.x

Let's create log node object, set **x=2** and see if forward, backward propagation values are returned correctly as **0.693** and **0.5**.

In [13]:
log = Log()
x = 2
log.forward(x), log.backward()

(0.6931471805599453, 0.5)

# 3. Square Node

If $z=f(x)=x^2$, then the gradient of $z$ is


$$
\nabla z = \left ( \frac{\partial z}{\partial x}\right ) = \left ( \frac{\partial x^2}{\partial x}\right ) = 2x
$$

<img src="http://drive.google.com/uc?export=view&id=1C67JqOdnW4dxbLBVHLxF8Hvrr2NoTth3" width="600">


> **Forward Propagation**
>-  Square node returns the square value of the input.  

> **Back Propagation**
>-  Square node returns the value of $2x$.

In [14]:
# Define Square class and its forward, back propagation methods.

class Square:
    
    # f(x) = x^2
    def forward(self, x):
        self.x = x
        return self.x ** 2
    # df/dx = 2x
    def backward(self):
        return 2 * self.x

In [15]:
square = Square()
x =5
square.forward(x), square.backward()

(25, 10)

#4. Trigonometric Functions(sin, cos, tan)

##4-1. Sin Node
If $z=f(x)=sin(x)$, then the gradient of $z$ is


$$
\nabla z = \left ( \frac{\partial z}{\partial x}\right ) = \left ( \frac{\partial sin(x)}{\partial x}\right ) = cos(x)
$$



<img src="http://drive.google.com/uc?export=view&id=1zdwScJP5hbMTtJETFVo1P9F8ZACoU-fc" width="600">

> **Forward Propagation**
>-  Sin node returns the sine value of the input.
  

> **Back Propagation**
>-  Sin node returns the cosine value of the input.

In [16]:
# Define Sin class and its forward, back propagation methods.
class Sin:
    
    def forward(self, x):
        self.x = x
        return np.sin(x)

    def backward(self):
        return np.cos(x)

In [17]:
sin = Sin()
x = np.pi / 3
sin.forward(x), sin.backward()

(0.8660254037844386, 0.5000000000000001)

##4-2. Cos Node
If $z=f(x)=cos(x)$, then the gradient of $z$ is


$$
\nabla z = \left ( \frac{\partial z}{\partial x}\right ) = \left ( \frac{\partial cos(x)}{\partial x}\right ) = -sin(x)
$$

<img src="http://drive.google.com/uc?export=view&id=11AEEdSWOmBjGQhzW2-BDboXFjCvd-hi8" width="600">


> **Forward Propagation**
>-  Cos node returns the cosine value of the input.
  

> **Back Propagation**
>-  Cos node returns the -sine value of the input.

In [18]:
class Cos:
    def forward(self, x):
        self.x = x
        return np.cos(x)
    
    def backward(self):
        return -np.sin(x)

In [19]:
cos = Cos()
x = np.pi/3
cos.forward(x), cos.backward()

(0.5000000000000001, -0.8660254037844386)

##4-3. Tan Node
If $z=f(x)=tan(x)$, then the gradient of $z$ is


$$
\nabla z = \left ( \frac{\partial z}{\partial x}\right ) = \left ( \frac{\partial tan(x)}{\partial x}\right ) = \frac{1}{ cos(x)^2}
$$

<img src="http://drive.google.com/uc?export=view&id=16Q1LDQ4L2uY8dkqKgMRZP8UILcKtG10L" width="600">


> **Forward Propagation**
>- Tan node returns the tangent value of the input.
  

> **Back Propagation**
>- Tan node returns $\frac{1}{ cos(x)^2}$.

In [20]:
class Tan:
    
    def forward(self, x):
        self.x = x
        return np.tan(x)
    
    def backward(self):
        return 1 / (np.cos(x)**2)

In [21]:
tan = Tan()
x = np.pi/3
tan.forward(x), tan.backward()

(1.7320508075688767, 3.9999999999999982)

#5. Loss Function
Let's define the loss funtions through computational graph and get the partial derivative of the loss funcions.

##5-1. MSE Loss Function Node

MSE(Mean Squared Error) Loss Function can be represented as following.

$$
MSE=\frac{1}{2} (\hat{y}-y)^2
$$


n represents the total number of data, $y^{(i)}$ represents label of i'th data.

 predicted value of ${y}^{(i)}$: $\hat{y}^{(i)}=\sigma(w^Tx^{(i)}+b)$
 and Cost Function of MSE can be represented as


$$
MSE(Cost)=\frac{1}{n}\sum_{i=1}^{n}\frac{1}{2} (\hat{y}^{(i)}-y^{(i)})^2
$$

Cost Function is just an average of loss function, so I'll only define Loss Function.

Below shows the Computatioanl Graph of the MSE Loss Function above.

<img src="http://drive.google.com/uc?export=view&id=1dqfw6Tpeo6CkNsns8IMU3Bw1ItkAUNZ6" width="600">


To find the optimal weights $w$ and bias $b$ for the Gradient Descent algorithm, we need the partial derivative values of the Loss by $w$ and $b$.  Thus, our MSE node should return the partial derivative value of $\hat{y}$, $\frac{\partial L}{\partial \hat{y}} $

MSE Node can be represented as following diagram.

<img src="http://drive.google.com/uc?export=view&id=16dbnhIahsvpVh1eAF4uavMA8mZIMkWhF" width="600">


To define MSE Node, we'll first define Add Node.
Forward will return the sum of two inputs, and backward will return 1 to each path.

In [22]:
class Add:
    def forward(self, x, y):
        self.x = x
        self.y = y
        return x + y
    def backward(self):
        return 1.0, 1.0

Let's define MSE Node.

When $\hat{y}$ and $y$ are given as inputs, forward propagation will return $ MSE=\frac{1}{2} (\hat{y}-y)^2 $ and backward propagation will return  $\frac{\partial L}{\partial \hat{y}} $.

In [23]:
class MSE:
    mult1 = Multiply()
    mult2 = Multiply()
    add = Add()
    sq = Square()
    def forward(self, y_predict, y_actual):
        self.y_predict = y_predict
        self.y_actual = y_actual
        fw1 = self.mult1.forward(self.y_actual, -1)
        fw2 = self.add.forward(fw1, self.y_predict)
        fw3 = self.sq.forward(fw2)
        fw4 = self.mult2.forward(fw3, 1/2)
        return fw4
    
    def backward(self):
        bw1 = self.mult2.backward()[0]
        bw2 = self.sq.backward() * bw1
        bw3 = self.add.backward()[1] * bw2
        return bw3

Create MSE node object, set **y_predict = 1, y_actual = 4** and check if the forward, backward values are **4.5, -3.0**.

In [24]:
mse = MSE()
mse.forward(1,4), mse.backward()

(4.5, -3.0)

## 5-2. Cross Entropy Loss Function Node

Cross entropy node can be represented as: 

<img src="http://drive.google.com/uc?export=view&id=15wLtFHEr884R75PZrl7TLTUj2izJkWM-" width="600">


###Cross Entropy
If total number of class(label) is C, $y_{c}$ represents c'th label value, predicted value of ${y}_{c}$ : $\hat{y}_{c}=\sigma(w^Tx_{c}+b)$,

then: 

$$
\text{Cross Entropy Loss} = -\sum_{c=1}^{C}  y_{c} \times log(\hat{y}_{c})
$$

###Binary Classification
For Binary Classification problem, $C = 2$, thus:

$$ \text{Cross Entropy Loss(binary)} = - y \times log(\hat{y}) - (1-{y}) \times log(1-\hat{y})$$

### 3 Class Classification
For 3 Class Multi-class classification, $C = 3$, thus:

$$
\text{Cross Entropy Loss} = -\sum_{c=1}^{3}  y_{c} \times log(\hat{y}_{c})
$$

3 Class Multi-class classification Cross Entropy Node can be represented as computational graph as below. 

<img src="http://drive.google.com/uc?export=view&id=1HFLacOYnnN5ihTN9pm1_LEL1CB-1JNny" width="600">


As before, let's start by defining Add function for multiple inputs.

In [25]:
class AddAll():
    def forward(self, array):
        self.array = np.array(array)
        return self.array.sum()
    
    def backward(self):
        return np.array([1 for i in range (self.array.shape[0])])

In [26]:
class CE:
    log = Log()
    mult = Multiply()
    addAll = AddAll()
    lastMult = Multiply()
    def forward(self, y_predict, y_actual):
        self.y_predict = np.array(y_predict)
        self.y_actual = np.array(y_actual)
        log_y_predict = self.log.forward(self.y_predict)
        log_yp_times_y = self.mult.forward(log_y_predict, self.y_actual)
        added = self.addAll.forward(log_yp_times_y)
        multiplied = self.lastMult.forward(added, -1)
        return multiplied
    
    def backward(self):
        bw1 = self.lastMult.backward()[0]
        bw2 = bw1 * self.addAll.backward()
        bw3 = bw2 * self.mult.backward()[0]
        bw4 = bw3 * self.log.backward()
        return bw4

Build a CE Node object, set **y_predict_ce = [2, 3, 4] , y_ce = [0, 1, 2]** and see if the forward and backward value show the correct values: **-3.8712 and (0, -0.33, -0.5)**.

In [27]:
ce = CE()
y_predict_ce = [2, 3, 4]
y_ce = [0, 1, 2]
print(f'Cross Entropy: {ce.forward(y_predict_ce, y_ce)}\nBackward Propagation: {ce.backward()}')

Cross Entropy: -3.8712010109078907
Backward Propagation: [ 0.         -0.33333333 -0.5       ]
