In [1]:
from icecream import ic
from abc import ABC, abstractmethod
import numpy as np
from numpy import linalg as LA
import graphviz

# Modules
- To better organize our project, we outsourced certain components that support the main code presented in this notebook.

### Atomic Operations, and Expression Rree

- Each atomic operation is implemented as a separate class.  
- Mathematical functions can be built by chaining these atomic operations together.  
- These functions are internally represented using an *expression tree*.  
  - In the expression tree:  
    - Each atomic operation is a node, more specifically an instance of $\texttt{Expr\_node()}$.  
    - Each variable in the function is also represented as a node, more specifically an instance of $\texttt{Expr\_end\_node()}$. 
 - the classes of atomic operation all inherit from the same abstract classes, which defines their sturcture:
     - all operations have a `forward` and a `backward` function, where the backward function is the derivative of the forward function 
      - These classes are callable. Calling an operation object on another value automatically builds the *expression tree*.  
  - To calculate the `forward pass`, the expression tree is traversed using the `forward` function.  
  - The `backward pass` requires both the `forward` and `backward` functions to compute derivatives, applying the **chain rule** during traversal.
  
### Visualize the Expression Tree
- $\texttt{print\_graph}$: module provides everything needed to visualize the expression tree using the **Graphviz** package

In [2]:
from forward_backward_functions_and_nodes import * 
from print_graph import print_graph

#  Details on our Forward and Backward Propagation Algorithms
Both the forward and the backward pass are recursive functions.
The forward pass computes the value of the specified node, this is the evaluation of the function at the points (values) of the end nodes. If the node is not an endnode, it will keep recurring itself until it reaches one. \
The backward propagation calculates the outer derivative at the current node by evaluating the backward function at the values from the forward pass of all child nodes and recursively calls itself on each child node. In each recursion step, the derivative values are multiplied. The propagation continues until it reaches an end node, where the accumulated product is assigned to the value of the end node.


## Backpropagtion for Vectors and Matrices

For the application, especially in large neural networks, we have many parameters to manage. It is essential to group variables into matrices and vectors, allowing the backpropagation algorithm to handle them efficiently and helping the user maintain an overview.

**Our algorithm is capable of handling multidimensional values, such as vectors and matrices.** According to the **multidimensional chain rule**, the derivative of a chained function is the matrix product of their Jacobian matrices. This task is **not trivial**, as it **requires defining the Jacobian matrix for matrices**.

Our initial approach was to generalize the Jacobian matrix and represent it as third-order tensors, in order to preserve the 2D structure of each matrix within the Jacobian. However, this approach became confusing, as each multiplication required transposing or flattening the matrices to obtain the correct result.

As a result, we decided to treat every component (or entry) of each vector and matrix that the function depends on as an independent parameter. We then used the standard definition of the Jacobian matrix, where each row contains the derivatives of the components of the destination (codomain) of the function, and for each parameter (in the domain of definition), there is one row containing the corresponding partiial derivatives. 

To every child node, the columns of the Jacobian corresponding to the components of the parameter that the node represents (whether vector, matrix, or scalar) are passed. The multiplication then follows the usual matrix multiplication process. Since we have one column for each component, we need to reshape the result at the end node before further use. If the codomain of the entire chained function is one-dimensional, the shape of the end node parameter can be used for reshaping. If not, there will be as many rows as there are dimensions in the codomain, and each row needs to be reshaped to the shape of the end node parameter.

In the case of artificial neural networks, the mentioned entire function is the loss function chained with the network's function. Since the loss function is a scalar value for each end node parameter, the entire backpropagation process will result in a Jacobian matrix with one row and as many columns as there are entries in the parameter. This can easily be reshaped using the following pseudocode: parameter.reshape(paramer.shape)

In [3]:
def forward(node):
    return node.forward_func(*(forward(child) for child in node.childs)) if type(node) is not Expr_end_node else node.value 
    
# def backward(node, value = np.float64(1)):
#     if type(node) is not Expr_end_node:
#         child_values = [forward(child) for child in node.childs] # computes the argument of the outer derivative. In other words: it computes g of f'(g)
#         if len(node.childs) == 1:
#             new_value = node.backward_func(*child_values) # computes the outer derivative f'(g)
#             if value.ndim == 0 or new_value.ndim == 0:
#                 backward(node.childs[0], value * new_value)
#             else: 
#                 backward(node.childs[0], new_value.T @ value) # @ is matrix product
#         else:
#             for child, new_value in zip(node.childs, node.backward_func(*child_values), strict=True):
#                 if value.ndim == 0 or new_value.ndim == 0:
#                     backward(child, value * new_value)
#                 else: 
#                     backward(child, new_value.T @ value)                 
#     else:
#         node.grad_value += value


def backward(node, value = np.float64(1)):
    if type(node) is not Expr_end_node:
        child_values = [forward(child) for child in node.childs]
        if len(node.childs) == 1:
            # product of inner derivatives
            # value =parant_node derivative(all of parant's childs feeded forward)
            # new_value = derivative_of_current_node(all of current cilds feeded forward)
            new_value = node.backward_func(*child_values)
            if value.ndim == 0 or new_value.ndim == 0:
                backward(node.childs[0], value * new_value)
            else: 
                backward(node.childs[0], value @ new_value) # @ is matrix product
            
        else:
            for child, new_value in zip(node.childs, node.backward_func(*child_values), strict=True):
                if value.ndim == 0 or new_value.ndim == 0:
                    backward(child, value * new_value)
                else: 
                    backward(child, value @ new_value ) # @ is matrix product                  
    else: 
        node.grad_value += value

# Example Functions

The following cells are structured as follows: 
- Each cell contains one example function. \
For each function, we define the operators, parameters, and the function itself. The operators are defined as their respective operator classes. \
Example: 
| mathematical operator | operator class  |
| --- | --- |
| addition (+) | Add() |
| multiplication ($\cdot$) | Multiply()|
| sin() | Sin() | 

- This function will then be depicted as a node tree via the $\texttt{print\_graph}$ function.
- Finally, we perform the forward and backward propagation. The values of the propagations will be compared with values of the analytically solved function and derivative(s).

## Parameters and Mathematical Operators

In [4]:
x1 = Expr_end_node(np.random.rand(1))
x2 = Expr_end_node(np.random.rand(1))
x = Expr_end_node(np.random.rand(1))
w = Expr_end_node(np.random.rand(3,3))
xv = Expr_end_node(np.random.rand(3))
b = Expr_end_node(np.random.rand(3))

add = Add()
add_2 = Add_scalar(2)
multiply = Multiply()
multiply_3 = Multiply_scalar(3)
multiply_4 = Multiply_scalar(4)
sin = Sin()
log = Log()
tanh = Tanh()
#vecadd = Vector_vector_sum() 
#matmul = Matrix_vector_product()

## Function 1: $f(x_1,x_2) = \log(x_1 \cdot x_2) \cdot \sin(x_2) $

In [5]:
# defining the function
func1 = multiply(log(multiply(x1, x2)), sin(x2))

# graphical depiction
graph1 = graphviz.Digraph('graph1', comment='test') 
graph1.attr(rankdir="LR")
print_graph(func1, graph1)
graph1.render(directory='graph_out/tt', view=True)


# analytical function and its derivative(s)
mfunc1 = np.log(x1.value * x2.value) * np.sin(x2.value) # analytical function
mdfunc1dx1 = np.sin(x2.value) / x1.value # analytical derivative w.r.t. x1 
mdfunc1dx2 = np.sin(x2.value) / x2.value + np.log(x1.value * x2.value) * np.cos(x2.value) # analytical derivative w.r.t. x2

# comparison to analytical value
# comparison 1: forward propagation
ic(forward(func1)) # value of the function via forward propagation
ic(np.log(x1.value * x2.value) * np.sin(x2.value)) # value of the analytical function

# comparison 2: backward propagation
# as we reuse the same parameter names for multiple functions, we set the derivatives w.r.t. the parameters to zero, before performing the derivatives.
x1.grad_value=0
x2.grad_value=0
backward(func1) # performing the derivative via backward propagation
ic(x1.grad_value) # value of the derivative w.r.t. x1 via backward propagation
ic(np.sin(x2.value) / x1.value) # value of the analytical derivative w.r.t. x1 
ic(x2.grad_value) # value of the derivative w.r.t. x2 via backward propagation
ic(np.sin(x2.value) / x2.value + np.log(x1.value * x2.value) * np.cos(x2.value)) # value of the analytical derivative w.r.t. x2


# Result
print("function 1")
print("Calculus values of x1 derivative and x2 derivative:  ", mdfunc1dx1, mdfunc1dx2 )
print("Compared to derivatives optained with own algorithm: ", x1.grad_value, x2.grad_value)

ic| forward(func1): array([-0.47822975])
ic| np.log(x1.value * x2.value) * np.sin(x2.value): array([-0.47822975])
ic| x1.grad_value: array([0.45698546])
ic| np.sin(x2.value) / x1.value: array([0.45698546])
ic| x2.grad_value: array([-0.48324304])
ic| np.sin(x2.value) / x2.value + np.log(x1.value * x2.value) * np.cos(x2.value): array([-0.48324304])


function 1
Calculus values of x1 derivative and x2 derivative:   [0.45698546] [-0.48324304]
Compared to derivatives optained with own algorithm:  [0.45698546] [-0.48324304]


## Function 2: $g(x_1, x_2) = x_1 \cdot x_2 (x_1 + x_2) $ 

In [6]:
# defining the function
func2 = multiply(x1, multiply(x2, add(x1, x2)))

# graphical depiction
graph2 = graphviz.Digraph('graph2', comment='test') 
graph2.attr(rankdir="LR")
print_graph(func2, graph2)
graph2.render(directory='graph_out/tt', view=True)

# analytical function and its derivative(s)
mfunc2 = x1.value * x2.value * (x1.value + x2.value)
mdfunc2dx1 = x2.value * (x1.value + x2.value) + x1.value * x2.value
mdfunc2dx2 = x1.value * (x1.value + x2.value) + x1.value * x2.value

# comparison
ic(forward(func2))
ic(x1.value * x2.value * (x1.value + x2.value))

x1.grad_value=0
x2.grad_value=0
backward(func2)
ic(x1.grad_value)
ic(mdfunc2dx1)
ic(x2.grad_value)
ic(mdfunc2dx2)

# result
print("Function 2")
print("Calculus values of x1 derivative and x2 derivative:  ", mdfunc2dx1, mdfunc2dx2 )
print("Compared to derivatives optained with own algorithm: ", x1.grad_value, x2.grad_value)

ic| forward(func2): array([0.2123992])
ic| x1.value * x2.value * (x1.value + x2.value): array([0.2123992])
ic| x1.grad_value: array([0.52691469])
ic| mdfunc2dx1: array([0.52691469])
ic| x2.grad_value: array([0.88769867])
ic| mdfunc2dx2: array([0.88769867])


Function 2
Calculus values of x1 derivative and x2 derivative:   [0.52691469] [0.88769867]
Compared to derivatives optained with own algorithm:  [0.52691469] [0.88769867]


## Function 3: $h(x) = 3x^2 + 4x + 2$

In [7]:
# defining the function
func3 = add_2( add( multiply_3(multiply(x,x)) , multiply_4(x) ))

# graphical depiction
graph3 = graphviz.Digraph('graph3', comment='test') 
graph3.attr(rankdir="LR")
print_graph(func3, graph3)
graph3.render(directory='graph_out/tt', view=False)

# comparison
mfunc3 = 3*x.value**2 + 4 * x.value + 2
mdfunc3dx = 6 * x.value + 4

ic(forward(func3))
ic(3*x.value**2 + 4 * x.value + 2)

x.grad_value = 0
backward(func3)
ic(mdfunc3dx)
ic(x.grad_value)


# Result
print("Function 3")
print("Calculus values of x derivative:                     ", mdfunc3dx)
print("Compared to derivatives optained with own algorithm: ", x.grad_value)

ic| forward(func3): array([2.59169226])
ic| 3*x.value**2 + 4 * x.value + 2: array([2.59169226])
ic| mdfunc3dx: array([4.80627789])
ic| x.grad_value: array([4.80627789])


Function 3
Calculus values of x derivative:                      [4.80627789]
Compared to derivatives optained with own algorithm:  [4.80627789]


## Function 4: Neuron$(\vec x, w, \vec b) = \tanh(w\cdot \vec x + \vec b)$

### One-dimensional Variables: Neuron$(x, w, b) = \tanh(w\cdot x + b)$

In [8]:
xs_value = np.random.rand(1)
ws_value = np.random.rand(1)
bs_value = np.random.rand(1)
func4s_target_value = np.random.rand(1)

ws = Expr_end_node(ws_value)
xs = Expr_end_node(xs_value)
bs = Expr_end_node(bs_value)
func4s_target = Expr_end_node(func4s_target_value)

tanh = Tanh()
add = Add()
multiply = Multiply()

# defining the function
func4s = tanh(add(multiply(ws,xs) , bs)) # = function 4

# prediction value for loss function
func4s_predict = forward(func4s)

# graphical depiction
graph4s = graphviz.Digraph('graph4s', comment='test') 
graph4s.attr(rankdir="LR")
print_graph(func4s, graph4s)
graph4s.render(directory='graph_out/tt', view=True)


ws.grad_value = np.float64(0)
xs.grad_value = np.float64(0)
bs.grad_value = np.float64(0)
backward(func4s)

#grad_ws = loss_backwards(func4s_predict, func4s_target.value)*ws.grad_value
#grad_xs = loss_backwards(func4s_predict, func4s_target.value)*xs.grad_value
#grad_bs = loss_backwards(func4s_predict, func4s_target.value)*bs.grad_value
#ic(grad_ws)
#ic(grad_xs)
#ic(grad_bs)


# analytical function and its derivative(s)
dfunc4dw = 1 / np.cosh(ws.value * xs.value + bs.value)**2 * xs.value
dfunc4dx = 1 / np.cosh(ws.value * xs.value + bs.value)**2 * ws.value
dfunc4db = 1 / np.cosh(ws.value * xs.value + bs.value)**2

# Result
print("Function 4 (scalar)")
print("Derivatives optained with own algorithm:        \n", "ws :\n",
      ws.grad_value, "\n xs: \n",
      xs.grad_value, "\n bs: \n", 
      bs.grad_value)
print("analytical results:        \n", "ws :\n",
      dfunc4dw, "\n xs: \n",
      dfunc4dx, "\n bs: \n", 
      dfunc4db)

Function 4 (scalar)
Derivatives optained with own algorithm:        
 ws :
 [0.327417] 
 xs: 
 [0.13154934] 
 bs: 
 0.4124164233403246
analytical results:        
 ws :
 [0.327417] 
 xs: 
 [0.13154934] 
 bs: 
 [0.41241642]


### Extending to Multiple Dimensions: $( x \to \vec{x} )$
For the neuron activation function with multidimensional input, we will compare the derivatives with what $\texttt{torch}$ computes. See further below. \
The backpropagation will include the loss function as the outermost function chained with the function of the neuron layer or neuronal net ($\vec y$), so that we have
$$
 \frac{\partial \text{Loss}(\vec y)}{\partial \vartheta_{ij}} = \frac{\partial \text{Loss}}{\partial \vec y}  \frac{\partial \vec y}{\partial \vartheta_{ij}}
$$
with $\vec y = \text{Neuron}(\vec x, w, \vec b)$ and $\vartheta_{ij}$ either being the weight matrix $w_{ij}$ or the bias vector $\vec b$.

In [9]:
# loss_function   
def loss_function(y_pred, y_target):
     return (y_pred - y_target).pow(2).sum()
    
# (d loss)/(d y_pred) = 2*(y_pred - y_target)
def loss_backwards(y_pred, y_target):
     return 2*(y_pred - y_target)

### Multidimensional Value Neurons $(\vec x, w, \vec b) = \tanh(w\cdot \vec x + \vec b)$ (a whole layer of neurons in one go)

In [10]:
# multidimensional values
w_value = np.random.rand(3,2)
xv_value = np.random.rand(2)
b_value = np.random.rand(3)
func4_target_value = np.random.rand(3)

w = Expr_end_node(w_value)
xv = Expr_end_node(xv_value)
b = Expr_end_node(b_value)
func4_target = Expr_end_node(func4_target_value)

tanh = Tanh()
vecadd = Vector_vector_sum()
matmul = Matrix_vector_product()

# defining the function
activation = Matrix_w_x_b()
func4 = tanh(activation(w,xv,b)) # = function 4

# prediction value for loss function
func4_predict = forward(func4)

# graphical depiction
graph4 = graphviz.Digraph('graph4', comment='test') 
graph4.attr(rankdir="LR")
print_graph(func4, graph4)
graph4.render(directory='graph_out/tt', view=True)


w.grad_value = np.float64(0)
xv.grad_value = np.float64(0)
b.grad_value = np.float64(0)
backward(func4)

grad_w = loss_backwards(func4_predict, func4_target.value)@w.grad_value
grad_xv = loss_backwards(func4_predict, func4_target.value)@xv.grad_value
grad_b = loss_backwards(func4_predict, func4_target.value)@b.grad_value

# reshaping the derivatives to the resepective parameter
grad_w_cs = grad_w.reshape(w.value.shape)
grad_xv_cs = grad_xv.reshape(xv.value.shape)
grad_b_cs = grad_b.reshape(b.value.shape)

ic(grad_w_cs)
ic(grad_xv_cs)
ic(grad_b_cs)


# Result
print("Function 4")
print("Derivatives optained with own algorithm: :        \n", "w :\n",
      grad_w_cs, "\n xv: \n",
      grad_xv_cs, "\n b: \n", 
      grad_b_cs)
print("For comparison of torches results see section Comparison to torch!")

ic| grad_w_cs: array([[ 0.19794248,  0.26151961],
                      [ 0.24653337,  0.32571739],
                      [-0.18034925, -0.23827561]])
ic| grad_xv_cs: array([0.20577088, 0.36978172])
ic| grad_b_cs: array([ 0.43476933,  0.54149643, -0.39612681])


Function 4
Derivatives optained with own algorithm: :        
 w :
 [[ 0.19794248  0.26151961]
 [ 0.24653337  0.32571739]
 [-0.18034925 -0.23827561]] 
 xv: 
 [0.20577088 0.36978172] 
 b: 
 [ 0.43476933  0.54149643 -0.39612681]
For comparison of torches results see section Comparison to torch!


### Function 4 Extension: Multiple Layers

In proof to show the capabilty of our algorithm to handle deep neuronal natwork functions.
We have here a simple implementation of a **fully connected feed forward network with 3 layers (input, hidden and output)**.

In [11]:
w2_value = np.random.rand(3,3)
b2_value = np.random.rand(3)
func4_2_target_value = np.random.rand(3)

w2 = Expr_end_node(w2_value)
b2 = Expr_end_node(b2_value)
func4_2_target = Expr_end_node(func4_2_target_value)

# second neuron
func4_2 = tanh(activation(w2, func4, b2))

# graphical depiction
graph4_2 = graphviz.Digraph('graph4.2', comment='test') 
graph4_2.attr(rankdir="LR")
print_graph(func4_2, graph4_2)
graph4_2.render(directory='graph_out/tt', view=True)

w2.grad_value = np.float64(0)
b2.grad_value = np.float64(0)
w.grad_value = np.float64(0)
b.grad_value = np.float64(0)
backward(func4_2)

func4_2 = tanh(activation(w2, func4, b2))
func4_2_predict = forward(func4_2)

grad_w2 = loss_backwards(func4_2_predict, func4_2_target.value)@w2.grad_value
grad_b2 = loss_backwards(func4_2_predict, func4_2_target.value)@b2.grad_value

# reshaping the derivatives to the resepective parameter
grad_w2_rs = grad_w2.reshape(w2.value.shape)
grad_b2_rs = grad_b2.reshape(b2.value.shape)

ic(grad_w2_rs)
ic(grad_b2_rs)
print("For comparison of torches results see section Comparison to torch!")

ic| grad_w2_rs: array([[0.13627391, 0.09196742, 0.10082281],
                       [0.34560476, 0.23323891, 0.25569708],
                       [0.17022174, 0.11487785, 0.12593924]])
ic| grad_b2_rs: array([0.16504418, 0.41856913, 0.2061591 ])


For comparison of torches results see section Comparison to torch!


### Comparison to $\texttt{torch}$

In [12]:
import torch

### One-Dimensional and Single-Layer Comparison

In [13]:
# Same code as above, but the random parameters need to be in the same cell for comparison
# Everything inside the lined box is a repetition, skip to torch code
# -------------------------------------------------------------------------------------------
# actual values
xs_value = np.random.rand(1)
ws_value = np.random.rand(1)
bs_value = np.random.rand(1)
func4s_target_value = np.random.rand(1)

ws.grad_value = np.float64(0)
xs.grad_value = np.float64(0)
bs.grad_value = np.float64(0)

ws = Expr_end_node(ws_value)
xs = Expr_end_node(xs_value)
bs = Expr_end_node(bs_value)
func4s_target = Expr_end_node(func4s_target_value)

tanh = Tanh()
add = Add()
multiply = Multiply()

# defining the function
func4s = tanh(add(multiply(ws,xs) , bs)) # = function 4

# prediction value for loss function
func4s_predict = forward(func4s)

ws.grad_value = np.float64(0)
xs.grad_value = np.float64(0)
bs.grad_value = np.float64(0)
backward(func4s)

grad_ws = loss_backwards(func4s_predict, func4s_target.value)*ws.grad_value
grad_xs = loss_backwards(func4s_predict, func4s_target.value)*xs.grad_value
grad_bs = loss_backwards(func4s_predict, func4s_target.value)*bs.grad_value

# -------------------------------------------------------------------------------------------
# here starts the torch code
def forward_function(x, w, b):
     return torch.tanh(w * x + b)

# torch expression for parameters
xs_t = torch.tensor(xs_value, requires_grad=True)
ws_t = torch.tensor(ws_value, requires_grad=True)
bs_t = torch.tensor(bs_value, requires_grad=True)


func4s_target_t = torch.tensor(func4s_target_value)

func4s_predict_t = forward_function(xs_t, ws_t, bs_t,)

loss_t = loss_function(func4s_predict_t, func4s_target_t)

loss_t.backward()

ic(grad_ws)
ic(grad_bs)
ic(ws_t.grad)
ic(bs_t.grad)


# Result
print("Function 4 (scalar)")
print("torch gradient values for w and b: \n",
      "w: \n", ws_t.grad, "\n",
      "b: \n", bs_t.grad, "\n")
print("compared to gradient results of own algorithm: \n", 
      "w :\n", grad_ws, "\n",
      "b: \n", grad_bs)

ic| grad_ws: array([0.30121278])
ic| grad_bs: array([0.42685234])
ic| ws_t.grad: tensor([0.3012], dtype=torch.float64)
ic| bs_t.grad: tensor([0.4269], dtype=torch.float64)


Function 4 (scalar)
torch gradient values for w and b: 
 w: 
 tensor([0.3012], dtype=torch.float64) 
 b: 
 tensor([0.4269], dtype=torch.float64) 

compared to gradient results of own algorithm: 
 w :
 [0.30121278] 
 b: 
 [0.42685234]


In [14]:
# Same code as above, but the random parameters need to be in the same cell for comparison
# Everything inside the lined box is a repetition, skip to torch code
# -------------------------------------------------------------------------------------------

# multidimensional values
# multidimensional values
w_value = np.random.rand(3,2)
xv_value = np.random.rand(2)
b_value = np.random.rand(3)
func4_target_value = np.random.rand(3)

w = Expr_end_node(w_value)
xv = Expr_end_node(xv_value)
b = Expr_end_node(b_value)
func4_target = Expr_end_node(func4_target_value)

tanh = Tanh()
vecadd = Vector_vector_sum()
matmul = Matrix_vector_product()

# defining the function
activation = Matrix_w_x_b()
func4 = tanh(activation(w,xv,b)) # = function 4

# prediction value for loss function
func4_predict = forward(func4)

w.grad_value = np.float64(0)
xv.grad_value = np.float64(0)
b.grad_value = np.float64(0)
backward(func4)

grad_w = loss_backwards(func4_predict, func4_target.value)@w.grad_value
grad_xv = loss_backwards(func4_predict, func4_target.value)@xv.grad_value
grad_b = loss_backwards(func4_predict, func4_target.value)@b.grad_value

# reshaping the derivatives to the resepective parameter
grad_w_rs = grad_w.reshape(w.value.shape)
grad_xv_rs = grad_xv.reshape(xv.value.shape)
grad_b_rs = grad_b.reshape(b.value.shape)


# -------------------------------------------------------------------------------------------
# here starts the torch code

def forward_function(x, w, b):
     return torch.tanh(w @ x + b)

# torch expression for parameters
xv_t = torch.tensor(xv_value, requires_grad=True)
w_t = torch.tensor(w_value, requires_grad=True)
b_t = torch.tensor(b_value, requires_grad=True)


func4_target_t = torch.tensor(func4_target_value)

func4_predict_t = forward_function(xv_t, w_t, b_t)
loss_t = loss_function(func4_predict_t, func4_target_t)

loss_t.backward()

ic(grad_w)
ic(grad_b)
ic(w_t.grad)
ic(b_t.grad)

print("Function 4")
print("\n Input: \n",
      "w: \n", w_value, "\n",
      "b: \n", b_value , "\n")

print("neuron parameters: \n",
      "w: \n", grad_w_rs, "\n",
      "b: \n", grad_b_rs, "\n")
print("compared to gradient results of own algorithm: \n", 
      "w :\n", w_t.grad, "\n",
      "b: \n", b_t.grad)


ic| grad_w: array([-0.00054707, -0.00171534,  0.03409697,  0.10691214, -0.13023268,
                   -0.40834868])
ic| grad_b: array([-0.0023075 ,  0.14381925, -0.54931464])
ic| w_t.grad: tensor([[-0.0005, -0.0017],
                      [ 0.0341,  0.1069],
                      [-0.1302, -0.4083]], dtype=torch.float64)
ic| b_t.grad: tensor([-0.0023,  0.1438, -0.5493], dtype=torch.float64)


Function 4

 Input: 
 w: 
 [[0.90602038 0.27251732]
 [0.47833065 0.50656291]
 [0.86829964 0.3025354 ]] 
 b: 
 [0.82966095 0.02442584 0.2357855 ] 

neuron parameters: 
 w: 
 [[-0.00054707 -0.00171534]
 [ 0.03409697  0.10691214]
 [-0.13023268 -0.40834868]] 
 b: 
 [-0.0023075   0.14381925 -0.54931464] 

compared to gradient results of own algorithm: 
 w :
 tensor([[-0.0005, -0.0017],
        [ 0.0341,  0.1069],
        [-0.1302, -0.4083]], dtype=torch.float64) 
 b: 
 tensor([-0.0023,  0.1438, -0.5493], dtype=torch.float64)


### Multidimensional and multilayer comparison

In [15]:
# Same code as above, but the random parameters need to be in the same cell for comparison
# Everything inside the lined box is a repetition, skip to torch code
# -------------------------------------------------------------------------------------------
# actual values
w_value = np.random.rand(3,2)
xv_value = np.random.rand(2)
b_value = np.random.rand(3)

w2_value = np.random.rand(3,3)
b2_value = np.random.rand(3)
func4_2_target_value = np.random.rand(3)

# node expression
w = Expr_end_node(w_value)
xv = Expr_end_node(xv_value)
b = Expr_end_node(b_value)

w2 = Expr_end_node(w2_value)
b2 = Expr_end_node(b2_value)


func4_2_target = Expr_end_node(func4_2_target_value)


tanh = Tanh()
vecadd = Vector_vector_sum()
matmul = Matrix_vector_product()
activation = Matrix_w_x_b()

func4 = tanh(activation(w,xv,b))
func4_2 = tanh(activation(w2, func4, b2))
func4_2_predict = forward(func4_2)

w.grad_value = np.float64(0)
xv.grad_value = np.float64(0)
b.grad_value = np.float64(0)
w2.grad_value = np.float64(0)
b2.grad_value = np.float64(0)
backward(func4_2)

grad_w = loss_backwards(func4_2_predict, func4_2_target.value)@w.grad_value
grad_b = loss_backwards(func4_2_predict, func4_2_target.value)@b.grad_value

grad_w2 = loss_backwards(func4_2_predict, func4_2_target.value)@w2.grad_value
grad_b2 = loss_backwards(func4_2_predict, func4_2_target.value)@b2.grad_value

# reshaping the derivatives to the resepective parameter
grad_w_rs = grad_w.reshape(w.value.shape)
grad_b_rs = grad_b.reshape(b.value.shape)
grad_w2_rs = grad_w2.reshape(w2.value.shape)
grad_b2_rs = grad_b2.reshape(b2.value.shape)

# -------------------------------------------------------------------------------------------
# here starts the torch code
def forward_function(x, w1, b1, w2, b2):
     return torch.tanh(w2 @ torch.tanh(w1 @ x + b1) + b2)

# torch expression for parameters
xv_t = torch.tensor(xv_value, requires_grad=True)
w_t = torch.tensor(w_value, requires_grad=True)
b_t = torch.tensor(b_value, requires_grad=True)

w2_t = torch.tensor(w2_value, requires_grad=True)
b2_t = torch.tensor(b2_value, requires_grad=True)


func4_2_target_t = torch.tensor(func4_2_target_value)

func4_2_predict_t = forward_function(xv_t, w_t, b_t, w2_t, b2_t)
loss_t = loss_function(func4_2_predict_t, func4_2_target_t)

loss_t.backward()

ic(grad_w2_rs)
ic(grad_b2_rs)
ic(w2_t.grad)
ic(b2_t.grad)

print("Function 4")
print("\n Input layer: \n",
      "w: \n", w_value, "\n",
      "b: \n", b_value )
print("\n Second layer:")
print("torch values for w and b: \n",
      "w: \n", w_t.grad, "\n",
      "b: \n", b_t.grad, "\n")
print("compared to gradient results of own algorithm: \n", 
      "w :\n", grad_w_rs, "\n",
      "b: \n", grad_b_rs)
print("\n Third layer:")
print("torch values for w and b: \n",
      "w: \n", w2_t.grad, "\n",
      "b: \n", b2_t.grad, "\n")
print("compared to manual derivatives through chain rule: \n", 
      "w :\n", grad_w2_rs, "\n",
      "b: \n", grad_b2_rs)

ic| grad_w2_rs: array([[0.16946913, 0.24740022, 0.24647518],
                       [0.03214457, 0.04692638, 0.04675093],
                       [0.20031027, 0.29242379, 0.29133041]])
ic| grad_b2_rs: array([0.28919791, 0.05485449, 0.34182811])
ic| w2_t.grad: tensor([[0.1695, 0.2474, 0.2465],
                       [0.0321, 0.0469, 0.0468],
                       [0.2003, 0.2924, 0.2913]], dtype=torch.float64)
ic| b2_t.grad: tensor([0.2892, 0.0549, 0.3418], dtype=torch.float64)


Function 4

 Input layer: 
 w: 
 [[0.11067609 0.74409225]
 [0.95022696 0.63816965]
 [0.19433155 0.84210456]] 
 b: 
 [0.2384311  0.39252483 0.73286122]

 Second layer:
torch values for w and b: 
 w: 
 tensor([[0.1443, 0.1188],
        [0.0266, 0.0219],
        [0.0405, 0.0333]], dtype=torch.float64) 
 b: 
 tensor([0.2410, 0.0445, 0.0676], dtype=torch.float64) 

compared to gradient results of own algorithm: 
 w :
 [[0.14432145 0.11880729]
 [0.02663425 0.02192566]
 [0.04048723 0.03332961]] 
 b: 
 [0.2409894  0.04447414 0.06760599]

 Third layer:
torch values for w and b: 
 w: 
 tensor([[0.1695, 0.2474, 0.2465],
        [0.0321, 0.0469, 0.0468],
        [0.2003, 0.2924, 0.2913]], dtype=torch.float64) 
 b: 
 tensor([0.2892, 0.0549, 0.3418], dtype=torch.float64) 

compared to manual derivatives through chain rule: 
 w :
 [[0.16946913 0.24740022 0.24647518]
 [0.03214457 0.04692638 0.04675093]
 [0.20031027 0.29242379 0.29133041]] 
 b: 
 [0.28919791 0.05485449 0.34182811]
