# Automatic Differentiation

Neural Networks can be constructed in various ways. We can represent a Neural Network with a directed graph $\mathcal{G}$. However, as these graphs get more sophisticated, gradient descent-based optimization technics become more challenging. Therefore, there is a need for an automatic gradient calculation through these graphs. 

We can calculate the Jacobian of a function with respect to network parameters. There are two ways of calculating Jacobians, namely forward-mode and reverse-mode differentiation. Although they give the same result, depending on the structure of the function that we differentiate, one of them can be more efficient. For example, if the size of the range of a function is less than the size of its domain, reverse-mode is more efficient. For more details please read [this](https://www.wikiwand.com/en/Automatic_differentiation).

Since, in Deep Learning, we would like to differentiate loss functions, which have scalar outputs, we prefer to use reverse-mode differentiation. **Backpropagation** is a special case of reverse-mode differentiation.

## Autograd

In this homework, we will implement a NumPy-based automatic differentiation library called ```autograd```. Before moving into the implementation, make sure that you read [chapter 6](https://www.deeplearningbook.org/contents/mlp.html).

First, we start with building a graph. Second, we differentiate a node in the graph with respect to all its inputs.

### Array

In ```autograd```, we call our basic data-structure ```Array```. An array works similar to NumPy ndarray but it is differentiable. In order to run reverse-mode differentiation, array objects keep track of the computational graph to which they belong. Example graphs are shown in the below figure. In singleton graph, array Z contains the graph $\mathcal{G}$ which has 3 nodes $X$, $Y$, and $Z$. 

<img src="comp-graphs.png" alt="drawing" width="1000"/>

> Figure: Blue circles denote leaf nodes, while the red circle denotes the root node. Gray nodes represent intermediate nodes. Note that non-leaf nodes are the outputs of some operations.

In order to understand how array objects are built let's observe the following usage:


In [36]:
%load_ext autoreload
%autoreload 2

import numpy as np
from autograd.array import Array
from autograd.operations import SimpleAdd

# Construction of Leaf nodes
data = np.random.rand(4, 5)
first_array = Array(value=data)
second_array = Array(value=np.ones((4, 5)))

# Creation of a non-leaf node
simple_add_op = SimpleAdd(first_array, second_array)
result_array = Array(simple_add_op(), simple_add_op)
# Note that, we will not explicitly call operations! This is just for demonstration!

assert first_array.operation is None
assert second_array.operation is None
assert result_array.operation is not None


result_array, second_array

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


(Autograd array, Operation: SimpleAdd: [[1.31601182 1.06189737 1.5497095  1.04361216 1.34210663]
  [1.0057547  1.90576253 1.2104615  1.19405544 1.80914202]
  [1.73363448 1.71204617 1.66306854 1.40599005 1.1679411 ]
  [1.87475525 1.9481673  1.93464611 1.09157235 1.01255101]],
 Autograd array: [[1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]])

An array can be constructed in two different ways.
- Feeding only the value parameter by a NumPy array
- Also by feeding the operation that created the array value

If we create an array using the former method, it becomes a leaf node in the computation graph that contains it. Otherwise, it becomes an intermediate or the root node. For example, in the figure above, the simple graph has 4 nodes. Among them, $X$ and $Y$ are leaf nodes (created without feeding operation parameter), $Z$ is the root node, and $T$ is the intermediate node. Both $T$ and $Z$ are created as outputs of addition operations, and hence, they are not leaf nodes. Non-leaf nodes contain the operation that resulted in their creation.

> Note: Array object and array value (NumPy Array) are made immutable in order to leave less room for errors in gradient calculations.

### Operation

Automatic differentiation requires both the forward computation and a derivative of every differentiable operation. Therefore, we define new NumPy-based operations that contain forward and derivative operations. Please take a look at the definition of Operations in the ```operations.py``` script.

- call method
- jvp method

#### Forward computation

We implement forward computation in ```__call__``` method. This is trivial since we can directly use NumPy operations.

#### Derivative computation

In automatic differentiation, we can calculate the Jacobian of a function. However, in Deep Learning, we want to compute gradients of a function with scalar output $g: \mathcal{R}^n \rightarrow \mathcal{R}^1$ with respect to learnable parameters. Hence, instead of computing costly Jacobians, we use vector Jacobian product ```vjp``` to calculate gradients. 

Suppose that we have a composition of functions $g(x) = f_n\circ \dots \circ f_1(x)$ where $f_i: \mathcal{R}^n \rightarrow \mathcal{R}^n, \forall i \in \{1, \dots,n-1\}$ and $f_n: \mathcal{R}^n \rightarrow \mathcal{R}^1$. We calculate the gradient of $g(x)$ as:


\begin{align}
    \mathcal{J}_x g(x) &= (\mathcal{J} f_n)(\mathcal{J} f_{n-1}) \dots (\mathcal{J} f_1(x))\\
    &= (\nabla^T f_n) (\mathcal{J} f_{n-1}) \dots (\mathcal{J} f_1(x))\\
    &= \underbrace{(\nabla^T (f_n \circ f_{n-1}))}_{(\nabla^T f_n) (\mathcal{J} f_{n-1}):=\text{vjp}} \dots (\mathcal{J} f_1(x))\\
    &= \underbrace{(\nabla^T (f_n \circ f_{n-1} \circ f_{n-2}))}_{\text{vector}^T} \underbrace{(\mathcal{J} f_{n-3}(x))\dots (\mathcal{J} f_1(x))}_{\text{Jacobian}}\\
    &= \nabla^T_x g(x)
\end{align}

Following the above steps, we can calculate vector Jacobian products (vjp) at every step in the computational graph without ever computing and storing the full Jacobian.

### Computational Graph

Reverse-mode differentiation requires a computational graph $\mathcal{G}$ of a root node $Z$ to compute its derivative with respect to leaf nodes in the graph. Therefore, we feed the operation object to non-leaf arrays which contain a list of arrays/operands that are used in the operation creation. We call this list $\mathcal{C}_Z$; children of $Z$. For example in simple-graph, $\mathcal{C}_Z$ contains $T$ and $Y$. We can traverse the computational-graph $\mathcal{G}_z$ of root node $Z$ by using the $\mathcal{C}_Z$ and then recursively traversing their children and so on.

We differentiate a node in a computational graph by reversing the directed edges and traversing the reversed graph starting from the root node (the one that we want to differentiate). At every step, we calculate ```vjp```s and pass it to the next node(s) in the reversed graph. Example reversed graph is shown below.

<img src="comp-graph-backprop.png" alt="drawing" width="1000"/>

> In the reversed graph, the numbers on the reversed directed edges denote the order of which we calculate the vjp operation.

Notice that, in the above-reversed graph, in order to evaluate vjp of node $T$ we need to complete vjp calculations of nodes before it ($K$, $L$, and $Z$). The order in which we compute vjps is as follows:
- If a node has parents, it must come after all of its parents in the reversed graph

This order is important for efficient gradient calculation in reverse-mode differentiation.

Luckily, we have an algorithm just as we defined called [topological sorting](https://www.wikiwand.com/en/Topological_sorting#:~:text=In%20computer%20science%2C%20a%20topological,before%20v%20in%20the%20ordering). Please read the link to learn more about it.





In [37]:
import numpy as np
from autograd.array import Array
from autograd.operations import SimpleAdd


def add(first_array: Array, second_array: Array) -> Array:
    operation = SimpleAdd(first_operand=first_array, second_operand=second_array) # We use this operation just for demonstration!
    return Array(value=operation(), operation=operation)


arr_x = Array(np.array([1, 4]), is_parameter=True) # Leaf nodes
arr_y = Array(np.array([2, 3]), is_parameter=True) # Leaf nodes

arr_t = add(arr_x, arr_y) # Intermediate Nodes
arr_k = add(arr_t, arr_x) # Intermediate Nodes
arr_l = add(arr_t, arr_k) # Intermediate Nodes
arr_z = add(arr_t, arr_l) # Root Node

def simple_traverse(array: Array) -> None:
    print(array)
    if array.operation is not None:
        for arr in array.operation.operands:
            simple_traverse(arr)

simple_traverse(arr_z)
arr_z

Autograd array, Operation: SimpleAdd: [10 25]
Autograd array, Operation: SimpleAdd: [3 7]
Autograd array: [1 4]
Autograd array: [2 3]
Autograd array, Operation: SimpleAdd: [ 7 18]
Autograd array, Operation: SimpleAdd: [3 7]
Autograd array: [1 4]
Autograd array: [2 3]
Autograd array, Operation: SimpleAdd: [ 4 11]
Autograd array, Operation: SimpleAdd: [3 7]
Autograd array: [1 4]
Autograd array: [2 3]
Autograd array: [1 4]


Autograd array, Operation: SimpleAdd: [10 25]

We form the computational graph by tracing the operations that we use to create ```Array```s. An example traverse is given above.

Now, we want to calculate the derivatives $\frac{\partial z}{\partial x}$ and $\frac{\partial z}{\partial x}$. First, we need to have a topological ordering to start calculating ```jvp```s of the nodes in a proper order.

> Complete ```topological_sort``` in the ```__init__.py``` script. 

In [38]:
from autograd import topological_sort

order = topological_sort(arr_z)
for ordered_array, array in zip(order[:-2], (arr_z, arr_l, arr_k, arr_t)):
    assert ordered_array.hash_code == array.hash_code, f"Mismatch between ordered array {ordered_array} and {array}"


We can calculate the gradient using topologically ordered nodes. At every node, accumulate the gradients of each child in the reversed graph. Then use the accumulated gradient to feed it to ```vjp``` method of the array to obtain gradients with respect to its children. For example, in the above figure, in order to calculate vjp of node $T$, we need to accumulate the gradients flowing through $T \leftarrow K$, $T \leftarrow L$, $T \leftarrow Z$ fist.

> Complete ```grad``` in the ```__init__.py``` script.

Note that, only parameter Arrays (Array objects with ```is_parameter=True```) are returned by ```grad```.

In [39]:
from autograd import grad

gradients = grad(arr_z, np.ones(2))
assert np.allclose(gradients[arr_x], 4.0), "Gradient mismatch"
assert np.allclose(gradients[arr_y], 3.0), "Gradient mismatch"

assert arr_z not in gradients.keys()
assert arr_l not in gradients.keys()
assert arr_k not in gradients.keys()
assert arr_t not in gradients.keys()



GRADIENTS defaultdict(<class 'list'>, {Autograd array, Operation: SimpleAdd: [10 25]: [array([1., 1.])]})


With that completed, we can fill the remaining operations and complete the Array class. Later we will use it to build Fully Connected Neural Networks and more.

### More Operations

In order to use addition, subtraction, matrix multiplication, etc we need to override corresponding operators in the Array class. For details please read [Python's data model](https://docs.python.org/3/reference/datamodel.html). We have already filled all of the operators in the Array class. But they use Operations that you need to fill before using these operators.

```Python
def __add__(self, other: "Array") -> "Array":
    op = Add(self, self.to_array(other))
    return Array(op(), operation=op)
```

In the above code snippet, you can see how we overload the ```+``` operator of Array objects. This operator is called when two arrays, lets say ```A``` and ```B```, are used as ```A + B```.

#### Broadcastable Operations

```SimpleAdd``` operation that we use to test ```grad```does not automatically broadcast its inputs. Therefore, we need basic arithmetic operations that can do broadcasting and can handle its derivative.

> Complete ```BroadcastedOperation``` in the ```operations.py``` script.

> Complete ```Add``` in the ```operations.py``` script.

> Complete ```Subtract``` in the ```operations.py``` script.

> Complete ```Multiply``` in the ```operations.py``` script.

> Complete ```Divide``` in the ```operations.py``` script.

> Complete ```Maximum``` in the ```operations.py``` script.

> Complete ```Minimum``` in the ```operations.py``` script.


Let's test ```Add``` operation.

In [40]:
import numpy as np
from autograd.array import Array

x_arr = Array(np.ones((4, 3, 1), dtype=np.float32) * 3)
y_arr = Array(np.ones((5,), dtype=np.float32) * 2)

z_arr = x_arr + y_arr
print("x_arr", x_arr.value.shape)
print("y_arr", y_arr.value.shape)

assert np.allclose(z_arr.value,  np.ones((4, 3, 5), dtype=np.float32) * 5), "Forward error in Add"
grad_x, grad_y = z_arr.operation.vjp(np.ones_like(z_arr.value))
assert np.allclose(grad_x, np.ones_like(x_arr.value) * 5), "Derivative error in first argument of Add"
assert np.allclose(grad_y, np.ones_like(y_arr.value) * 12), "Derivative error in first argument of Add"


x_arr (4, 3, 1)
y_arr (5,)


Test ```Subtract``` operation.

In [41]:
z_arr = x_arr - y_arr
assert np.allclose(z_arr.value,  np.ones((4, 3, 5), dtype=np.float32)), "Forward error in Subtract"
grad_x, grad_y = z_arr.operation.vjp(np.ones_like(z_arr.value))
assert np.allclose(grad_x, np.ones_like(x_arr.value) * 5), "Derivative error in first argument of Subtract"
assert np.allclose(grad_y, np.ones_like(y_arr.value) * -12), "Derivative error in first argument of Subtract"

Test ```Multiply``` operation.


In [42]:
z_arr = x_arr * y_arr
assert np.allclose(z_arr.value,  np.ones((4, 3, 5), dtype=np.float32) * 6), "Forward error in Multiply"
grad_x, grad_y = z_arr.operation.vjp(np.ones_like(z_arr.value))
assert np.allclose(grad_x, np.ones_like(x_arr.value) * 10), "Derivative error in first argument of Multiply"
assert np.allclose(grad_y, np.ones_like(y_arr.value) * 36), "Derivative error in first argument of Multiply"

Test ```Divide``` operation.

In [43]:
z_arr = x_arr / y_arr
assert np.allclose(z_arr.value,  np.ones((4, 3, 5), dtype=np.float32) * 1.5), "Forward error in Divide"
grad_x, grad_y = z_arr.operation.vjp(np.ones_like(z_arr.value))
assert np.allclose(grad_x, np.ones_like(x_arr.value) * 2.5), "Derivative error in first argument of Divide"
assert np.allclose(grad_y, np.ones_like(y_arr.value) * -9.0), "Derivative error in first argument of Divide"

Test ```Maximum``` operation.

In [44]:
x_arr = Array(np.array([[1], [2], [5]], dtype=np.float32))
y_arr = Array(np.array([3, 4, -1], dtype=np.float32))

z_arr = x_arr.maximum(y_arr)
assert np.allclose(z_arr.value, np.array([[3., 4., 1.],
                                           [3., 4., 2.],
                                           [5., 5., 5.]], dtype=np.float32)), "Forward error in Maximum"
grad_x, grad_y = z_arr.operation.vjp(np.ones_like(z_arr.value))
grad_x, grad_y
assert np.allclose(grad_x, np.array([[1.],
                                     [1.],
                                     [3.]], dtype=np.float32)), "Derivative error in first argument of Maximum"
assert np.allclose(grad_y, np.array([2., 2., 0.], dtype=np.float32)
                   ), "Derivative error in first argument of Maximum"


#### Reduce Operations

Reduce operations apply a function on an axis and the result of these operators decreases the dimension of the axis that they operate. Let's complete Reduce operations.



> Complete ```ReduceOperations``` in the ```operations.py``` script.

> Complete ```Sum``` in the ```operations.py``` script.

> Complete ```Mean``` in the ```operations.py``` script.

> Complete ```Max``` in the ```operations.py``` script.

Test ```Sum``` operation.

In [45]:
import numpy as np
from autograd.array import Array

x_arr = Array(np.arange(18, dtype=np.float32).reshape((2, 3, 3)))
z_arr = x_arr.sum(axis=1)
assert np.allclose(z_arr.value,  np.array([[9., 12., 15.],
                                           [36., 39., 42.]], dtype=np.float32)), "Forward error in Sum"
grad_x = z_arr.operation.vjp(np.ones_like(z_arr.value))
assert np.allclose(grad_x, np.ones_like(x_arr.value)), "Derivative error in first argument of Sum"


Test ```Mean``` operation.

In [46]:
z_arr = x_arr.mean(axis=1, keepdims=True)
assert np.allclose(z_arr.value,  np.array([[[3., 4., 5.]],
                                           [[12., 13., 14.]]], dtype=np.float32)), "Forward error in Mean"
grad_x = z_arr.operation.vjp(np.ones_like(z_arr.value))
assert np.allclose(grad_x, np.ones_like(x_arr.value)/3), "Derivative error in first argument of Mean"


Test ```Max``` operation.

In [47]:
z_arr = x_arr.max(axis=1)
assert np.allclose(z_arr.value,  np.array([[6., 7., 8.],
                                           [15., 16., 17.]], dtype=np.float32)), "Forward error in Max"
print("XXXX", x_arr)
grad_x = z_arr.operation.vjp(np.ones_like(z_arr.value))
true_grad = np.zeros_like(x_arr.value)
true_grad[:, 2, :] = 1.0
assert np.allclose(grad_x, true_grad), "Derivative error in first argument of Max"


XXXX Autograd array: [[[ 0.  1.  2.]
  [ 3.  4.  5.]
  [ 6.  7.  8.]]

 [[ 9. 10. 11.]
  [12. 13. 14.]
  [15. 16. 17.]]]


#### Dot product

Now, we need to implement matrix multiplication operation.

> Complete ```Matmul``` in the ```operations.py``` script.

Test ```Matmul``` operation.

In [48]:
import numpy as np
from autograd.array import Array

x_arr = Array(np.arange(6, dtype=np.float32).reshape((2, 3)))
y_arr = Array(np.arange(6, 18, dtype=np.float32).reshape((3, 4)))
z_arr = x_arr @ y_arr
assert np.allclose(z_arr.value,  np.array([[38.,  41.,  44.,  47.],
                                           [128., 140., 152., 164.]], dtype=np.float32)), "Forward error in Matmul"
grad_x, grad_y = z_arr.operation.vjp(np.ones_like(z_arr.value))
assert np.allclose(grad_x, np.array([[30., 46., 62.],
                                     [30., 46., 62.]], dtype=np.float32)), "Derivative error in first argument of Matmul"
assert np.allclose(grad_y, np.array([[3., 3., 3., 3.],
                                     [5., 5., 5., 5.],
                                     [7., 7., 7., 7.]], dtype=np.float32)), "Derivative error in first argument of Matmul"


#### Miscellaneous

Finally, we need to complete some basic operations.

> Complete ```Tanh``` Operation in ```operations.py``` (See the Sigmoid Operation)

> Complete ```Exp``` Operation in ```operations.py```

> Complete ```Pow``` Operation in ```operations.py```

> Complete ```Log``` Operation in ```operations.py```

> Complete ```Onehot``` Operation in ```operations.py```

Please take a look at ```Array``` class and how we use these operations to better understand the structure of autograd.

Let's test ```Tanh``` Operation.

In [49]:
import numpy as np
from autograd.array import Array
from autograd import grad

x_arr = Array(np.linspace(-2, 2, 6).astype(np.float32).reshape((2, 3)), is_parameter=True)
y_arr = x_arr.tanh()  # Using Array's function (see Array's tanh function)

assert np.allclose(y_arr.value, np.array([[-0.9640276,  -0.83365464, -0.37994897],
                                          [0.37994897,  0.83365464,  0.9640276]], dtype=np.float32))
dx = grad(y_arr, gradient_vector=np.ones_like(y_arr.value))[x_arr]
assert np.allclose(dx, np.array([[0.07065082, 0.30501992, 0.85563874],
                                 [0.85563874, 0.30501992, 0.07065082]], dtype=np.float32))


GRADIENTS defaultdict(<class 'list'>, {Autograd array, Operation: Tanh: [[-0.9640276  -0.83365464 -0.37994897]
 [ 0.37994897  0.83365464  0.9640276 ]]: [array([[1., 1., 1.],
       [1., 1., 1.]], dtype=float32)]})


Test ```Exp``` Operation

In [50]:
x_arr = Array(np.linspace(-2, 2, 6).astype(np.float32).reshape((2, 3)), is_parameter=True)
y_arr = x_arr.exp()

assert np.allclose(y_arr.value, grad(y_arr, np.ones_like(y_arr.value))[x_arr])
assert np.allclose(y_arr.value, np.array([[0.13533528, 0.3011942 , 0.67032003],
                                    [1.4918246 , 3.3201172 , 7.3890557 ]], dtype=np.float32))

GRADIENTS defaultdict(<class 'list'>, {Autograd array, Operation: Exp: [[0.13533528 0.3011942  0.67032003]
 [1.4918246  3.3201172  7.3890557 ]]: [array([[1., 1., 1.],
       [1., 1., 1.]], dtype=float32)]})


Test ```Pow``` Operation

In [51]:
x_arr = Array(np.linspace(-2, 2, 6).astype(np.float32).reshape((2, 3)), is_parameter=True)
y_arr = x_arr ** 2

assert np.allclose(y_arr.value, np.array([[4.0000, 1.4400, 0.1600],
                                          [0.1600, 1.4400, 4.0000]], dtype=np.float32))
assert np.allclose(grad(y_arr, np.ones_like(y_arr.value))[x_arr], np.array([[-4.0000, -2.4000, -0.8000],
                                                                  [0.8000,  2.4000,  4.0000]], dtype=np.float32))


GRADIENTS defaultdict(<class 'list'>, {Autograd array, Operation: Power: [[4.         1.44       0.16000001]
 [0.16000001 1.44       4.        ]]: [array([[1., 1., 1.],
       [1., 1., 1.]], dtype=float32)]})


Test ```Log``` Operation

In [52]:
x_arr = Array(np.linspace(0.1, 2, 6, dtype=np.float32).reshape(2, 3), is_parameter=True)
y_arr = x_arr.log()

assert np.allclose(y_arr.value, np.array([[-2.3025851, -0.7339692, -0.15082288],
                                          [0.21511139,  0.48242617,  0.6931472]], dtype=np.float32))


#assert np.allclose(grad(y_arr.sum().sum())[x_arr], np.array([[10.0000,  2.0833,  1.1628], 
                                                             #[0.8065,  0.6173,  0.5000]], dtype=np.float32), atol=1e-4)


Test ```Onehot``` Operation

In [53]:
x_arr = Array(np.array([1, 2]), is_parameter=True)
y_arr = x_arr.onehot(3)

assert np.allclose(y_arr.value, np.array([[0, 1, 0],[0, 0, 1]], dtype=np.float32))
assert np.allclose(grad(y_arr, np.arange(6).reshape(2, 3).astype(np.float32))[x_arr], np.array([1, 5.]))

GRADIENTS defaultdict(<class 'list'>, {Autograd array, Operation: Onehot: [[0. 1. 0.]
 [0. 0. 1.]]: [array([[0., 1., 2.],
       [3., 4., 5.]], dtype=float32)]})


## Functions using Array Operations

We implemented basic operations for building sophisticated neural network layers and functions. For example, we can implement ```relu``` function.

```Python
def relu(array: Array) -> Array:
    return array.maximum(0.0)
```

Since we have already implemented the ```maximum``` operation, we do not need to deal with the gradient of the ```relu``` function. We can use the ```relu``` function to build neural networks and its ```vjp``` will be automatically called during gradient calculations. Similarly, we can implement other functions using ```Array``` operations that we already defined.

> Complete ```leaky_relu``` in ```functions.py``` using ```Array``` methods

In [54]:
import numpy as np
from autograd.functions import leaky_relu
from autograd.array import Array
from autograd import grad

x_arr = Array(np.array([-2, -1, .10, 1, 2], dtype=np.float32), is_parameter=True)
y_arr = leaky_relu(x_arr, negative_slope=0.1)
assert np.allclose(grad(y_arr.sum(axis=0))[x_arr], np.array([0.1, 0.1, 1, 1, 1], dtype=np.float32))


GRADIENTS defaultdict(<class 'list'>, {Autograd array, Operation: Sum: [-0.2 -0.1  0.1  1.   2. ]: [array(1.)]})


> Complete ```nll_with_logits_loss``` in ```functions.py``` using ```Array``` methods

In [76]:
from autograd.functions import nll_with_logits_loss

logits = Array(np.array([[0.3, -0.4, 0.1, 1.3]], dtype=np.float32), is_parameter=True)
label = Array(np.array([3], dtype=np.int64))


loss = nll_with_logits_loss(logits, label).mean()

assert np.allclose(loss.value, np.array([0.616135], dtype=np.float32)), "Error in evaluation"

a = grad(loss)[logits]
print(grad(loss))
assert np.allclose(grad(loss)[logits], np.array([[ 0.19866505,  0.09865415,  0.1626532 , -0.4599724 ]])), "Error in gradient"



GRADIENTS defaultdict(<class 'list'>, {Autograd array, Operation: Mean: 0.6161350011825562: [array(1.)]})
VAR OPERATION <autograd.operations.Mean object at 0x000001E63C3D36C8>
VAR OPERATION <autograd.operations.Mean object at 0x000001E63C3D36C8>
VAR OPERATION <autograd.operations.Multiply object at 0x000001E63C3D3BC8>
VAR OPERATION <autograd.operations.Multiply object at 0x000001E63C3D3BC8>
VAR OPERATION None
VAR OPERATION <autograd.operations.Sum object at 0x000001E63C3D3C88>
VAR OPERATION <autograd.operations.Sum object at 0x000001E63C3D3C88>
VAR OPERATION <autograd.operations.Multiply object at 0x000001E63C3D3A48>
VAR OPERATION <autograd.operations.Multiply object at 0x000001E63C3D3A48>
VAR OPERATION <autograd.operations.Onehot object at 0x000001E63C3D3E88>
VAR OPERATION <autograd.operations.Onehot object at 0x000001E63C3D3E88>
VAR OPERATION None
VAR OPERATION <autograd.operations.Log object at 0x000001E63C3D3CC8>
VAR OPERATION <autograd.operations.Log object at 0x000001E63C3D3CC8>


Now we can continue with neural networks using all the functions we have implemented so far.