## Introduction

In mathematics, tensors are geometric objects that describe linear relations between geometric vectors, scalars, and other tensors. Elementary examples of such relations include the dot product, the cross product, and linear maps. Geometric vectors, often used in physics and engineering applications, and scalars themselves are also tensors [Reference: Wiki].

TensorFlow is an open source library for numerical computation, specializing in machine learning applications. [Refer to this paper for in-depth detail: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45166.pdf]

A  TensorFlow  computation  is  described  by  a  directed graph, which is composed of a set of nodes. 
The graph represents a dataflow computation,  with extensions for allowing  some  kinds  of  nodes  to  maintain  and  update persistent  state  and  for  branching  and  looping  control structures within the graph.

In a TensorFlow graph, each node has zero or more inputs and zero or more outputs, and represents the instantiation of an operation.

Values that flow along normal edges in the graph (from outputs to inputs) are tensors, arbitrary dimensionality arrays where the underlying element type is specified or inferred at graph-construction time. 

<img src="assets/add.png" style="height: 50%;width: 50%">


Special  edges,  called control  dependencies ,  can also exist in the graph:  no data flows along such edges, but they indicate that the source node for the control dependence  must  finish  executing  before  the  destination
node for the control dependence starts executing. This is done using *Topological sort*

In order to define your network, you'll need to define the order of operations for your nodes. Given that the input to some node depends on the outputs of others, you need to flatten the graph in such a way where all the input dependencies for each node are resolved before trying to run its calculation. This is a technique called a topological sort.

In the field of computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge $u \rightarrow v$ from vertex $u$ to vertex $v$, $u$ comes before $v$ in the ordering.

<img src="assets/topological_sort.png" style="height: 50%;width: 50%">

In [16]:
# Let first define the basis on which TensorFlow is build
# Lets do a simple Add operation

class Node(object):
    def __init__(self, inbound_nodes=[]):
        # Nodes from which this Node receives values
        self.inbound_nodes = inbound_nodes
        # Nodes to which this Node passes values
        self.outbound_nodes = []
        # A calculated value
        self.value = None
        # Add this node as an outbound node on its inputs.
        for n in self.inbound_nodes:
            n.outbound_nodes.append(self)

    # These will be implemented in a subclass.
    def forward(self):
        """
        Forward propagation.

        Compute the output value based on `inbound_nodes` and
        store the result in self.value.
        """
        raise NotImplemented

# Input is a subclass, which has the inheritence from the parent class (Node)
class Input(Node):
    def __init__(self):
        # an Input node has no inbound nodes,
        # so no need to pass anything to the Node instantiator
        Node.__init__(self)

    # NOTE: Input node is the only node that may
    # receive its value as an argument to forward().
    #
    # All other node implementations should calculate their
    # values from the value of previous nodes, using
    # self.inbound_nodes
    #
    # Example:
    # val0 = self.inbound_nodes[0].value
    def forward(self, value=None):
        if value is not None:
            self.value = value

# This is Add subclass
class Add(Node):
    def __init__(self, x, y):
        # You could access `x` and `y` in forward with
        # self.inbound_nodes[0] (`x`) and self.inbound_nodes[1] (`y`)
        Node.__init__(self, [x, y])
        # print (self.inbound_nodes)
        # print 
    def forward(self):
        """
        Set the value of this node (`self.value`) to the sum of its inbound_nodes.
        """
        self.value=self.inbound_nodes[0].value+self.inbound_nodes[1].value

class Multiply(Node):
    def __init__(self, x, y):
        Node.__init__(self, [x, y]) # Initialize the mulitply node with it's in-bound nodes
    
    def forward(self):
        self.value = self.inbound_nodes[0].value * self.inbound_nodes[1].value

def topological_sort(feed_dict):
    """
    Sort generic nodes in topological order using Kahn's Algorithm.

    `feed_dict`: A dictionary where the key is a `Input` node and the value is the respective value feed to that node.

    Returns a list of sorted nodes.
    """

    input_nodes = [n for n in feed_dict.keys()]

    G = {}
    nodes = [n for n in input_nodes]
    while len(nodes) > 0:
        n = nodes.pop(0)
        if n not in G:
            G[n] = {'in': set(), 'out': set()}
        for m in n.outbound_nodes:
            if m not in G:
                G[m] = {'in': set(), 'out': set()}
            G[n]['out'].add(m)
            G[m]['in'].add(n)
            nodes.append(m)

    L = []
    S = set(input_nodes)
    while len(S) > 0:
        n = S.pop()

        if isinstance(n, Input):
            n.value = feed_dict[n]

        L.append(n)
        for m in n.outbound_nodes:
            G[n]['out'].remove(m)
            G[m]['in'].remove(n)
            # if no other incoming edges add to S
            if len(G[m]['in']) == 0:
                S.add(m)
    return L


def forward_pass(output_node, sorted_nodes):
    """
    Performs a forward pass through a list of sorted nodes.

    Arguments:

        `output_node`: A node in the graph, should be the output node (have no outgoing edges).
        `sorted_nodes`: A topologically sorted list of nodes.

    Returns the output Node's value
    """

    for n in sorted_nodes:
        n.forward()

    return output_node.value

x, y = Input(), Input() # Inputs does not have any in-bound nodes
#print (x.outbound_nodes)
#print (y.outbound_nodes)

f = Add(x, y) # Once an Add subclass is called, it updates an Add (node) as outbound node to all its inbound nodes (which are x and y)
#print (f.inbound_nodes, f.outbound_nodes)

g = Multiply(f, x)
# print (f.inbound_nodes, f.outbound_nodes)
#print (x.outbound_nodes)
#print (f.value)


feed_dict = {x: 10, y: 5} # here x and y are input nodes

sorted_nodes = topological_sort(feed_dict)
# print (sorted_nodes)
output = forward_pass(f, sorted_nodes)
# output = forward_pass(g, sorted_nodes)

# NOTE: because topological_sort set the values for the `Input` nodes we could also access
# the value for x with x.value (same goes for y).
print("{} + {} = {} (according to flow)".format(feed_dict[x], feed_dict[y], output))
# print("({} + {})*{} = {} (according to flow)".format(feed_dict[x], feed_dict[y], feed_dict[x], output))






[<__main__.Input object at 0x000001DDCE17F160>, <__main__.Input object at 0x000001DDCE17F358>] [<__main__.Multiply object at 0x000001DDCE17F240>]
(10 + 5)*10 = 150 (according to flow)


<img src="linear_combination.png" style="height: 50%;width: 50%">

In a TensorFlow world, this looks like:
<img src="assets/linear_tensor.png" style="height: 50%;width: 50%">

In [41]:
import numpy as np
class Linear(Node):
    def __init__(self, inputs, weights, bias):
        Node.__init__(self, [inputs, weights, bias])
    
    def forward(self):
        # Check if the inbound instances are numpy arrays
        if isinstance(self.inbound_nodes[0].value, np.ndarray) is False:
            self.inbound_nodes[0].value = np.array(self.inbound_nodes[0].value)
        if isinstance(self.inbound_nodes[1].value, np.ndarray) is False:
            self.inbound_nodes[1].value = np.array(self.inbound_nodes[1].value)
        if isinstance(self.inbound_nodes[2].value, np.ndarray) is False:
            self.inbound_nodes[2].value = np.array(self.inbound_nodes[2].value)
        # You can also write assertions to make sure the dimensions match
        self.value = np.dot(self.inbound_nodes[0].value, self.inbound_nodes[1].value) + self.inbound_nodes[2].value
        

In [60]:
# Let see if we have implemented the Linear node correctly
# First define the input objects
X = Input()
W = Input()
b = Input()

# Now define the Linear node
Y = Linear(X, W, b)

# Now feed these nodes some input values
feed_dict = {X: np.array([[2, 3], [1, 5], [6, 7]]), W: np.array([[2, 1], [3, 4]]),  b: np.array([[0.5], [1.0], [2.0]])}

sorted_nodes = topological_sort(feed_dict)
output = forward_pass(Y, sorted_nodes)

print (output)

[[ 13.5  14.5]
 [ 18.   22. ]
 [ 35.   36. ]]


In [61]:
# Now let's build a sigmoid node
class Sigmoid(Node):
    def __init__(self, inputs):
        Node.__init__(self, [inputs]) # inbound nodes to the node object should be iterable
    
    def forward(self):
        # print (self.inbound_nodes[0].value)
        self.value = 1./(1 + np.exp(-self.inbound_nodes[0].value))

In [62]:
Y_sigmoid = Sigmoid(Y)
# Now feed these nodes some input values
feed_dict = {X: np.array([[2, 3], [1, 5], [6, 7]]), W: np.array([[2, 1], [3, 4]]),  b: np.array([[0.5], [1.0], [2.0]])}
sorted_nodes = topological_sort(feed_dict)
output = forward_pass(Y_sigmoid, sorted_nodes)
print (output)

[[ 0.99999863  0.9999995 ]
 [ 0.99999998  1.        ]
 [ 1.          1.        ]]


In [70]:
# Now let's implement the cost function, which is the mean-square-error
class MSE(Node):
    def __init__(self, y_true, y_hat):
        Node.__init__(self, [y_true, y_hat])
    
    def forward(self):
        
        self.value = np.sum((self.inbound_nodes[0].value - 
                            self.inbound_nodes[1].value)**2)*1./self.inbound_nodes[0].value.shape[0]

In [71]:
# Now test the cost function

y_true = Input() # This is another input node to the MSE
y_hat = Input()

cost = MSE(y_true, y_hat)
y_ = np.array([1, 2, 3])
a_ = np.array([4.5, 5, 10])

feed_dict = {y_true: y_, y_hat: a_}
graph = topological_sort(feed_dict)
# forward pass
forward_pass(cost, graph)


23.416666666666668

### Forward Propagation in neural network

<img src="assets/forward_propagation_nn.png" style="height: 75%;width: 75%">

### Back propagation in Neural Networks

From the above figure we see that change in $L_2$ produces change in $C$. We call this relationship as the **Gradient**. Mathematically it is defined as:
$$\frac{\partial C}{\partial l_2}$$

If we want to update one of the weights with gradient descent, we'll need the gradient of the cost with respect to those weights. Let's see how we can use this framework to find the gradient for the weights in the second layer, $W_2$. We want to calculate the gradient of $C$ with respect to $W_2$:
$$\frac{\partial C}{\partial w_2}$$

The mean-square error which is the cost function, is given by:
$$C = \frac{1}{2m} \Sigma_x[y_{true} - \hat{y}]^{2}$$
$$C = \frac{1}{2m} \Sigma_x[y_{true} - L_2]^{2}$$

Where $L_2$ is the output from the Layer-2. So the gradient for $L_2$ node is:

$$\frac{\partial C}{\partial L_2} = \frac{\partial}{\partial L_2} [\frac{1}{2m}\Sigma_x (y_{true}-L_2)^2]$$

$$ = \frac{-1}{m}[\Sigma_x (y_{true}-L_2)]$$

Now let's look at changes in $L_2$ due to the changes in $W_2$:
$$\frac{\partial L_w}{\partial W_2} = \frac{\partial}{\partial W_2}[W_2\times s_1+b_2]$$
$$ = s_1 $$ with the size $(m, n_1)$

Thus by chain-rule:

$$\frac{\partial C}{\partial W_2} = \frac{\partial C}{\partial L_2}\times\frac{\partial L_2}{\partial W_2}$$

Thus the change in the weights for the second hidden-layer is the:

$$\delta W_2  = -s_2\times \frac{1}{m}  \Sigma(y_{true}-L_2) $$


<img src="assets/back_propagation_1.png.jpg" style="height: 75%;width: 75%"</img>

<img src="assets/back_propagation_2.jpg" style="height: 75%;width: 75%"</img>


In code this is given by:
<prev>
```# In Code this is given by
# Initialize a partial for each of the inbound_nodes.
self.gradients = {n: np.zeros_like(n.value) for n in self.inbound_nodes}
# Cycle through the outputs. The gradient will change depending
# on each output, so the gradients are summed over all outputs.
for n in self.outbound_nodes:
    # Get the partial of the cost with respect to this node.
    grad_cost = n.gradients[self]
    # Set the partial of the loss with respect to this node's inputs.
    self.gradients[self.inbound_nodes[0]] += np.dot(grad_cost, self.inbound_nodes[1].value.T)
    # Set the partial of the loss with respect to this node's weights.
    self.gradients[self.inbound_nodes[1]] += np.dot(self.inbound_nodes[0].value.T, grad_cost)
    # Set the partial of the loss with respect to this node's bias.
    self.gradients[self.inbound_nodes[2]] += np.sum(grad_cost, axis=0, keepdims=False)

```
</prev>

In [72]:
import numpy as np


class Node(object):
    """
    Base class for nodes in the network.

    Arguments:

        `inbound_nodes`: A list of nodes with edges into this node.
    """
    def __init__(self, inbound_nodes=[]):
        """
        Node's constructor (runs when the object is instantiated). Sets
        properties that all nodes need.
        """
        # A list of nodes with edges into this node.
        self.inbound_nodes = inbound_nodes
        # The eventual value of this node. Set by running
        # the forward() method.
        self.value = None
        # A list of nodes that this node outputs to.
        self.outbound_nodes = []
        # New property! Keys are the inputs to this node and
        # their values are the partials of this node with
        # respect to that input.
        self.gradients = {}
        # Sets this node as an outbound node for all of
        # this node's inputs.
        for node in inbound_nodes:
            node.outbound_nodes.append(self)

    def forward(self):
        """
        Every node that uses this class as a base class will
        need to define its own `forward` method.
        """
        raise NotImplementedError

    def backward(self):
        """
        Every node that uses this class as a base class will
        need to define its own `backward` method.
        """
        raise NotImplementedError


class Input(Node):
    """
    A generic input into the network.
    """
    def __init__(self):
        # The base class constructor has to run to set all
        # the properties here.
        #
        # The most important property on an Input is value.
        # self.value is set during `topological_sort` later.
        Node.__init__(self)

    def forward(self):
        # Do nothing because nothing is calculated.
        pass

    def backward(self):
        # An Input node has no inputs so the gradient (derivative)
        # is zero.
        # The key, `self`, is reference to this object.
        self.gradients = {self: 0}
        # Weights and bias may be inputs, so you need to sum
        # the gradient from output gradients.
        for n in self.outbound_nodes:
            self.gradients[self] += n.gradients[self]

class Linear(Node):
    """
    Represents a node that performs a linear transform.
    """
    def __init__(self, X, W, b):
        # The base class (Node) constructor. Weights and bias
        # are treated like inbound nodes.
        Node.__init__(self, [X, W, b])

    def forward(self):
        """
        Performs the math behind a linear transform.
        """
        X = self.inbound_nodes[0].value
        W = self.inbound_nodes[1].value
        b = self.inbound_nodes[2].value
        self.value = np.dot(X, W) + b

    def backward(self):
        """
        Calculates the gradient based on the output values.
        """
        # Initialize a partial for each of the inbound_nodes.
        self.gradients = {n: np.zeros_like(n.value) for n in self.inbound_nodes}
        # Cycle through the outputs. The gradient will change depending
        # on each output, so the gradients are summed over all outputs.
        for n in self.outbound_nodes:
            # Get the partial of the cost with respect to this node.
            grad_cost = n.gradients[self]
            # Set the partial of the loss with respect to this node's inputs.
            self.gradients[self.inbound_nodes[0]] += np.dot(grad_cost, self.inbound_nodes[1].value.T)
            # Set the partial of the loss with respect to this node's weights.
            self.gradients[self.inbound_nodes[1]] += np.dot(self.inbound_nodes[0].value.T, grad_cost)
            # Set the partial of the loss with respect to this node's bias.
            self.gradients[self.inbound_nodes[2]] += np.sum(grad_cost, axis=0, keepdims=False)


class Sigmoid(Node):
    """
    Represents a node that performs the sigmoid activation function.
    """
    def __init__(self, node):
        # The base class constructor.
        Node.__init__(self, [node])

    def _sigmoid(self, x):
        """
        This method is separate from `forward` because it
        will be used with `backward` as well.

        `x`: A numpy array-like object.
        """
        return 1. / (1. + np.exp(-x))

    def forward(self):
        """
        Perform the sigmoid function and set the value.
        """
        input_value = self.inbound_nodes[0].value
        self.value = self._sigmoid(input_value)

    def backward(self):
        """
        Calculates the gradient using the derivative of
        the sigmoid function.
        """
        # Initialize the gradients to 0.
        self.gradients = {n: np.zeros_like(n.value) for n in self.inbound_nodes}
        # Sum the partial with respect to the input over all the outputs.
        for n in self.outbound_nodes:
            grad_cost = n.gradients[self]
            sigmoid = self.value
            self.gradients[self.inbound_nodes[0]] += sigmoid * (1 - sigmoid) * grad_cost


class MSE(Node):
    def __init__(self, y, a):
        """
        The mean squared error cost function.
        Should be used as the last node for a network.
        """
        # Call the base class' constructor.
        Node.__init__(self, [y, a])

    def forward(self):
        """
        Calculates the mean squared error.
        """
        # NOTE: We reshape these to avoid possible matrix/vector broadcast
        # errors.
        #
        # For example, if we subtract an array of shape (3,) from an array of shape
        # (3,1) we get an array of shape(3,3) as the result when we want
        # an array of shape (3,1) instead.
        #
        # Making both arrays (3,1) insures the result is (3,1) and does
        # an elementwise subtraction as expected.
        y = self.inbound_nodes[0].value.reshape(-1, 1)
        a = self.inbound_nodes[1].value.reshape(-1, 1)

        self.m = self.inbound_nodes[0].value.shape[0]
        # Save the computed output for backward.
        self.diff = y - a
        self.value = np.mean(self.diff**2)

    def backward(self):
        """
        Calculates the gradient of the cost.
        """
        self.gradients[self.inbound_nodes[0]] = (2 / self.m) * self.diff
        self.gradients[self.inbound_nodes[1]] = (-2 / self.m) * self.diff


def topological_sort(feed_dict):
    """
    Sort the nodes in topological order using Kahn's Algorithm.

    `feed_dict`: A dictionary where the key is a `Input` Node and the value is the respective value feed to that Node.

    Returns a list of sorted nodes.
    """

    input_nodes = [n for n in feed_dict.keys()]

    G = {}
    nodes = [n for n in input_nodes]
    while len(nodes) > 0:
        n = nodes.pop(0)
        if n not in G:
            G[n] = {'in': set(), 'out': set()}
        for m in n.outbound_nodes:
            if m not in G:
                G[m] = {'in': set(), 'out': set()}
            G[n]['out'].add(m)
            G[m]['in'].add(n)
            nodes.append(m)

    L = []
    S = set(input_nodes)
    while len(S) > 0:
        n = S.pop()

        if isinstance(n, Input):
            n.value = feed_dict[n]

        L.append(n)
        for m in n.outbound_nodes:
            G[n]['out'].remove(m)
            G[m]['in'].remove(n)
            # if no other incoming edges add to S
            if len(G[m]['in']) == 0:
                S.add(m)
    return L


def forward_and_backward(graph):
    """
    Performs a forward pass and a backward pass through a list of sorted Nodes.

    Arguments:

        `graph`: The result of calling `topological_sort`.
    """
    # Forward pass
    for n in graph:
        n.forward()

    # Backward pass
    # see: https://docs.python.org/2.3/whatsnew/section-slices.html
    for n in graph[::-1]:
        n.backward()


def sgd_update(trainables, learning_rate=1e-2):
    """
    Updates the value of each trainable with SGD.

    Arguments:

        `trainables`: A list of `Input` Nodes representing weights/biases.
        `learning_rate`: The learning rate.
    """
    # TODO: update all the `trainables` with SGD
    # You can access and assign the value of a trainable with `value` attribute.
    # Example:
    for t in trainables:
        t.value = t.value - learning_rate*t.gradients[t]



X, W, b = Input(), Input(), Input()
y = Input()
f = Linear(X, W, b)
a = Sigmoid(f)
cost = MSE(y, a)

X_ = np.array([[-1., -2.], [-1, -2]])
W_ = np.array([[2.], [3.]])
b_ = np.array([-3.])
y_ = np.array([1, 2])

feed_dict = {
    X: X_,
    y: y_,
    W: W_,
    b: b_,
}

graph = topological_sort(feed_dict)
forward_and_backward(graph)
# return the gradients for each Input
gradients = [t.gradients[t] for t in [X, y, W, b]]

"""
Expected output

[array([[ -3.34017280e-05,  -5.01025919e-05],
       [ -6.68040138e-05,  -1.00206021e-04]]), array([[ 0.9999833],
       [ 1.9999833]]), array([[  5.01028709e-05],
       [  1.00205742e-04]]), array([ -5.01028709e-05])]
"""
print(gradients)




[array([[ -3.34017280e-05,  -5.01025919e-05],
       [ -6.68040138e-05,  -1.00206021e-04]]), array([[ 0.9999833],
       [ 1.9999833]]), array([[  5.01028709e-05],
       [  1.00205742e-04]]), array([ -5.01028709e-05])]


In [75]:
# Now load the boston data from sklearn
"""
Check out the new network architecture and dataset!

Notice that the weights and biases are
generated randomly.

No need to change anything, but feel free to tweak
to test your network, play around with the epochs, batch size, etc!
"""

import numpy as np
from sklearn.datasets import load_boston
from sklearn.utils import shuffle, resample

# Load data
data = load_boston()
X_ = data['data']
y_ = data['target']

# Normalize data
X_ = (X_ - np.mean(X_, axis=0)) / np.std(X_, axis=0)

n_features = X_.shape[1]
n_hidden = 10
W1_ = np.random.randn(n_features, n_hidden)
b1_ = np.zeros(n_hidden)
W2_ = np.random.randn(n_hidden, 1)
b2_ = np.zeros(1)

# Neural network
X, y = Input(), Input()
W1, b1 = Input(), Input()
W2, b2 = Input(), Input()

l1 = Linear(X, W1, b1)
s1 = Sigmoid(l1)
l2 = Linear(s1, W2, b2)
cost = MSE(y, l2)

feed_dict = {
    X: X_,
    y: y_,
    W1: W1_,
    b1: b1_,
    W2: W2_,
    b2: b2_
}

epochs = 1000
# Total number of examples
m = X_.shape[0]
batch_size = 11
steps_per_epoch = m // batch_size

graph = topological_sort(feed_dict)
trainables = [W1, b1, W2, b2]

print("Total number of examples = {}".format(m))

# Step 4
for i in range(epochs):
    loss = 0
    for j in range(steps_per_epoch):
        # Step 1
        # Randomly sample a batch of examples
        X_batch, y_batch = resample(X_, y_, n_samples=batch_size)

        # Reset value of X and y Inputs
        X.value = X_batch
        y.value = y_batch

        # Step 2
        forward_and_backward(graph)

        # Step 3
        sgd_update(trainables)

        loss += graph[-1].value

    print("Epoch: {}, Loss: {:.3f}".format(i+1, loss/steps_per_epoch))


Total number of examples = 506
Epoch: 1, Loss: 120.109
Epoch: 2, Loss: 33.018
Epoch: 3, Loss: 27.542
Epoch: 4, Loss: 23.355
Epoch: 5, Loss: 24.365
Epoch: 6, Loss: 15.663
Epoch: 7, Loss: 16.942
Epoch: 8, Loss: 14.217
Epoch: 9, Loss: 15.675
Epoch: 10, Loss: 18.921
Epoch: 11, Loss: 11.797
Epoch: 12, Loss: 16.114
Epoch: 13, Loss: 14.660
Epoch: 14, Loss: 16.879
Epoch: 15, Loss: 15.137
Epoch: 16, Loss: 14.133
Epoch: 17, Loss: 14.237
Epoch: 18, Loss: 14.512
Epoch: 19, Loss: 12.001
Epoch: 20, Loss: 9.809
Epoch: 21, Loss: 8.268
Epoch: 22, Loss: 11.764
Epoch: 23, Loss: 9.718
Epoch: 24, Loss: 9.471
Epoch: 25, Loss: 8.988
Epoch: 26, Loss: 9.592
Epoch: 27, Loss: 8.738
Epoch: 28, Loss: 8.382
Epoch: 29, Loss: 9.516
Epoch: 30, Loss: 9.224
Epoch: 31, Loss: 9.124
Epoch: 32, Loss: 8.508
Epoch: 33, Loss: 6.893
Epoch: 34, Loss: 8.669
Epoch: 35, Loss: 6.753
Epoch: 36, Loss: 8.717
Epoch: 37, Loss: 7.540
Epoch: 38, Loss: 8.093
Epoch: 39, Loss: 7.137
Epoch: 40, Loss: 6.420
Epoch: 41, Loss: 8.423
Epoch: 42, Los

Epoch: 351, Loss: 3.910
Epoch: 352, Loss: 4.412
Epoch: 353, Loss: 5.476
Epoch: 354, Loss: 4.244
Epoch: 355, Loss: 4.594
Epoch: 356, Loss: 4.009
Epoch: 357, Loss: 4.930
Epoch: 358, Loss: 4.735
Epoch: 359, Loss: 4.445
Epoch: 360, Loss: 5.347
Epoch: 361, Loss: 3.660
Epoch: 362, Loss: 4.690
Epoch: 363, Loss: 4.667
Epoch: 364, Loss: 4.738
Epoch: 365, Loss: 5.244
Epoch: 366, Loss: 4.898
Epoch: 367, Loss: 4.959
Epoch: 368, Loss: 4.967
Epoch: 369, Loss: 4.923
Epoch: 370, Loss: 5.399
Epoch: 371, Loss: 4.284
Epoch: 372, Loss: 5.348
Epoch: 373, Loss: 5.420
Epoch: 374, Loss: 4.455
Epoch: 375, Loss: 4.487
Epoch: 376, Loss: 4.556
Epoch: 377, Loss: 4.046
Epoch: 378, Loss: 4.871
Epoch: 379, Loss: 4.717
Epoch: 380, Loss: 4.412
Epoch: 381, Loss: 4.865
Epoch: 382, Loss: 4.204
Epoch: 383, Loss: 4.753
Epoch: 384, Loss: 4.674
Epoch: 385, Loss: 3.974
Epoch: 386, Loss: 4.238
Epoch: 387, Loss: 4.329
Epoch: 388, Loss: 5.247
Epoch: 389, Loss: 4.856
Epoch: 390, Loss: 4.900
Epoch: 391, Loss: 4.534
Epoch: 392, Loss

Epoch: 711, Loss: 3.987
Epoch: 712, Loss: 3.783
Epoch: 713, Loss: 4.506
Epoch: 714, Loss: 3.638
Epoch: 715, Loss: 4.334
Epoch: 716, Loss: 4.261
Epoch: 717, Loss: 4.142
Epoch: 718, Loss: 4.321
Epoch: 719, Loss: 4.314
Epoch: 720, Loss: 4.105
Epoch: 721, Loss: 4.195
Epoch: 722, Loss: 4.302
Epoch: 723, Loss: 3.723
Epoch: 724, Loss: 4.265
Epoch: 725, Loss: 4.203
Epoch: 726, Loss: 3.686
Epoch: 727, Loss: 3.616
Epoch: 728, Loss: 3.596
Epoch: 729, Loss: 4.185
Epoch: 730, Loss: 3.889
Epoch: 731, Loss: 4.260
Epoch: 732, Loss: 3.545
Epoch: 733, Loss: 4.456
Epoch: 734, Loss: 3.395
Epoch: 735, Loss: 4.525
Epoch: 736, Loss: 4.037
Epoch: 737, Loss: 3.985
Epoch: 738, Loss: 4.075
Epoch: 739, Loss: 3.656
Epoch: 740, Loss: 4.770
Epoch: 741, Loss: 4.617
Epoch: 742, Loss: 4.360
Epoch: 743, Loss: 4.337
Epoch: 744, Loss: 3.840
Epoch: 745, Loss: 4.180
Epoch: 746, Loss: 3.998
Epoch: 747, Loss: 4.425
Epoch: 748, Loss: 4.196
Epoch: 749, Loss: 4.272
Epoch: 750, Loss: 4.433
Epoch: 751, Loss: 4.177
Epoch: 752, Loss