# Notebook 3:
writing your own library like tensorflow, A MiniFlow library.

Before we start using tensorflow, keras or anyother library, it is best to learn how this library works conceptually. So, to do that we will write our own library - MiniFlow which will work similiarly like tensorFlow. Why this is important? Well, before using this abstractions Isn't it is good to learn how under the hood this library works? forward pass, backprop, derivatives or chain rule?

An intellectually curious mind should know this nutss and bolts and that is why we will write our first Neural Network in from scratch.

## MiniFlow Architecture:

Let's consider how to implement this graph structure in MiniFlow. We'll use a Python class to represent a generic node.

We know that each node might receive input from multiple other nodes. We also know that each node creates a single output, which will likely be passed to other nodes. Let's add two lists: one to store references to the inbound nodes, and the other to store references to the outbound nodes.

In [2]:
class Node(object):
    def __init__(self, inbound_nodes = []):
        #Nodes from which this Node recieves values
        self.inbound_nodes = inbound_nodes
        #Nodes to which this node pass values
        self.outbound_nodes = []
        #for each inbound_nodes add this Node as outbound Node.
        for inbound_node in self.inbound_nodes:
            inbound_node.outbound_nodes.append(self)
        #A calculated final value of this Node
        self.value = None
        
    def forward(self):
        '''
        Forward propagation.
        
        compute the output value based on 'inbound_nodes' and 
        store the result in self.value
        '''
        raise NotImplemented

*While Node defines the basic set of properties that every node holds, only specialized subclasses of will end uo in final graph. So, lets build our first subclass which will calculate the value and hold value.*

In [3]:
class Input(Node):
    def __init__(self):
        #An Input Node has no inbound nodes,
        #so no need to pass anything to the Node instantiator.
        Node.__init__(self)
    
    def forward(self, value=None):
        '''
        since Input node is the node which doesn't have any inbound_nodes
        this forward method will take value as input and set it self.value
        while, other non input Node's forward method will read the value form each inbound_nodes.value
        calculate the resultant and store it in self.value.
        '''
        #overwite the value if one passed in.
        if value is not None:
            self.value = value
        return self.value

*Okay, so far we wrote Input Node which doesn't take any input from other nodes. But itself holds the input to neural network and this value can be set directly by setting Node.vlaue or by passing value to forward method.*

*As we know, in nueral networks there are nodes which takes value from such input nodes or hidden nodes perform actual calculation and save it along with passing it to rest of network.*

*Okay, Lets implement Add Node which will does exactly this.*

In [4]:
class Add(Node):
    def __init__(self, inbound_nodes = []):
        Node.__init__(self, inbound_nodes)
    
    def forward(self):
        """
        Note: this method doesn't has value parameter as we supposed to take
        values from inbound_nodes and perform calculation.
        """
        self.value = 0
        for inbound_node in self.inbound_nodes:
                self.value += inbound_node.value
        
        return self.value

## Forward Propagation:
Like in tensorFlow library, we has to create the computation graph first which gets initialized when we pass values to it and call evaluate. It then checks the computation graph, runs through the computation and give us the output. Similiary, we will implement to menthods in the this library.

`topological_sort() and forward_pass()`, In order to define the network we need to define the order of operations on nodes. Given the input to some node depends on the output of others, we need to flatten this computation graph in such a way that all the nodes gets evaluated first whose inputs are needed to calculate the output of current node.

To resolve this we will implement kahn's Algorithm which will sort the nodes inthe order of their calculation. The input of `topological_sort()` is `feed_dict: a python dict` and the output is `sorted list of nodes`.

Then `forward_pass()` will take this `sorted_nodes` list and do a forward pass on each node and gives back the final `output_node` which will contain the final value of the network.

In [5]:
def topological_sort(feed_dict):
    """
    Sort generic nodes in topological order using Kahn's Algorithm.

    `feed_dict`: A dictionary where the key is a `Input` node and the value is the respective value feed to that node.

    Returns a list of sorted nodes.
    """

    input_nodes = [n for n in feed_dict.keys()]

    G = {}
    nodes = [n for n in input_nodes]
    while len(nodes) > 0:
        n = nodes.pop(0)
        if n not in G:
            G[n] = {'in': set(), 'out': set()}
        for m in n.outbound_nodes:
            if m not in G:
                G[m] = {'in': set(), 'out': set()}
            G[n]['out'].add(m)
            G[m]['in'].add(n)
            nodes.append(m)

    L = []
    S = set(input_nodes)
    while len(S) > 0:
        n = S.pop()

        if isinstance(n, Input):
            n.value = feed_dict[n]

        L.append(n)
        for m in n.outbound_nodes:
            G[n]['out'].remove(m)
            G[m]['in'].remove(n)
            # if no other incoming edges add to S
            if len(G[m]['in']) == 0:
                S.add(m)
    return L

In [6]:
def forward_pass(output_node, sorted_nodes):
    """
    Performs a forward pass through a list of sorted nodes.

    Arguments:

        `output_node`: A node in the graph, should be the output node (have no outgoing edges).
        `sorted_nodes`: A topologically sorted list of nodes.

    Returns the output Node's value
    """

    for n in sorted_nodes:
        n.forward()

    return output_node.value

In [7]:
"""
This script builds and runs a graph with miniflow.

There is no need to change anything to solve this quiz!

However, feel free to play with the network! Can you also
build a network that solves the equation below?

(x + y) + y
"""

x, y, z = Input(), Input(), Input()

f = Add([x, y, z])

feed_dict = {x: 10, y: 5, z:8}

sorted_nodes = topological_sort(feed_dict)
output = forward_pass(f, sorted_nodes)

# NOTE: because topological_sort sets the values for the `Input` nodes we could also access
# the value for x with x.value (same goes for y).
print("{} + {} = {} (according to miniflow)".format(feed_dict[x], feed_dict[y], output))

10 + 5 = 23 (according to miniflow)


Congratualtions!, on building your first feed forward nueral network.
Next this will be to compare output value:`y'` with true value:`y`, calculate error term and do backprop to adjust the weights to improve the model.


## Learning and loss

Cool, So far we implemented forward pass of neural network which outputs y'. The real learning starts after this in NN, the error term is calculated and backpropagation happen where the network weights and bias are adjusted using chain rule. This weights and bias are updated by a simple formula: W - alpha * del_of_errorterm_wrt_W, where alpha is learning rate.

To understand the learning happening in the network, let's implement `Linear` node which takes weight, bias and input(x) and outputs `$\sum(W_i*X_i) + bias$`

In [8]:
class Linear(Node):
    def __init__(self, inputs, weights, bias):
        Node.__init__(self, [inputs, weights, bias])

        # NOTE: The weights and bias properties here are not
        # numbers, but rather references to other nodes.
        # The weight and bias values are stored within the
        # respective nodes.

    def forward(self):
        """
        Set self.value to the value of the linear function output.
        
        """
        inputs = self.inbound_nodes[0].value
        weights = self.inbound_nodes[1].value
        bias = self.inbound_nodes[2].value
        self.value = bias
        
        print("inbound_nodes[0].value: "+ str(inputs))
        print("inbound_nodes[1].value: "+ str(weights))
        print("inbound_nodes[2].value: "+ str(bias))
        for x, w in zip(inputs, weights):
            self.value += x * w


In [9]:
inputs, weights, bias = Input(), Input(), Input()

f = Linear(inputs, weights, bias)

feed_dict = {
    inputs: [6, 14, 3],
    weights: [0.5, 0.25, 1.4],
    bias: 2
}

graph = topological_sort(feed_dict)
output = forward_pass(f, graph)

print(output) # should be 12.7 with this example

inbound_nodes[0].value: [6, 14, 3]
inbound_nodes[1].value: [0.5, 0.25, 1.4]
inbound_nodes[2].value: 2
12.7


In [25]:
"""
Modify Linear#forward so that it linearly transforms
input matrices, weights matrices and a bias vector to
an output.
"""
class Linear(Node):
    def __init__(self, X, W, b):
        # Notice the ordering of the input nodes passed to the
        # Node constructor.
        Node.__init__(self, [X, W, b])

    def forward(self):
        """
        Set the value of this node to the linear transform output.

        Your code goes here!
        """
        inputs = self.inbound_nodes[0].value
        weights = self.inbound_nodes[1].value
        bias = self.inbound_nodes[2].value
        
#         print("inbound_nodes[0].value: "+ str(inputs))
#         print("inbound_nodes[1].value: "+ str(weights))
#         print("inbound_nodes[2].value: "+ str(bias))
        self.value = np.dot(inputs, weights) + bias


In [26]:
"""
The setup is similar to the prevous `Linear` node you wrote
except you're now using NumPy arrays instead of python lists.

Update the Linear class in miniflow.py to work with
numpy vectors (arrays) and matrices.

Test your code here!
"""

import numpy as np

X, W, b = Input(), Input(), Input()

f = Linear(X, W, b)

X_ = np.array([[-1., -2.], [-1, -2]])
W_ = np.array([[2., -3], [2., -3]])
b_ = np.array([-3., -5])

feed_dict = {X: X_, W: W_, b: b_}

graph = topological_sort(feed_dict)
output = forward_pass(f, graph)

"""
Output should be:
[[-9., 4.],
[-9., 4.]]
"""
print(output)

[[-9.  4.]
 [-9.  4.]]


## Activation function: Sigmoid function


In [35]:
class Sigmoid(Node):
    """
    You need to fix the `_sigmoid` and `forward` methods.
    """
    def __init__(self, node):
        Node.__init__(self, [node])

    def _sigmoid(self, x):
        """
        This method is separate from `forward` because it
        will be used later with `backward` as well.

        `x`: A numpy array-like object.

        Return the result of the sigmoid function.

        Your code here!
        """
        return 1./(1 + np.exp(-x))


    def forward(self):
        """
        Set the value of this node to the result of the
        sigmoid function, `_sigmoid`.

        Your code here!
        """
        # This is a dummy value to prevent numpy errors
        # if you test without changing this method.
        inputs = self.inbound_nodes[0].value
                
        self.value = self._sigmoid(inputs)


In [36]:
"""
This network feeds the output of a linear transform
to the sigmoid function.

Finish implementing the Sigmoid class in miniflow.py!

Feel free to play around with this network, too!
"""

import numpy as np

X, W, b = Input(), Input(), Input()

f = Linear(X, W, b)
g = Sigmoid(f)

X_ = np.array([[-1., -2.], [-1, -2]])
W_ = np.array([[2., -3], [2., -3]])
b_ = np.array([-3., -5])

feed_dict = {X: X_, W: W_, b: b_}

graph = topological_sort(feed_dict)
output = forward_pass(g, graph)

"""
Output should be:
[[  1.23394576e-04   9.82013790e-01]
 [  1.23394576e-04   9.82013790e-01]]
"""
print(output)

[[1.23394576e-04 9.82013790e-01]
 [1.23394576e-04 9.82013790e-01]]


## Error term or Cost Function
### MSE

In [46]:
class MSE(Node):
    def __init__(self, y, a):
        """
        The mean squared error cost function.
        Should be used as the last node for a network.
        """
        # Call the base class' constructor.
        Node.__init__(self, [y, a])

    def forward(self):
        """
        Calculates the mean squared error.
        """
        # NOTE: We reshape these to avoid possible matrix/vector broadcast
        # errors.
        #
        # For example, if we subtract an array of shape (3,) from an array of shape
        # (3,1) we get an array of shape(3,3) as the result when we want
        # an array of shape (3,1) instead.
        #
        # Making both arrays (3,1) insures the result is (3,1) and does
        # an elementwise subtraction as expected.
        y = self.inbound_nodes[0].value.reshape(-1, 1)
        a = self.inbound_nodes[1].value.reshape(-1, 1)
        # TODO: your code here
        #self.value = np.divide(np.sum(np.square(np.subtract(y, a))),y.shape[0])
        self.value = np.mean(np.square(np.subtract(y, a)))


In [48]:
import numpy as np

y, a = Input(), Input()
cost = MSE(y, a)

y_ = np.array([1, 2, 3])
a_ = np.array([4.5, 5, 10])

feed_dict = {y: y_, a: a_}
graph = topological_sort(feed_dict)
# forward pass
forward_pass(cost, graph)

"""
Expected output

23.4166666667
"""
print(cost.value)

23.416666666666668


## Backpropagation : way to calculate amount of change in error cost due to wach weight and bias
### Gradient Descent : way to adjust weights and bias

Great! We've successfully calculated a full forward pass and found the cost. Next we need to start a backwards pass, which starts with backpropagation. Backpropagation is the process by which the network runs error values backwards.

During this process, the network calculates the way in which the weights need to change (also called the gradient) to reduce the overall error of the network. Changing the weights usually occurs through a technique called gradient descent.

Making sense of the purpose of backpropagation comes more easily after you work through the intended outcome. I'll come back to backpropagation in a bit, but first, I want to dive deeper into gradient descent.

![alt text](images/SGD.png "Point in 3-d surface")

Imagine a point on a surface in three dimensional space. In real-life, a ball sitting on the slope of a valley makes a nice analogy. In this case, the height of the point represents the difference between the current output of the network and the correct output given the current parameter values (hence why you need data with known outputs). Each dimension of the plane represents another parameter to the network. A network with m parameters would be a hypersurface of m dimensions.

(Imagining more than three dimensions is tricky. The good news is that the ball and valley example describes the behavior of gradient descent well, the only difference between three dimensional and n dimensional situations being the number of parameters in the calculations.)

In the ideal situation, the ball rests at the bottom of the valley, indicating the minimum difference between the output of the network and the known correct output.

The learning process starts with random weights and biases. In the ball analogy, the ball starts at a random point near the valley.

Gradient descent works by first calculating the slope of the plane at the current point, which includes calculating the partial derivatives of the loss with respect to all of the weights. This set of partial derivatives is called the gradient. Then it uses the gradient to modify the weights such that the next forward pass through the network moves the output lower in the hypersurface. Physically, this would be the same as measuring the slope of the valley at the location of the ball, and then moving the ball a small amount in the direction of the slope. Over time, it's possible to find the bottom of the valley with many small movements.

In [50]:
def gradient_descent_update(x, gradx, learning_rate):
    """
    Performs a gradient descent update.
    """
    # TODO: Implement gradient descent.
    
    # Return the new value for x
    return x

In [53]:
"""
Given the starting point of any `x` gradient descent
should be able to find the minimum value of x for the
cost function `f` defined below.
"""
import random


def f(x):
    """
    Quadratic function.

    It's easy to see the minimum value of the function
    is 5 when is x=0.
    """
    return x**2 + 5


def df(x):
    """
    Derivative of `f` with respect to `x`.
    """
    return 2*x


# Random number between 0 and 10,000. Feel free to set x whatever you like.
x = random.randint(0, 10000)
# TODO: Set the learning rate
learning_rate = 0.001
epochs = 100

for i in range(epochs+1):
    cost = f(x)
    gradx = df(x)
    print("EPOCH {}: Cost = {:.3f}, x = {:.3f}".format(i, cost, gradx))
    x = gradient_descent_update(x, gradx, learning_rate)

EPOCH 0: Cost = 29909966.000, x = 10938.000
EPOCH 1: Cost = 29909966.000, x = 10938.000
EPOCH 2: Cost = 29909966.000, x = 10938.000
EPOCH 3: Cost = 29909966.000, x = 10938.000
EPOCH 4: Cost = 29909966.000, x = 10938.000
EPOCH 5: Cost = 29909966.000, x = 10938.000
EPOCH 6: Cost = 29909966.000, x = 10938.000
EPOCH 7: Cost = 29909966.000, x = 10938.000
EPOCH 8: Cost = 29909966.000, x = 10938.000
EPOCH 9: Cost = 29909966.000, x = 10938.000
EPOCH 10: Cost = 29909966.000, x = 10938.000
EPOCH 11: Cost = 29909966.000, x = 10938.000
EPOCH 12: Cost = 29909966.000, x = 10938.000
EPOCH 13: Cost = 29909966.000, x = 10938.000
EPOCH 14: Cost = 29909966.000, x = 10938.000
EPOCH 15: Cost = 29909966.000, x = 10938.000
EPOCH 16: Cost = 29909966.000, x = 10938.000
EPOCH 17: Cost = 29909966.000, x = 10938.000
EPOCH 18: Cost = 29909966.000, x = 10938.000
EPOCH 19: Cost = 29909966.000, x = 10938.000
EPOCH 20: Cost = 29909966.000, x = 10938.000
EPOCH 21: Cost = 29909966.000, x = 10938.000
EPOCH 22: Cost = 299