# MiniFlow

In this lesson, I'll build a miniflow, a module that stores a simple neural network, implemented in python using numpy. The structure of this code was created by instructors from Udacity's Deep Learning Foundation, while some implementation as created by me as exercises for the Nanodegree.


## MiniFlow Architecture

A Python class we'll ve used to represent a generic node. Each node will receive input from multiple other nodes, and also creates a single output that will likely be passed to other nodes.

In [51]:
class Node(object):
    def __init__(self, inbound_nodes=[]):
        # Node(s) from which this node receives values
        self.inbound_nodes = inbound_nodes
        
        # Node(s) to which this node passes values
        self.outbound_nodes = []
        
        # For each inbound nodes here, add this node as an inbound node to that node
        for node in self.inbound_nodes:
            node.outbound_nodes.append(self)
            
        # Initializing the value that will be passed to other nodes as None
        self.value = None
         
    def forward(self):
        """
        Forward propagation.
        
        Compute the output value based on inbound_nodes and
        store the result in self.value. It doesn't actually perform the forward pass,
        only calculate its value and stores it in self.value
        """
        raise NotImplemented

The class node only set the base set of properties every node holds, but only specialized subclasses of Node will end up in the graph.

In [52]:
class Input(Node):
    def __init__(self):
        # Since the input is the first node in the graph, it has no inbound_nodes
        # so, when initializing the Node class, there's no need to pass in any other nodes
        Node.__init__(self)
        
    def forward(self, value=None):
        """
        This is the only node where the value may be passe din as an argument for
        forward method, since it does not have to perform and operation with values from inbound nodes
        """
        # Remember that self.value was already initiated in Node.__init__(self)
        # Overwrite the value if one is passed in
        if value is not None:
            self.value = value
        

## The Add Subclass

The Add subclass actually performs a calculation, addition.

In [53]:
class Add(Node):
    """
    This class will take a list of nodes and add the values stored in them together
    """
    def __init__(self, *):
        """
        We'll initialize the Node (parent) init function with the arguments given 
        in to the Add class initialization. We'll pass them as a list, since the parent
        node has a list as an argument for inbound_nodes
        """
        Node.__init__(seld, list(inputs))
        
    def forward(self):
        """
        This method will add the values stored in inbound_nodes
        """
        summ = 0
        for node in self.inbound_nodes:
            summ += node.value
        self.value = summ
        

SyntaxError: named arguments must follow bare * (<ipython-input-53-67ee5fb841f4>, line 5)

Since the input of some node depends on the output of other, there are dependencies for the order of the operations. To arrange the nodes in a order such that the operations can be performed, we'll have to sort the nodes before applying the forward pass.
The topological_sort() function implements topological sorting using Kahn's Algorithm. This function returns a sorted list of nodes in which all of the calculations can run in series.

In [None]:
def forward_pass(output_node, sorted_nodes):
    """
    Performs a forward pass through a list of sorted nodes.
    
    Arguments:
    'output_node': The output of the graph (no outgoing edges).
    'sorted_nodes': a topologically sorted list of nodes.
    
    Returns the output node's value
    """
    for n in sorted_nodes:
        n.forward()
        
    return output_node.value



Below will be defined the topological_sort() function, which implements topological sorting using Kahn's Algorithm.

In [None]:
def topological_sort(feed_dict):
    """
    Sort generic nodes in topological order using Kahn's Algorithm.

    `feed_dict`: A dictionary where the key is a `Input` node and the value is the respective value feed to that node.

    Returns a list of sorted nodes.
    """

    input_nodes = [n for n in feed_dict.keys()]

    G = {}
    nodes = [n for n in input_nodes]
    while len(nodes) > 0:
        n = nodes.pop(0)
        if n not in G:
            G[n] = {'in': set(), 'out': set()}
        for m in n.outbound_nodes:
            if m not in G:
                G[m] = {'in': set(), 'out': set()}
            G[n]['out'].add(m)
            G[m]['in'].add(n)
            nodes.append(m)

    L = []
    S = set(input_nodes)
    while len(S) > 0:
        n = S.pop()

        if isinstance(n, Input):
            n.value = feed_dict[n]

        L.append(n)
        for m in n.outbound_nodes:
            G[n]['out'].remove(m)
            G[m]['in'].remove(n)
            # if no other incoming edges add to S
            if len(G[m]['in']) == 0:
                S.add(m)
    return L

## Forward Propagation

We'll now use the structures created above to perform a forward pass in our network

In [None]:
x, y, z = Input(), Input(), Input()

f = Add(x, y, z)

feed_dict = {x:4, y: 5, z:10}

graph = topological_sort(feed_dict)
output = forward_pass(f, graph)

print("{} + {} + {} = {} (according to miniflow)".format(x.value, y.value, z.value, output))

## Learning and Loss

So far, this neural network can only perform a forward pass through its nodes. The final objective is to improve the accuracy of their outputs over time, which is not useful for an Add node. So before we dive into the backpropagation part, a more complex node shall be implemented, the Linear node, which will perform a linear combination of a input nodes list, weights list and a bias.

In [None]:
class Linear(Node):
    def __init__(self, inputs, weights, bias):
        Node.__init__(self, [inputs, weights, bias])
        # Since Linear node intantiate a regular Node with a list of lists, the
        # self.inbound_nodes isn't a list of nodes, but a list of three things:
        # list o inbound nodes, list of weights, and a bias
        
    def forward(self):
        linear_combination = 0
        
        for i in range(len(self.inbound_nodes[0].value)):
            linear_combination += self.inbound_nodes[0].value[i] * self.inbound_nodes[1].value[i]
        linear_combination += self.inbound_nodes[2].value
        
        self.value = linear_combination 
        
        """
        Using iteration with python structure is computationally poor, it's worth to mention
        that those python lists could have been transformed to python arrays and the dot product
        of them been performed to obtain the linear combination
        """

In [54]:
inputs, weights, bias = Input(), Input(), Input()

f = Linear(inputs, weights, bias)

feed_dict = {inputs:[6, 14, 3],
            weights:[0.5, 0.25, 1.4],
            bias: 2}

graph = topological_sort(feed_dict)
output = forward_pass(f, graph)

print(output)

TypeError: forward_pass() takes 1 positional argument but 2 were given

The previous example was performed with only one data point as input. Usually, it's common to feed in multiple data points in each forward pass, because the linear combinations can be processed in parallel, resulting in performance gains. The number of data points (exemples) is called batch size, and common numbers for batch size are 32, 64, 128, 256, 512.

So now the previous Linear node will be addapted to perform linear a linear transformation of the input matrix, which will be of size m (number of data points) by n (number of features of each exemple).

In [None]:
import numpy as np

class Linear(Node):
    def __init__(self, X, W, b):
        Node.__init__(self, [X, W, b])
        
    def forward(self):
        """
        This method will now perform a matrix multiplication between the features matrix
        and the weights matrix, and later add the bias array
        """
        inputs = self.inbound_nodes[0].value # m x n numpy matrix
        weights = self.inbound_nodes[1].value # n x k numpy matrix
        bias = self.inbound_nodes[2].value # vector of size k
        
        linear_transform = np.dot(inputs, weights)
        linear_transform += bias
        
        self.value = linear_transform
        
        
        

In [None]:
X, W, b = Input(), Input(), Input()

f = Linear(X, W, b)

X_ = np.array([[-1., -2.], [-1, -2]])
W_ = np.array([[2., -3], [2., -3]])
b_ = np.array([-3., -5])

feed_dict = {X: X_, W:W_, b:b_}

graph = topological_sort(feed_dict)
output = forward_pass(f, graph)

print(output)

"Linear transforms are great for simply shifting values, but neural networks often require a more nuanced transform. For instance, one of the original designs for an artificial neuron, the perceptron, exhibit binary output behavior. Perceptrons compare a weighted input to a threshold. When the weighted input exceeds the threshold, the perceptron is activated and outputs 1, otherwise it outputs 0.

Activation, the idea of binary output behavior, generally makes sense for classification problems. For example, if you ask the network to hypothesize if a handwritten image is a '9', you're effectively asking for a binary output - yes, this is a '9', or no, this is not a '9'. A step function is the starkest form of a binary output, which is great, but step functions are not continuous and not differentiable, which is very bad. Differentiation is what makes gradient descent possible." 

## Sigmoid Function

As quoted above from the lesson of Deep Learning foundation course on MiniFlow, we need to use a differentiable funciton for the activation, so we can find a way to update the weights, in this case, using the gradient of the funtion. 

A sigmoid node must be created, one that takes as input a Linear node, perform and stores in its value the sigmoid of the input's value.

In [None]:
class Sigmoid(Node):
    """
    You need to fix the `_sigmoid` and `forward` methods.
    """
    def __init__(self, node):
        Node.__init__(self, [node])

    def _sigmoid(self, x):
        """
        This method is separate from `forward` because it
        will be used later with `backward` as well.

        `x`: A numpy array-like object.

        Return the result of the sigmoid function.

        Your code here!
        """
        return (1/(1+np.exp(-x)))
        
        


    def forward(self):
        """
        Set the value of this node to the result of the
        sigmoid function, `_sigmoid`.

        Your code here!
        """
        # This is a dummy value to prevent numpy errors
        # if you test without changing this method.
        
        
        self.value = self._sigmoid(self.inbound_nodes[0].value)

## Cost

There are still some structures needed to make a neural network learn upon the data it is given in. To measure how well the prediction of the network is, we'll use the Mean Square Error function, which evaluates how far your predicionts are form the correct labeled data from your training set.

In [None]:
class MSE(Node):
    def __init__(self, y, a):
        Node.__init__(self, [y, a])
        
    def forward(self):
        """
        since the matrices might not be of the same shape, it is needed to reshape them so they
        can be broadcasted together with no error"""
        y = self.inbound_nodes[0].value.reshape(-1, 1)
        a = self.inbound_nodes[1].value.reshape(-1, 1)
        error = y-a
        squared_sum = np.square(error).sum()
        self.value = squared_sum/y.shape[0]
        
def forward_pass(graph):
    for n in graph:
        n.forward()

In [None]:
y, a = Input(), Input()
cost = MSE(y, a)

y_ = np.array([1, 2, 3])
a_ = np.array([4.5, 5, 10])

feed_dict = {y: y_, a: a_}
graph = topological_sort(feed_dict)
# forward pass
forward_pass(graph)

"""
Expected output

23.4166666667
"""
print(cost.value)