# Homework 3: Neural Networks
## 1. Feed Forward Neural Network

Implementing simple Feed Forward neural network.

* Input layer x (consisting of two features), hidden layer (consisting of four nodes) and output layer (consisting of two nodes)
* Usage of ReLU activation function on each layer $f(z) = max\{z, 0\}$, $f(u) = max\{u, 0\}$
* Usage on softmax function for calculating probabilities on each element of output layer

![schema of Feed Forward neuronal network](https://courses.edx.org/assets/courseware/v1/a9b004b0d6dbb3aea99347118218126d/asset-v1:MITx+6.86x+1T2021+type@asset+block/images_hw4_p1.png)

In [1]:
import numpy as np

In [14]:
# ReLU function (return max{x, 0}, works for scalars and vectors)
def relu(X):
    return np.maximum(X, 0)

# softmax function (input and output is array of shape (n,))
def softmax(X):
    softmax_X = np.exp(X)
    return softmax_X / np.sum(softmax_X)

In [2]:
# define matrices for weighing factors
W = np.array([[1,0,-1],[0,1,-1],[-1,0,-1],[0,-1,-1]])
V = np.array([[1,1,1,1,0],[-1,-1,-1,-1,2]])
W

array([[ 1,  0, -1],
       [ 0,  1, -1],
       [-1,  0, -1],
       [ 0, -1, -1]])

In [3]:
#define input data
x = np.array([3,14])
x.shape

(2,)

In [9]:
z = np.empty(4)
for i in range(W.shape[0]):
    z[i] = np.dot(W[i,0:2], x) + W[i,2]

z = relu(z)
z

array([ 2., 13.,  0.,  0.])

In [10]:
u = np.matmul(V[:,0:4], z)
u = u + V[:, 4]
u = relu(u)
u

array([15.,  0.])

In [15]:
# calculate the softmax function of the output-layer
o = softmax(u)
o

array([9.99999694e-01, 3.05902227e-07])

In [22]:
# task: 'Output of Neural Network'
new_u = np.array([[1,1],[0,2],[3,-1]])
for i in range(len(new_u)):
    new_u = relu(new_u)
    new_o = softmax(new_u[i])
    print("o_1: ", new_o[0])

o_1:  0.5
o_1:  0.11920292202211755
o_1:  0.9525741268224333


## 2. LSTM (long short term memory)

The following code returns parameters regarding to consecutive digits of {0, 1} as input vector
 * 1: odd no. of ones
 * -1: even no. of ones 
 * 0: zero in corresponding position of input vector

![](https://courses.edx.org/assets/courseware/v1/8e7c0fe7c60323338b556ae490d116cf/asset-v1:MITx+6.86x+1T2021+type@asset+block/images_hw4_p2.png)

In [68]:
#definition of helper functions for the LSTM-Problem

# sigmoid function, returns values between [0,1] 
def sigmoid(X):
    return 1 / (1 + np.exp(-X))

# definition of simplified sigmoid function with respect to the task
def sigmoid_simplified(X):
    for i in range(X.shape[0]):
        for j in range (X.shape[1]):
            if X[i,j] >= 1:
                X[i,j] = 1
            elif X[i,j] <= -1:
                X[i,j] = 0
            else:
                X[i,j] = sigmoid(X[i,j])
    return X

def sigmoid_simplified_scalar(X):
    if X >= 1:
        return 1
    elif X <= -1:
        return 0
    else:
        return sigmoid(X)
                
# analogue simplified tanh function
def tanh_simplified(X):
    for i in range(X.shape[0]):
        for j in range (X.shape[1]):
            if X[i,j] >= 1:
                X[i,j] = 1
            elif X[i,j] <= -1:
                X[i,j] = -1
            else:
                X[i,j] = np.tanh(X[i,j])
    return X

def tanh_simplified_scalar(X):
    if X >= 1:
        return 1
    elif X <= -1:
        return -1
    else:
        return np.tanh(X)

In [69]:
# definition of parameters for the NN, 
# for the sake of simplicity consisting of scalars
W_f_h = 0
W_f_x = 0
W_i_h = 0
W_i_x = 100
W_o_h = 0
W_o_x = 100
W_c_h = -100
W_c_x = 50
b_f = -100
b_i = 100
b_o = 0
b_c = 0

In [70]:
# test of functions
test = np.arange(-1.1,1.1, 0.1).reshape((11,2))
print("sigmoid simplified: ", sigmoid_simplified(test))
print("tanh simplified: ", tanh_simplified(test))

sigmoid simplified:  [[0.         0.        ]
 [0.2890505  0.31002552]
 [0.33181223 0.35434369]
 [0.37754067 0.40131234]
 [0.42555748 0.450166  ]
 [0.47502081 0.5       ]
 [0.52497919 0.549834  ]
 [0.57444252 0.59868766]
 [0.62245933 0.64565631]
 [0.66818777 0.68997448]
 [0.7109495  1.        ]]
tanh simplified:  [[0.         0.        ]
 [0.28126066 0.30046031]
 [0.3201482  0.34022211]
 [0.36056978 0.38107129]
 [0.40160196 0.42203545]
 [0.4422471  0.46211716]
 [0.48153381 0.50039579]
 [0.51861441 0.53611508]
 [0.55283804 0.56873856]
 [0.58378654 0.59796561]
 [0.6112719  1.        ]]


In [74]:
# run the LSTM NN on the given parameters and initial conditions
# TODO: Change elementwise multiplication and sigmoid/ tanh fanction for 
#       calculatione with matrices and vectors

# input data
x_sequence_1 = np.array([0,0,1,1,1,0])
x_sequence_2 = np.array([1,1,0,1,1])
x_sequence = x_sequence_2 

# initialization
f_t = 0
i_t = 0
o_t = 0
c_t = 0 # initialization with c_{t-1} = 0
h_t = 0 # initialization with h_{t-1} = 0
h_t_list = np.empty(len(x_sequence))
for i in range(len(x_sequence)):
    x_t = x_sequence[i]
    f_t = sigmoid_simplified_scalar(W_f_h * h_t + W_f_x * x_t + b_f)
    i_t = sigmoid_simplified_scalar(W_i_h * h_t + W_i_x * x_t + b_i)
    o_t = sigmoid_simplified_scalar(W_o_h * h_t + W_o_x * x_t + b_o)
    c_t = f_t * c_t + i_t * tanh_simplified_scalar(W_c_h * h_t + W_c_x * x_t + b_c)
    h_t = o_t * tanh_simplified_scalar(c_t)
    
    # round h_t to integer value {-1, 0, 1} in every time step
    if h_t <= 0.5 and h_t >= -0.5:
        h_t = 0
    
    h_t_list[i] = h_t
h_t_list

array([ 1., -1.,  0.,  1., -1.])