In [2]:
import numpy as np

# Multi layer perceptron

- Implementation of a multilayer perceptron 
- Consisting of multiple fully connected layers with biases each followed (potentially) by an activation function
- This implementation is based on the Python package numpy. 

The MLP is defined by providing the sizes of the layers (including input and output layers) and the activation functions that are applied after each layer. I used vanilla stochastic gradient descent with mini-batches for weight & bias updates.

You can find implementations of activation functions below. They can be used in the forward pass but also provide the derivative if `deriv=True` is provided. See more details in the comment in a).

### a) forward

A function `forward` that takes an input array that should then be passed through the entire network. All intermediate layer results are stored in `y`. Thereby also store the input (raw input) and the output layer.
The forward pass goes as follows

<div style="background-color:rgba(0, 0, 0, 0.0470588); padding:10px 0;font-family:monospace;">
$y_0 = x$ <br>
for $k=0..$ do<br>
&emsp;    $z_{k} =  W_k y_{k} + b_k$ <br>
&emsp;    $y_{k+1} = g_{k}(z_{k})$
</div>


Where $W$ is the weight matrix, $b$ is the bias vector and $g_{k}$ the activation function at layer $k$. 

The variable $z_k$ is not stored but is here for illustration. In general one potentially would have to store it too but we do not need it for the activation functions sigmoid and relu. They have the special property that we can write those  as $g'(x) = G(g(x))$ for some function $G$.

### b) back propagation

The backpropagation is computed as:


<div style="background-color:rgba(0, 0, 0, 0.0470588); padding:10px 0;font-family:monospace;">
$h \leftarrow \frac{\partial L}{\partial y}$<br>
for $k = l, l-1,... 1$ do:<br>
&emsp; $h \leftarrow \frac{\partial L}{\partial z_k} = h \odot g' _{k-1}(z_{k-1}) = h \odot G_{k-1}(y_k)$<br>
&emsp; $\frac{\partial L}{\partial W_{k-1}} = y_{k-1} \otimes h$       &emsp;# use +=<br>
&emsp; $\frac{\partial L}{\partial b_{k-1}} = h$     &emsp;&emsp; # use += <br>
&emsp; $h \leftarrow \frac{\partial L}{\partial y_k} = h \cdot W_{k-1}$
</div>


Where $\odot$ is the element wise product, $\otimes$ is the outer product and $\cdot$ the dot product. It assumes that a forward step has previously happened and has set the values of `results` accordingly. To keep it simple, this function is intended for one instance only but we want to employ it in a mini batch scenario.

### c) tanh activation (1)

The tanh activation function is implemented in a way such that they can also be used in your MLP. See e.g. https://en.wikipedia.org/wiki/Activation_function#Comparison_of_activation_functions


### d) update (4)

A function `update` updates the weights and biases for all layers based on a mini-batch (given as lists of input arrays `xs` and desired targets `ts`), loss function `loss_func` and and learning rate $\mu$. It therefore first zeros previous gradients (implement the `zero_grad` function), then does a forward and backwards pass for each training instance and finally updates the weights and biases using the learning rate. 

This is a simplified process. In pratice, all of the instances in the mini-batch would be passed forward and backwards simultaneously, we do not do it here.

### e) Evaluation functions (3)

The functions `calc_accuracy` and `calc_loss` calculate the classification accuracy and the *average* loss over the provided examples `xs` and `ts`.


### f) Preparing real data (2)
A function `top_n_words(sentences, n)` returns a list of the most frequently used words for a corpus. You can assume that the function is tested such that there is no tie.

A function `to_hot_encoding(sentences, top_words)` encodes the document as a numpy array of type float. It has the same length as `top_words` and indicates the presence or abscence of that particular word by 1 = present, 0 = absent. This is a bag-of-words model.

The sentences are provided as a list of list of tokens. 

### g) fiddle with hyper parameters (2)

Hyperparameters are found
```
my_seed = 1
my_sizes = (128, ...,2)
my_activations = (..., sigmoid)
my_epochs = 1
my_chunks = 1
```
such that you achieve >90% accuracy on the training set and >75% accuracy on the test set using the provided training loop. Available activation functions were limited to `relu`, `identity`, `tanh` and `sigmoid`. The training time was limited to a minute.

In [3]:
from typing import List

In [4]:
class MLP:
    def __init__(self, sizes, activations):
        assert len(activations) == (len(sizes)-1)
        self.activation_functions = tuple(activations)
        
        # init weights, biases and temporary results
        self.weights = tuple(((np.random.random_sample((sizes[i], sizes[i+1]))-0.5) for i in range(len(sizes)-1)))
        self.biases = tuple((np.random.random_sample(sizes[i+1])-0.5) for i in range(len(sizes)-1))
        self.y = tuple(np.empty(sizes[i]) for i in range(len(sizes)))
        
        # init gradients
        self.gradients = tuple((np.zeros(arr.shape) for arr in self.weights))
        self.biases_gradients = tuple((np.zeros(arr.shape) for arr in self.biases))


    def forward(self, input_arr):
        # input_array is a numpy array,
        # this function returns nothing
        
        # initialize
        k = 0 
        self.y[k][:] = input_arr
        layer = input_arr
        
        for activation in self.activation_functions:
            layer = activation(self.weights[k].T.dot(layer) + self.biases[k])
            k += 1
            self.y[k][:] = layer
        
        # pass


    def back_prop(self, t, loss_func):
        # t : target label == desired form of the final label
        # loss_func: a loss function see squared loss below
        #pass

        k = len(self.activation_functions) 
        hah = loss_func(self.y[k], t, True) 
        
        # calculate derivatives for activation functions
        for activation in reversed(self.activation_functions):
            hah = np.multiply(hah, activation(self.y[k], deriv=True))
            self.gradients[k-1][:] += np.outer(self.y[k-1], hah)
            self.biases_gradients[k-1][:] += hah
            hah = hah.dot(self.weights[k-1].T)
            k -= 1

    def zero_grad(self):
        #pass
        #self.gradients = tuple((np.zeros(w.shape) for w in self.weights))
        #self.biases_gradients = tuple((np.zeros(w.shape) for w in self.biases))
        
        for i in range(len(self.gradients)):
            self.gradients[i][:] = np.zeros(self.weights[i].shape)
        
        for i in range(len(self.biases_gradients)):
            self.biases_gradients[i][:] = np.zeros(self.biases[i].shape)
        '''
        self.gradients[:][:] = tuple((np.zeros(w.shape) for w in self.weights))
        '''
        
    def update(self, xs, ts, loss_func, mu):
        # xs: list of numpy arrays (inputs)
        # ts: list of numpy arrays (desired output)
        # loss_func: function, a loss function see squared loss below
        # mu: float, learning rate
        
        assert len(ts) == len(xs)
        
        self.zero_grad()

        for idx in range(len(xs)):
            self.forward(xs[idx])
            self.back_prop(ts[idx], loss_func)
            self.weights -= np.multiply(mu, self.gradients)
            self.biases -= np.multiply(mu, self.biases_gradients)
        
        
    def calc_accuracy(self, xs, ts):
        # xs: list of numpy arrays (inputs)
        # ts: list of numpy arrays (desired output)
        same = 0
        diff = 0
        
        for idx in range(len(xs)):
            self.forward(xs[idx])
            if np.argmax(self.y[-1]) == np.argmax(ts[idx]):
                same += 1
            else:
                diff +=1
            
        acc = round(same / (same+diff), 3)
        return acc
    
        #return 0.0
    
    def calc_loss(self, xs, ts, loss_func):
        # xs: list of numpy arrays (inputs)
        # ts: list of numpy arrays (desired output)
        # loss_func: function, a loss function see squared loss below
        loss = 0
        for idx in range(len(xs)):
            self.forward(xs[idx])
            loss += loss_func(self.y[-1], ts[idx])
        return loss/len(xs)

In [5]:
# activation functions
# if deriv=True they act like the function capital G as explained above
#   that is they assume that the input is not x but f(x)
#   take inspiration from the other activation functions

def identity(x, deriv=False):
    if deriv == True:
        return 1
    return x


def relu(x, deriv=False):
    if deriv == True:
        out = np.zeros(x.shape)
        out[x>0]=x[x>0]
        return out
    return np.maximum(x, 0)


def sigmoid(x, deriv=False):
    if deriv == True:
        return x * (1 - x)
    return 1 / (1 + np.exp(-x))

### c) Examples tanh

In [6]:
# Take inspiration from the other activation functions implemented above

def tanh(x, deriv=False):
    if deriv == True:
        return 1 - np.power(x, 2)
    return (np.exp(x) - np.exp(-x)) / (np.exp(x) + np.exp(-x))
    #return 1

In [7]:
print(tanh(np.array([1, 2, 3])))
print(tanh(np.array([1, 2, 3]), deriv=True))
print(tanh(np.array([2, 5, 11]), deriv=True))

#[0.76159416 0.96402758 0.99505475]
#[ 0 -3 -8]
#[  -3  -24 -120]

[0.76159416 0.96402758 0.99505475]
[ 0 -3 -8]
[  -3  -24 -120]


In [8]:
# loss function
def squared_loss(x, t, deriv=False):
    if deriv:
        return x - t
    return 0.5*np.sum((np.square(x-t)))

### a) Examples forward

In [9]:
in1 = np.array([0,0,1,0,0], dtype=float)
np.random.seed(1)

activs = [relu, relu, sigmoid]
myNN = MLP((5,4,3,2), activs)
y0, y1, y2, y3 = myNN.y

myNN.forward(in1)
print(myNN.y[0])
print(myNN.y[1])
print(myNN.y[2])
print(myNN.y[-1])

# check that we have done everything in place
assert myNN.y[0] is y0
assert myNN.y[1] is y1
assert myNN.y[2] is y2
assert myNN.y[-1] is y3

#[0. 0. 1. 0. 0.]
#[0.         0.28896105 0.4080556  0.43338515]
#[0.         0.03587933 0.        ]
#[0.48869641 0.5991624 ]

[0. 0. 1. 0. 0.]
[0.         0.28896105 0.4080556  0.43338515]
[0.         0.03587933 0.        ]
[0.48869641 0.5991624 ]


In [10]:
in2 = np.array([1,0,1,0,0], dtype=float)
myNN.forward(in2)

# check that we have done everything in place
assert myNN.y[0] is y0
assert myNN.y[1] is y1
assert myNN.y[2] is y2
assert myNN.y[-1] is y3

print(myNN.y[0])
print(myNN.y[1])
print(myNN.y[2])
print(myNN.y[-1])
#[1. 0. 1. 0. 0.]
#[0.         0.50928554 0.         0.23571773]
#[0.         0.38629211 0.        ]
#[0.50550331 0.58354196]



[1. 0. 1. 0. 0.]
[0.         0.50928554 0.         0.23571773]
[0.         0.38629211 0.        ]
[0.50550331 0.58354196]


### b) Examples backward

In [11]:
myNN.zero_grad()
myNN.forward(in1)
w0, w1, w2 = myNN.gradients
b0, b1, b2 = myNN.biases_gradients
myNN.back_prop(np.array([1,0]), squared_loss)

# untouched
print(myNN.weights)
print(myNN.biases)

# changed
print(myNN.gradients)
print(myNN.biases_gradients)

# check that we have done everything in place
assert w0 is myNN.gradients[0]
assert w1 is myNN.gradients[1]
assert w2 is myNN.gradients[2]
assert b0 is myNN.biases_gradients[0]
assert b1 is myNN.biases_gradients[1]
assert b2 is myNN.biases_gradients[2]

# (array([[-0.082978  ,  0.22032449, -0.49988563, -0.19766743],
#       [-0.35324411, -0.40766141, -0.31373979, -0.15443927],
#       [-0.10323253,  0.03881673, -0.08080549,  0.1852195 ],
#       [-0.29554775,  0.37811744, -0.47261241,  0.17046751],
#       [-0.0826952 ,  0.05868983, -0.35961306, -0.30189851]]),
#        array([[ 0.30074457,  0.46826158, -0.18657582],
#       [ 0.19232262,  0.37638915,  0.39460666],
#       [-0.41495579, -0.46094522, -0.33016958],
#       [ 0.3781425 , -0.40165317, -0.07889237]]),
#        array([[ 0.45788953,  0.03316528],
#       [ 0.19187711, -0.18448437],
#       [ 0.18650093,  0.33462567]]))
#(array([-0.48171172,  0.25014431,  0.48886109,  0.24816565]), array([-0.21955601,  0.28927933, -0.39677399]), array([-0.05210647,  0.4085955 ]))
#(array([[ 0.        ,  0.        ,  0.        ,  0.        ],
#       [ 0.        ,  0.        ,  0.        ,  0.        ],
#       [ 0.        , -0.00019926,  0.00034459,  0.00031891],
#       [ 0.        ,  0.        ,  0.        ,  0.        ],
#       [ 0.        ,  0.        ,  0.        ,  0.        ]]), array([[ 0.        ,  0.        ,  0.        ],
#       [ 0.        , -0.00052939,  0.        ],
#       [ 0.        , -0.00074758,  0.        ],
#       [ 0.        , -0.00079398,  0.        ]]), array([[ 0.        ,  0.        ],
#       [-0.00458396,  0.005163  ],
#       [ 0.        ,  0.        ]]))
#(array([ 0.        , -0.00019926,  0.00034459,  0.00031891]), array([ 0.        , -0.00183205,  0.        ]), array([-0.12776057,  0.14389893]))


(array([[-0.082978  ,  0.22032449, -0.49988563, -0.19766743],
       [-0.35324411, -0.40766141, -0.31373979, -0.15443927],
       [-0.10323253,  0.03881673, -0.08080549,  0.1852195 ],
       [-0.29554775,  0.37811744, -0.47261241,  0.17046751],
       [-0.0826952 ,  0.05868983, -0.35961306, -0.30189851]]), array([[ 0.30074457,  0.46826158, -0.18657582],
       [ 0.19232262,  0.37638915,  0.39460666],
       [-0.41495579, -0.46094522, -0.33016958],
       [ 0.3781425 , -0.40165317, -0.07889237]]), array([[ 0.45788953,  0.03316528],
       [ 0.19187711, -0.18448437],
       [ 0.18650093,  0.33462567]]))
(array([-0.48171172,  0.25014431,  0.48886109,  0.24816565]), array([-0.21955601,  0.28927933, -0.39677399]), array([-0.05210647,  0.4085955 ]))
(array([[ 0.        ,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        ],
       [ 0.        , -0.00019926,  0.00034459,  0.00031891],
       [ 0.        ,  0.        ,  0.        ,  0.        ],
 

In [12]:
myNN.zero_grad()
print(myNN.gradients)
print(myNN.biases_gradients)
print('---0---')
# check that we have done everything in place
print(w0)
#print(np.shape(w0))
print('------')
#print(np.shape(myNN.gradients[0]))
print(myNN.gradients[0])
print('---1---')
# check that we have done everything in place
print(w1)
#print(np.shape(w1))
print('------')
#print(np.shape(myNN.gradients[1]))
print(myNN.gradients[1])
print('--2----')
# check that we have done everything in place
print(w2)
#print(np.shape(w2))
print('------')
#print(np.shape(myNN.gradients[2]))
print(myNN.gradients[2])
assert w0 is myNN.gradients[0]
assert w1 is myNN.gradients[1]
assert w2 is myNN.gradients[2]
assert b0 is myNN.biases_gradients[0]
assert b1 is myNN.biases_gradients[1]
assert b2 is myNN.biases_gradients[2]

(array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]]), array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]]), array([[0., 0.],
       [0., 0.],
       [0., 0.]]))
(array([0., 0., 0., 0.]), array([0., 0., 0.]), array([0., 0.]))
---0---
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
------
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
---1---
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
------
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
--2----
[[0. 0.]
 [0. 0.]
 [0. 0.]]
------
[[0. 0.]
 [0. 0.]
 [0. 0.]]



### b) Examples backward with two training instances

In [13]:
myNN.zero_grad()

# accumulates gradients over two examples
myNN.forward(in1)
myNN.back_prop(np.array([1,0]), squared_loss)
myNN.forward(in2)
myNN.back_prop(np.array([1,1]), squared_loss)


# untouched
print(myNN.weights)
print(myNN.biases)

# changed
print(myNN.gradients)
print(myNN.biases_gradients)

#(array([[-0.082978  ,  0.22032449, -0.49988563, -0.19766743],
#       [-0.35324411, -0.40766141, -0.31373979, -0.15443927],
#       [-0.10323253,  0.03881673, -0.08080549,  0.1852195 ],
#       [-0.29554775,  0.37811744, -0.47261241,  0.17046751],
#       [-0.0826952 ,  0.05868983, -0.35961306, -0.30189851]]), array([[ 0.30074457,  0.46826158, -0.18657582],
#       [ 0.19232262,  0.37638915,  0.39460666],
#       [-0.41495579, -0.46094522, -0.33016958],
#       [ 0.3781425 , -0.40165317, -0.07889237]]), array([[ 0.45788953,  0.03316528],
#       [ 0.19187711, -0.18448437],
#       [ 0.18650093,  0.33462567]]))
#(array([-0.48171172,  0.25014431,  0.48886109,  0.24816565]), array([-0.21955601,  0.28927933, -0.39677399]), array([-0.05210647,  0.4085955 ]))

# changed
#(array([[ 0.        , -0.00037368,  0.        ,  0.00018456],
#       [ 0.        ,  0.        ,  0.        ,  0.        ],
#       [ 0.        , -0.00057294,  0.00034459,  0.00050347],
#       [ 0.        ,  0.        ,  0.        ,  0.        ],
#       [ 0.        ,  0.        ,  0.        ,  0.        ]]), array([[ 0.        ,  0.        ,  0.        ],
#       [ 0.        , -0.0015222 ,  0.        ],
#       [ 0.        , -0.00074758,  0.        ],
#       [ 0.        , -0.0012535 ,  0.        ]]), array([[ 0.        ,  0.        ],
#       [-0.05233322, -0.03393283],
#       [ 0.        ,  0.        ]]))
#(array([ 0.        , -0.00057294,  0.00034459,  0.00050347]),
# array([ 0.        , -0.00378147,  0.        ]),
# array([-0.25136976,  0.04269099]))


(array([[-0.082978  ,  0.22032449, -0.49988563, -0.19766743],
       [-0.35324411, -0.40766141, -0.31373979, -0.15443927],
       [-0.10323253,  0.03881673, -0.08080549,  0.1852195 ],
       [-0.29554775,  0.37811744, -0.47261241,  0.17046751],
       [-0.0826952 ,  0.05868983, -0.35961306, -0.30189851]]), array([[ 0.30074457,  0.46826158, -0.18657582],
       [ 0.19232262,  0.37638915,  0.39460666],
       [-0.41495579, -0.46094522, -0.33016958],
       [ 0.3781425 , -0.40165317, -0.07889237]]), array([[ 0.45788953,  0.03316528],
       [ 0.19187711, -0.18448437],
       [ 0.18650093,  0.33462567]]))
(array([-0.48171172,  0.25014431,  0.48886109,  0.24816565]), array([-0.21955601,  0.28927933, -0.39677399]), array([-0.05210647,  0.4085955 ]))
(array([[ 0.        , -0.00037368,  0.        ,  0.00018456],
       [ 0.        ,  0.        ,  0.        ,  0.        ],
       [ 0.        , -0.00057294,  0.00034459,  0.00050347],
       [ 0.        ,  0.        ,  0.        ,  0.        ],
 

### d) Examples for update

In [14]:
# a function that generates some dummy training dataset
def generate_training(n):
    xs=[]
    ts=[]
    for _ in range(n):
        x = np.random.random_sample(10)>0.5
        t = (x[1] and x[2]) or x[3]
        xs.append(x)
        ts.append(np.array([t,~t]))
        
    return xs, ts
    return np.array(xs), np.array(ts)

In [15]:
np.random.seed(1)
xs, ts = generate_training(1000)

In [16]:
np.random.seed(1)
myNN = MLP((10, 8, 4, 2), activs)
print("#", myNN.calc_loss(xs,ts, squared_loss), myNN.calc_accuracy(xs, ts))

for x_chunk, t_chunk in zip(np.array_split(xs,100), np.array_split(ts,100)):
    myNN.update(x_chunk, t_chunk, squared_loss, 0.1)
    
    # evaluate NN on entire dataset
    print("#", myNN.calc_loss(xs, ts, squared_loss), myNN.calc_accuracy(xs, ts))

# 0.24257539789489851 0.634
# 0.23680018014797793 0.634
# 0.23486961983067542 0.634
# 0.23763274357305297 0.634
# 0.23294888559644564 0.634
# 0.23122098354943119 0.634
# 0.2304775784731769 0.634


  self.weights -= np.multiply(mu, self.gradients)
  self.biases -= np.multiply(mu, self.biases_gradients)


# 0.22330318362069906 0.634
# 0.21893212674055548 0.634
# 0.22047076752731185 0.634
# 0.20693636742514407 0.634
# 0.2552210296963756 0.634
# 0.21531657934819837 0.634
# 0.20623799899411183 0.634
# 0.19881888295769318 0.634
# 0.1762144479903395 0.645
# 0.22747002449625175 0.634
# 0.17239514867314046 0.637
# 0.19650407945536824 0.634
# 0.14672626964664534 0.701
# 0.2563586858259369 0.634
# 0.2577118518965085 0.481
# 0.13791970821936927 0.788
# 0.10604787657737749 0.898
# 0.1347146952911298 0.776
# 0.09514207936165966 0.878
# 0.1155671086863483 0.838
# 0.3188538244925079 0.639
# 0.18326460835463557 0.718
# 0.09064728376910568 0.9
# 0.058834700162359595 0.961
# 0.2696931580311026 0.681
# 0.28060698402327394 0.513
# 0.1841940999426835 0.712
# 0.07886446244029571 0.916
# 0.07334228771389784 0.907
# 0.07513961354583581 0.918
# 0.06602938077034148 0.923
# 0.10503067478055103 0.88
# 0.08853239512915334 0.885
# 0.07333191690686433 0.907
# 0.06291351946098912 0.931
# 0.24959051696936257 0.643
# 0

In [17]:
# output from previous cell
# 0.24257539789489851 0.634
# 0.2411544143560101 0.634
# 0.23984965535059738 0.634
# 0.23882087470347274 0.634
# 0.23843002829715368 0.634
# 0.2382914663918927 0.634
# 0.23764527725780307 0.634
# 0.23666737755900943 0.634
# 0.23620884986400037 0.634
# 0.23650340311293885 0.634
# 0.23616300372820928 0.634
# 0.23502874505528465 0.634
# 0.23503090772220395 0.634
# 0.23429557276477497 0.634
# 0.23407573863293182 0.634
# 0.2339776485620543 0.634
# 0.23348187532872453 0.634
# 0.23290833656298032 0.634
# 0.23273709472388385 0.634
# 0.2325444855821718 0.634
# 0.23267973317482651 0.634
# 0.23200642490995707 0.634
# 0.23170660058816217 0.634
# 0.2314284286110461 0.634
# 0.2310843786009857 0.634
# 0.2308711195328345 0.634
# 0.23047087788541018 0.634
# 0.23078563724882506 0.634
# 0.23010159343696798 0.634
# 0.22980399458119968 0.634
# 0.2297798677979091 0.634
# 0.2293841765404658 0.634
# 0.2291655512063339 0.634
# 0.22870691228278373 0.634
# 0.22817010393724937 0.634
# 0.22771610716266258 0.634
# 0.22735622344120002 0.634
# 0.2268242701172851 0.634
# 0.22650100063083706 0.634
# 0.22550104232382934 0.634
# 0.22509370708150808 0.634
# 0.22467802344347212 0.634
# 0.2242107007448516 0.634
# 0.2233116482328667 0.634
# 0.22265497659346642 0.634
# 0.22178494678736085 0.634
# 0.2213379424960512 0.634
# 0.21990383266003793 0.634
# 0.21838496173099078 0.634
# 0.21753192292293697 0.634
# 0.21669299576563952 0.634
# 0.21663406404375596 0.634
# 0.2140600877981478 0.634
# 0.21325662957467928 0.634
# 0.21282023805393604 0.634
# 0.21274278318993375 0.634
# 0.21016631292648913 0.634
# 0.20932144343716885 0.634
# 0.21282663190076195 0.634
# 0.21085248695856348 0.634
# 0.2091929895648754 0.634
# 0.2040259930808181 0.634
# 0.20830483920311055 0.634
# 0.20110187393197654 0.634
# 0.19921054775576258 0.634
# 0.19614748720352942 0.634
# 0.19579048879561475 0.634
# 0.19344528532212424 0.634
# 0.20045251332451106 0.634
# 0.18825747694914757 0.634
# 0.18576924889914842 0.634
# 0.18411287775658522 0.634
# 0.18302615584253373 0.634
# 0.18454763429228072 0.634
# 0.17626642470559317 0.635
# 0.17469065098247072 0.679
# 0.1719378298327115 0.661
# 0.17605269372506277 0.637
# 0.19701745527854836 0.634
# 0.1608919682268122 0.716
# 0.16342926706599922 0.651
# 0.19137490242816632 0.938
# 0.16993445388782424 0.662
# 0.17335348387678517 0.65
# 0.15162184456043243 0.682
# 0.14235256482453107 0.807
# 0.1512034857104882 0.69
# 0.15666428550872563 0.681
# 0.2567495441627176 0.634
# 0.15486582586192096 0.95
# 0.19636555320933272 0.652
# 0.18016546272626016 0.84
# 0.128967559491965 0.742
# 0.13573770765700863 0.862
# 0.09399810441538305 0.921
# 0.09851463017361554 0.903
# 0.11583907110986205 0.943
# 0.32636879914274497 0.634
# 0.2195528529634276 0.681
# 0.2232071483693024 0.676
# 0.08403512410215158 0.976

In [18]:
import pickle
X=None
with open('news_sports_train_X.p', 'rb') as f:
    X=pickle.load(f)
    
X_test=None
with open('news_sports_test_X.p', 'rb') as f:
    X_test=pickle.load(f)

In [19]:
y=None
with open('news_sports_train_y.p', 'rb') as f:
    y=pickle.load(f)
    
y_test=None
with open('news_sports_test_y.p', 'rb') as f:
    y_test=pickle.load(f)

In [20]:
# prepare targets
def get_ts(labels):
    ts = []
    for label in labels:
        l = bool(label)
        ts.append(np.array([not l, l], dtype=bool))
    return ts

In [21]:
ts = get_ts(y)
ts_test = get_ts(y_test)

### f) Implement top_n_words

In [22]:
from collections import Counter
from itertools import chain

def top_n_words(sentences : List[List[str]], n:int) -> List[str]:
    return [x[0] for x in Counter(list(chain(*sentences))).most_common(n)]

In [23]:
top_n_words(X,10)
# [',', '--', 'the', '>', '.', ':', ')', '(', 'to', 'a']

[',', '--', 'the', '>', '.', ':', ')', '(', 'to', 'a']

In [24]:
top_n_words(X_test,20)
# [',',
# 'the',
# '>',
# '.',
# '--',
# ':',
# ')',
# 'to',
# '(',
# 'a',
# 'in',
# 'i',
# 'of',
# 'and',
# '@',
# 'is',
# '0',
# 'that',
# '!',
# '?']

[',',
 'the',
 '>',
 '.',
 '--',
 ':',
 ')',
 'to',
 '(',
 'a',
 'in',
 'i',
 'of',
 'and',
 '@',
 'is',
 '0',
 'that',
 '!',
 '?']

### f) Implement to_hot_encoding

In [25]:
def to_hot_encoding(sentences : List[List[str]], top_words : List[str]):
    r = np.zeros((len(sentences), len(top_words)))
    for i, sentence in enumerate(sentences):
        s = set(sentence)
        r[i] = np.array([x in s for x in top_words]).astype(float)
    
    return r

In [26]:
to_hot_encoding(X[:2], top_n_words(X,10))
#[array([1., 1., 1., 1., 1., 1., 1., 1., 1., 0.]),
# array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])]

array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 0.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])

In [27]:
# Get one hot encoding for the entire corpus
top_words = top_n_words(X, 128)
arrs = to_hot_encoding(X, top_words)

In [28]:
arrs_test = to_hot_encoding(X_test, top_words)

### g) Set your parameters here

In [29]:
show_output=True
my_seed = 1
my_sizes = (128,100,80,40,2)
my_activations = (relu, tanh, tanh, sigmoid)
my_epochs = 25
my_chunks = 200
'''
4
# train 0.0070419240887775155 0.994
# test  0.19830166383479556 0.763
'''

'\n4\n# train 0.0070419240887775155 0.994\n# test  0.19830166383479556 0.763\n'

In [30]:
np.random.seed(my_seed)
myNN = MLP(my_sizes, my_activations)
n_chunks = my_chunks
n_epochs = my_epochs
for i in range(n_epochs):
    for x_chunk, t_chunk in zip(np.array_split(arrs,n_chunks), np.array_split(ts,n_chunks)):
        myNN.update(x_chunk, t_chunk, squared_loss, 0.01)
        # evaluate NN on entire dataset
    if show_output:
        print(i)
        print("# train", myNN.calc_loss(arrs,ts, squared_loss), myNN.calc_accuracy(arrs, ts))
        print("# test ", myNN.calc_loss(arrs_test,ts_test, squared_loss), myNN.calc_accuracy(arrs_test, ts_test))

  self.weights -= np.multiply(mu, self.gradients)
  self.biases -= np.multiply(mu, self.biases_gradients)


0
# train 0.2075044493435954 0.707
# test  0.2317226905414242 0.639
1
# train 0.16270284141888722 0.778
# test  0.20813140550729978 0.695
2
# train 0.13111274852408664 0.834
# test  0.18138178856984397 0.731
3
# train 0.11681423906934035 0.85
# test  0.17225020122105342 0.765
4
# train 0.09765720420192221 0.88
# test  0.1663707397525231 0.751
5
# train 0.09012170811409718 0.878
# test  0.1645646221292025 0.769
6
# train 0.07344157271096431 0.91
# test  0.1637376684213452 0.765
7
# train 0.09900187057633103 0.867
# test  0.18587248719818816 0.73
8
# train 0.09421574313192527 0.866
# test  0.18914477584956801 0.722
9
# train 0.05777019420078386 0.93
# test  0.18380299479589274 0.755
10
# train 0.053698305309607695 0.932
# test  0.1741964383206386 0.766
11
# train 0.03682026431808623 0.962
# test  0.18419359307920113 0.763
12
# train 0.03988836328748935 0.954
# test  0.1865682714108159 0.753
13
# train 0.04601021694961058 0.941
# test  0.19325684818307776 0.758
14
# train 0.02558140426203