### General Instructions

Do not change the file name, method name or any variable name in your submission file. 

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and student number below.

Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Also, ensure that your notebook does not give errors before submitting. Ensure there is no 'Assertion Error', 'NotImplementedError' or  test(s) failed  feedback. 

NotImplementedError: this means there is a code cell/task you are yet to implement.

AssertionError: this means your implementation is failing some tests.

Note that your assignment will be checked with additional test cases after submission. Ensure you work with the instructions given.

DO NOT EDIT ANY CELL/NOTEBOOK METADATA.




In [None]:
NAME = ""
STUDENT_NUMBER = ""

---

# <center>  COMS4054A/COMS7066A </center>
# <center> Natural Language Processing/Technology (NLP) 2022 </center>
## <center> Lab Session 8 </center>
### <center> 13th October, 2022 </center>

### Sentiment classification with Neural Networks



### Objectives

- Implement a 2-layer neural network.
- Use units with non-linear activation functions, (tanh and sigmoid)
- Compute the cross entropy loss
- Implement forward propagation
- Implement backward propagation


### Dataset
We are using the NLTK Twitter dataset.

Credit: Lab was adapted from a course by Andrew Ng.

<a id='top'></a>

### Task Outline (Total 30 marks)

Note: The marks (breakdown) are attached to the respective assertions.

[Task 1](#task1) 
    
[Task 2](#task2) 
    
[Task 3](#task3)     

[Task 4](#task4) 

[Task 5](#task5) 

[Task 6](#task6) 

[Task 7](#task7) 

[Task 8](#task8) 

[Task 9](#task9) 



In [None]:
# Package imports
# Run this cell
import nltk                                # Python library for NLP
from nltk.corpus import twitter_samples    # sample Twitter dataset from NLTK


import copy

import re 
import string
import numpy as np

from nltk.tokenize import TweetTokenizer
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer


from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import  precision_recall_fscore_support, f1_score, precision_score, recall_score, accuracy_score


In [None]:
# downloads sample twitter dataset.
nltk.download('twitter_samples')#Run this cell
nltk.download('stopwords')

In [None]:
# Run this cell 

# Get the set of positive and negative tweets
all_positive_tweets = twitter_samples.strings('positive_tweets.json')
all_negative_tweets = twitter_samples.strings('negative_tweets.json')

#### Preprocessing 
Read through, understand and run the following code cells to preprocess the tweets.

In [None]:
# Run this code to clean up tweets
def clean_tweets(tweet):
    """
    tweet (string)
    return clean_tweet (string)"""
    
    
    # remove  "RT"
    clean_tweet = re.sub(r'^RT[\s]+', '', tweet) #^ starts with RT, followed by one or more spaces

    # remove hyperlinks
    clean_tweet = re.sub(r'https?://[^\s\n\r]+', '', clean_tweet)

    # remove hashtags
    # only removing the hash # sign from the word
    clean_tweet = re.sub(r'#', '', clean_tweet)
    
    return clean_tweet
    
    
   

In [None]:
# Run this cell 

def preprocess_pipeline(tweet):
    stopwords_english = stopwords.words('english')

    """ tweet(string)
    return string
    """   
    

    tokenizer = TweetTokenizer(preserve_case=False)
    stemmer = PorterStemmer()    
    tweet_tokens = tokenizer.tokenize(tweet)

    tweets_clean = ""
    tweets_clean = " ".join([stemmer.stem(word) for word in tweet_tokens if (word not in stopwords_english)])
    return tweets_clean



In [None]:
# Run this cell
all_positive_tweets_clean = []
all_negative_tweets_clean = []

for tw in all_positive_tweets:
    all_positive_tweets_clean.append(preprocess_pipeline(clean_tweets(tw)))
    
for tv in all_negative_tweets:
    all_negative_tweets_clean.append(preprocess_pipeline(clean_tweets(tv)))




In [None]:
# RUN THIS CELL
train_pos = all_positive_tweets_clean[:1000]
train_neg = all_negative_tweets_clean[:1000]

test_pos = all_positive_tweets_clean[1000:]
test_neg = all_negative_tweets_clean[1000:]


train_x = train_pos + train_neg
test_x = test_pos + test_neg

# putting 1 as label for the positve tweets, and 0 as label for negative tweets.

train_y = np.concatenate((np.ones((len(train_pos),1),dtype=int),np.zeros((len(train_neg),1),dtype=int))).T
test_y = np.concatenate((np.ones((len(test_pos),1),dtype=int), np.zeros((len(test_neg),1), dtype=int))).T

#### Feature Extraction
Run the next two code cells to extract the bag of words features.

In [None]:
# Run this cell
count_vec = CountVectorizer()
train_features = np.array(count_vec.fit_transform(train_x).todense().T)

In [None]:
# Run this cell
test_features = np.array(count_vec.transform(test_x).todense().T)


#####  The goal is to stack the training examples vertically, hence the transpose.
The number of features is going to be the number of nodes in the input layer.
Let:
- n_x -- the size of the input layer (same as number of features for each input)
- n_h -- the size of the hidden layer
- n_y -- the size of the output layer
- m --- the size of the training examples

In [None]:
# Run this cell 

shape_train_x = train_features.shape
shape_train_y = train_y.shape
m = train_features.shape[1]


print ('The shape of X is: ' + str(shape_train_x))
print ('The shape of Y is: ' + str(shape_train_y))
print ('Using %d training examples!' % (m))

## Neural Network model

**General overview**

The main steps to build a Neural Network: (Andrew Ng; deeplearning.ai)
1. Define the neural network structure ( # of input units,  # of hidden units, etc). 
1. Initialize the model's parameters
1. Loop:
        - Implement forward propagation
        - Compute loss
        - Implement backward propagation to get the gradients
        - Update parameters (gradient descent)



**The model**:


<img src="nn.png" style="width:600px;height:300px;">


**Our task:**


- To build a 2-layer network, with one hidden layer, with 4 hidden units/neurons.
- We'll use `tanh` activation function at the hidden layer and the `sigmoid` activation function at the output layer.


**Some notations to help you**


superscript in round brackets indicates a training example

superscript in square brackets indicates a layer number

$z$ - weighted sum of features

$a$ - output from a layer

$x^{(i)}$ - the *i*th training example

$z^{[1] (i)}$ -  the weighted sum of features for the *i*th example, from the first layer.

$W^{[1]}$ - the weight matrix for layer 1

$a^{[1] (i)}$ - the output from layer 1 on the ith training example

$\hat{y}^{(i)}$ - the final output (from the output layer) for the *i*th example.


**Mathematically**:

For one example $x^{(i)}$:
$$z^{[1] (i)} =  W^{[1]} x^{(i)} + b^{[1]}\tag{1}$$ 
$$a^{[1] (i)} = \tanh(z^{[1] (i)})\tag{2}$$
$$z^{[2] (i)} = W^{[2]} a^{[1] (i)} + b^{[2]}\tag{3}$$
$$\hat{y}^{(i)} = a^{[2] (i)} = \sigma(z^{ [2] (i)})\tag{4}$$
$$y^{(i)}_{prediction} = \begin{cases} 1 & \mbox{if } a^{[2](i)} > 0.5 \\ 0 & \mbox{otherwise } \end{cases}\tag{5}$$

Given the predictions on all the examples, you can also compute the cost $J$ as follows: 
$$J = - \frac{1}{m} \sum\limits_{i = 0}^{m} \large\left(\small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right)  \large  \right) \small \tag{6}$$


<a id='task1'></a>

[Back to top](#top) 
### Task 1: Define the neural network structure ####


Define three variables:

    - n_x: the size of the input layer
    
    - n_h: the size of the hidden layer (**set this to 4**) 
    
    - n_y: the size of the output layer

    - Use shapes of X (train_features) and Y (train_y) to find n_x and n_y. 
    - Hard code the hidden layer size to be 4.
    - Implement this in function `layer_sizes`.


In [None]:
def layer_sizes(train_features, train_y):
    """
    Arguments:
    train_features -- input dataset of shape (input size/features, number of examples)
    train_y -- labels of shape (output size, number of examples)
    
    Returns:
    n_x -- the size of the input layer
    n_h -- the size of the hidden layer
    n_y -- the size of the output layer
    """
    # YOUR CODE HERE
    raise NotImplementedError()
    
    return (n_x, n_h, n_y)

In [None]:
(n_x, n_h, n_y) = layer_sizes(train_features, train_y)
print("The size of the input layer is: n_x = " + str(n_x))
print("The size of the hidden layer is: n_h = " + str(n_h))
print("The size of the output layer is: n_y = " + str(n_y))

In [None]:
# Run this cell to test your code
# 2 marks

assert n_x == 5057
print("Test passed")

In [None]:
# Run this cell to test your code
# 2 marks

assert n_h == 4
print("Test passed")

In [None]:
# Run this cell to test your code
# 2 marks

assert n_y == 1
print("Test passed")

<a id='task2'></a>

[Back to top](#top) 
### Task 2:  Initialize the model's parameters ####


- Implement the function `initialize_parameters()`.

- You will initialize the weights matrices with random values. 
    - Use: `np.random.randn(a,b) * 0.01` to randomly initialize a matrix of shape (a,b).
- You will initialize the bias vectors as zeros. 
    - Use: `np.zeros((a,b))` to initialize a matrix of shape (a,b) with zeros.

In [None]:
def initialize_parameters(n_x, n_h, n_y):
    """
    Argument:
    n_x -- size of the input layer
    n_h -- size of the hidden layer
    n_y -- size of the output layer
    
    Returns:
    params -- python dictionary containing your parameters:
                    W1 -- weight matrix of shape (n_h, n_x)
                    b1 -- bias vector of shape (n_h, 1)
                    W2 -- weight matrix of shape (n_y, n_h)
                    b2 -- bias vector of shape (n_y, 1)
    """    
    #(≈ 4 lines of code)
    # W1 = ...
    # b1 = ...
    # W2 = ...
    # b2 = ...
    
    # YOUR CODE HERE
    raise NotImplementedError()

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters

#### The sigmoid function

In [None]:
# Run this cell
# You should have implemented this function in Lab 5
def sigmoid(z):
    """
    Compute the sigmoid of z

    Arguments:
    z -- A scalar or numpy array of any size.

   
    """
    sz = 1/(1+np.exp(-z))
    return sz

<a id='task3'></a>

[Back to top](#top) 
### Task 3 - forward_propagation

Implement `forward_propagation()` using the following equations:

$$Z^{[1]} =  W^{[1]} X + b^{[1]}\tag{1}$$ 
$$A^{[1]} = \tanh(Z^{[1]})\tag{2}$$
$$Z^{[2]} = W^{[2]} A^{[1]} + b^{[2]}\tag{3}$$
$$\hat{Y} = A^{[2]} = \sigma(Z^{[2]})\tag{4}$$


**Instructions**:


- Use the function `sigmoid()`. It's inthe previous code cell.
- Use the function `np.tanh()`. It's part of the numpy library.
- Implement using these steps:
    1. Retrieve each parameter from the dictionary "parameters" (which is the output of `initialize_parameters()` by using `parameters[".."]`.
    2. Implement Forward Propagation. Compute $Z^{[1]}, A^{[1]}, Z^{[2]}$ and $A^{[2]}$ (the vector of all your predictions on all the examples in the training set).
- Values needed in the backpropagation are stored in "cache". The cache will be given as an input to the backpropagation function.

In [None]:
def forward_propagation(X, parameters):
    """
    Argument:
    X -- input data of size (n_x, m)
    parameters -- python dictionary containing your parameters (output of initialization function)
    
    Returns:
    A2 -- The sigmoid output of the second activation
    cache -- a dictionary containing "Z1", "A1", "Z2" and "A2"
    """
    # Retrieve each parameter from the dictionary "parameters"
    #(≈ 4 lines of code)
    # W1 = ...
    # b1 = ...
    # W2 = ...
    # b2 = ...
    
    # YOUR CODE HERE
    raise NotImplementedError()
    
    # Implement Forward Propagation to calculate A2 (probabilities)
    # (≈ 4 lines of code)
    # Z1 = ...
    # A1 = ...
    # Z2 = ...
    # A2 = ...
    
    # YOUR CODE HERE
    raise NotImplementedError()
    
    assert(A2.shape == (1, X.shape[1]))
    
    cache = {"Z1": Z1,
             "A1": A1,
             "Z2": Z2,
             "A2": A2}
    
    return A2, cache

In [None]:
# Run this cell to test your code
# 3 marks

_test_params = initialize_parameters(*layer_sizes(train_features, train_y))
a2_ret, cache = forward_propagation(train_features, _test_params) # checks that it does not crash
# checks the shape.
N = train_features.shape[1]
assert cache['A2'].shape == (1, N)
assert cache['A1'].shape == (n_h, N)
assert cache['Z2'].shape == (1, N)
assert cache['Z1'].shape == (n_h, N)
print("Test passed")

In [None]:
# Run this cell to test your code
# 3 marks

# Checks that the forward pass is correct.
temp_features = np.ones((train_features.shape[0], 1))
new_test_params = {}
for k, v in _test_params.items():
    new_test_params[k] = np.ones_like(v)
_, ans_simple = forward_propagation(temp_features, new_test_params)
ans_correct =  {'Z1': np.array([[5058.],
         [5058.],
         [5058.],
         [5058.]]),
  'A1': np.array([[1.],
         [1.],
         [1.],
         [1.]]),
  'Z2': np.array([[5.]]),
  'A2': np.array([[0.99330715]])}
for k, v in ans_simple.items():
    assert np.allclose(v, ans_correct[k], atol=1e-3), f"Item {k} is incorrect"
print("Test passed")

<a id='task4'></a>
[Back to top](#top) 
### Task 4 - Compute the Cost

Now that you've computed $A^{[2]}$ (in the Python variable "`A2`"), which contains $a^{[2](i)}$ for all examples, you can compute the cost function as follows:

$$J = - \frac{1}{m} \sum\limits_{i = 1}^{m} \large{(} \small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right) \large{)} \small\tag{13}$$



Implement `compute_cost()` to compute the value of the cost $J$.


**Notes**: 

- You can use either `np.multiply()` and then `np.sum()` or directly `np.dot()`).  
- If you use `np.multiply` followed by `np.sum` the end result will be a type `float`, whereas if you use `np.dot`, the result will be a 2D numpy array.  
- You can use `np.squeeze()` to remove redundant dimensions (in the case of single float, this will be reduced to a zero-dimension array). 
- You can also cast the array as a type `float` using `float()`.

In [None]:
def compute_cost(A2, Y):
    """
    Computes the cross-entropy cost given in equation (13)
    
    Arguments:
    A2 -- The sigmoid output of the second activation, of shape (1, number of examples)
    Y -- "true" labels vector of shape (1, number of examples)

    Returns:
    cost -- cross-entropy cost given equation (13)
    
    """
    
    m = Y.shape[1] # number of examples

    # Compute the cross-entropy cost
    # (≈ 2 lines of code)
    # logprobs = ...
    # cost = ...
    
    
    # YOUR CODE HERE
    raise NotImplementedError()
    
    cost = float(np.squeeze(cost))  # makes sure cost is the dimension we expect. 
    
    return cost

<a id='task5'></a>

[Back to top](#top) 
### Task 5 - Implement Backpropagation

Using the cache computed during forward propagation, you can now implement backward propagation.


Implement the function `backward_propagation()`.

**Instructions**:
Here are six equations. The equations on the right will help to implement a vectorized implementation.  

*(Source: deeplearning.ai; Andrew Ng)*

<img src="grad_summary.png" style="width:600px;height:300px;">
<caption><center><font color='purple'><b>Figure 1</b>: Backpropagation. Use the six equations on the right.</font></center></caption>

<!--
$\frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } = \frac{1}{m} (a^{[2](i)} - y^{(i)})$

$\frac{\partial \mathcal{J} }{ \partial W_2 } = \frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } a^{[1] (i) T} $

$\frac{\partial \mathcal{J} }{ \partial b_2 } = \sum_i{\frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)}}}$

$\frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)} } =  W_2^T \frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } * ( 1 - a^{[1] (i) 2}) $

$\frac{\partial \mathcal{J} }{ \partial W_1 } = \frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)} }  X^T $

$\frac{\partial \mathcal{J} _i }{ \partial b_1 } = \sum_i{\frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)}}}$

- Note that $*$ denotes elementwise multiplication.
- The notation you will use is common in deep learning coding:
    - dW1 = $\frac{\partial \mathcal{J} }{ \partial W_1 }$
    - db1 = $\frac{\partial \mathcal{J} }{ \partial b_1 }$
    - dW2 = $\frac{\partial \mathcal{J} }{ \partial W_2 }$
    - db2 = $\frac{\partial \mathcal{J} }{ \partial b_2 }$
    
!-->

- Tips:
    - To compute dZ1 you'll need to compute $g^{[1]'}(Z^{[1]})$. Since $g^{[1]}(.)$ is the tanh activation function, if $a = g^{[1]}(z)$ then $g^{[1]'}(z) = 1-a^2$. So you can compute 
    $g^{[1]'}(Z^{[1]})$ using `(1 - np.power(A1, 2))`.

In [None]:
def backward_propagation(parameters, cache, X, Y):
    """
    Implement the backward propagation using the instructions above.
    
    Arguments:
    parameters -- python dictionary containing our parameters 
    cache -- a dictionary containing "Z1", "A1", "Z2" and "A2".
    X -- input data of shape (2, number of examples)
    Y -- "true" labels vector of shape (1, number of examples)
    
    Returns:
    grads -- python dictionary containing your gradients with respect to different parameters
    """
    m = X.shape[1]
    
    # First, retrieve W1 and W2 from the dictionary "parameters".
    #(≈ 2 lines of code)
    # W1 = ...
    # W2 = ...
    
    # YOUR CODE HERE
    raise NotImplementedError()
        
    # Retrieve also A1 and A2 from dictionary "cache".
    #(≈ 2 lines of code)
    # A1 = ...
    # A2 = ...
    
    # YOUR CODE HERE
    raise NotImplementedError()
    
    # Backward propagation: calculate dW1, db1, dW2, db2. 
    #(≈ 6 lines of code, corresponding to 6 equations on slide above)
    # dZ2 = ...
    # dW2 = ...
    # db2 = ...
    # dZ1 = ...
    # dW1 = ...
    # db1 = ...
    
    
    # YOUR CODE HERE
    raise NotImplementedError()
    
    grads = {"dW1": dW1,
             "db1": db1,
             "dW2": dW2,
             "db2": db2}
    
    return grads

In [None]:
# Numerically calculate gradients. Note, this takes a while to run
# (cost(w + h) - cost(w)) / h

_test_params = initialize_parameters(*layer_sizes(train_features, train_y))

small_x = train_features[:, :25]
small_y = train_y[:, :25]
    
_, cache = forward_propagation(small_x, _test_params)
ans = backward_propagation(_test_params, cache, small_x, small_y)
    

for k, v in ans.items():
    print("Checking", k)
    h = 0.000001
    weight_key = k[1:] # remove the 'd' prefix
    actual_weight = _test_params[weight_key]
    _new_params = copy.deepcopy(_test_params)
    for i in range(actual_weight.shape[0]):
        for j in range(actual_weight.shape[1]):
            
            _new_params[weight_key][i, j] += h
            top = compute_cost(forward_propagation(small_x, _new_params)[0], small_y) - compute_cost(forward_propagation(small_x, _test_params)[0], small_y)
            assert np.allclose(top / h, v[i, j])
            _new_params[weight_key][i, j] -= h
    print(k, "Passed")
print("Test Passed")

<a id='task6'></a>
[Back to top](#top) 

### Task 6 -  Update Parameters 


Implement the update rule. Use gradient descent. You have to use (dW1, db1, dW2, db2) in order to update (W1, b1, W2, b2).

**General gradient descent rule**: $\theta = \theta - \alpha \frac{\partial J }{ \partial \theta }$ where $\alpha$ is the learning rate and $\theta$ represents a parameter.

<img src="sgd.gif" style="width:400;height:400;"> <img src="sgd_bad.gif" style="width:400;height:400;">
<caption><center><font color='purple'><b>Figure 2</b>: The gradient descent algorithm with a good learning rate (converging) and a bad learning rate (diverging). Images courtesy of Adam Harley.</font></center></caption>

**Hint**

- Use `copy.deepcopy(...)` when copying lists or dictionaries that are passed as parameters to functions. It avoids input parameters being modified within the function. In some scenarios, this could be inefficient, but it is required for grading purposes.


In [None]:
def update_parameters(parameters, grads, learning_rate = 1.2):
    """
    Updates parameters using the gradient descent update rule given above
    
    Arguments:
    parameters -- python dictionary containing your parameters 
    grads -- python dictionary containing your gradients 
    
    Returns:
    parameters -- python dictionary containing your updated parameters 
    """
    # Retrieve a copy of each parameter from the dictionary "parameters". Use copy.deepcopy(...) for W1 and W2
    #(≈ 4 lines of code)
    # W1 = ...
    # b1 = ...
    # W2 = ...
    # b2 = ...
    
    # YOUR CODE HERE
    raise NotImplementedError()
    
    # Retrieve each gradient from the dictionary "grads"
    #(≈ 4 lines of code)
    # dW1 = ...
    # db1 = ...
    # dW2 = ...
    # db2 = ...
    
    # YOUR CODE HERE
    raise NotImplementedError()
    
    # Update rule for each parameter
    #(≈ 4 lines of code)
    # W1 = ...
    # b1 = ...
    # W2 = ...
    # b2 = ...
    
    # YOUR CODE HERE
    raise NotImplementedError()
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters

In [None]:
# Checks simple update
_test_params = {
    "W1": np.zeros((4, 1)),
    "b1": np.zeros((4, 5057)),
    "W2": np.zeros((1, 4)),
    "b2": np.zeros((1, 1))
}

_test_grads = {
    "dW1": np.ones((4, 1)),
    "db1": np.ones((4, 5057)),
    "dW2": np.ones((1, 4)),
    "db2": np.ones((1, 1))
}
ans = update_parameters(_test_params, _test_grads, learning_rate=0.1)
for k, v in ans.items():
    assert np.allclose(v, -0.1)
print("Test Passed")

<a id='task7'></a>
[Back to top](#top) 
### Task 7 - Integration

Integrate your functions in `nn_model()` 

Build your neural network model in `nn_model()`.

**Instructions**: The neural network model has to use the previous functions in the right order.

In [None]:
def nn_model(X, Y, n_h, num_iterations = 10000, print_cost=False):
    """
    Arguments:
    X -- dataset of shape (2, number of examples)
    Y -- labels of shape (1, number of examples)
    n_h -- size of the hidden layer
    num_iterations -- Number of iterations in gradient descent loop
    print_cost -- if True, print the cost every 1000 iterations
    
    Returns:
    parameters -- parameters learnt by the model. They can then be used to predict.
    """
    
    np.random.seed(3)
    n_x = layer_sizes(X, Y)[0]
    n_y = layer_sizes(X, Y)[2]
    
    # Initialize parameters
    #(≈ 1 line of code)
    # parameters = ...
    
    # YOUR CODE HERE
    raise NotImplementedError()
    
    # Loop (gradient descent)
    loss_start = None
    loss_end = None
    for i in range(0, num_iterations):
         
        #(≈ 4 lines of code)
        # Forward propagation. Inputs: "X, parameters". Outputs: "A2, cache".
        # A2, cache = ...
        
        # Cost function. Inputs: "A2, Y". Outputs: "cost".
        # cost = ...
 
        # Backpropagation. Inputs: "parameters, cache, X, Y". Outputs: "grads".
        # grads = ...
 
        # Gradient descent parameter update. Inputs: "parameters, grads". Outputs: "parameters".
        # parameters = ...
        
        # YOUR CODE HERE
        raise NotImplementedError()
        
        # Print the cost every 1000 iterations
        if print_cost and i % 100 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))
        loss_start = cost if loss_start is None else loss_start
        loss_end = cost
    assert loss_start >= loss_end, f"Test failed! The loss at the start {loss_start} must be larger than the loss at the end of training {loss_end}"
    print("Test Passed")
    return parameters

<a id='task8'></a>
[Back to top](#top) 
### Task 8 - Test the Model (Peedict)


Predict with your model by building `predict()`.
Use forward propagation to predict results.

Predictions = $ \begin{cases}
      1 & \text{if}\ activation > 0.5 \\
      0 & \text{otherwise (<= 0.5)}
    \end{cases}$  
    

In [None]:

def predict(parameters, X):
    """
    Using the learned parameters, predicts a class for each example in X
    
    Arguments:
    parameters -- python dictionary containing your parameters 
    X -- input data of size (n_x, m)
    
    Returns
    predictions -- vector of predictions of our model 
    """
    
    # Computes probabilities using forward propagation, and classifies to 0/1 using 0.5 as the threshold.
    #(≈ 2 lines of code)
    # A2, cache = ...
    # predictions = ...
    
    
    # YOUR CODE HERE
    raise NotImplementedError()
    
    return predictions

In [None]:
#Run this cell
parameters =  nn_model(train_features, train_y, n_h, num_iterations = 1000, print_cost=True)

predictions = predict(parameters, test_features)



<a id='task9'></a>

[Back to top](#top) 
### Task 9 - Evaluation
Implement the function evaluate to calculate and return the accuracy, f1 score, precision and recall.

In [None]:
def evaluate(predictions,test_y):
    predictions = np.reshape(predictions, newshape=(-1))
    test_y = np.reshape(test_y, newshape=(-1))
    
    # YOUR CODE HERE
    raise NotImplementedError()
    
    return accuracy, f1, precision, recall

In [None]:
_test_preds   = np.array([1, 0, 1, 0, 1, 1, 1, 1])
_test_correct = np.array([0, 1, 1, 0, 1, 1, 1, 0])
assert np.allclose(np.round(evaluate(_test_preds, _test_correct), 3), (0.625, 0.727, 0.667, 0.8))
print("Test Passed")

In [None]:
accuracy = evaluate(predictions,test_y)[0]
assert accuracy >= 0.60
print("Test Passed")

In [None]:
f1 = evaluate(predictions,test_y)[1]
assert f1 >= 0.60
print("Test Passed")

In [None]:
precision = evaluate(predictions,test_y)[2]
assert precision >= 0.60
print("Test Passed")

In [None]:
recall = evaluate(predictions,test_y)[3]
assert recall >= 0.60
print("Test Passed")

##### Note that running the neural network with the entire dataset (10000 tweets) might take some time, depending on your computation power.
Try increasing the portion of the dataset used, with a larger number of iterations and you will get better results.
You can run your code on Google colab, if you are struggling with compute.