<a href="https://colab.research.google.com/github/MohamedElsayed002/DeepLearning_Study/blob/master/RNN17.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Recurrent Neural Networks

A recurrent neural network (RNN) is a type of artifical neural network which uses sequential data or time series data as input its typically used for ordinal or temporal problems like language translation, speech recognition, and time series forecasting.

In this lab, we will unterstand the fundamental building blocks of an RNN. we will train a simply binary text classifier on top of an existing pre-trained module that embeds sentences

## __Table of Contents__

<ol>
    <li><a href="#Objectives">Objectives</a></li>
    <li>
        <a href="#Setup">Setup</a>
        <ol>
            <li><a href="#Installing-Required-Libraries">Installing Required Libraries</a></li>
            <li><a href="#Importing-Required-Libraries">Importing Required Libraries</a></li>
            <li><a href="#Defining-Helper-Functions">Defining Helper Functions</a></li>
        </ol>
    </li>
    <li>
        <a href="#RNN-Fundamentals">RNN Fundamentals</a>
        <ol>
            <li><a href="#Vanilla-Recurrent-Neural-Network"> Vanilla Recurrent Neural Network</a></li>
            <li><a href="#Unrolling-in-time-of-a-RNN">Unrolling in time of a RNN</a></li>
            <li><a href="#Training-an-RNN">Training an RNN</a></li>
        </ol>
    </li>
    <li><a href="#Types-of-RNNs">Types of RNNs</a></li>
    <li><a href="#Pre-trained-RNNs">Pre-trained RNNs</a></li>
</ol>

In [None]:
import numpy as np
import pandas as pd
import math
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

import tensorflow as tf
import skillsnetwork
from tensorflow import keras
from tensorflow.keras import layers
from keras.preprocessing.text import Tokenizer
from tensorflow.keras.losses import mean_squared_error
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense, Embedding,Masking,LSTM, GRU, Conv1D, Dropout
from tensorflow.keras.optimizers import Adam
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Dropout, Embedding, SimpleRNN
from tensorflow.keras.datasets import reuters
from sklearn.model_selection import train_test_split
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization
from sklearn.metrics import accuracy_score,precision_recall_fscore_support
import tensorflow_hub as hub



# Helper Functions

In [None]:
# function to compute the accuracy, precision, recall and F1 score of a model's predictions.
def calculate_results(y_true, y_pred):
    model_accuracy = accuracy_score(y_true, y_pred)
    model_precision, model_recall, model_f1,_ = precision_recall_fscore_support(y_true, y_pred,average="weighted")
    model_results = {"accuracy":model_accuracy,
                     "precision":model_precision,
                     "recall" :model_recall,
                     "f1":model_f1}
    return model_results

# RNN Fundamentals

RNN fall in the category of neural networks that maintain some kind of `state` they can process sequential data of arbitary length. by doing so they overcome certain limitations faced by classical neural networks. classical NNs only accept fixed-length vectors as input and output fixed-length vectors.RNNs operate over sequences of vectors. classical NNs aren't build to consider the sequential nature of some data. RNNs work with sequential data forms like language. video frames, time series and so on.

The RNN layer uses a for-loop to iterate over the time-steps of a sequence, and maintains an internal state that encodes information about all time steps that have been observed so far.The keras RNN API has build-in `keras.layers.RNN` and `keras.layers.LSTM` layers that make it easy to quickly build RNN models.

# Vanilla Recurrent Neural Netowk


RNNs use these two simple formulas:

$$ \mathbf s_t = \mbox{tanh }(U \mathbf x_t + W \mathbf s_{t-1}) $$

$$ \mathbf y_t = V \mathbf s_t $$

The following plot shows the hyperbolic tan function, `tanh`:

<img src="https://github.com/DataScienceUB/DeepLearningMaster2019/blob/master/images/TanhReal.gif?raw=1" alt="" style="width: 300px;">

#### Terminology:
* $s_t$ current network, or the hidden state
* $\mathbf s_{t-1}$ previous hidden state
* $\mathbf x_t$ current input
* $U, V, W$ matrices that are parameters of the RNN
* $\mathbf y_t$ output at time $t$

These equations say that the current network state or the hidden state, is a function of the previous hidden state and the current input.

### Unrolling in time of a RNN

Given an input sequence, we apply RNN formulas in a recurrent way until we process all input elements. The $U,V,W$ parameters are shared across all recurrent steps. This implies that at each time step, the output is a function of all inputs from previous time steps. The network has a form of memory, encoding information about the time-steps it has seen so far.

Some important observations:
- The initial values for $U,V,W$ as well as for $\mathbf s$ must be provided when training an RNN.
- Hidden state  acts as a memory of the network. It can capture information about the previous steps. It embeds the representation of the sequence.
- We can look at the network's output at every stage or just the final stage.

### Training an RNN

A RNN has a layer for each time step, and its weights are shared across time. It is trained using backpropagation through time, and is done using the following steps:
- The input or the training set is made of several input ($n$-dimensional) sequences $\{\mathbf{X}_i \}$ and corresponding outcomes. Each element of a sequence $\mathbf{x}_j \in \mathbf{X}_i$ is also a vector.
- We use a loss function to measure how well the network's output fits to the expected outcome, such as ground truth.
- We apply an optimization method like stochastic gradient descent or Adam to optimize the loss function
- After the forward pass, gradients of the cost function are propagated backwards through the unrolled network


## Types of RNNs

Predicting the output, $y_t$, at each time step is not always the case. Different RNN architectures can be used to solve different kinds of problems.

|Type|Input|Output|Example problem
|-|-|-|-
|*many-to-many*|An input sequence|An output sequence|Part of Speech (POS) tagging
|*many-to-one*|An input sequence|Value of output sequence for last timestep|Text classification: positive tweet or negative?
|*one-to-many*|Single value of input sequence|An output sequence| Given an input image, predict sequence data

In this section, we will be experimenting with existing RNNs. We will use the NLP disaster dataset. The dataset contains a `test.csv` and a `train.csv` each of which have the following information:

* The text of a tweet
* A keyword from that tweet (although this may be blank!)
* The location the tweet was sent from (may also be blank)

Our task is to predict whether a given tweet is about a real disaster or not. If so, predict a 1. If not, predict a 0.



In [None]:
await skillsnetwork.prepare("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-ML311-Coursera/labs/Module4/L1/nlp_disaster.zip")


Downloading nlp_disaster.zip:   0%|          | 0/607343 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Saved to '.'


Now we will read in the train dataset.Here we use `frac=1` so all rows in the training dataset are returned in a random order. we also set a random state to ensure reproduciblity of results

In [None]:
train_df = pd.read_csv('train.csv')
# shuffle the dataset_

train_df_shuffled = train_df.sample(frac=1,random_state=42)

In [None]:
train_df_shuffled.head()

Unnamed: 0,id,keyword,location,text,target
2644,3796,destruction,,So you have a new weapon that can cause un-ima...,1
2227,3185,deluge,,The f$&amp;@ing things I do for #GISHWHES Just...,0
5448,7769,police,UK,DT @georgegalloway: RT @Galloway4Mayor: ÛÏThe...,1
132,191,aftershock,,Aftershock back to school kick off was great. ...,0
6845,9810,trauma,"Montgomery County, MD",in response to trauma Children of Addicts deve...,0


We will use 90% of the entire labelled dataset for training and 10% of it for testing purpose s

In [None]:
# split the data into 90% training and 10% testing
X_train, X_test, y_train, y_test = train_test_split(train_df_shuffled["text"].to_numpy(),
                                                    train_df_shuffled["target"].to_numpy(),
                                                    test_size = 0.1,
                                                    random_state=42)
X_train.shape, y_train.shape

((6851,), (6851,))

In [None]:
X_train[0:5]

array(['@mogacola @zamtriossu i screamed after hitting tweet',
       'Imagine getting flattened by Kurt Zouma',
       '@Gurmeetramrahim #MSGDoing111WelfareWorks Green S welfare force ke appx 65000 members har time disaster victim ki help ke liye tyar hai....',
       "@shakjn @C7 @Magnums im shaking in fear he's gonna hack the planet",
       'Somehow find you and I collide http://t.co/Ee8RpOahPk'],
      dtype=object)

`TextVectorization` is a preprocessing layer which maps text features to integer sequences. We also specify `lower_and_strip_punctuation` as the standardization method to apply to the input text. The text will be lowercased and all punctuation removed. Next we split on the whitespace, and pass `None` to `ngrams` so no ngrams are created.

In [None]:
text_vectorizer = TextVectorization(max_tokens=None,
                                    #remove punctuation and make letters lowercase
                                    standardize="lower_and_strip_punctuation",
                                    #whitespace delimiter
                                    split="whitespace",
                                    #dont group anything, every token alone
                                    ngrams = None,
                                    output_mode ="int",
                                    #length of each sentence == length of largest sentence
                                    output_sequence_length=None
                                    )


In [None]:
# define hypermaters

# number of wors in the vocab

max_vocab_length = 10000
# tweet averagelength
max_length = 15

Below we define an `Embedding` layer with a vocabulary of 10000 a vector space of 128 dimensions in which words will be embedeed and input docuemnts that have 15 words each

In [None]:
embedding = layers.Embedding(input_dim=max_vocab_length,
                             output_dim=128,
                             input_length=max_length)

The `hub.KerasLayer` wraps a savedmodel (or a legacy TF1 Hub format) as keras layer the `univeral-sentence-encoder` is an encoder of grater-than-word length text trained on a variety of data. it can be used for text classification, semantic similarity clustering and other natural language tasks

we can train a simple binary text classifier on top of any TF-hub module that can embed sentences the universal sentence encoder was partially trained with custom text classification tasks in mind these kinds of classifers can be trained to perform a wide variety of classification tasks ofter with a very small amound of labeled examples

More on this is found in the Tensorflow Hub [documentation](https://tfhub.dev/google/universal-sentence-encoder/4?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkML311Coursera747-2022-01-01)

In [None]:
encoder_layer = hub.KerasLayer("https://tfhub.dev/google/universal-sentence-encoder/4",
                               input_shape=[],
                               dtype = tf.string,
                               trainable=False,
                               name="pretrained")

The `encoder_layer` will take as input variable length English text and the output is a 512 dimensonal vector.

we will add a dense layer with unit 1 to create simple binary text classifier on top of any TF-Hub module. Next, we will compile and fit it using 20 epochs

In [None]:
model = tf.keras.Sequential([
                             encoder_layer,
                             layers.Dense(1,activation="sigmoid")], name="model_pretrained")
model.compile(loss="binary_crossentropy",
                     optimizer="adam",
                     metrics=["accuracy"])

model.fit(x=X_train,
              y=y_train,
              epochs=20,
              validation_data=(X_test,y_test))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.src.callbacks.History at 0x799a2c8c6bc0>