<a href="https://colab.research.google.com/github/zerotodeeplearning/ztdl-masterclasses/blob/master/solutions_do_not_open/Sentiment_Classification_with_Recurrent_Neural_Networks_solution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Learn with us: www.zerotodeeplearning.com

Copyright © 2021: Zero to Deep Learning ® Catalit LLC.

In [None]:
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Sentiment Classification with Recurrent Neural Networks

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import gzip
import os

from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

Data loading and prepping is the same as in the [Word Embeddings class](https://github.com/zerotodeeplearning/ztdl-masterclasses#word-embeddings).

In [None]:
url = "https://raw.githubusercontent.com/zerotodeeplearning/ztdl-masterclasses/master/data/"

In [None]:
pos_path = tf.keras.utils.get_file(
    'rotten_tomatoes_positive_reviews.txt',
    url + 'rotten_tomatoes_positive_reviews.txt.gz',
    extract=True)
neg_path = tf.keras.utils.get_file(
    'rotten_tomatoes_negative_reviews.txt',
    url + 'rotten_tomatoes_negative_reviews.txt.gz',
    extract=True)

with gzip.open(pos_path) as fin:
  pos_rev = fin.readlines()
  pos_rev = [r.decode('utf-8') for r in pos_rev]

with gzip.open(neg_path) as fin:
  neg_rev = fin.readlines()
  neg_rev = [r.decode('utf-8') for r in neg_rev]
  
docs = np.array(pos_rev + neg_rev)
y = np.array([1]*len(pos_rev) + [0]*len(neg_rev))

docs_train, docs_test, y_train, y_test = train_test_split(docs, y, test_size=0.15, random_state=0)

In [None]:
max_features = 20000

In [None]:
tokenizer = Tokenizer(
    num_words=max_features,
    filters='!"#$%&()*+,-./:;<=>?@[\\]^_`\'{|}~\t\n',
    lower=True,
    split=" ",
    char_level=False,
    oov_token=None,
    document_count=0,
)

tokenizer.fit_on_texts(docs_train)

In [None]:
seq_train = tokenizer.texts_to_sequences(docs_train)
seq_test =tokenizer.texts_to_sequences(docs_test)

In [None]:
maxlen=58

X_train = pad_sequences(seq_train, maxlen=maxlen)
X_test = pad_sequences(seq_test, maxlen=maxlen)

### Exercise 1

Let's build a model that leverages recurrent layers to classify sentiment.

- Define a new `Sequential` model that uses `LSTM` or `GRU` layers after the `Embedding` layer
- Start with the simplest model possible and gradually increase the complexity
- Train the model and compare the performance of the models developed in the [Word Embeddings class](https://github.com/zerotodeeplearning/ztdl-masterclasses#word-embeddings) with this one.

Your code will look like:

```python
model = Sequential([
  Embedding(# YOUR CODE HERE
  # YOUR CODE HERE
])
```

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Dense, Dropout, LSTM, GRU

In [None]:
embedding_dim=16

model = Sequential([
  Embedding(max_features,
            embedding_dim,
            input_length=maxlen),
  LSTM(32),
  Dense(24, activation='relu'),
  Dense(1, activation='sigmoid')
])

model.summary()

In [None]:
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

h = model.fit(
    X_train, y_train,
    batch_size=128,
    epochs=4,
    validation_split=0.1)

In [None]:
pd.DataFrame(h.history).plot();