<a href="https://colab.research.google.com/github/KCL-Health-NLP/nlp_examples/blob/master/ann/cnn_text_classifier_with_keras-answers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# A simple CNN for text classification - answers

***This is the "answers" version of this notebook. It differs from the "question" version in that the network has parameters and layers to give a reaosnable performance on IMDb.***

Based on an [example from the Keras team](https://keras.io/examples/nlp/text_classification_from_scratch/).

The original has been changed as follows:

* More text cell explanations and code comments
* Inspection of intermediate text pre-processing steps
* Network visualisation
* Plotting of training loss and accuracy
* Seperated out parameters in to a single cell
* Some renaming of variables
* Small differences in use of imports

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.

You may obtain a copy of the License at

[http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.



## Introduction

This notebook uses a popular neural network API, [Keras](https://keras.io/), to build a simple CNN classifer, and runs it over movie reviews from [IMDb - the Internet Movie Database](https://www.imdb.com/).

The notebook walks through building a CNN text classifier from scratch. There is not a lot of coding for you to do. However, once you have run through the notebook, you should alter the network, trying different parameters and layers, and seeing what effect this has on performance. See the suggestions at the end.

The reviews are also available as a pre-prepared dataset that can be downloaded by the Keras distribution, and in their original format from [here](http://ai.stanford.edu/~amaas/data/sentiment/).

The dataset is constructed from very polarised reviews, and has been used in text classification evaluations for several years.

Here's an example positive review:

> I went to an advance screening of this movie thinking I was about to embark on 120 minutes of cheezy lines, mindless plot, and the kind of nauseous acting that made "The Postman" one of the most malignant displays of cinematic blundering of our time. But I was shocked. Shocked to find a film starring Costner that appealed to the soul of the audience. Shocked that Ashton Kutcher could act in such a serious role. Shocked that a film starring both actually engaged and captured my own emotions. Not since 'Robin Hood' have I seen this Costner: full of depth and complex emotion. Kutcher seems to have tweaked the serious acting he played with in "Butterfly Effect". These two actors came into this film with a serious, focused attitude that shone through in what I thought was one of the best films I've seen this year. No, its not an Oscar worthy movie. It's not an epic, or a profound social commentary film. Rather, its a story about a simple topic, illuminated in a way that brings that audience to a higher level of empathy than thought possible. That's what I think good film-making is and I for one am throughly impressed by this work. Bravo!

And here's a negative review example:

> It hurt to watch this movie, it really did... I wanted to like it, even going in. Shot obviously for very little cash, I looked past and told myself to appreciate the inspiration. Unfortunately, although I did appreciate the film on that level, the acting and editing was terrible, and the last 25-30 minutes were severe thumb-twiddling territory. A 95 minute film should not drag. The ratings for this one are good so far, but I fear that the friends and family might have had a say in that one. What was with those transitions? Dear Mr. Editor, did you just purchase your first copy of Adobe Premiere and make it your main goal to use all the goofy transitions that come with that silly program? Anyway... some better actors, a little more passion, and some more appealing editing and this makes a decent movie.




## TensorFlow

This practical uses [TensorFlow](https://www.tensorflow.org/), a very widely used machine learning library which is popular for deep learning models.

TensorFlow uses a class of object called [*Tensors*](https://www.tensorflow.org/guide/tensor), which are based on the mathematical concept of a tensor.

Mathematically, a tensor is a multidimensional array. Zero dimensions, called a rank-0 tensor, is equivalent to a scalar. One dimension, a rank-1 tensor, is equivalent to a vector or array, and a rank-2 tensor equivalent to a 2D matrix. We might have more ranks.

In TensorFlow, the Tensor class holds data as a tensor, and has various methods to manipulate it. Importantly:
* Tensors and another class of objects, *Operations* are used to make computational graphs operating over large matrices. These can be used to represent neural networks, and have methods for back propagation.
* Tensors are designed to be efficiently processed on GPUs.

Generally we don't use the TensorFlow library directly, instead using higher-level APIs that hide some of its complexity.

## Using with GPUs

The execution time of TensorFlow based code will benefit from the use of GPUs. To select a GPU runtime in colab:

* Select the *Runtime* menu
* Select the *Change runtime type* submenu
* In the dialog that appears, under *Hardware accelerator* select *GPU*
* Your existing runtime will disconnect, and you will be allocated and connected to a new GPU runtime.

We will also improve execution time through the way in which we fetch and cache data, in one of the steps below.

The original code comments say: "This example demonstrates the use of Convolution1D for text classification. It gets to 0.89 test accuracy after 2 epochs. Speed:
* 90s/epoch on Intel i5 2.4Ghz CPU.
* 10s/epoch on Tesla K40 GPU."

These figures might be out of date by the time you read them!

## Packages

First, the import. You will need Keras. Keras is the default high-level API for TensorFlow, which is itself the most popular neural net libray. 

**Note if running locally:** in order for the visualisation to work, you will need to have pydot and graphviz installed, e.g. 

```sudo apt-get install graphviz
pip3 install pydot```

In [None]:
# Basics
import tensorflow as tf
import numpy as np

# Keras package to handle directories of text
from tensorflow.keras.utils import text_dataset_from_directory

# Model layers - we need these!
from tensorflow.keras import layers

# We use these next two when pre-processing string
import string
import re

# For displaying models
from tensorflow.keras.utils import plot_model
from tensorflow.keras.utils import model_to_dot
import matplotlib.pyplot as plt
from IPython.display import SVG

## Parameters

Now let's set up some parameters, such as number of features, embedding dimensions, batch size, epochs etc.

In [None]:
# How many documents in a batch?
batch_size = 32

# Maximum or padded length (in tokens) of a text sequence
sequence_length = 500

# Maximum number of features in our text vector space.
# i.e. how many different tokens in our vocabulary
max_features = 20000

# Dimensions in text embedding
embedding_dim = 128

# Proportion of nodes to dropout of the
# embedding layer in each epoch
embedding_dropout= 0.2

# Number of output filters in the convolution
filters = 128

# Length of the convolution window
kernel_size = 7

# Proportion of nodes to dropout of the
# dense layer in each epoch
dense_dropout= 0.5

# Number of units in the dense hidden layer
hidden_units = 250

# Number of training epochs
epochs = 2

# Prediction threshold, above which an output probability
# will indicate class 1.
pred_threshold = 0.5

## Get the text
First we retrieve the IMDB dataset from Stanford University.

* ```curl``` is a unix command line utility which can retrieve data from a web server.
* The data is in ```tar.gz``` format
  * ```gz``` - zipped with the gzip utility
  * ```tar``` - an archive format
* The ```tar``` command line utility will unzip and stream it in to its original directory structure.



In [None]:
!curl -O https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
!tar -xf aclImdb_v1.tar.gz

Tale a look at the data in the folder you have just downloaded.
* What's in the README file?
* What two datasets is it split in to?
* How are different classes distinguished in the datasets?

You can use the ```cat``` command line tool to read individual files.







In [None]:
!cat aclImdb/train/pos/6248_7.txt

Note that the ```train``` directory contains three folders:
* ```neg``` - negative reviews
* ```pos``` - positive reviews
* ```unsup``` - unlabelled reviews for unsupervised training

We don't need this last one. We must delete it, or or next step will read it in.

In [None]:
# unix command to remove directory recursively
# check it has worked!
!rm -r aclImdb/train/unsup

## Read in the text
Next we need to read the text in to our code, in to a Keras ```Dataset``` object. There's a handy Keras function, ```text_dataset_from_directory``` that will do just this, reading in the data, and inferring labels based on the directory structure. Data and labels are placed in a tensor. We will use the same function to split our training data in to train and dev (validation) sets.

In [None]:
# Training data, 80% of the train directory
train_raw = text_dataset_from_directory(
    "aclImdb/train",
    batch_size=batch_size,           # batches for use in future processing
    validation_split=0.2,            # proportion of data to put in dev set
    subset="training",               # which train / val subset is this?
    seed=1337,                       # you need to set the same seed here
                                     # and in the val data to avoid overlap
)

# Validation / dev data - the remaining 20%
val_raw = text_dataset_from_directory(
    "aclImdb/train",
    batch_size=batch_size,
    validation_split=0.2,
    subset="validation",
    seed=1337,
)

# Held-out test data, all of it
test_raw = text_dataset_from_directory(
    "aclImdb/test",
    batch_size=batch_size
)

Let's take a look. We can use the ```Dataset.take()``` method to get a single batch out, and then iterate over the that to look at the elements.

In [None]:
# take creates a dataset with n elements from
# the object
examples = train_raw.take(1)

# iterating over a dataset returns a tuple
# of instances and labels
for texts, labels in examples:

    # we'll look at five of the examples
    for i in range(5):

        # data and labels are held in tensors, so we 
        # convert to numpy arrays to print
        print(texts.numpy()[i])
        print(labels.numpy()[i])

## Preprocessing the text
Looking at the above examples, can you see anything we might need to remove or change when processing? We will define a function that takes an input of text data, and uses TensorFlow string processing functions to change them.

In [None]:
# Process text to standardise
def preprocess_text(input_data):

    # lowercase everything
    lowercase = tf.strings.lower(input_data)

    # remove html line breaks
    stripped_html = tf.strings.regex_replace(lowercase, "<br />", " ")

    # remove escaped punctuation characters (e.g. \' and \xc3)
    esc_removed = tf.strings.regex_replace(
        stripped_html, f"[{re.escape(string.punctuation)}]", ""
        )
    
    return esc_removed
    

Does this do what we need? Let's take a look at a full batch of examples.

In [None]:
for texts, labels in train_raw.take(1):
  print(preprocess_text(texts))


What do you think?
* Are there still characters we might not want? What are they, and how might you remove them?
* Have we introduced any other problems?

## Vectorise the text

Next we will create a Keras layer to normalize (using the function we just wrote), split in to tokens, and map to integers.

There are several ways in which we could represent the tokens as integers. We will use a simple mapping where each unique token is represented by a different integer.



In [None]:
# Make a TextVectorization layer, using the preprocess_text
# function that we wrote above.
# output_mode="int" - builds an integer index, each unique
#                     token mapped to an integer
# max_tokens: I think this will restrict the integer index
#             to the given number of most frequent tokens
# output_sequence_length: restrict and pad output to this length
vectorize_layer = layers.TextVectorization(
    standardize=preprocess_text,
    output_mode="int",
    max_tokens=max_features,
    output_sequence_length=sequence_length,
)

# We need to "adapt" the layer to our corpus of texts,
# i.e. fit it to the vocabulary, computing the integer
# mappings. Note this does not vectorize the text,
# just computes the vocabulary

# To do this we first need a text-only dataset with no labels.
# Dataset.map(function) maps the values in a dataset using
# the function. Here we use a simple lambda expression for
# our function
train_texts = train_raw.map(lambda x, y: x)

# Now we can adapt to this text
vectorize_layer.adapt(train_texts)


Next, we will write a very simple function that uses out vectorize_layer to vectorize the texts.

Why didn't we just write a function in the first place? Why did we use a layer? It's not just to take advantage of the functionality of vectorize_layer. Now that we have a layer defined to do this, we could incorporate the layer in to our final model later on, so that the model can take raw text as input. We have the flexibility to:
* use the function we are defining below to vectorize the text, and then pass it to the model
* or, incorporate the vectorize_layer in to our model, so the model can handle raw text

In [None]:
# function to use out vectorize_layer to vectorize a text tensor
# and return it with the label tensor
def vectorize_text(text, label):

    # add an innermost (right hand) dimension to the text
    text = tf.expand_dims(text, -1)   
    return vectorize_layer(text), label


# Now we can vectorize the data.
train_clean = train_raw.map(vectorize_text)
val_clean = val_raw.map(vectorize_text)
test_clean = test_raw.map(vectorize_text)

What does our data look like, now we have transformed it?

In [None]:
# Let's take a look at a single example of our vectorized data
for t in train_clean:
  print(t)
  break       

* Why is the shape of the first tensor 32 x 500?
* What are the individual integers in the first array of vectors?
* What do you think of the distribution of integers, given that we have a mximum value of 20000 ?
* Why do some of the vectors end in lots of zeros?
* What is the second tensor?

## Improving performance

We can improve the performance of our code by cacheing (buffering) data. We can also *prefetch* data. Normally, a each batch of data will be fetched, preprocessed, and run through the model before the next us started. With prefetching, the next batch will be fetched and preprocessed while the previous batch is being run through the model. This can speed up 2nd and subsequent epochs in particular.

In [None]:
# Do async prefetching / buffering of the data for best performance on GPU.
train_clean = train_clean.cache().prefetch(buffer_size=10)
val_clean = val_clean.cache().prefetch(buffer_size=10)
test_clean = test_clean.cache().prefetch(buffer_size=10)

## Building the model

A neural network model is a directed acyclic graph of layers. To build our model, we create each individual layer and connect it to the previous one in our graph.

Imagine we have two kinds of layers modelled by classes called ```OneTypeOfLayer``` and ```AnotherTypeOfLayer```. We can construct instances of the two layers like this:

```
first_layer = OneTypeOfLayer(parameters)
second_layer = AnotherTypeOfLayer(more_parameters)
```

We could then join the two layers like this (using the ```___call__``` method internally):

```
second_layer(first_layer)
```

There is a syntactic shortcut for this, using a [functional programming syntax](https://keras.io/guides/functional_api/):

```
first_layer = OneTypeOfLayer(parameters)
second_layer = AnotherTypeOfLayer(more_parameters)(first_layer)
```

Extending this in the examples below, we use variable ```x``` to refer to the last layer we have created:

```
x = OneTypeOfLayer(parameters)
x = AnotherTypeOfLayer(more_parameters)(x)
x = YetAnotherLayerType(params)(x)
```

We will use this syntax below.

In [None]:
# A integer input node for the vocabulary indices.
# This instantiates the input tensor.
inputs = tf.keras.Input(shape=(None,), dtype="int64")

# We add an embedding layer which maps
# our vocab indices into embedding_dims dimensions.
# We add a dropout layer which will randomly set a 
# proportion of its inputs to zero, thus ignoring them
x = layers.Embedding(max_features, embedding_dim)(inputs)
x = layers.Dropout(embedding_dropout)(x)

# Now we add 1 dimension convolution layers, which will learn
# word group filters. The number of filters learned
# is given by the parameter filters, and the size of the
# convolution window by kernel_size
x = layers.Conv1D(filters, kernel_size, padding="valid", activation="relu", strides=3)(x)
x = layers.Conv1D(filters, kernel_size, padding="valid", activation="relu", strides=3)(x)

# We apply max pooling to the output of our
# convolution layers
x = layers.GlobalMaxPooling1D()(x)

# We add a standard densely connected hidden layer
x = layers.Dense(hidden_units, activation="relu")(x)

# and another dropout layer which will randomly set a 
# proportion of its inputs to zero, thus ignoring them
x = layers.Dropout(dense_dropout)(x)

# Finally we project the output of the previous layer onto
# a single unit output layer, and squash it with a sigmoid.
predictions = layers.Dense(1, activation="sigmoid", name="predictions")(x)


Now we have all of our layers, let's put them in to a model, by using our input layer and our final predictions layer as parameters. The model wraps up the layers, adding training and inference functionality.

We can then compile our model, i.e. configure it for training by providing parameters for the loss function, optimisatiom, and metrics we will use. 

In [None]:
# Create the model
model = tf.keras.Model(inputs, predictions)

# Compile the model with binary crossentropy loss and an adam optimizer.
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])

## Take a look at the model

Keras can print out textual and graphical representations of a model, that tells us:

* The layers in the model, in the order in which they appear in the model
* The output shape - i.e. the size of the matrices passed between layers. In some layers, the final dimension will be the number of units, in CNN layers, it will be the number of filters.
* Parameters - this is the number of weights in each layer

Let's take a look at our model...


In [None]:
print(model.summary())

We can also visualise this

In [None]:
SVG(model_to_dot(model).create(prog='dot', format='svg'))

## Train the model

Now let's train it. Keras will validate against our test data, showing us loss and accuracy as it goes, and saving these in to a ```History``` object. We can use this ```History``` to display the results of each epoch, after we have finished all training.

In [None]:
# Fit the model to the training data, validating against our validation
# data on each epoch. Save the results to a History object.
history = model.fit(train_clean, validation_data=val_clean, epochs=epochs)

## Visualise the training process

OK, but how did that change over time? We can plot the results from each epoch that were stored in our ```History``` object.

In [None]:
# summarize history for accuracy
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

What do you think of the results? How good was it after 1 epoch? Is it going to improve much more if you run more epochs?

## Evaluate the model

Once we have settled on our final model, we can evaluate it against out held-out test data. Note that if we wanted, we could now retrain on all of our test and validation data combined, as we no longer need the validation data. We won't retrain on this occasion.

In [None]:
score = model.evaluate(test_clean)
print(f"{'Test loss:':16}{score[0]:.2f}")
print(f"{'Test accuracy:':16}{score[1]:.2f}")

## Building an end-to-end model

We now have a model that is trained on and works on text represented by integer vectors. What we would now like is a model that will take raw text input, and give us the class of that text.

Recall that when we wrote code to pre-preprocess our text, we wrapped it up in a Keras ```TextVectorization``` layer. We will now create a model that first uses this ```TextVectorization``` layer, and then passes the tensors from that to our existing model. We will compile this as a new end-to-end model

Keras also has facility for saving and loading models to and from disk, which we will not use in this example. 



In [None]:
# Create a string input tensor
inputs = tf.keras.Input(shape=(1,), dtype="string")

# Connect a TextVectorization layer to our input
# layer, to turn the strings into vocab indices
indices = vectorize_layer(inputs)

# Connect these indices to our existing model
# to output predictions
outputs = model(indices)

# We now create a new end to end model
# from our inputs and outputs...
end_to_end_model = tf.keras.Model(inputs, outputs)

# ... and compile it
end_to_end_model.compile(
    loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"]
)


## Using the model

Let's use our final end=to-end model to make some predictions. There are two ways to do this:

* Call ```end_to_end_model.predict(texts)``` method. This will process a batch, and is useful when processing lots of text.
* Call the model itself, ```end_toEnd_model.(texts)```. This uses the ```__call__``` method internally. It is useful for processing individual or small numbers of texts.

We will use the latter method.

Try running on different reviews - class 0 are bad reviews, class 1 are good reviews. There are some to get you started below, some made up, some from IMDb, and some recent ones from the Guardian. You could always find others from IMDb - anything from the last few years won't be in the training data.

How does it do? Could you improve it?

In [None]:
# Some examples
examples = [
    ['A truly awful movie. I never want to see rubbish like this again. Really bad, worthless, hopeless, bad acting',
     0],

    ['How do you update Shaft for the modern era? John Singleton tried in 2000 with a serviceable if unspectacular sequel, a rather asexual and anonymous follow-up to the far more stylish and distinctive original. Almost two decades later and somehow we\'re even further from the right answer. Because it turns out that a 2019 version of Shaft probably shouldn\'t turn into an unabashed celebration of regressively misogynistic and homophobic masculinity. In Ride Along director Tim Story\'s wildly misjudged follow-up, we\'re given a Jordan Peterson-level assault on so-called beta millennial males, a strange, angry attack on modernity that feels like the result of a group of bitter men griping about the metrosexualisation of a younger generation. In the latest incarnation, the younger Shaft, played by the undeniably charming Jessie T Usher, is a decaf coffee-drinking, gun-hating, women-respecting data analyst for the FBI, which turns him into a joke and a punchline for both the film and his absentee father. Played by a returning Samuel L Jackson, he\'s disgusted to see what his son has turned into at the hands of his mother (a wasted Regina Hall), who raised him in his absence. Convinced he must be gay, he\'s determined to turn him into his idea of a real man as the pair investigate the death of younger Shaft\'s ex-junkie friend. There\'s an inevitable culture clash between the two but elder Shaft is focused on showing his son that listening less to women and shooting more guns will help to show him the way. There\'s a smart, self-aware film to be made from a rough kernel of this setup. Focusing on the different definitions of masculinity shared by two generations of men is an intriguing entry point, especially given that one is a father who hasn\'t been present for his son\'s youth. There\'s space for suggesting a happy medium between two extremities but Story\'s update has zero interest in nuance or even debate. We\'re shown time and time again that for the younger Shaft, the more he embraces modern, “softer” qualities, the less of a man he then is. Skinny jeans, coconut water, desk work – all treated with unbridled disdain as for his father they all symbolise femininity or, even worse, homosexuality. The film is littered with uneasy jabs not just towards gay men but also the trans community (young Shaft\'s boss complains that his biggest problem is that his daughter “wants to be known as Frank”). “You sure you like pussy?” is repeated by a concerned Jackson with such alarming frequency that I was tempted to ask the critic next to me whether we had somehow been magically transported back to the 70s. But even in the 1971 original, sexual and gender politics were far less troubling. Back then, Shaft even had a gay friend of sorts, a barman who he treated as his equal, but somehow almost 20 years later, every reference to homosexuality is dripping in bile. While 70s Shaft might have been dismissive about the women he was having sex with, he didn\'t feel the need to pause the film to give a sermon about how all women desperately want and need to be treated with unquestioned strength and power, something that the modern incarnation deems necessary. It would be one thing if the film presented him as a dinosaur but the script, from Black-ish creator and Girls Trip co-writer Kenya Barris and Family Guy\'s Alex Barnow, is too busy hero-worshipping him to bother finding fault. The film presents us with two outwardly strong female characters, in Hall and younger Shaft\'s love interest, played by Love, Simon\'s Alexandra Shipp, but any feistiness is soon overruled by their visible arousal at men being men. In one of the film\'s strangest scenes, younger gun-hating Shaft is forced into a shootout as his date watches smiling, turned on by his blood thirst. Similarly Hall\'s date with a beta male (who\'s treated with contempt for having manners and being scared by another shootout) is interrupted by a swaggering Jackson, the man who abandoned her with a baby, making her quiver by showing off the two women he has arrived with and embarrassing her date. Even outside of the script\'s aggressively repetitive bigotry, the shambolic Scooby Doo plot struggles to grab even the slightest amount of attention. There are half-assed attempts to modernise a familiar narrative with references to an Islamic church and men suffering from PTSD but it soon devolves into dull, by-the-numbers, jarringly over-sentimental sitcom. When action arrives it\'s also haphazardly choreographed, especially in a shoddy, confusingly shot finale. Jackson shamelessly showboats throughout but his charisma is buried underneath a shtick that becomes so gratingly obnoxious I almost applauded when Richard Roundtree\'s original Shaft made his inevitable cameo. But rather than saving the day, he\'s given little to do, and the script chooses to defend both admittedly terrible absentee fathers while throwing Hall\'s beleaguered single mother under the bus for feminising her son. It\'s frustrating to see such an underrated comic actor like Hall struggle to find space to shine, although she does provide one of the film\'s rare funny moments as she talks to herself in a public bathroom. Usher similarly, who was so good in the now cancelled TV comedy Survivor\'s Remorse, does his best with a crudely etched character, but any resistance against the film\'s regressive politics is ultimately futile. The mission in Shaft is to break down a modern definition of masculinity, to toughen up “delicate” qualities, such as hating guns and listening to women, and return to the good old days instead. While Jackson\'s ribald relic might succeed in forcing his son back to the past, this embarrassingly tone-deaf film fails to take us with him.',
     0],

    ['It hurt to watch this movie, it really did... I wanted to like it, even going in. Shot obviously for very little cash, I looked past and told myself to appreciate the inspiration. Unfortunately, although I did appreciate the film on that level, the acting and editing was terrible, and the last 25-30 minutes were severe thumb-twiddling territory. A 95 minute film should not drag. The ratings for this one are good so far, but I fear that the friends and family might have had a say in that one. What was with those transitions? Dear Mr. Editor, did you just purchase your first copy of Adobe Premiere and make it your main goal to use all the goofy transitions that come with that silly program? Anyway... some better actors, a little more passion, and some more appealing editing and this makes a decent movie.',
     0],
      
    ['I loved this movie. The best I have seen this year. Great acting, brilliant plot, wonderful dialogue.',
     1],

    ['I went to an advance screening of this movie thinking I was about to embark on 120 minutes of cheezy lines, mindless plot, and the kind of nauseous acting that made "The Postman" one of the most malignant displays of cinematic blundering of our time. But I was shocked. Shocked to find a film starring Costner that appealed to the soul of the audience. Shocked that Ashton Kutcher could act in such a serious role. Shocked that a film starring both actually engaged and captured my own emotions. Not since "Robin Hood" have I seen this Costner: full of depth and complex emotion. Kutcher seems to have tweaked the serious acting he played with in "Butterfly Effect". These two actors came into this film with a serious, focused attitude that shone through in what I thought was one of the best films I\'ve seen this year. No, its not an Oscar worthy movie. It\'s not an epic, or a profound social commentary film. Rather, its a story about a simple topic, illuminated in a way that brings that audience to a higher level of empathy than thought possible. That\'s what I think good film-making is and I for one am throughly impressed by this work. Bravo!',
     1],

    ['A documentary about the pain of mothers losing a connection with their children might not sound like one of the most uplifting films of the year, but Kristof Bilsen\'s film is a radical achievement: a love letter to loss, sacrifice and yearning. It questions how we care for elderly loved ones, makes provocative contrasts between east and west in the economics of medicine, and, with a central character who\'s pure charisma, this is intimate observational documentary-making of a high standard. Pomm is a carer in Thailand for westerners with Alzheimer\'s. She gives her patient one-to-one care, which comprises singing, joking, hugging and confiding, as well as the basics of cleaning and welfare. This is more personal attention than would be possible in her patients\' home countries, and though it doesn\'t come cheap, families feel it\'s worth the expense of sending someone halfway across the world for it. When we first meet her, Pomm\'s patient is Elisabeth, who can communicate only in squeaks and other noises, but seems calm and content. Pomm has her own problems, primarily lack of contact with her children who live many miles away. Tender late-night confessions to Elisabeth make clear the extent to which the support is mutual. This is a brilliantly novel way to establish characters\' motivations and doesn\'t feel crass. It\'s natural and born of their trust in Bilsen, whose presence in the room appears to be minimal. Pomm and Elisabeth are each other\'s family, bound by their need for one another. A segue to Pomm\'s relationship with her children feels uncomfortable – she\'s out of their sight and out of their minds, and it hurts. Later, Pomm has another patient, Maya, brought from Switzerland by a family who love her but have decided the greatest love is to let her go and put her in the hands of a stranger. With Maya\'s arrival in Thailand, the film shifts up a gear, moving from tenderness and sensitivity to something much harder to watch, and Bilsen deserves praise for doing it. Earlier scenes with Maya and her family in the mountains of Switzerland are puzzling and even alienating, as we are plunged into their world without any context. We\'re intrigued but feel like intruders on their private foreboding as they know their time together is running out. Bilsen\'s plotting from start to finish is immaculate, never explaining too much but always hinting just enough about the melancholy destinies of the characters. Mother explores pain and dignity, and the way that some people\'s pain is considered more important than others. Pomm has lower status, only begrudgingly granted time off by her boss to see her family. No one asks her about her wellbeing, no one dotes over her. Yet she is the hero of this piece, giving everything for strangers, night and day, in an exhausting service so her own family can be cared for. This is an original piece of work, addressing the selflessness of mothers and the impact of dementia on families. These subjects have been the focus of documentaries before but rarely in this combination and with such an unflinching resolve to keep filming in uncomfortable moments.',
     1]
]

# We need to convert the examples in to a 2D numpy array of texts
# and labels. We can then pass the text dimension to our model.

# *examples unpacks the examples list in to its individual elements
# zip then zips up these elements, giving a zip object for both a
# text list and a label list.
# We then cast to a list, and construct an np.array from this.
examples = np.array(list(zip(*examples)))

# Let's take a look at that
print(examples)


In [None]:
# Now we can make our predictions

# The model outputs the probability of class 1
pred = end_to_end_model(examples[0])

# We convert this to 1 if above a threshold, otherwise 0
label = np.where(pred > pred_threshold, 1,0)
print(label)

## Next steps

Try these ideas next:

* **Change parameters:** you could vary some of these, and see what impact it has on performance:
  * Number of text vector features (size of the vocabulary)
  * Size of the embedding
  * Dropout proportions
  * Number of filters in the convolution layer
  * Add another convolution layer
  * Number of units in the hidden layer
  * Number of training epochs
* **LSTM:** adapt the code to use an LSTM. Keras has an [LSTM layer](https://keras.io/api/layers/recurrent_layers/lstm/). You can also find [example LSTM code](https://keras.io/examples/nlp/bidirectional_lstm_imdb/) to combine with this notebook.
* **PubMed RCT:** You could convert the PubMed RCT datasets in to a format that can be used with the model, and train and evaluate on that.
* **Assignment:** Combining the above two ideas could make a good answer for the assignment.