# Neural Networks II

In [None]:
%pip install keras_nlp
%pip install tensorflow_datasets
%pip install transformers
%pip install tensorflow-hub

In [767]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

from transformers import pipeline, AutoTokenizer

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, cohen_kappa_score, f1_score, classification_report, balanced_accuracy_score

import seaborn as sns



## Neural Networks for Text Data

Neural networks are extremely flexible, which allows you to use them for all kinds of data. We've already seen this with data that was in a 2-dimensional format with images. They can also be used for text data to do tasks such as sentiment analysis using supervised learning.



Go ahead and run this (it will take a moment to finish) and we'll talk about it in a moment:

In [783]:
embed = hub.load("https://www.kaggle.com/models/google/universal-sentence-encoder/TensorFlow2/universal-sentence-encoder/2")
# this does nothing except fix a compatibility issue with tensorflow
embed_layer_wrapper = tf.keras.layers.Lambda(lambda x: embed(x))

# Reviews

We'll start by reloading the IMDB movie review corpus that we used a couple of weeks ago. Just to refresh your memory: this is a subset from a larger corpus of user generated IMDB reviews. The dataset contains the full text of each review, along with a numeric label that is equal to 0 if the review was negative and 1 if the review is positive. Because this is just an example data set, there's actually an even split between positive and negative reviews here, so we have a more-or-less balanced sample of 2500 positive reviews and 2500 negative reviews to work with:


In [789]:
reviews = pd.read_csv("imdb_reviews.csv")

reviews.head()

Unnamed: 0,text,label
0,I always wrote this series off as being a comp...,0
1,1st watched 12/7/2002 - 3 out of 10(Dir-Steve ...,0
2,This movie was so poorly written and directed ...,0
3,The most interesting thing about Miryang (Secr...,1
4,"when i first read about ""berlin am meer"" i did...",0


Just as in previous classes, we're going to be evalauting a model here by creating separate training and testing datasets. We'll also convert these datasets to tensors in order to make it easier to work with them in tensorflow

In [793]:


train_examples,  test_examples= train_test_split(reviews,
                                     test_size=0.20, # 20% of observations for validation
                                     random_state = 999) # this is a random process, so you want to set a random seed! 


# convert to tensor objects
train_tensor = tf.convert_to_tensor(train_examples['text'])
test_tensor = tf.convert_to_tensor(test_examples['text'])
train_labels = tf.convert_to_tensor(train_examples['label'])
test_labels = tf.convert_to_tensor(test_examples['label'])



In [794]:
print(f"Training entries: {len(train_examples)}, test entries: {len(test_examples)}")

Training entries: 4000, test entries: 1000


## Embeddings

In a previous class, we trained a naive bayes classifier to distinguish positive from negative IMDB reviews with a fairly high degree (~84%) accuracy. 

Now, we're going to try to do the same task using a neural network trained on a sentence embedding model. **Text Embeddings** represent one way that analysts can move away from the bag-of-words model to create classifiers that can account for things like word order, synonyms and antonyms and complex grammatical relationships.

Word/Sentence/Document embedding models can take strings of text and convert them into a "dense" vector of numbers whose values reflect some kind of abstract meaning. The precise method for creating them will be different depending on the model, but the general idea is that they use some text as training data and then are trained to "predict" some missing text or context. The weights from this predictive model will be similar for texts that have similar meanings. 

In a well-trained word-embedding model, words with similar meanings will have similar values (<a href='https://projector.tensorflow.org/'>there's a good visual representation here</a>). Instead of using a bag-of-words as our input for a classifier, we can pass our text through an embedding model to get a representation that can account for things like synonyms and context.



The `embed` object we downloaded earlier is a pre-trained embedding model that is built for general-purpose applications. It takes a list of strings as inputs and returns a vector of 512 numbers that represent that sentences "location" in a 512 dimension space. Here's an example of getting the first ten elements from the embedding for a sentence:

In [795]:
# embedding a sentence about catci and looking at the first 10 elements

embed(["The rattail cactus is native to Mexico."])[0][:10]

<tf.Tensor: shape=(10,), dtype=float32, numpy=
array([-3.59944254e-02,  2.86443513e-02,  5.79423613e-05, -1.09751895e-02,
       -3.56823625e-03,  2.59994646e-03,  1.08064972e-02, -1.86106842e-02,
       -2.18271017e-02, -2.75516417e-02], dtype=float32)>

To illustrate what why this is useful, we can use a little code from the <a href='https://www.tensorflow.org/hub/tutorials/semantic_similarity_with_tf_hub_universal_encoder'>online documentation</a> that will allow us to visualize the similarities between the embeddings produced by different sentences:

In [796]:
def plot_similarity(labels, features, rotation):
  corr = np.inner(features, features)
  sns.set(font_scale=1)
  g = sns.heatmap(
      corr,
      xticklabels=labels,
      yticklabels=labels,
      vmin=0,
      vmax=1,
      cmap="YlOrRd")
  g.set_xticklabels(labels, rotation=rotation)
  g.set_title("Semantic Textual Similarity")

def run_and_plot(messages_):
  message_embeddings_ = embed(messages_)
  plot_similarity(messages_, message_embeddings_, 90)

Below are some sentences from different wikipedia entries. The first two are from the entry on *Citizen Kane*, the last two are from entries on cacti. Note that the terms in both groups share very few terms overall, but take a look at their similarities as measured by the innner products of their respective embeddings:

In [None]:
run_and_plot([
    # two sentences from the Wikipedia entry for citizen kane
    "Citizen Kane is often cited as the greatest film ever.",
    "Hollywood had shown interest in Welles as early as 1936.",
    # sentences from entries on cacti
    "The rattail cactus is native to Mexico.",
    "Prickly pears are frequently found around California."])


In essence, text embeddings give us a more flexibile way to represent text that can account for nuanced aspects of meaning and context, so that sentences about the same general idea are "close" in the embedding space even if they share none of the exact same terms. Feeding these inputs - instead of a simple bag of words - into a machine learning model, can allow us to make more effective use of the same data.

## Fitting the model

Now, let's fit a model to predict movie reviews that uses the embedding model. We'll use the embedding layer as our input layer and then include two hidden layers and a sigmoid output layer that will return our predicted probability of a review being negative or positive

In [791]:
model = Sequential([
   
    embed_layer_wrapper,
    Dense(64, activation='relu'),
    Dense(32, activation='relu'),
    Dense(1, activation='sigmoid')
    ])

model.summary()

After that, we compile our model and then train it for 15 epochs. 

In [None]:
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])


history = model.fit(train_tensor, 
                    train_labels,
                    epochs=15,
                    batch_size=500,
                    validation_data=(test_tensor, 
                                     test_labels),
                    verbose=1
                   )

Now we can generate some predictions and look at our results:

In [None]:
# generate predictions from test data
preds = model.predict(test_tensor).flatten()>=.5


<h3 style="color:red;">Q1:Create a confusion matrix and classification report from the predictions and assess the quality of the classifier</h3>

In [None]:
# create a confusion matrix
cmat =pd.crosstab(test_labels, preds>=.5,  margins=True).rename_axis(index = 'Truth', columns='Predictions')
cmat

# evaluate the performance on the held out data
print(classification_report(test_labels, preds>=.5, 
                            # add target_names to show labels in the report:
                              target_names=['Negative', 'Positive']))


# add cohen's kappa and balanced accuracy
print("cohens kappa: ", cohen_kappa_score(test_labels, preds>=.5))
print("balanced accuracy: ", balanced_accuracy_score(test_labels, preds>=.5))

How does this do? Does it outperform the naive bayes classifier? Why might this be? 

## Changes to the Model

We can make changes to the model to add more layers, use more nodes, train it for longer, or even use a different kind of model. This is part of the overall process for finding the model that has the best performance in terms of accuracy. In reality, we would do these steps many, many times, tuning our model so that it is as good as possible.

In reality, the full IMDB reviews corpus is much larger than what we've been using here, so we would also want to use that data in its entirety for a real world application, but since that takes a while to train, we can use a pre-made model that was trained on this data set to get a sense of how well we could do if we did some more fine-tuning.


<h3 style="color:red;">Q2: Change something about the model above and compare your results. </h3>

(Some options are: add an additional hidden layer, run the same model for more epochs, add more nodes to one or more of the layers, or add <a href='https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dropout'>dropout</a>)

### Pre-built models from Hugging Face

The [Hugging Face Hub](https://huggingface.co/models) has many models that have been pre-trained for you to use. One of the models hosted there is a <a href='https://huggingface.co/aychang/roberta-base-imdb'>sentiment classifer that was trained on the entire IMDB corpus</a>. We can load this model and see how it performs on our held-out data.


In [None]:
tiny_bert = pipeline("text-classification", "arnabdhar/tinybert-imdb")


In [501]:
# converting to list since thats the input format this model uses
test_list = test_examples['text'].tolist()

# applying the model

results = tiny_bert(inputs =test_list, max_length=512, truncation=True)
# looking at the first five results
results[:5]

In [499]:
# reformatting to match our original test labels 
tiny_bert_preds = [int(i['label']=="POSITIVE") for i in results]

In [528]:
print(classification_report(test_labels, tiny_bert_preds, 
                            # add target_names to show labels in the report:
                              target_names=['Negative', 'Positive']))


# add cohen's kappa and balanced accuracy
print("cohens kappa: ", cohen_kappa_score(test_labels, tiny_bert_preds))#
print("balanced accuracy: ", balanced_accuracy_score(test_labels, tiny_bert_preds))

              precision    recall  f1-score   support

    Negative       0.94      0.95      0.94       499
    Positive       0.95      0.94      0.94       501

    accuracy                           0.94      1000
   macro avg       0.94      0.94      0.94      1000
weighted avg       0.94      0.94      0.94      1000

cohens kappa:  0.8860018239708165
balanced accuracy:  0.9430097720390882


## Other Types of Sentiment

The nice thing about these models is that they are also pre-trained to do different types of sentiment analysis. For example, let's take the Distilbert-base-uncased-emotion model. This provides scores for emotions such as joy or anger. Here's an example of getting the emotions expressed in the first 100 rows the the reviews data set:

In [763]:
classifier = pipeline("text-classification",
                      model='bhadresh-savani/distilbert-base-uncased-emotion', 
                      top_k=None)


All PyTorch model weights were used when initializing TFDistilBertForSequenceClassification.

All the weights of TFDistilBertForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.


In [764]:
prediction = classifier(test_list[:100], truncation=True, max_length=120)


In [765]:
emotion_prediction = pd.concat([pd.DataFrame(i) for i in prediction])
emotion_prediction.groupby('label').agg({'score':['min','max','median','mean']})

Unnamed: 0_level_0,score,score,score,score
Unnamed: 0_level_1,min,max,median,mean
label,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
anger,0.000115,0.99525,0.01312,0.187991
fear,8.8e-05,0.986734,0.002353,0.103024
joy,0.000325,0.99884,0.493655,0.513088
love,0.000161,0.644965,0.001724,0.012831
sadness,0.00019,0.998333,0.005088,0.141641
surprise,0.000139,0.980268,0.001508,0.041424


In [766]:
l= []
[l.extend([i] * 6) for i in range(len(prediction))]
emotion_prediction['doc_index'] = l
wide_fmt =emotion_prediction.reset_index().pivot(index = 'doc_index',columns='label', values='score')
wide_fmt['text'] = test_list[:100]
wide_fmt.head()

label,anger,fear,joy,love,sadness,surprise,text
doc_index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.000151,0.000128,0.998835,0.000374,0.00019,0.000321,i first saw this movie at the sundance film fe...
1,0.006406,0.003685,0.977704,0.000793,0.002732,0.00868,The case history of 'Mulholland Dr.' is known:...
2,0.182236,0.722556,0.071156,0.002334,0.018521,0.003197,Despite having a very pretty leading lady (Ros...
3,0.000267,0.000179,0.99862,0.000282,0.000375,0.000277,... than this ;-) What would happen if Terry G...
4,0.962185,0.001332,0.003103,0.00067,0.032253,0.000457,Critics are falling over themselves within the...


You can check out some other options on the hugging face <a href='https://huggingface.co/models'>models page.</a>