In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Emotion prediction with GoEmotions and PRADO



<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/models/blob/master/research/seq_flow_lite/demo/colab/emotion_colab.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/models/blob/master/research/seq_flow_lite/demo/colab/emotion_colab.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

In this tutorial, we will work through training a neural emotion prediction model, using the tensorflow-models PIP package, and Bazel.

This tutorial is using GoEmotions, an emotion prediction dataset, available on [TensorFlow TFDS](https://www.tensorflow.org/datasets/catalog/goemotions). We will be training a sequence projection model architecture named PRADO, available on [TensorFlow Model Garden](https://github.com/tensorflow/models/blob/master/research/seq_flow_lite/models/prado.py). Finally, we will examine an application of emotion prediction to emoji suggestions from text.

## Setup

### Install the TensorFlow Model Garden pip package

`tf-nightly` is the nightly Model Garden package created daily automatically. We install it with pip.

In [1]:
!pip install tfds-nightly



### Install the Sequence Projection Models package

Install Bazel: This will allow us to build custom TensorFlow ops used by the PRADO architecture.

In [3]:
!sudo apt install curl gnupg
!curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -
!echo "deb [arch=amd64] https://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
!sudo apt update
!sudo apt install bazel

Reading package lists... Done
Building dependency tree       
Reading state information... Done
curl is already the newest version (7.58.0-2ubuntu3.16).
gnupg is already the newest version (2.2.4-1ubuntu1.4).
The following packages were automatically installed and are no longer required:
  cuda-command-line-tools-10-0 cuda-command-line-tools-10-1
  cuda-command-line-tools-11-0 cuda-compiler-10-0 cuda-compiler-10-1
  cuda-compiler-11-0 cuda-cuobjdump-10-0 cuda-cuobjdump-10-1
  cuda-cuobjdump-11-0 cuda-cupti-10-0 cuda-cupti-10-1 cuda-cupti-11-0
  cuda-cupti-dev-11-0 cuda-documentation-10-0 cuda-documentation-10-1
  cuda-documentation-11-0 cuda-documentation-11-1 cuda-gdb-10-0 cuda-gdb-10-1
  cuda-gdb-11-0 cuda-gpu-library-advisor-10-0 cuda-gpu-library-advisor-10-1
  cuda-libraries-10-0 cuda-libraries-10-1 cuda-libraries-11-0
  cuda-memcheck-10-0 cuda-memcheck-10-1 cuda-memcheck-11-0 cuda-nsight-10-0
  cuda-nsight-10-1 cuda-nsight-11-0 cuda-nsight-11-1 cuda-nsight-compute-10-0
  cuda-nsig

Install the library:
* `seq_flow_lite` includes the PRADO architecture and custom ops.
* We download the code from GitHub, and then build and install the TF and TFLite ops used by the model.


In [14]:
#only run if you're rerunning cells below
# !rm -rf tensorflow/models
# !rm -rf models

In [15]:
!git clone https://www.github.com/tensorflow/models
!models/research/seq_flow_lite/demo/colab/setup_workspace.sh
!pip install models/research/seq_flow_lite
!rm -rf models/research/seq_flow_lite/tf_ops
!rm -rf models/research/seq_flow_lite/tflite_ops

Cloning into 'models'...
remote: Enumerating objects: 69183, done.[K
remote: Total 69183 (delta 0), reused 0 (delta 0), pack-reused 69183[K
Receiving objects: 100% (69183/69183), 577.30 MiB | 24.38 MiB/s, done.
Resolving deltas: 100% (48768/48768), done.
Processing ./models/research/seq_flow_lite
[33m  DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.
   pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.[0m
Building wheels for collected packages: seq-flow-lite
  Building wheel for seq-flow-lite (setup.py) ... [?25l[?25hdone
  Created wheel for seq-flow-lite: filename=seq_flow_lite-0.1-py3-none-any.whl size=772854 sha256=a8f17eca187ba13c9155984f91e641717705c55a7cbb62b4d0d47050f475fa1b
  Stored in

## Training an Emotion Prediction Model

* First, we load the GoEmotions data from TFDS.
* Next, we prepare the PRADO model for training. We set up the model configuration, including hyperparameters and labels. We also prepare the dataset, which involves projecting the inputs from the dataset, and passing the projections to the model.  This is needed because a model training on TPU can not handle string inputs.
* Finally, we train and evaluate the model and produce model-level and per-label metrics.

***Start here on Runtime reset***, once the packages above are properly installed:
* Go to the `seq_flow_lite` directory.

In [16]:
%cd models/research/seq_flow_lite

/content/models/research/seq_flow_lite/models/research/seq_flow_lite


* Import the Tensorflow and Tensorflow Dataset libraries.

In [17]:
import tensorflow as tf
import tensorflow_datasets as tfds

### The data: GoEmotions
In this tutorial, we use the [GoEmotions dataset from TFDS](https://www.tensorflow.org/datasets/catalog/goemotions).

GoEmotions is a corpus of comments extracted from Reddit, with human annotations to 27 emotion categories or Neutral.

*   Number of labels: 27.
*   Size of training dataset: 43,410.
*   Size of evaluation dataset: 5,427.
*   Maximum sequence length in training and evaluation datasets: 30.

The emotion categories are admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realization, relief, remorse, sadness, surprise.


Load the data from TFDS:

In [18]:
ds = tfds.load('goemotions', split='train')

[1mDownloading and preparing dataset 4.19 MiB (download: 4.19 MiB, generated: 32.25 MiB, total: 36.44 MiB) to ~/tensorflow_datasets/goemotions/0.1.0...[0m


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Generating splits...:   0%|          | 0/3 [00:00<?, ? splits/s]

Generating train examples...:   0%|          | 0/43410 [00:00<?, ? examples/s]

Shuffling ~/tensorflow_datasets/goemotions/0.1.0.incompleteNA8KDZ/goemotions-train.tfrecord*...:   0%|        …

Generating validation examples...:   0%|          | 0/5426 [00:00<?, ? examples/s]

Shuffling ~/tensorflow_datasets/goemotions/0.1.0.incompleteNA8KDZ/goemotions-validation.tfrecord*...:   0%|   …

Generating test examples...:   0%|          | 0/5427 [00:00<?, ? examples/s]

Shuffling ~/tensorflow_datasets/goemotions/0.1.0.incompleteNA8KDZ/goemotions-test.tfrecord*...:   0%|         …

[1mDataset goemotions downloaded and prepared to ~/tensorflow_datasets/goemotions/0.1.0. Subsequent calls will reuse this data.[0m


Print 5 sample data elements from the dataset:

In [19]:
for element in ds.take(5):
  print(element)

{'admiration': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'amusement': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'anger': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'annoyance': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'approval': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'caring': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'comment_text': <tf.Tensor: shape=(), dtype=string, numpy=b"It's just wholesome content, from questionable sources">, 'confusion': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'curiosity': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'desire': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'disappointment': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'disapproval': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'disgust': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'embarrassment': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'excitement': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'fear': <tf.Tensor: shape=(

### The model: PRADO

We train an Emotion Prediction model, based on the [PRADO architecture](https://github.com/tensorflow/models/blob/master/research/seq_flow_lite/models/prado.py) from the [Sequence Projection Models package](https://github.com/tensorflow/models/tree/master/research/seq_flow_lite).

PRADO projects input sequences to fixed sized features. The idea behind this approach is to build embedding-free models that minimize the model size. Instead of using an embedding table to lookup embeddings, sequence projection models compute them on the fly, resulting in space-efficient models.

In this section, we prepare the PRADO model for training.

This GoEmotions dataset is not set up so that it can be directly fed into the PRADO model, so below, we also handle the necessary preprocessing by providing a dataset builder.

Prepare the model configuration:
* Enumerate the labels expected to be found in the GoEmotions dataset.
* Prepare the `MODEL_CONFIG` dictionary which includes training parameters for the model. See sample configs for the PRADO model [here](https://github.com/tensorflow/models/tree/master/research/seq_flow_lite/configs).

In [20]:
LABELS = [
    'admiration',
    'amusement',
    'anger',
    'annoyance',
    'approval',
    'caring',
    'confusion',
    'curiosity',
    'desire',
    'disappointment',
    'disapproval',
    'disgust',
    'embarrassment',
    'excitement',
    'fear',
    'gratitude',
    'grief',
    'joy',
    'love',
    'nervousness',
    'optimism',
    'pride',
    'realization',
    'relief',
    'remorse',
    'sadness',
    'surprise',
    'neutral',
]

# Model training parameters.
CONFIG = {
    'name': 'models.prado',
    'batch_size': 1024,
    'train_steps': 10000,
    'learning_rate': 0.0006,
    'learning_rate_decay_steps': 340,
    'learning_rate_decay_rate': 0.7,
}

# Limits the amount of logging output produced by the training run, in order to
# avoid browser slowdowns.
CONFIG['save_checkpoints_steps'] = int(CONFIG['train_steps'] / 10)

MODEL_CONFIG = {
    'labels': LABELS,
    'multilabel': True,
    'quantize': False,
    'max_seq_len': 128,
    'max_seq_len_inference': 128,
    'exclude_nonalphaspace_unicodes': False,
    'split_on_space': True,
    'embedding_regularizer_scale': 0.035,
    'embedding_size': 64,
    'bigram_channels': 64,
    'trigram_channels': 64,
    'feature_size': 512,
    'network_regularizer_scale': 0.0001,
    'keep_prob': 0.5,
    'word_novelty_bits': 0,
    'doc_size_levels': 0,
    'add_bos_tag': False,
    'add_eos_tag': False,
    'pre_logits_fc_layers': [],
    'text_distortion_probability': 0.0,
}

CONFIG['model_config'] = MODEL_CONFIG

Write a function that builds the datasets for the model.  It will load the data, handle batching, and generate projections for the input text.

In [21]:
from layers import base_layers
from layers import projection_layers

def build_dataset(mode, inspect=False):
  if mode == base_layers.TRAIN:
    split = 'train'
    count = None
  elif mode == base_layers.EVAL:
    split = 'test'
    count = 1
  else:
    raise ValueError('mode={}, must be TRAIN or EVAL'.format(mode))

  batch_size = CONFIG['batch_size']
  if inspect:
    batch_size = 1

  # Convert examples from their dataset format into the model format.
  def process_input(features):
    # Generate the projection for each comment_text input.  The final tensor 
    # will have the shape [batch_size, number of tokens, feature size].
    # Additionally, we generate a tensor containing the number of tokens for
    # each comment_text (seq_length).  This is needed because the projection
    # tensor is a full tensor, and we are not using EOS tokens.
    text = features['comment_text']
    text = tf.reshape(text, [batch_size])
    projection_layer = projection_layers.ProjectionLayer(MODEL_CONFIG, mode)
    projection, seq_length = projection_layer(text)

    # Convert the labels into an indicator tensor, using the LABELS indices.
    label = tf.stack([features[label] for label in LABELS], axis=-1)
    label = tf.cast(label, tf.float32)
    label = tf.reshape(label, [batch_size, len(LABELS)])

    model_features = ({'projection': projection, 'sequence_length': seq_length}, label)

    if inspect:
      model_features = (model_features[0], model_features[1], features)

    return model_features

  ds = tfds.load('goemotions', split=split)
  ds = ds.repeat(count=count)
  ds = ds.shuffle(buffer_size=batch_size * 2)
  ds = ds.batch(batch_size, drop_remainder=True)
  ds = ds.map(process_input,
              num_parallel_calls=tf.data.experimental.AUTOTUNE,
              deterministic=False)
  ds = ds.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
  return ds

train_dataset = build_dataset(base_layers.TRAIN)
test_dataset = build_dataset(base_layers.EVAL)
inspect_dataset = build_dataset(base_layers.TRAIN, inspect=True)

Print a batch of examples in model format.  This will consist of:
* the projection tensors (projection and seq_length)
* the label tensor (second tuple value)

The projection tensor is a **[batch size, max_seq_length, feature_size]** floating point tensor.  The **[b, i]** vector is a feature vector of the **i**th token of the **b**th comment_text.  The rest of the tensor is zero-padded, and the
seq_length tensor indicates the number of features vectors for each comment_text.

The label tensor is an indicator tensor of the set of true labels for the example.

In [22]:
example = next(iter(train_dataset))
print("inputs = {}".format(example[0]))
print("labels = {}".format(example[1]))

inputs = {'projection': <tf.Tensor: shape=(1024, 128, 512), dtype=float32, numpy=
array([[[ 0.,  1.,  0., ..., -1.,  0.,  0.],
        [ 1.,  1.,  0., ...,  0., -1.,  1.],
        [ 1.,  1.,  0., ...,  1.,  0.,  0.],
        ...,
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.]],

       [[ 1.,  0.,  0., ..., -1.,  1.,  0.],
        [-1., -1., -1., ...,  1., -1.,  0.],
        [ 0., -1., -1., ...,  0.,  0.,  1.],
        ...,
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.]],

       [[ 0.,  1.,  0., ...,  0.,  0.,  0.],
        [ 0., -1.,  0., ..., -1.,  0.,  1.],
        [-1.,  1.,  0., ...,  0.,  1.,  0.],
        ...,
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.]],

       ...,

       [[-1., -1.,  0., ...,  0., -1., -1.],
     

In this version of the dataset, the original example has been added as the third element of the tuple.

In [23]:
example = next(iter(inspect_dataset))
print("inputs = {}".format(example[0]))
print("labels = {}".format(example[1]))
print("original example = {}".format(example[2]))

inputs = {'projection': <tf.Tensor: shape=(1, 128, 512), dtype=float32, numpy=
array([[[ 1., -1.,  1., ...,  1.,  0., -1.],
        [ 0.,  1., -1., ..., -1., -1.,  0.],
        [-1.,  1.,  0., ..., -1., -1.,  0.],
        ...,
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.]]], dtype=float32)>, 'sequence_length': <tf.Tensor: shape=(1,), dtype=float32, numpy=array([4.], dtype=float32)>}
labels = [[1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0.]]
original example = {'admiration': <tf.Tensor: shape=(1,), dtype=bool, numpy=array([ True])>, 'amusement': <tf.Tensor: shape=(1,), dtype=bool, numpy=array([False])>, 'anger': <tf.Tensor: shape=(1,), dtype=bool, numpy=array([False])>, 'annoyance': <tf.Tensor: shape=(1,), dtype=bool, numpy=array([False])>, 'approval': <tf.Tensor: shape=(1,), dtype=bool, numpy=array([False])>, 'caring': <tf.Tensor: shape=(1,), dtype=bool, numpy

### Train and Evaluate

First we define a function to build the model.  We vary the model inputs depending on task.  For training and evaluation, we'll take the projection and sequence length as inputs.  Otherwise, we'll take strings as inputs.

In [24]:
from models import prado

def build_model(mode):
  # First we define our inputs.
  inputs = []
  if mode == base_layers.TRAIN or mode == base_layers.EVAL:
    # For TRAIN and EVAL, we'll be getting dataset examples,
    # so we'll get projections and sequence_lengths.
    projection = tf.keras.Input(
        shape=(MODEL_CONFIG['max_seq_len'], MODEL_CONFIG['feature_size']),
        name='projection',
        dtype='float32')

    sequence_length = tf.keras.Input(
        shape=(), name='sequence_length', dtype='float32')
    inputs = [projection, sequence_length]
  else:
    # Otherwise, we get string inputs which we need to project.
    input = tf.keras.Input(shape=(), name='input', dtype='string')
    projection_layer = projection_layers.ProjectionLayer(MODEL_CONFIG, mode)
    projection, sequence_length = projection_layer(input)
    inputs = [input]

  # Next we add the model layer.
  model_layer = prado.Encoder(MODEL_CONFIG, mode)
  logits = model_layer(projection, sequence_length)

  # Finally we add an activation layer.
  if MODEL_CONFIG['multilabel']:
    activation = tf.keras.layers.Activation('sigmoid', name='predictions')
  else:
    activation = tf.keras.layers.Activation('softmax', name='predictions')
  predictions = activation(logits)

  model = tf.keras.Model(
      inputs=inputs,
      outputs=[predictions])
  
  return model


Train the model:

In [25]:
# Remove any previous training data.
!rm -rf model

model = build_model(base_layers.TRAIN)

# Create the optimizer.
learning_rate = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate=CONFIG['learning_rate'],
    decay_rate=CONFIG['learning_rate_decay_rate'],
    decay_steps=CONFIG['learning_rate_decay_steps'],
    staircase=True)

optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)

# Define the loss function.
loss = tf.keras.losses.BinaryCrossentropy(from_logits=False)

model.compile(optimizer=optimizer, loss=loss)

epochs = int(CONFIG['train_steps'] / CONFIG['save_checkpoints_steps'])
model.fit(
    x=train_dataset,
    epochs=epochs,
    validation_data=test_dataset,
    steps_per_epoch=CONFIG['save_checkpoints_steps'])

model.save_weights('model/model_checkpoint')

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Load a training checkpoint and evaluate:

In [26]:
model = build_model(base_layers.EVAL)

# Define metrics over each category.
metrics = []
for i, label in enumerate(LABELS):
  metric = tf.keras.metrics.Precision(
      thresholds=[0.5],
      class_id=i,
      name='precision@0.5/{}'.format(label))
  metrics.append(metric)
  metric = tf.keras.metrics.Recall(
      thresholds=[0.5],
      class_id=i,
      name='recall@0.5/{}'.format(label))
  metrics.append(metric)

# Define metrics over the entire task.
metric = tf.keras.metrics.Precision(thresholds=[0.5], name='precision@0.5/all')
metrics.append(metric)
metric = tf.keras.metrics.Recall(thresholds=[0.5], name='recall@0.5/all')
metrics.append(metric)

model.compile(metrics=metrics)
model.load_weights('model/model_checkpoint')
result = model.evaluate(x=test_dataset, return_dict=True)



Print evaluation metrics for the model, as well as per emotion label:

In [27]:
for label in LABELS:
  precision_key = 'precision@0.5/{}'.format(label)
  recall_key = 'recall@0.5/{}'.format(label)
  if precision_key in result and recall_key in result:
    print('{}: (precision@0.5: {}, recall@0.5: {})'.format(
        label, result[precision_key], result[recall_key]))
    
precision_key = 'precision@0.5/all'
recall_key = 'recall@0.5/all'
if precision_key in result and recall_key in result:
  print('all: (precision@0.5: {}, recall@0.5: {})'.format(
      result[precision_key], result[recall_key]))

admiration: (precision@0.5: 0.5968109369277954, recall@0.5: 0.5402061939239502)
amusement: (precision@0.5: 0.7255814075469971, recall@0.5: 0.6527196764945984)
anger: (precision@0.5: 0.43918919563293457, recall@0.5: 0.3513513505458832)
annoyance: (precision@0.5: 0.23076923191547394, recall@0.5: 0.0397351011633873)
approval: (precision@0.5: 1.0, recall@0.5: 0.0030120480805635452)
caring: (precision@0.5: 0.0, recall@0.5: 0.0)
confusion: (precision@0.5: 0.0, recall@0.5: 0.0)
curiosity: (precision@0.5: 0.43478259444236755, recall@0.5: 0.11278195679187775)
desire: (precision@0.5: 0.5128205418586731, recall@0.5: 0.25641027092933655)
disappointment: (precision@0.5: 0.0, recall@0.5: 0.0)
disapproval: (precision@0.5: 0.75, recall@0.5: 0.035999998450279236)
disgust: (precision@0.5: 0.0, recall@0.5: 0.0)
embarrassment: (precision@0.5: 0.0, recall@0.5: 0.0)
excitement: (precision@0.5: 0.2142857164144516, recall@0.5: 0.030612245202064514)
fear: (precision@0.5: 0.0, recall@0.5: 0.0)
gratitude: (preci

## Suggest Emojis using an Emotion Prediction model

In this section, we apply the Emotion Prediction model trained above to suggest emojis relevant to input text.

Refer to our [GoEmotions Model Card](https://github.com/google-research/google-research/blob/master/goemotions/goemotions_model_card.pdf) for additional uses of the model and considerations and limitations for using the GoEmotions data.

Map each emotion label to a relevant emoji:
* Emotions are subtle and multi-faceted. In many cases, no one emoji can truely capture the full complexity of the human experience behind each emotion. 
* For the purpose of this exercise, we will select an emoji that captures at least one facet that is conveyed by an emotion label.

In [28]:
EMOJI_MAP = {
    'admiration': 'admiration 👏',
    'amusement': 'amusement 😂',
    'anger': 'anger 😡',
    'annoyance': 'annoyance 😒',
    'approval': 'approval 👍',
    'caring': 'caring 🤗',
    'confusion': 'confusion 😕',
    'curiosity': 'curiosity 🤔',
    'desire': 'desire 😍',
    'disappointment': 'disappointment 😞',
    'disapproval': 'disapproval 👎',
    'disgust': 'disgust 🤮',
    'embarrassment': 'embarrassment 😳',
    'excitement': 'excitement 🤩',
    'fear': 'fear 😨',
    'gratitude': 'gratitude 🙏',
    'grief': 'grief 😢',
    'joy': 'joy 😃',
    'love': 'love ❤️',
    'nervousness': 'nervousness 😬',
    'optimism': 'optimism 🤞',
    'pride': 'pride 😌',
    'realization': 'realization 💡',
    'relief': 'relief 😅',
    'remorse': 'remorse',
    'sadness': 'sadness 😞',
    'surprise': 'surprise 😲',
    'neutral': 'neutral',
}

Select sample inputs:

In [29]:
# PREDICT_TEXT = [
#   b'Good for you!',
#   b'Happy birthday!',
#   b'I love you.',
# ]

In [38]:
from google.colab import files
data = files.upload()

Saving emotions.csv to emotions.csv


In [45]:
import pandas as pd
import io
 
df = pd.read_csv(io.BytesIO(data['emotions.csv']))
df2 = df.loc[(df['user'] != 'test')]
df3 = df['string']
print(df3)

0      Useful for them to hear about the therapy proc...
1      Before you get into this management lock becau...
2      I first picked but the next two picks that rea...
3      Okay, so we talked about April 2016, right and...
4      I'm good a lot of talent --C:['We have a lot o...
                             ...                        
174    Leaving the church a and working to help these...
175    You've also put down in summary in summary not...
176    That compares to only 42 members of the Donner...
177    We're leaving this, you know, this Palace way ...
178    I need to take up the governor and I'm now a t...
Name: string, Length: 179, dtype: object


In [41]:
inputlist = []
for string in df3:
    inputlist.append(string.partition('--C:[')[0])
print(inputlist)

["Useful for them to hear about the therapy process and then there's a part of it ", "Before you get into this management lock because it's it's great fun, but challenging the genuine reasons need to be there because I think what a lot of people don't realize is that want to step into management ", 'I first picked but the next two picks that really or maybe just the third pick I guess because nothing wrong with the second trick ', 'Okay, so we talked about April 2016, right and then start blogging about common consent ', "I'm good a lot of talent ", "Happily of haffley's probably that the wrong word ", "I've got like a list of them in front of me that I've written out and I'm trying to decide which five I think of the most common in that ", "You've also put down in summary in summary notes that ", 'It blows my mind ', 'A really did everything you needed to so your way in the clear ', "Listen, he's not always born like how he was when you came with who's super quiet, right ", 'Just let 

Run inference for the selected examples:

In [65]:
import numpy as np

model = build_model(base_layers.PREDICT)
model.load_weights('model/model_checkpoint')

tags = []
scores = []

for text in inputlist:
  results = model.predict(x=[text])
  print('')
  print('{}:'.format(text))
  labels = np.flip(np.argsort(results[0]))
  for x in range(3):
    label = LABELS[labels[x]]
    label = EMOJI_MAP[label] if EMOJI_MAP[label] else label
    print('{}: {}'.format(label, results[0][labels[x]]))
    tags.append(label)
    scores.append(results[0][labels[x]])


Useful for them to hear about the therapy process and then there's a part of it :
neutral: 0.5566583871841431
curiosity 🤔: 0.3684433400630951
approval 👍: 0.3668678402900696

Before you get into this management lock because it's it's great fun, but challenging the genuine reasons need to be there because I think what a lot of people don't realize is that want to step into management :
admiration 👏: 0.5674764513969421
annoyance 😒: 0.3789424002170563
anger 😡: 0.37685948610305786

I first picked but the next two picks that really or maybe just the third pick I guess because nothing wrong with the second trick :
neutral: 0.44603627920150757
approval 👍: 0.3805331885814667
confusion 😕: 0.3744082450866699

Okay, so we talked about April 2016, right and then start blogging about common consent :
neutral: 0.5551586151123047
annoyance 😒: 0.39184656739234924
approval 👍: 0.369082510471344

I'm good a lot of talent :
admiration 👏: 0.6463674306869507
sadness 😞: 0.3782077729701996
approval 👍: 0.37397

In [69]:
tag1 = []
tag2 = []
tag3 = []
score1 = []
score2 = []
score3 = []

for tag in tags[0::3]:
  tag1.append(tag)

for tag in tags[1::3]:
  tag2.append(tag)

for tag in tags[2::3]:
  tag3.append(tag)

for tag in scores[0::3]:
  score1.append(tag)

for tag in scores[1::3]:
  score2.append(tag)

for tag in scores[2::3]:
  score3.append(tag)

In [78]:
finaldf = pd.DataFrame(list(zip(inputlist, df['id'], tag1, score1, tag2, score2, tag3, score3)),
               columns =['string', 'stringID', 'tag1', 'score1', 'tag2', 'score2', 'tag3', 'score3'])

finaldf

Unnamed: 0,string,stringID,tag1,score1,tag2,score2,tag3,score3
0,Useful for them to hear about the therapy proc...,70590,neutral,0.556658,curiosity 🤔,0.368443,approval 👍,0.366868
1,Before you get into this management lock becau...,113791,admiration 👏,0.567476,annoyance 😒,0.378942,anger 😡,0.376859
2,I first picked but the next two picks that rea...,11843,neutral,0.446036,approval 👍,0.380533,confusion 😕,0.374408
3,"Okay, so we talked about April 2016, right and...",55688,neutral,0.555159,annoyance 😒,0.391847,approval 👍,0.369083
4,I'm good a lot of talent,116973,admiration 👏,0.646367,sadness 😞,0.378208,approval 👍,0.373977
...,...,...,...,...,...,...,...,...
174,Leaving the church a and working to help these...,39658,neutral,0.432387,annoyance 😒,0.376229,anger 😡,0.370550
175,You've also put down in summary in summary not...,16470,neutral,0.619111,sadness 😞,0.368254,pride 😌,0.362264
176,That compares to only 42 members of the Donner...,52979,neutral,0.593373,annoyance 😒,0.373522,curiosity 🤔,0.359699
177,"We're leaving this, you know, this Palace way ...",17587,neutral,0.527474,annoyance 😒,0.388831,anger 😡,0.383078


In [81]:
finaldf.to_csv('model_tagged_strings.csv')
files.download('model_tagged_strings.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>