# 3. Mood Detection of Tweets - Word Embeddings and LSTM

Now that we have our cleaned dataset, we will now continue to train a neural network to classify a tweet's mood.

## Testing for CUDA-enabled TF with GPU Support

**What this means:**
Deep Learning requires a lot of matrix calculations and your CPU is not really meant for that sort of task. However, graphics cards (the hardware that your computer uses to make games run well) are able to perform these sort of tasks very well, so we need to check if the Jupyter Notebook has access to your (Nvidia) GPU, in order to train the model much more efficiently.

In [1]:
import tensorflow as tf
import tensorflow_hub as hub
import os

os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="2, 3"

config=tf.ConfigProto(device_count={'GPU':  2}, intra_op_parallelism_threads=1, inter_op_parallelism_threads=1, allow_soft_placement=True)
config.gpu_options.allow_growth=True

sess=tf.Session(graph=tf.get_default_graph(),config=config)

W0507 14:45:44.569529 140372727723776 __init__.py:56] Some hub symbols are not available because TensorFlow version is less than 1.14


In [2]:
if tf.test.gpu_device_name():
    print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))
else:
    print("Please install GPU version of TF")

Default GPU Device: /device:GPU:0


In [3]:
print(tf.test.is_built_with_cuda())
print(tf.keras.__version__)
print(tf.__version__)

True
2.1.6-tf
1.12.0


## Processing our mood tweets dataset

Dataset can also be accessed at: [future Kaggle link for my dataset] 


### Reading the CSV dataset with pandas

In [4]:
import pandas as pd

df = pd.read_csv("datasets/mood_tweets.csv")
df.drop(df.columns[0], inplace=True, axis=1)
df["tweet"] = df["tweet"].astype(str)
df.head()

Unnamed: 0,class,tweet
0,suicidal,"""He was lost & scared"" says a Newport woman ab..."
1,suicidal,@TheDumbMedico Ameen. Or ye suicide wali bat a...
2,suicidal,You showed the leak in @HouseofCommons today @...
3,suicidal,The number in my bio is for a Suicide hotline ...
4,suicidal,cw: suicidal ideation it's unsurprisingly hard...


In [5]:
df.describe()

Unnamed: 0,class,tweet
count,25758,25758
unique,6,25758
top,cheerful,@Kkeke_99 truth bc i didnt know about them unt...
freq,5885,1


Now let's analyse how the dataset is lain out and roughly how the data looks like

In [6]:
pd.set_option('display.max_colwidth', -1) # option to be set so that the tweet's texts won't be truncated
df.sort_values("class", inplace=True)
display(df.head())
display(df.tail())
df["class"].unique()

Unnamed: 0,class,tweet
18115,cheerful,You know what's awesome? After a month of doing yoga I'm getting most of my range of motion and flexibility back in the knee that I blew out playing rugby in college & university. I honestly wish I had started this years ago.
4418,cheerful,Just rolled over 10k awesome miles in @corbett and my @Tesla 3. @elonmusk one suggestion: can I plz haz headlights on by default? Every time I drive during the day they start off and I think they improve visibility for other drivers as I pass them.
4417,cheerful,Glad to see that Craig continues to keep the orientalism industry alive
4416,cheerful,@_Kazma8 No problem dude. Glad she was found
4415,cheerful,Sign up for this awesome contest!


Unnamed: 0,class,tweet
14794,suicidal,But why does my client commit suicide if I press the ranked ladder tab?
14793,suicidal,Ngl I don’t think I can do this education thing anymore. I’m moving to Naij this summer to sell pure water. I can’t come and kill myself abeg
14792,suicidal,The Island's suicide rate likely to rise following change in law.Read the next part in a special investigation into suicide on the Isle of Wight by @megbaynesLDR .#isleofwight
8106,suicidal,"@liangweihan4 my dad fucked up and married a white woman and had me. How do I keep the ""species"" strong? Do I date white women and marry out the the Asian or the other way around? Or should I just kill myself?"
0,suicidal,"""He was lost & scared"" says a Newport woman about a 14y/o boy on the street today. Teen tells police he's Timmothy Pitzen. Pitzen disappeared in 2011 after being taken by his mother. In her suicide note she wrote he was safe but would never be found. @Local12 #TimmothyPitzen"


array(['cheerful', 'depressed', 'happy', 'overjoyed', 'sad', 'suicidal'],
      dtype=object)

We will now change the classes into a numerical representation to make it easier for deep learning.

In [7]:
classes_index = {
    0 : "suicidal",
    1 : "depressed",
    2 : "sad",
    3 : "happy",
    4 : "cheerful",
    5 : "overjoyed",
}

for val, class_name in classes_index.items():
    df.loc[(df["class"] == class_name), "class"] = val

display(df.head())
display(df.tail())

Unnamed: 0,class,tweet
18115,4,You know what's awesome? After a month of doing yoga I'm getting most of my range of motion and flexibility back in the knee that I blew out playing rugby in college & university. I honestly wish I had started this years ago.
4418,4,Just rolled over 10k awesome miles in @corbett and my @Tesla 3. @elonmusk one suggestion: can I plz haz headlights on by default? Every time I drive during the day they start off and I think they improve visibility for other drivers as I pass them.
4417,4,Glad to see that Craig continues to keep the orientalism industry alive
4416,4,@_Kazma8 No problem dude. Glad she was found
4415,4,Sign up for this awesome contest!


Unnamed: 0,class,tweet
14794,0,But why does my client commit suicide if I press the ranked ladder tab?
14793,0,Ngl I don’t think I can do this education thing anymore. I’m moving to Naij this summer to sell pure water. I can’t come and kill myself abeg
14792,0,The Island's suicide rate likely to rise following change in law.Read the next part in a special investigation into suicide on the Isle of Wight by @megbaynesLDR .#isleofwight
8106,0,"@liangweihan4 my dad fucked up and married a white woman and had me. How do I keep the ""species"" strong? Do I date white women and marry out the the Asian or the other way around? Or should I just kill myself?"
0,0,"""He was lost & scared"" says a Newport woman about a 14y/o boy on the street today. Teen tells police he's Timmothy Pitzen. Pitzen disappeared in 2011 after being taken by his mother. In her suicide note she wrote he was safe but would never be found. @Local12 #TimmothyPitzen"


In [8]:
import bert
from bert import run_classifier
from bert import optimization
from bert import tokenization

Below, we'll set an output directory location to store our model output and checkpoints. This can be a local directory, in which case you'd set OUTPUT_DIR to the name of the directory you'd like to create. If you're running this code in Google's hosted Colab, the directory won't persist after the Colab session ends.

Alternatively, if you're a GCP user, you can store output in a GCP bucket. To do that, set a directory name in OUTPUT_DIR and the name of the GCP bucket in the BUCKET field.

Set DO_DELETE to rewrite the OUTPUT_DIR if it exists. Otherwise, Tensorflow will load existing model checkpoints from that directory (if they exist).

In [9]:
# Set the output directory for saving model file
# Optionally, set a GCP bucket location

OUTPUT_DIR = 'bert_models'

# Whether or not to clear/delete the directory and create a new one
DO_DELETE = False

if not os.path.exists(OUTPUT_DIR):
    os.mkdir(OUTPUT_DIR)
print('***** Model output directory: {} *****'.format(OUTPUT_DIR))


***** Model output directory: bert_models *****


In [10]:
DATA_COLUMN = 'tweet'
LABEL_COLUMN = 'class'
# label_list is the list of labels, i.e. True, False or 0, 1 or 'dog', 'cat'
label_list = list(classes_index.keys())
label_list

[0, 1, 2, 3, 4, 5]

# Data Preprocessing
We'll need to transform our data into a format BERT understands. This involves two steps. First, we create  `InputExample`'s using the constructor provided in the BERT library.

- `text_a` is the text we want to classify, which in this case, is the `Request` field in our Dataframe. 
- `text_b` is used if we're training a model to understand the relationship between sentences (i.e. is `text_b` a translation of `text_a`? Is `text_b` an answer to the question asked by `text_a`?). This doesn't apply to our task, so we can leave `text_b` blank.
- `label` is the label for our example, i.e. True, False

In [11]:
# Use the InputExample class from BERT's run_classifier code to create examples from the data
df_InputExamples = df.apply(lambda x: bert.run_classifier.InputExample(guid=None, # Globally unique ID for bookkeeping, unused in this example
                                                                   text_a = x[DATA_COLUMN], 
                                                                   text_b = None, 
                                                                   label = x[LABEL_COLUMN]), axis = 1)

In [12]:
df_InputExamples.head()

18115    <bert.run_classifier.InputExample object at 0x7faa88168d30>
4418     <bert.run_classifier.InputExample object at 0x7faa88168f98>
4417     <bert.run_classifier.InputExample object at 0x7faa88168fd0>
4416     <bert.run_classifier.InputExample object at 0x7faa8817d048>
4415     <bert.run_classifier.InputExample object at 0x7faa8817d080>
dtype: object

Next, we need to preprocess our data so that it matches the data BERT was trained on. For this, we'll need to do a couple of things (but don't worry--this is also included in the Python library):


1. Lowercase our text (if we're using a BERT lowercase model)
2. Tokenize it (i.e. "sally says hi" -> ["sally", "says", "hi"])
3. Break words into WordPieces (i.e. "calling" -> ["call", "##ing"])
4. Map our words to indexes using a vocab file that BERT provides
5. Add special "CLS" and "SEP" tokens (see the [readme](https://github.com/google-research/bert))
6. Append "index" and "segment" tokens to each input (see the [BERT paper](https://arxiv.org/pdf/1810.04805.pdf))

Happily, we don't have to worry about most of these details.

To start, we'll need to load a vocabulary file and lowercasing information directly from the BERT tf hub module:



In [13]:
# This is a path to an uncased (all lowercase) version of BERT
BERT_MODEL_HUB = "https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1"

def create_tokenizer_from_hub_module():
  """Get the vocab file and casing info from the Hub module."""
  with tf.Graph().as_default():
    bert_module = hub.Module(BERT_MODEL_HUB)
    tokenization_info = bert_module(signature="tokenization_info", as_dict=True)
    with tf.Session() as sess:
      vocab_file, do_lower_case = sess.run([tokenization_info["vocab_file"],
                                            tokenization_info["do_lower_case"]])
      
  return bert.tokenization.FullTokenizer(
      vocab_file=vocab_file, do_lower_case=do_lower_case)

tokenizer = create_tokenizer_from_hub_module()

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


I0507 14:45:51.009019 140372727723776 tf_logging.py:115] Saver not created because there are no variables in the graph to restore


Great--we just learned that the BERT model we're using expects lowercase data (that's what stored in tokenization_info["do_lower_case"]) and we also loaded BERT's vocab file. We also created a tokenizer, which breaks words into word pieces:

In [14]:
tokenizer.tokenize("This here's an example of using the BERT tokenizer")

['this',
 'here',
 "'",
 's',
 'an',
 'example',
 'of',
 'using',
 'the',
 'bert',
 'token',
 '##izer']

Using our tokenizer, we'll call `run_classifier.convert_examples_to_features` on our InputExamples to convert them into features BERT understands.

In [15]:
# We'll set sequences to be at most 128 tokens long.
MAX_SEQ_LENGTH = 128
# Convert our train and test features to InputFeatures that BERT understands.
df_features = bert.run_classifier.convert_examples_to_features(df_InputExamples, label_list, MAX_SEQ_LENGTH, tokenizer)

INFO:tensorflow:Writing example 0 of 25758


I0507 14:45:51.577533 140372727723776 tf_logging.py:115] Writing example 0 of 25758


INFO:tensorflow:*** Example ***


I0507 14:45:51.580434 140372727723776 tf_logging.py:115] *** Example ***


INFO:tensorflow:guid: None


I0507 14:45:51.581424 140372727723776 tf_logging.py:115] guid: None


INFO:tensorflow:tokens: [CLS] you know what ' s awesome ? after a month of doing yoga i ' m getting most of my range of motion and flexibility back in the knee that i blew out playing rugby in college & university . i honestly wish i had started this years ago . [SEP]


I0507 14:45:51.582406 140372727723776 tf_logging.py:115] tokens: [CLS] you know what ' s awesome ? after a month of doing yoga i ' m getting most of my range of motion and flexibility back in the knee that i blew out playing rugby in college & university . i honestly wish i had started this years ago . [SEP]


INFO:tensorflow:input_ids: 101 2017 2113 2054 1005 1055 12476 1029 2044 1037 3204 1997 2725 13272 1045 1005 1049 2893 2087 1997 2026 2846 1997 4367 1998 16991 2067 1999 1996 6181 2008 1045 8682 2041 2652 4043 1999 2267 1004 2118 1012 1045 9826 4299 1045 2018 2318 2023 2086 3283 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 14:45:51.583408 140372727723776 tf_logging.py:115] input_ids: 101 2017 2113 2054 1005 1055 12476 1029 2044 1037 3204 1997 2725 13272 1045 1005 1049 2893 2087 1997 2026 2846 1997 4367 1998 16991 2067 1999 1996 6181 2008 1045 8682 2041 2652 4043 1999 2267 1004 2118 1012 1045 9826 4299 1045 2018 2318 2023 2086 3283 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 14:45:51.584352 140372727723776 tf_logging.py:115] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 14:45:51.585269 140372727723776 tf_logging.py:115] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 4 (id = 4)


I0507 14:45:51.586126 140372727723776 tf_logging.py:115] label: 4 (id = 4)


INFO:tensorflow:*** Example ***


I0507 14:45:51.588286 140372727723776 tf_logging.py:115] *** Example ***


INFO:tensorflow:guid: None


I0507 14:45:51.589180 140372727723776 tf_logging.py:115] guid: None


INFO:tensorflow:tokens: [CLS] just rolled over 10 ##k awesome miles in @ corbett and my @ tesla 3 . @ el ##on ##mus ##k one suggestion : can i pl ##z ha ##z headlights on by default ? every time i drive during the day they start off and i think they improve visibility for other drivers as i pass them . [SEP]


I0507 14:45:51.590080 140372727723776 tf_logging.py:115] tokens: [CLS] just rolled over 10 ##k awesome miles in @ corbett and my @ tesla 3 . @ el ##on ##mus ##k one suggestion : can i pl ##z ha ##z headlights on by default ? every time i drive during the day they start off and i think they improve visibility for other drivers as i pass them . [SEP]


INFO:tensorflow:input_ids: 101 2074 4565 2058 2184 2243 12476 2661 1999 1030 24119 1998 2026 1030 26060 1017 1012 1030 3449 2239 7606 2243 2028 10293 1024 2064 1045 20228 2480 5292 2480 18167 2006 2011 12398 1029 2296 2051 1045 3298 2076 1996 2154 2027 2707 2125 1998 1045 2228 2027 5335 16476 2005 2060 6853 2004 1045 3413 2068 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 14:45:51.590997 140372727723776 tf_logging.py:115] input_ids: 101 2074 4565 2058 2184 2243 12476 2661 1999 1030 24119 1998 2026 1030 26060 1017 1012 1030 3449 2239 7606 2243 2028 10293 1024 2064 1045 20228 2480 5292 2480 18167 2006 2011 12398 1029 2296 2051 1045 3298 2076 1996 2154 2027 2707 2125 1998 1045 2228 2027 5335 16476 2005 2060 6853 2004 1045 3413 2068 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 14:45:51.591941 140372727723776 tf_logging.py:115] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 14:45:51.592838 140372727723776 tf_logging.py:115] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 4 (id = 4)


I0507 14:45:51.593694 140372727723776 tf_logging.py:115] label: 4 (id = 4)


INFO:tensorflow:*** Example ***


I0507 14:45:51.594990 140372727723776 tf_logging.py:115] *** Example ***


INFO:tensorflow:guid: None


I0507 14:45:51.595895 140372727723776 tf_logging.py:115] guid: None


INFO:tensorflow:tokens: [CLS] glad to see that craig continues to keep the oriental ##ism industry alive [SEP]


I0507 14:45:51.596774 140372727723776 tf_logging.py:115] tokens: [CLS] glad to see that craig continues to keep the oriental ##ism industry alive [SEP]


INFO:tensorflow:input_ids: 101 5580 2000 2156 2008 7010 4247 2000 2562 1996 11481 2964 3068 4142 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 14:45:51.597675 140372727723776 tf_logging.py:115] input_ids: 101 5580 2000 2156 2008 7010 4247 2000 2562 1996 11481 2964 3068 4142 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 14:45:51.598592 140372727723776 tf_logging.py:115] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 14:45:51.599516 140372727723776 tf_logging.py:115] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 4 (id = 4)


I0507 14:45:51.600365 140372727723776 tf_logging.py:115] label: 4 (id = 4)


INFO:tensorflow:*** Example ***


I0507 14:45:51.601540 140372727723776 tf_logging.py:115] *** Example ***


INFO:tensorflow:guid: None


I0507 14:45:51.602408 140372727723776 tf_logging.py:115] guid: None


INFO:tensorflow:tokens: [CLS] @ _ ka ##z ##ma ##8 no problem dude . glad she was found [SEP]


I0507 14:45:51.603272 140372727723776 tf_logging.py:115] tokens: [CLS] @ _ ka ##z ##ma ##8 no problem dude . glad she was found [SEP]


INFO:tensorflow:input_ids: 101 1030 1035 10556 2480 2863 2620 2053 3291 12043 1012 5580 2016 2001 2179 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 14:45:51.604159 140372727723776 tf_logging.py:115] input_ids: 101 1030 1035 10556 2480 2863 2620 2053 3291 12043 1012 5580 2016 2001 2179 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 14:45:51.605065 140372727723776 tf_logging.py:115] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 14:45:51.605959 140372727723776 tf_logging.py:115] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 4 (id = 4)


I0507 14:45:51.606816 140372727723776 tf_logging.py:115] label: 4 (id = 4)


INFO:tensorflow:*** Example ***


I0507 14:45:51.607909 140372727723776 tf_logging.py:115] *** Example ***


INFO:tensorflow:guid: None


I0507 14:45:51.608771 140372727723776 tf_logging.py:115] guid: None


INFO:tensorflow:tokens: [CLS] sign up for this awesome contest ! [SEP]


I0507 14:45:51.609630 140372727723776 tf_logging.py:115] tokens: [CLS] sign up for this awesome contest ! [SEP]


INFO:tensorflow:input_ids: 101 3696 2039 2005 2023 12476 5049 999 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 14:45:51.610503 140372727723776 tf_logging.py:115] input_ids: 101 3696 2039 2005 2023 12476 5049 999 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 14:45:51.611419 140372727723776 tf_logging.py:115] input_mask: 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 14:45:51.612292 140372727723776 tf_logging.py:115] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 4 (id = 4)


I0507 14:45:51.613139 140372727723776 tf_logging.py:115] label: 4 (id = 4)


INFO:tensorflow:Writing example 10000 of 25758


I0507 14:45:57.385486 140372727723776 tf_logging.py:115] Writing example 10000 of 25758


INFO:tensorflow:Writing example 20000 of 25758


I0507 14:46:02.568547 140372727723776 tf_logging.py:115] Writing example 20000 of 25758


# Creating a model

Now that we've prepared our data, let's focus on building a model. `create_model` does just this below. First, it loads the BERT tf hub module again (this time to extract the computation graph). Next, it creates a single new layer that will be trained to adapt BERT to our sentiment task (i.e. classifying whether a movie review is positive or negative). This strategy of using a mostly trained model is called [fine-tuning](http://wiki.fast.ai/index.php/Fine_tuning).

In [16]:
def create_model(is_predicting, input_ids, input_mask, segment_ids, labels,
                 num_labels):
  """Creates a classification model."""

  bert_module = hub.Module(
      BERT_MODEL_HUB,
      trainable=True)
  bert_inputs = dict(
      input_ids=input_ids,
      input_mask=input_mask,
      segment_ids=segment_ids)
  bert_outputs = bert_module(
      inputs=bert_inputs,
      signature="tokens",
      as_dict=True)

  # Use "pooled_output" for classification tasks on an entire sentence.
  # Use "sequence_outputs" for token-level output.
  output_layer = bert_outputs["pooled_output"]

  hidden_size = output_layer.shape[-1].value

  # Create our own layer to tune for politeness data.
  output_weights = tf.get_variable(
      "output_weights", [num_labels, hidden_size],
      initializer=tf.truncated_normal_initializer(stddev=0.02))

  output_bias = tf.get_variable(
      "output_bias", [num_labels], initializer=tf.zeros_initializer())

  with tf.variable_scope("loss"):

    # Dropout helps prevent overfitting
    output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)

    logits = tf.matmul(output_layer, output_weights, transpose_b=True)
    logits = tf.nn.bias_add(logits, output_bias)
    log_probs = tf.nn.log_softmax(logits, axis=-1)

    # Convert labels into one-hot encoding
    one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)

    predicted_labels = tf.squeeze(tf.argmax(log_probs, axis=-1, output_type=tf.int32))
    # If we're predicting, we want predicted labels and the probabiltiies.
    if is_predicting:
      return (predicted_labels, log_probs)

    # If we're train/eval, compute loss between predicted and actual label
    per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)
    loss = tf.reduce_mean(per_example_loss)
    return (loss, predicted_labels, log_probs)

Next we'll wrap our model function in a `model_fn_builder` function that adapts our model to work for training, evaluation, and prediction.

In [17]:
# model_fn_builder actually creates our model function
# using the passed parameters for num_labels, learning_rate, etc.
def model_fn_builder(num_labels, learning_rate, num_train_steps,
                     num_warmup_steps):
  """Returns `model_fn` closure for TPUEstimator."""
  def model_fn(features, labels, mode, params):  # pylint: disable=unused-argument
    """The `model_fn` for TPUEstimator."""

    input_ids = features["input_ids"]
    input_mask = features["input_mask"]
    segment_ids = features["segment_ids"]
    label_ids = features["label_ids"]

    is_predicting = (mode == tf.estimator.ModeKeys.PREDICT)
    
    # TRAIN and EVAL
    if not is_predicting:

      (loss, predicted_labels, log_probs) = create_model(
        is_predicting, input_ids, input_mask, segment_ids, label_ids, num_labels)

      train_op = bert.optimization.create_optimizer(
          loss, learning_rate, num_train_steps, num_warmup_steps, use_tpu=False)

      # Calculate evaluation metrics. 
      def metric_fn(label_ids, predicted_labels):
        accuracy = tf.metrics.accuracy(label_ids, predicted_labels)
        #f1_score = tf.contrib.metrics.f1_score(
        #    label_ids,
        #    predicted_labels)
        #auc = tf.metrics.auc(
        #    label_ids,
        #    predicted_labels)
        recall = tf.metrics.recall(
            label_ids,
            predicted_labels)
        precision = tf.metrics.precision(
            label_ids,
            predicted_labels) 
        true_pos = tf.metrics.true_positives(
            label_ids,
            predicted_labels)
        true_neg = tf.metrics.true_negatives(
            label_ids,
            predicted_labels)   
        false_pos = tf.metrics.false_positives(
            label_ids,
            predicted_labels)  
        false_neg = tf.metrics.false_negatives(
            label_ids,
            predicted_labels)
        return {
            "eval_accuracy": accuracy,
            #"f1_score": f1_score,
            #"auc": auc,
            "precision": precision,
            "recall": recall,
            "true_positives": true_pos,
            "true_negatives": true_neg,
            "false_positives": false_pos,
            "false_negatives": false_neg
        }

      eval_metrics = metric_fn(label_ids, predicted_labels)

      if mode == tf.estimator.ModeKeys.TRAIN:
        return tf.estimator.EstimatorSpec(mode=mode,
          loss=loss,
          train_op=train_op)
      else:
          return tf.estimator.EstimatorSpec(mode=mode,
            loss=loss,
            eval_metric_ops=eval_metrics)
    else:
      (predicted_labels, log_probs) = create_model(
        is_predicting, input_ids, input_mask, segment_ids, label_ids, num_labels)

      predictions = {
          'probabilities': log_probs,
          'labels': predicted_labels
      }
      return tf.estimator.EstimatorSpec(mode, predictions=predictions)

  # Return the actual model function in the closure
  return model_fn


In [18]:
# Compute train and warmup steps from batch size
# These hyperparameters are copied from this colab notebook (https://colab.sandbox.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_finetuning_with_cloud_tpus.ipynb)
BATCH_SIZE = 32
LEARNING_RATE = 2e-5
NUM_TRAIN_EPOCHS = 3.0
# Warmup is a period of time where hte learning rate 
# is small and gradually increases--usually helps training.
WARMUP_PROPORTION = 0.1
# Model configs
SAVE_CHECKPOINTS_STEPS = 500
SAVE_SUMMARY_STEPS = 100
VALIDATION_SPLIT = 0.2

In [19]:
len(df_features)

25758

In [20]:
import random
# we shuffle our sorted data, to make it random first
random.shuffle(df_features)

# before we begin to split them
num_validation_samples = int(VALIDATION_SPLIT * len(df_features))
num_validation_samples

5151

In [21]:
test_features = df_features[:num_validation_samples]
train_features = df_features[num_validation_samples:]

In [22]:
# Compute # train and warmup steps from batch size
num_train_steps = int(len(train_features) / BATCH_SIZE * NUM_TRAIN_EPOCHS)
num_warmup_steps = int(num_train_steps * WARMUP_PROPORTION)

In [23]:
# Specify output directory and number of checkpoint steps to save
run_config = tf.estimator.RunConfig(
    model_dir=OUTPUT_DIR,
    save_summary_steps=SAVE_SUMMARY_STEPS,
    save_checkpoints_steps=SAVE_CHECKPOINTS_STEPS)

In [24]:
model_fn = model_fn_builder(
  num_labels=len(label_list),
  learning_rate=LEARNING_RATE,
  num_train_steps=num_train_steps,
  num_warmup_steps=num_warmup_steps)

estimator = tf.estimator.Estimator(
  model_fn=model_fn,
  config=run_config,
  params={"batch_size": BATCH_SIZE})

INFO:tensorflow:Using config: {'_model_dir': 'bert_models', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 500, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7faa31b08128>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


I0507 14:46:06.531221 140372727723776 tf_logging.py:115] Using config: {'_model_dir': 'bert_models', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 500, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7faa31b08128>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


Next we create an input builder function that takes our training feature set (`train_features`) and produces a generator. This is a pretty standard design pattern for working with Tensorflow [Estimators](https://www.tensorflow.org/guide/estimators).

In [25]:
# Create an input function for training. drop_remainder = True for using TPUs.
train_input_fn = bert.run_classifier.input_fn_builder(
    features=train_features,
    seq_length=MAX_SEQ_LENGTH,
    is_training=True,
    drop_remainder=False)

Now we train our model!

In [26]:
from datetime import datetime

print(f'Beginning Training!')
current_time = datetime.now()
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
print("Training took time ", datetime.now() - current_time)

Beginning Training!
INFO:tensorflow:Calling model_fn.


I0507 14:46:18.953002 140372727723776 tf_logging.py:115] Calling model_fn.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


I0507 14:46:22.185296 140372727723776 tf_logging.py:115] Saver not created because there are no variables in the graph to restore
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


INFO:tensorflow:Done calling model_fn.


I0507 14:46:31.217102 140372727723776 tf_logging.py:115] Done calling model_fn.


INFO:tensorflow:Create CheckpointSaverHook.


I0507 14:46:31.219996 140372727723776 tf_logging.py:115] Create CheckpointSaverHook.


INFO:tensorflow:Graph was finalized.


I0507 14:46:36.089108 140372727723776 tf_logging.py:115] Graph was finalized.


INFO:tensorflow:Running local_init_op.


I0507 14:46:41.170332 140372727723776 tf_logging.py:115] Running local_init_op.


INFO:tensorflow:Done running local_init_op.


I0507 14:46:41.321447 140372727723776 tf_logging.py:115] Done running local_init_op.


INFO:tensorflow:Saving checkpoints for 0 into bert_models/model.ckpt.


I0507 14:46:55.777678 140372727723776 tf_logging.py:115] Saving checkpoints for 0 into bert_models/model.ckpt.


INFO:tensorflow:loss = 1.7532601, step = 0


I0507 14:47:06.765743 140372727723776 tf_logging.py:115] loss = 1.7532601, step = 0


INFO:tensorflow:global_step/sec: 1.47515


I0507 14:48:14.555107 140372727723776 tf_logging.py:115] global_step/sec: 1.47515


INFO:tensorflow:loss = 0.1330314, step = 101 (67.792 sec)


I0507 14:48:14.558142 140372727723776 tf_logging.py:115] loss = 0.1330314, step = 101 (67.792 sec)


INFO:tensorflow:global_step/sec: 1.85797


I0507 14:49:08.377319 140372727723776 tf_logging.py:115] global_step/sec: 1.85797


INFO:tensorflow:loss = 0.01043457, step = 201 (53.823 sec)


I0507 14:49:08.381397 140372727723776 tf_logging.py:115] loss = 0.01043457, step = 201 (53.823 sec)


INFO:tensorflow:global_step/sec: 1.85203


I0507 14:50:02.372009 140372727723776 tf_logging.py:115] global_step/sec: 1.85203


INFO:tensorflow:loss = 0.059115604, step = 301 (53.994 sec)


I0507 14:50:02.375793 140372727723776 tf_logging.py:115] loss = 0.059115604, step = 301 (53.994 sec)


INFO:tensorflow:global_step/sec: 1.84755


I0507 14:50:56.497850 140372727723776 tf_logging.py:115] global_step/sec: 1.84755


INFO:tensorflow:loss = 0.04622001, step = 401 (54.125 sec)


I0507 14:50:56.500905 140372727723776 tf_logging.py:115] loss = 0.04622001, step = 401 (54.125 sec)


INFO:tensorflow:Saving checkpoints for 500 into bert_models/model.ckpt.


I0507 14:51:50.026949 140372727723776 tf_logging.py:115] Saving checkpoints for 500 into bert_models/model.ckpt.


INFO:tensorflow:global_step/sec: 1.73681


I0507 14:51:54.074596 140372727723776 tf_logging.py:115] global_step/sec: 1.73681


INFO:tensorflow:loss = 0.16945139, step = 501 (57.576 sec)


I0507 14:51:54.076485 140372727723776 tf_logging.py:115] loss = 0.16945139, step = 501 (57.576 sec)


INFO:tensorflow:global_step/sec: 1.84998


I0507 14:52:48.129420 140372727723776 tf_logging.py:115] global_step/sec: 1.84998


INFO:tensorflow:loss = 0.005248884, step = 601 (54.056 sec)


I0507 14:52:48.132100 140372727723776 tf_logging.py:115] loss = 0.005248884, step = 601 (54.056 sec)


INFO:tensorflow:global_step/sec: 1.85489


I0507 14:53:42.041126 140372727723776 tf_logging.py:115] global_step/sec: 1.85489


INFO:tensorflow:loss = 0.14718188, step = 701 (53.913 sec)


I0507 14:53:42.044778 140372727723776 tf_logging.py:115] loss = 0.14718188, step = 701 (53.913 sec)


INFO:tensorflow:global_step/sec: 1.8506


I0507 14:54:36.077679 140372727723776 tf_logging.py:115] global_step/sec: 1.8506


INFO:tensorflow:loss = 0.0026476844, step = 801 (54.036 sec)


I0507 14:54:36.081133 140372727723776 tf_logging.py:115] loss = 0.0026476844, step = 801 (54.036 sec)


INFO:tensorflow:global_step/sec: 1.84994


I0507 14:55:30.133446 140372727723776 tf_logging.py:115] global_step/sec: 1.84994


INFO:tensorflow:loss = 0.0036297175, step = 901 (54.055 sec)


I0507 14:55:30.136150 140372727723776 tf_logging.py:115] loss = 0.0036297175, step = 901 (54.055 sec)


INFO:tensorflow:Saving checkpoints for 1000 into bert_models/model.ckpt.


I0507 14:56:23.623900 140372727723776 tf_logging.py:115] Saving checkpoints for 1000 into bert_models/model.ckpt.


INFO:tensorflow:global_step/sec: 1.75364


I0507 14:56:27.157443 140372727723776 tf_logging.py:115] global_step/sec: 1.75364


INFO:tensorflow:loss = 0.058256775, step = 1001 (57.023 sec)


I0507 14:56:27.159082 140372727723776 tf_logging.py:115] loss = 0.058256775, step = 1001 (57.023 sec)


INFO:tensorflow:global_step/sec: 1.85188


I0507 14:57:21.156697 140372727723776 tf_logging.py:115] global_step/sec: 1.85188


INFO:tensorflow:loss = 0.0023932336, step = 1101 (54.000 sec)


I0507 14:57:21.159393 140372727723776 tf_logging.py:115] loss = 0.0023932336, step = 1101 (54.000 sec)


INFO:tensorflow:global_step/sec: 1.85292


I0507 14:58:15.125372 140372727723776 tf_logging.py:115] global_step/sec: 1.85292


INFO:tensorflow:loss = 0.0020429878, step = 1201 (53.968 sec)


I0507 14:58:15.127703 140372727723776 tf_logging.py:115] loss = 0.0020429878, step = 1201 (53.968 sec)


INFO:tensorflow:global_step/sec: 1.84996


I0507 14:59:09.180814 140372727723776 tf_logging.py:115] global_step/sec: 1.84996


INFO:tensorflow:loss = 0.017281398, step = 1301 (54.056 sec)


I0507 14:59:09.183718 140372727723776 tf_logging.py:115] loss = 0.017281398, step = 1301 (54.056 sec)


INFO:tensorflow:global_step/sec: 1.85169


I0507 15:00:03.185542 140372727723776 tf_logging.py:115] global_step/sec: 1.85169


INFO:tensorflow:loss = 0.008529071, step = 1401 (54.005 sec)


I0507 15:00:03.188677 140372727723776 tf_logging.py:115] loss = 0.008529071, step = 1401 (54.005 sec)


INFO:tensorflow:Saving checkpoints for 1500 into bert_models/model.ckpt.


I0507 15:00:56.767282 140372727723776 tf_logging.py:115] Saving checkpoints for 1500 into bert_models/model.ckpt.


INFO:tensorflow:global_step/sec: 1.74969


I0507 15:01:00.338342 140372727723776 tf_logging.py:115] global_step/sec: 1.74969


INFO:tensorflow:loss = 0.0019483108, step = 1501 (57.152 sec)


I0507 15:01:00.340330 140372727723776 tf_logging.py:115] loss = 0.0019483108, step = 1501 (57.152 sec)


INFO:tensorflow:global_step/sec: 1.85044


I0507 15:01:54.379629 140372727723776 tf_logging.py:115] global_step/sec: 1.85044


INFO:tensorflow:loss = 0.027974296, step = 1601 (54.042 sec)


I0507 15:01:54.382384 140372727723776 tf_logging.py:115] loss = 0.027974296, step = 1601 (54.042 sec)


INFO:tensorflow:global_step/sec: 1.8494


I0507 15:02:48.451101 140372727723776 tf_logging.py:115] global_step/sec: 1.8494


INFO:tensorflow:loss = 0.001222246, step = 1701 (54.071 sec)


I0507 15:02:48.453061 140372727723776 tf_logging.py:115] loss = 0.001222246, step = 1701 (54.071 sec)


INFO:tensorflow:global_step/sec: 1.85036


I0507 15:03:42.494772 140372727723776 tf_logging.py:115] global_step/sec: 1.85036


INFO:tensorflow:loss = 0.0018905711, step = 1801 (54.044 sec)


I0507 15:03:42.497460 140372727723776 tf_logging.py:115] loss = 0.0018905711, step = 1801 (54.044 sec)


INFO:tensorflow:global_step/sec: 1.84937


I0507 15:04:36.567366 140372727723776 tf_logging.py:115] global_step/sec: 1.84937


INFO:tensorflow:loss = 0.0015011837, step = 1901 (54.073 sec)


I0507 15:04:36.570474 140372727723776 tf_logging.py:115] loss = 0.0015011837, step = 1901 (54.073 sec)


INFO:tensorflow:Saving checkpoints for 1931 into bert_models/model.ckpt.


I0507 15:04:52.789425 140372727723776 tf_logging.py:115] Saving checkpoints for 1931 into bert_models/model.ckpt.


INFO:tensorflow:Loss for final step: 0.0012503576.


I0507 15:04:56.128183 140372727723776 tf_logging.py:115] Loss for final step: 0.0012503576.


Training took time  0:18:49.375330


Now let's use our test data to see how well our model did:

In [27]:
test_input_fn = run_classifier.input_fn_builder(
    features=test_features,
    seq_length=MAX_SEQ_LENGTH,
    is_training=False,
    drop_remainder=False)

In [28]:
estimator.evaluate(input_fn=test_input_fn, steps=None)

INFO:tensorflow:Calling model_fn.


I0507 15:04:58.973592 140372727723776 tf_logging.py:115] Calling model_fn.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


I0507 15:05:02.991786 140372727723776 tf_logging.py:115] Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Done calling model_fn.


I0507 15:05:11.962692 140372727723776 tf_logging.py:115] Done calling model_fn.


INFO:tensorflow:Starting evaluation at 2019-05-07-07:05:11


I0507 15:05:11.986277 140372727723776 tf_logging.py:115] Starting evaluation at 2019-05-07-07:05:11


INFO:tensorflow:Graph was finalized.


I0507 15:05:13.390394 140372727723776 tf_logging.py:115] Graph was finalized.


INFO:tensorflow:Restoring parameters from bert_models/model.ckpt-1931


I0507 15:05:13.396456 140372727723776 tf_logging.py:115] Restoring parameters from bert_models/model.ckpt-1931


INFO:tensorflow:Running local_init_op.


I0507 15:05:15.408491 140372727723776 tf_logging.py:115] Running local_init_op.


INFO:tensorflow:Done running local_init_op.


I0507 15:05:15.624501 140372727723776 tf_logging.py:115] Done running local_init_op.


INFO:tensorflow:Finished evaluation at 2019-05-07-07:05:44


I0507 15:05:44.891416 140372727723776 tf_logging.py:115] Finished evaluation at 2019-05-07-07:05:44


INFO:tensorflow:Saving dict for global step 1931: eval_accuracy = 0.9846632, false_negatives = 10.0, false_positives = 13.0, global_step = 1931, loss = 0.07248825, precision = 0.9971194, recall = 0.9977827, true_negatives = 628.0, true_positives = 4500.0


I0507 15:05:44.894817 140372727723776 tf_logging.py:115] Saving dict for global step 1931: eval_accuracy = 0.9846632, false_negatives = 10.0, false_positives = 13.0, global_step = 1931, loss = 0.07248825, precision = 0.9971194, recall = 0.9977827, true_negatives = 628.0, true_positives = 4500.0


INFO:tensorflow:Saving 'checkpoint_path' summary for global step 1931: bert_models/model.ckpt-1931


I0507 15:05:47.362723 140372727723776 tf_logging.py:115] Saving 'checkpoint_path' summary for global step 1931: bert_models/model.ckpt-1931


{'eval_accuracy': 0.9846632,
 'false_negatives': 10.0,
 'false_positives': 13.0,
 'loss': 0.07248825,
 'precision': 0.9971194,
 'recall': 0.9977827,
 'true_negatives': 628.0,
 'true_positives': 4500.0,
 'global_step': 1931}

Now let's write code to make predictions on new sentences:

In [29]:
def getPrediction(in_sentences):
  labels = list(classes_index.items())
  input_examples = [run_classifier.InputExample(guid="", text_a = x, text_b = None, label = 0) for x in in_sentences] # here, "" is just a dummy label
  input_features = run_classifier.convert_examples_to_features(input_examples, label_list, MAX_SEQ_LENGTH, tokenizer)
  predict_input_fn = run_classifier.input_fn_builder(features=input_features, seq_length=MAX_SEQ_LENGTH, is_training=False, drop_remainder=False)
  predictions = estimator.predict(predict_input_fn)
  return [(sentence, prediction['probabilities'], labels[prediction['labels']]) for sentence, prediction in zip(in_sentences, predictions)]

In [30]:
pred_sentences = [
  "If I just disappear from the world right now, will anyone even care?",
  "Nobody can understand the stress and anxiety that exists within me right now. Nobody cares.",
  "When I'm training my neural network, I can't watch netflix because it lags like crazy. Sigh",
  "Today seems like a fine day. I can't wait to go to school today.",
  "Wow your scrambled eggs are delicious!",
  "I am so excited for Captain Marvel actually"
]

In [31]:
predictions = getPrediction(pred_sentences)

INFO:tensorflow:Writing example 0 of 6


I0507 15:05:47.570571 140372727723776 tf_logging.py:115] Writing example 0 of 6


INFO:tensorflow:*** Example ***


I0507 15:05:47.573590 140372727723776 tf_logging.py:115] *** Example ***


INFO:tensorflow:guid: 


I0507 15:05:47.578235 140372727723776 tf_logging.py:115] guid: 


INFO:tensorflow:tokens: [CLS] if i just disappear from the world right now , will anyone even care ? [SEP]


I0507 15:05:47.579765 140372727723776 tf_logging.py:115] tokens: [CLS] if i just disappear from the world right now , will anyone even care ? [SEP]


INFO:tensorflow:input_ids: 101 2065 1045 2074 10436 2013 1996 2088 2157 2085 1010 2097 3087 2130 2729 1029 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 15:05:47.581306 140372727723776 tf_logging.py:115] input_ids: 101 2065 1045 2074 10436 2013 1996 2088 2157 2085 1010 2097 3087 2130 2729 1029 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 15:05:47.582887 140372727723776 tf_logging.py:115] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 15:05:47.584558 140372727723776 tf_logging.py:115] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0507 15:05:47.586024 140372727723776 tf_logging.py:115] label: 0 (id = 0)


INFO:tensorflow:*** Example ***


I0507 15:05:47.588223 140372727723776 tf_logging.py:115] *** Example ***


INFO:tensorflow:guid: 


I0507 15:05:47.589659 140372727723776 tf_logging.py:115] guid: 


INFO:tensorflow:tokens: [CLS] nobody can understand the stress and anxiety that exists within me right now . nobody cares . [SEP]


I0507 15:05:47.591088 140372727723776 tf_logging.py:115] tokens: [CLS] nobody can understand the stress and anxiety that exists within me right now . nobody cares . [SEP]


INFO:tensorflow:input_ids: 101 6343 2064 3305 1996 6911 1998 10089 2008 6526 2306 2033 2157 2085 1012 6343 14977 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 15:05:47.592565 140372727723776 tf_logging.py:115] input_ids: 101 6343 2064 3305 1996 6911 1998 10089 2008 6526 2306 2033 2157 2085 1012 6343 14977 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 15:05:47.594082 140372727723776 tf_logging.py:115] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 15:05:47.595478 140372727723776 tf_logging.py:115] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0507 15:05:47.596847 140372727723776 tf_logging.py:115] label: 0 (id = 0)


INFO:tensorflow:*** Example ***


I0507 15:05:47.598833 140372727723776 tf_logging.py:115] *** Example ***


INFO:tensorflow:guid: 


I0507 15:05:47.600208 140372727723776 tf_logging.py:115] guid: 


INFO:tensorflow:tokens: [CLS] when i ' m training my neural network , i can ' t watch netflix because it la ##gs like crazy . sigh [SEP]


I0507 15:05:47.601533 140372727723776 tf_logging.py:115] tokens: [CLS] when i ' m training my neural network , i can ' t watch netflix because it la ##gs like crazy . sigh [SEP]


INFO:tensorflow:input_ids: 101 2043 1045 1005 1049 2731 2026 15756 2897 1010 1045 2064 1005 1056 3422 20907 2138 2009 2474 5620 2066 4689 1012 6682 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 15:05:47.602911 140372727723776 tf_logging.py:115] input_ids: 101 2043 1045 1005 1049 2731 2026 15756 2897 1010 1045 2064 1005 1056 3422 20907 2138 2009 2474 5620 2066 4689 1012 6682 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 15:05:47.604394 140372727723776 tf_logging.py:115] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 15:05:47.605806 140372727723776 tf_logging.py:115] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0507 15:05:47.607162 140372727723776 tf_logging.py:115] label: 0 (id = 0)


INFO:tensorflow:*** Example ***


I0507 15:05:47.608933 140372727723776 tf_logging.py:115] *** Example ***


INFO:tensorflow:guid: 


I0507 15:05:47.610297 140372727723776 tf_logging.py:115] guid: 


INFO:tensorflow:tokens: [CLS] today seems like a fine day . i can ' t wait to go to school today . [SEP]


I0507 15:05:47.611660 140372727723776 tf_logging.py:115] tokens: [CLS] today seems like a fine day . i can ' t wait to go to school today . [SEP]


INFO:tensorflow:input_ids: 101 2651 3849 2066 1037 2986 2154 1012 1045 2064 1005 1056 3524 2000 2175 2000 2082 2651 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 15:05:47.612961 140372727723776 tf_logging.py:115] input_ids: 101 2651 3849 2066 1037 2986 2154 1012 1045 2064 1005 1056 3524 2000 2175 2000 2082 2651 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 15:05:47.614266 140372727723776 tf_logging.py:115] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 15:05:47.615558 140372727723776 tf_logging.py:115] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0507 15:05:47.616789 140372727723776 tf_logging.py:115] label: 0 (id = 0)


INFO:tensorflow:*** Example ***


I0507 15:05:47.618367 140372727723776 tf_logging.py:115] *** Example ***


INFO:tensorflow:guid: 


I0507 15:05:47.619591 140372727723776 tf_logging.py:115] guid: 


INFO:tensorflow:tokens: [CLS] wow your scrambled eggs are delicious ! [SEP]


I0507 15:05:47.620784 140372727723776 tf_logging.py:115] tokens: [CLS] wow your scrambled eggs are delicious ! [SEP]


INFO:tensorflow:input_ids: 101 10166 2115 13501 6763 2024 12090 999 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 15:05:47.621952 140372727723776 tf_logging.py:115] input_ids: 101 10166 2115 13501 6763 2024 12090 999 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 15:05:47.623135 140372727723776 tf_logging.py:115] input_mask: 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0507 15:05:47.624360 140372727723776 tf_logging.py:115] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0507 15:05:47.625548 140372727723776 tf_logging.py:115] label: 0 (id = 0)


INFO:tensorflow:Calling model_fn.


I0507 15:05:47.664012 140372727723776 tf_logging.py:115] Calling model_fn.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


I0507 15:05:51.673863 140372727723776 tf_logging.py:115] Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Done calling model_fn.


I0507 15:05:51.889468 140372727723776 tf_logging.py:115] Done calling model_fn.


INFO:tensorflow:Graph was finalized.


I0507 15:05:52.396250 140372727723776 tf_logging.py:115] Graph was finalized.


INFO:tensorflow:Restoring parameters from bert_models/model.ckpt-1931


I0507 15:05:52.400949 140372727723776 tf_logging.py:115] Restoring parameters from bert_models/model.ckpt-1931


INFO:tensorflow:Running local_init_op.


I0507 15:05:53.185207 140372727723776 tf_logging.py:115] Running local_init_op.


INFO:tensorflow:Done running local_init_op.


I0507 15:05:53.234896 140372727723776 tf_logging.py:115] Done running local_init_op.


Voila! We have a sentiment classifier!

In [32]:
predictions

[('If I just disappear from the world right now, will anyone even care?',
  array([-0.05108946, -3.2444801 , -4.8545737 , -6.6508336 , -7.8408146 ,
         -6.6161776 ], dtype=float32),
  (0, 'suicidal')),
 ('Nobody can understand the stress and anxiety that exists within me right now. Nobody cares.',
  array([-0.53542405, -0.89993197, -6.3480854 , -5.847459  , -6.9001694 ,
         -6.0589876 ], dtype=float32),
  (0, 'suicidal')),
 ("When I'm training my neural network, I can't watch netflix because it lags like crazy. Sigh",
  array([-5.0027833, -0.5956425, -3.074578 , -4.671892 , -1.0175464,
         -3.6877565], dtype=float32),
  (1, 'depressed')),
 ("Today seems like a fine day. I can't wait to go to school today.",
  array([-3.9587896 , -4.2296805 , -6.3801646 , -0.16406982, -4.8696446 ,
         -2.222822  ], dtype=float32),
  (3, 'happy')),
 ('Wow your scrambled eggs are delicious!',
  array([-3.9031124 , -4.7588186 , -5.7492213 , -0.19345228, -2.4745243 ,
         -2.8176239 