## Predicting Movie Review Sentiment with BERT on TF Hub

If you've been following Natural Language Processing over the past year, you've probably heard of BERT: Bidrectional Encoder Repressentations from Transformers. It's a neural network architecture designed by Google researchers that's totally transformed what's state-of-the-art for NLP tasks, like text classification, translation, summarization, and question answer.

Now that BERT's been added to TF Hub as a loadable module, it's easy(ish) to add into existing Tensorflow text pipelines. IN an existing pipeline, BERT can replace text embedding layers like ELMO and GloVe. Alternatively, finetuning BERT can provide both an accuracy boost and faster training time in many cases.

Here, we'll train a model to predict whether an IMDB movie review is positive or negative using BERT in Tensorflow with tf hub. Some code was adapted from [this colob notebook](https://colab.sandbox.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_finetuning_with_cloud_tpus.ipynb). Let's get started!

In [1]:
! pip install tensorflow_hub

Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Collecting tensorflow_hub
[?25l  Downloading http://mirrors.aliyun.com/pypi/packages/10/5c/6f3698513cf1cd730a5ea66aec665d213adf9de59b34f362f270e0bd126f/tensorflow_hub-0.4.0-py2.py3-none-any.whl (75kB)
[K    100% |████████████████████████████████| 81kB 1.1MB/s 
Installing collected packages: tensorflow-hub
Successfully installed tensorflow-hub-0.4.0
[33mYou are using pip version 18.1, however version 19.0.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [1]:
from sklearn.model_selection import train_test_split
import pandas as pd
import tensorflow as tf
import tensorflow_hub as hub
from datetime import datetime

W0412 18:38:59.049878 4535756224 __init__.py:56] Some hub symbols are not available because TensorFlow version is less than 1.14


In addition to the standard libraries we imported above, we'll need to install BERT's python package.

In [2]:
! pip install bert-tensorflow



In [3]:
import bert
from bert import run_classifier
from bert import optimization
from bert import tokenization

Below, we'll set an output directory lacation to store our model output and checkpoints. This can be a local directory, in which case you'd set OUTPUT_DIR to the name of directory you'd like to create. Set DO_DELETE to rewrite the OUTPUT_DIR if it exists. Otherwise, Tensorflow will load existing model checkpoints from that directory(if they exist).

In [4]:
OUTPUT_DIR = 'OUTPUT'
DO_DELETE = False

if DO_DELETE:
    try:
        tf.gfile.DeleteRecursively(OUTPUT_DIR)
    except:
        pass
tf.gfile.MakeDirs(OUTPUT_DIR)
print('***** Model output directory: {} *****'.format(OUTPUT_DIR))

***** Model output directory: OUTPUT *****


### Data

First, let's download the dataset, hosted by Standford. The code below, which downloads, extracts, and imports the IMDB Large Moive Review Dataset, is borrowed from [this Tensoflow tutorial](https://www.tensorflow.org/hub/tutorials/text_classification_with_tf_hub)

In [5]:
from tensorflow import keras
import os
import re

# Load all files from a directory in a DataFrame.
def load_directory_data(directory):
    data = {}
    data["sentence"] = []
    data["sentiment"] = []
    for file_path in os.listdir(directory):
        with tf.gfile.GFile(os.path.join(directory, file_path), "r") as f:
            data["sentence"].append(f.read())
            data["sentiment"].append(re.match("\d+_(\d+)\.txt", file_path).group(1))
    return pd.DataFrame.from_dict(data)

# Merge positive and negative examples, add a polarity column and shuffle.
def load_dataset(directory):
    pos_df = load_directory_data(os.path.join(directory, "pos"))
    neg_df = load_directory_data(os.path.join(directory, "neg"))
    pos_df["polarity"] = 1
    neg_df["polarity"] = 0
    return pd.concat([pos_df, neg_df]).sample(frac=1).reset_index(drop=True)

# Download and process the dataset files.
def download_and_load_datasets(force_dowmload=False):
    dataset = tf.keras.utils.get_file(
        fname="aclImdb.tar.gz",
        origin="http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz",
        extract=True
    )
    train_df = load_dataset(os.path.join(os.path.dirname(dataset), "aclImdb", "train"))
    test_df = load_dataset(os.path.join(os.path.dirname(dataset), "aclImdb", "test"))
    return train_df, test_df

In [6]:
train, test = download_and_load_datasets()

Downloading data from http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz


In [7]:
train

Unnamed: 0,sentence,sentiment,polarity
0,"Yes, this movie make me feel real horror, when...",1,0
1,WWF Survivor Series 2001<br /><br />This was a...,1,0
2,This movie is incredibly realistic and I feel ...,8,1
3,"College students, who are clearing out a conde...",4,0
4,Surprisingly not terrible and well animated fo...,8,1
5,"""Dressed To Kill"", is one of the best thriller...",10,1
6,"I hadn't seen this film in probably 35 years, ...",10,1
7,I love this movie! 10 out of 10 hands down! It...,10,1
8,The Lack of content in this movie amazed me th...,1,0
9,The story line has been rehashed a number of t...,4,0


To keep training fast, we'll take a sample of 5000 train and test examples, respectively.

In [8]:
train = train.sample(5000)
test = test.sample(5000)
train.shape

(5000, 3)

In [9]:
train.columns

Index(['sentence', 'sentiment', 'polarity'], dtype='object')

For us, our input data is the 'sentence' column and our label is the 'polarity' column(0, 1 for negative and positive, respecively)

In [10]:
DATA_COLUMN = 'sentence'
LABEL_COLUMN = 'polarity'
label_list = [0, 1]

### Data Preprocessing

We'll need to transform our data into a format BERT understands. This involves two steps. First, we create InputExample's using the constructor provided in the BERT library.

* text_a is the text we want to classify, which in this case, is the Request field in our Dataframe.
* text_b is used if we're training a model to understand the relationship between sentences(i.e. is text_b a translation of text_a? Is text_b an answer to the question asked by text_a?). This doesn't apply to our task, so we can leave text_b blank.
* label is the label for our example, i.e. True, False

In [11]:
# Use the InputExample class from BERT's run_classifier code to create examples from the data
train_InputExamples = train.apply(lambda x: bert.run_classifier.InputExample(guid=None, # Globally unique ID for bookkeeping, unused in this example
                                                                   text_a = x[DATA_COLUMN], 
                                                                   text_b = None, 
                                                                   label = x[LABEL_COLUMN]), axis = 1)

test_InputExamples = test.apply(lambda x: bert.run_classifier.InputExample(guid=None, 
                                                                   text_a = x[DATA_COLUMN], 
                                                                   text_b = None, 
                                                                   label = x[LABEL_COLUMN]), axis = 1)

Next, we need to preprocess our data so that it matches the data BERT was trained on. For this, we'll need to da a couple if things (but don't worry-this is also included in the Python library):

1. Lowercase our text (if we're using a BERT lowercase model)
2. Tokenize it(i.e "sally says hi" -> \["sally", "says", "hi"\])
3. Break words into WordPieces(i.e. "calling" -> \["call", "##ing"\])
4. Map our words to indexes using a vocab file that BERT provides
5. Add special "CLS" and "SEP" token (see the readme)
6. Append "index" and "segment" tokens to each input (see the BERT paper)

Happily, we don't have to worry about most of these details.
To start, we'll need to load a vocabulary file and lowercasing information directly from the BERT tf hub module:

In [12]:
# this is a path to an uncased (all lowercase) version of BERT
# BERT_MODEL_HUB = "/ml/dfsj/tfhub_module"
BERT_MODEL_HUB = "https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1"

def create_tokenizer_from_hub_module():
    """ Get the vocab file and casing info from the Hub module. """
    with tf.Graph().as_default():
        bert_module = hub.Module(BERT_MODEL_HUB)
        tokenization_info = bert_module(signature="tokenization_info", as_dict=True)
        with tf.Session() as sess:
            vocab_file, do_lower_case = sess.run([tokenization_info["vocab_file"],
                                                  tokenization_info["do_lower_case"]])
    return bert.tokenization.FullTokenizer(vocab_file=vocab_file, do_lower_case=do_lower_case)

tokenizer = create_tokenizer_from_hub_module()

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


I0412 18:42:14.748091 4535756224 tf_logging.py:115] Saver not created because there are no variables in the graph to restore


Great--we just learned that the BERT model we're using expects lowercase data (that's what stored in tokenization_info \["do_lower_case"\]) and we also loaded BERT's vocab file. We also created a tokenizer, which breaks words into word pieces:

In [14]:
tokenizer.tokenize("This here's an example of using the BERT tokenizer")

['this',
 'here',
 "'",
 's',
 'an',
 'example',
 'of',
 'using',
 'the',
 'bert',
 'token',
 '##izer']

Using our tokenizer, we'll call run_classifier.convert_examples_to_features on our InputExamples to convent them into features BERT understands.

In [15]:
# We'll set sequences to be at most 128 tokens long.
MAX_SEQ_LENGTH = 128
# Convert our train and test features to InputFeatures that BERT understands.
train_features = bert.run_classifier.convert_examples_to_features(train_InputExamples, label_list, MAX_SEQ_LENGTH, tokenizer)
test_features = bert.run_classifier.convert_examples_to_features(test_InputExamples, label_list, MAX_SEQ_LENGTH, tokenizer)

INFO:tensorflow:Writing example 0 of 5000


I0412 19:25:44.370208 4535756224 tf_logging.py:115] Writing example 0 of 5000


INFO:tensorflow:*** Example ***


I0412 19:25:44.379289 4535756224 tf_logging.py:115] *** Example ***


INFO:tensorflow:guid: None


I0412 19:25:44.387365 4535756224 tf_logging.py:115] guid: None


INFO:tensorflow:tokens: [CLS] with all due di ##sr ##es ##pe ##ct for this george stevens sr . " epic " of mis ##cast ##ings and mis ##rea ##ding ##s , i can only wonder that the james dean " legend " could survive this outing , i submit that then - studio obe ##isance ##s to bank ##able box office " giants " came a crop ##per of its own ' gig ##ant ##ism ##oses ' . nor were rock and liz that much better off . let us just say that the televised " dallas " was the authentic " heir , " even if con ##tem ##p ( tu ##ous ) latter ##day " texans " like lay and delay , not to mention our put ##ative [SEP]


I0412 19:25:44.401907 4535756224 tf_logging.py:115] tokens: [CLS] with all due di ##sr ##es ##pe ##ct for this george stevens sr . " epic " of mis ##cast ##ings and mis ##rea ##ding ##s , i can only wonder that the james dean " legend " could survive this outing , i submit that then - studio obe ##isance ##s to bank ##able box office " giants " came a crop ##per of its own ' gig ##ant ##ism ##oses ' . nor were rock and liz that much better off . let us just say that the televised " dallas " was the authentic " heir , " even if con ##tem ##p ( tu ##ous ) latter ##day " texans " like lay and delay , not to mention our put ##ative [SEP]


INFO:tensorflow:input_ids: 101 2007 2035 2349 4487 21338 2229 5051 6593 2005 2023 2577 8799 5034 1012 1000 8680 1000 1997 28616 10526 8613 1998 28616 16416 4667 2015 1010 1045 2064 2069 4687 2008 1996 2508 4670 1000 5722 1000 2071 5788 2023 26256 1010 1045 12040 2008 2059 1011 2996 15578 28138 2015 2000 2924 3085 3482 2436 1000 7230 1000 2234 1037 10416 4842 1997 2049 2219 1005 15453 4630 2964 27465 1005 1012 4496 2020 2600 1998 9056 2008 2172 2488 2125 1012 2292 2149 2074 2360 2008 1996 13762 1000 5759 1000 2001 1996 14469 1000 8215 1010 1000 2130 2065 9530 18532 2361 1006 10722 3560 1007 3732 10259 1000 23246 1000 2066 3913 1998 8536 1010 2025 2000 5254 2256 2404 8082 102


I0412 19:25:44.407488 4535756224 tf_logging.py:115] input_ids: 101 2007 2035 2349 4487 21338 2229 5051 6593 2005 2023 2577 8799 5034 1012 1000 8680 1000 1997 28616 10526 8613 1998 28616 16416 4667 2015 1010 1045 2064 2069 4687 2008 1996 2508 4670 1000 5722 1000 2071 5788 2023 26256 1010 1045 12040 2008 2059 1011 2996 15578 28138 2015 2000 2924 3085 3482 2436 1000 7230 1000 2234 1037 10416 4842 1997 2049 2219 1005 15453 4630 2964 27465 1005 1012 4496 2020 2600 1998 9056 2008 2172 2488 2125 1012 2292 2149 2074 2360 2008 1996 13762 1000 5759 1000 2001 1996 14469 1000 8215 1010 1000 2130 2065 9530 18532 2361 1006 10722 3560 1007 3732 10259 1000 23246 1000 2066 3913 1998 8536 1010 2025 2000 5254 2256 2404 8082 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


I0412 19:25:44.410555 4535756224 tf_logging.py:115] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0412 19:25:44.415276 4535756224 tf_logging.py:115] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0412 19:25:44.418679 4535756224 tf_logging.py:115] label: 0 (id = 0)


INFO:tensorflow:*** Example ***


I0412 19:25:44.427361 4535756224 tf_logging.py:115] *** Example ***


INFO:tensorflow:guid: None


I0412 19:25:44.432137 4535756224 tf_logging.py:115] guid: None


INFO:tensorflow:tokens: [CLS] i loved the first " az ##umi " movie . i ' ve seen ms . u ##eto in a variety of her tv appearances and i ' ve seen my fair share of samurai and ninja flick ##s . i have to say that this movie was much weaker than i ' d expected . < br / > < br / > given the movie ' s cast and set up in " az ##umi " , they should have been able to do a much better job with this movie , but instead it was slow , pl ##od ##ding in parts , and sp ##rin ##kled with very poor , un ##con ##vin ##cing , and wooden acting . < br / [SEP]


I0412 19:25:44.434337 4535756224 tf_logging.py:115] tokens: [CLS] i loved the first " az ##umi " movie . i ' ve seen ms . u ##eto in a variety of her tv appearances and i ' ve seen my fair share of samurai and ninja flick ##s . i have to say that this movie was much weaker than i ' d expected . < br / > < br / > given the movie ' s cast and set up in " az ##umi " , they should have been able to do a much better job with this movie , but instead it was slow , pl ##od ##ding in parts , and sp ##rin ##kled with very poor , un ##con ##vin ##cing , and wooden acting . < br / [SEP]


INFO:tensorflow:input_ids: 101 1045 3866 1996 2034 1000 17207 12717 1000 3185 1012 1045 1005 2310 2464 5796 1012 1057 18903 1999 1037 3528 1997 2014 2694 3922 1998 1045 1005 2310 2464 2026 4189 3745 1997 16352 1998 14104 17312 2015 1012 1045 2031 2000 2360 2008 2023 3185 2001 2172 15863 2084 1045 1005 1040 3517 1012 1026 7987 1013 1028 1026 7987 1013 1028 2445 1996 3185 1005 1055 3459 1998 2275 2039 1999 1000 17207 12717 1000 1010 2027 2323 2031 2042 2583 2000 2079 1037 2172 2488 3105 2007 2023 3185 1010 2021 2612 2009 2001 4030 1010 20228 7716 4667 1999 3033 1010 1998 11867 6657 19859 2007 2200 3532 1010 4895 8663 6371 6129 1010 1998 4799 3772 1012 1026 7987 1013 102


I0412 19:25:44.436712 4535756224 tf_logging.py:115] input_ids: 101 1045 3866 1996 2034 1000 17207 12717 1000 3185 1012 1045 1005 2310 2464 5796 1012 1057 18903 1999 1037 3528 1997 2014 2694 3922 1998 1045 1005 2310 2464 2026 4189 3745 1997 16352 1998 14104 17312 2015 1012 1045 2031 2000 2360 2008 2023 3185 2001 2172 15863 2084 1045 1005 1040 3517 1012 1026 7987 1013 1028 1026 7987 1013 1028 2445 1996 3185 1005 1055 3459 1998 2275 2039 1999 1000 17207 12717 1000 1010 2027 2323 2031 2042 2583 2000 2079 1037 2172 2488 3105 2007 2023 3185 1010 2021 2612 2009 2001 4030 1010 20228 7716 4667 1999 3033 1010 1998 11867 6657 19859 2007 2200 3532 1010 4895 8663 6371 6129 1010 1998 4799 3772 1012 1026 7987 1013 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


I0412 19:25:44.438473 4535756224 tf_logging.py:115] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0412 19:25:44.440653 4535756224 tf_logging.py:115] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0412 19:25:44.442584 4535756224 tf_logging.py:115] label: 0 (id = 0)


INFO:tensorflow:*** Example ***


I0412 19:25:44.454234 4535756224 tf_logging.py:115] *** Example ***


INFO:tensorflow:guid: None


I0412 19:25:44.456264 4535756224 tf_logging.py:115] guid: None


INFO:tensorflow:tokens: [CLS] this movie is basically about some girls in a catholic school that end up getting into trouble because of putting red dye in one in one of their school mates sham ##poo and after being rep ##rim ##and ##ed for this act they decide to take off to florida for a vacation . on their way there they meet up with some guys in a local diner and decide that they would both meet up with each other in another location later on . the girls end up on a road side near the woods and stop for awhile and while one of the girls decides to walk around a bit she sees a murder happen in which the local sheriff himself is involved . she [SEP]


I0412 19:25:44.457859 4535756224 tf_logging.py:115] tokens: [CLS] this movie is basically about some girls in a catholic school that end up getting into trouble because of putting red dye in one in one of their school mates sham ##poo and after being rep ##rim ##and ##ed for this act they decide to take off to florida for a vacation . on their way there they meet up with some guys in a local diner and decide that they would both meet up with each other in another location later on . the girls end up on a road side near the woods and stop for awhile and while one of the girls decides to walk around a bit she sees a murder happen in which the local sheriff himself is involved . she [SEP]


INFO:tensorflow:input_ids: 101 2023 3185 2003 10468 2055 2070 3057 1999 1037 3234 2082 2008 2203 2039 2893 2046 4390 2138 1997 5128 2417 18554 1999 2028 1999 2028 1997 2037 2082 14711 25850 24667 1998 2044 2108 16360 20026 5685 2098 2005 2023 2552 2027 5630 2000 2202 2125 2000 3516 2005 1037 10885 1012 2006 2037 2126 2045 2027 3113 2039 2007 2070 4364 1999 1037 2334 15736 1998 5630 2008 2027 2052 2119 3113 2039 2007 2169 2060 1999 2178 3295 2101 2006 1012 1996 3057 2203 2039 2006 1037 2346 2217 2379 1996 5249 1998 2644 2005 19511 1998 2096 2028 1997 1996 3057 7288 2000 3328 2105 1037 2978 2016 5927 1037 4028 4148 1999 2029 1996 2334 6458 2370 2003 2920 1012 2016 102


I0412 19:25:44.459636 4535756224 tf_logging.py:115] input_ids: 101 2023 3185 2003 10468 2055 2070 3057 1999 1037 3234 2082 2008 2203 2039 2893 2046 4390 2138 1997 5128 2417 18554 1999 2028 1999 2028 1997 2037 2082 14711 25850 24667 1998 2044 2108 16360 20026 5685 2098 2005 2023 2552 2027 5630 2000 2202 2125 2000 3516 2005 1037 10885 1012 2006 2037 2126 2045 2027 3113 2039 2007 2070 4364 1999 1037 2334 15736 1998 5630 2008 2027 2052 2119 3113 2039 2007 2169 2060 1999 2178 3295 2101 2006 1012 1996 3057 2203 2039 2006 1037 2346 2217 2379 1996 5249 1998 2644 2005 19511 1998 2096 2028 1997 1996 3057 7288 2000 3328 2105 1037 2978 2016 5927 1037 4028 4148 1999 2029 1996 2334 6458 2370 2003 2920 1012 2016 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


I0412 19:25:44.461475 4535756224 tf_logging.py:115] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0412 19:25:44.469093 4535756224 tf_logging.py:115] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0412 19:25:44.471421 4535756224 tf_logging.py:115] label: 0 (id = 0)


INFO:tensorflow:*** Example ***


I0412 19:25:44.498985 4535756224 tf_logging.py:115] *** Example ***


INFO:tensorflow:guid: None


I0412 19:25:44.501221 4535756224 tf_logging.py:115] guid: None


INFO:tensorflow:tokens: [CLS] arm ##aged ##don pp ##v < br / > < br / > the last pp ##v of 2006 < br / > < br / > smackdown brand . < br / > < br / > match results ahead * * * * * * * * < br / > < br / > we are starting the show with the inferno match . kane v . mvp . this was an okay match . nothing about wrestling here . this was about the visuals . overall , this was not bad . there were a few close spots here with kane getting too close to the fire , but in the end , kane won with ram ##ming mvp into the fire [SEP]


I0412 19:25:44.503638 4535756224 tf_logging.py:115] tokens: [CLS] arm ##aged ##don pp ##v < br / > < br / > the last pp ##v of 2006 < br / > < br / > smackdown brand . < br / > < br / > match results ahead * * * * * * * * < br / > < br / > we are starting the show with the inferno match . kane v . mvp . this was an okay match . nothing about wrestling here . this was about the visuals . overall , this was not bad . there were a few close spots here with kane getting too close to the fire , but in the end , kane won with ram ##ming mvp into the fire [SEP]


INFO:tensorflow:input_ids: 101 2849 18655 5280 4903 2615 1026 7987 1013 1028 1026 7987 1013 1028 1996 2197 4903 2615 1997 2294 1026 7987 1013 1028 1026 7987 1013 1028 22120 4435 1012 1026 7987 1013 1028 1026 7987 1013 1028 2674 3463 3805 1008 1008 1008 1008 1008 1008 1008 1008 1026 7987 1013 1028 1026 7987 1013 1028 2057 2024 3225 1996 2265 2007 1996 21848 2674 1012 8472 1058 1012 12041 1012 2023 2001 2019 3100 2674 1012 2498 2055 4843 2182 1012 2023 2001 2055 1996 26749 1012 3452 1010 2023 2001 2025 2919 1012 2045 2020 1037 2261 2485 7516 2182 2007 8472 2893 2205 2485 2000 1996 2543 1010 2021 1999 1996 2203 1010 8472 2180 2007 8223 6562 12041 2046 1996 2543 102


I0412 19:25:44.505738 4535756224 tf_logging.py:115] input_ids: 101 2849 18655 5280 4903 2615 1026 7987 1013 1028 1026 7987 1013 1028 1996 2197 4903 2615 1997 2294 1026 7987 1013 1028 1026 7987 1013 1028 22120 4435 1012 1026 7987 1013 1028 1026 7987 1013 1028 2674 3463 3805 1008 1008 1008 1008 1008 1008 1008 1008 1026 7987 1013 1028 1026 7987 1013 1028 2057 2024 3225 1996 2265 2007 1996 21848 2674 1012 8472 1058 1012 12041 1012 2023 2001 2019 3100 2674 1012 2498 2055 4843 2182 1012 2023 2001 2055 1996 26749 1012 3452 1010 2023 2001 2025 2919 1012 2045 2020 1037 2261 2485 7516 2182 2007 8472 2893 2205 2485 2000 1996 2543 1010 2021 1999 1996 2203 1010 8472 2180 2007 8223 6562 12041 2046 1996 2543 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


I0412 19:25:44.507890 4535756224 tf_logging.py:115] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0412 19:25:44.509868 4535756224 tf_logging.py:115] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1 (id = 1)


I0412 19:25:44.511981 4535756224 tf_logging.py:115] label: 1 (id = 1)


INFO:tensorflow:*** Example ***


I0412 19:25:44.518899 4535756224 tf_logging.py:115] *** Example ***


INFO:tensorflow:guid: None


I0412 19:25:44.520746 4535756224 tf_logging.py:115] guid: None


INFO:tensorflow:tokens: [CLS] i enjoyed this movie quite a lot . i have always been a fan of who ##op ##i goldberg and this movie only emphasizes it . she portrays a house ##wife in an african - american family which is moving up the social chain due to the husband ' s ( danny glover ) success as an attorney . she moves to an all white neighborhood where the people are friendly , yet a little awkward toward her . the various events that arise during the course of the movie make for some laughs but mostly appeal to the other emotions . this movie is not so much a comedy as a drama . i give it a strong 8 / 10 . i highly recommend [SEP]


I0412 19:25:44.522470 4535756224 tf_logging.py:115] tokens: [CLS] i enjoyed this movie quite a lot . i have always been a fan of who ##op ##i goldberg and this movie only emphasizes it . she portrays a house ##wife in an african - american family which is moving up the social chain due to the husband ' s ( danny glover ) success as an attorney . she moves to an all white neighborhood where the people are friendly , yet a little awkward toward her . the various events that arise during the course of the movie make for some laughs but mostly appeal to the other emotions . this movie is not so much a comedy as a drama . i give it a strong 8 / 10 . i highly recommend [SEP]


INFO:tensorflow:input_ids: 101 1045 5632 2023 3185 3243 1037 2843 1012 1045 2031 2467 2042 1037 5470 1997 2040 7361 2072 18522 1998 2023 3185 2069 20618 2009 1012 2016 17509 1037 2160 19993 1999 2019 3060 1011 2137 2155 2029 2003 3048 2039 1996 2591 4677 2349 2000 1996 3129 1005 1055 1006 6266 20012 1007 3112 2004 2019 4905 1012 2016 5829 2000 2019 2035 2317 5101 2073 1996 2111 2024 5379 1010 2664 1037 2210 9596 2646 2014 1012 1996 2536 2824 2008 13368 2076 1996 2607 1997 1996 3185 2191 2005 2070 11680 2021 3262 5574 2000 1996 2060 6699 1012 2023 3185 2003 2025 2061 2172 1037 4038 2004 1037 3689 1012 1045 2507 2009 1037 2844 1022 1013 2184 1012 1045 3811 16755 102


I0412 19:25:44.524555 4535756224 tf_logging.py:115] input_ids: 101 1045 5632 2023 3185 3243 1037 2843 1012 1045 2031 2467 2042 1037 5470 1997 2040 7361 2072 18522 1998 2023 3185 2069 20618 2009 1012 2016 17509 1037 2160 19993 1999 2019 3060 1011 2137 2155 2029 2003 3048 2039 1996 2591 4677 2349 2000 1996 3129 1005 1055 1006 6266 20012 1007 3112 2004 2019 4905 1012 2016 5829 2000 2019 2035 2317 5101 2073 1996 2111 2024 5379 1010 2664 1037 2210 9596 2646 2014 1012 1996 2536 2824 2008 13368 2076 1996 2607 1997 1996 3185 2191 2005 2070 11680 2021 3262 5574 2000 1996 2060 6699 1012 2023 3185 2003 2025 2061 2172 1037 4038 2004 1037 3689 1012 1045 2507 2009 1037 2844 1022 1013 2184 1012 1045 3811 16755 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


I0412 19:25:44.526450 4535756224 tf_logging.py:115] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0412 19:25:44.528614 4535756224 tf_logging.py:115] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1 (id = 1)


I0412 19:25:44.530693 4535756224 tf_logging.py:115] label: 1 (id = 1)


INFO:tensorflow:Writing example 0 of 5000


I0412 19:26:07.635954 4535756224 tf_logging.py:115] Writing example 0 of 5000


INFO:tensorflow:*** Example ***


I0412 19:26:07.639602 4535756224 tf_logging.py:115] *** Example ***


INFO:tensorflow:guid: None


I0412 19:26:07.641300 4535756224 tf_logging.py:115] guid: None


INFO:tensorflow:tokens: [CLS] this is one of the best episodes of doctor who ever . we have the cyber ##men , the cyber conversion units ( may scare young children ) and of coarse the doctor doing one of his best acts . bravo david ten ##nant . good scenes as if it was a movie , with thrilling scenes in some streets , an invasion on the cyber ##man ' s base , and leaving the world different to ours , basically a 45 minute movie . < br / > < br / > being part 2 of rise of the cyber ##men , this would never di ##sa ##pp ##oint . with it having a great build up to the final . < br / > [SEP]


I0412 19:26:07.643225 4535756224 tf_logging.py:115] tokens: [CLS] this is one of the best episodes of doctor who ever . we have the cyber ##men , the cyber conversion units ( may scare young children ) and of coarse the doctor doing one of his best acts . bravo david ten ##nant . good scenes as if it was a movie , with thrilling scenes in some streets , an invasion on the cyber ##man ' s base , and leaving the world different to ours , basically a 45 minute movie . < br / > < br / > being part 2 of rise of the cyber ##men , this would never di ##sa ##pp ##oint . with it having a great build up to the final . < br / > [SEP]


INFO:tensorflow:input_ids: 101 2023 2003 2028 1997 1996 2190 4178 1997 3460 2040 2412 1012 2057 2031 1996 16941 3549 1010 1996 16941 7584 3197 1006 2089 12665 2402 2336 1007 1998 1997 20392 1996 3460 2725 2028 1997 2010 2190 4490 1012 17562 2585 2702 16885 1012 2204 5019 2004 2065 2009 2001 1037 3185 1010 2007 26162 5019 1999 2070 4534 1010 2019 5274 2006 1996 16941 2386 1005 1055 2918 1010 1998 2975 1996 2088 2367 2000 14635 1010 10468 1037 3429 3371 3185 1012 1026 7987 1013 1028 1026 7987 1013 1028 2108 2112 1016 1997 4125 1997 1996 16941 3549 1010 2023 2052 2196 4487 3736 9397 25785 1012 2007 2009 2383 1037 2307 3857 2039 2000 1996 2345 1012 1026 7987 1013 1028 102


I0412 19:26:07.644489 4535756224 tf_logging.py:115] input_ids: 101 2023 2003 2028 1997 1996 2190 4178 1997 3460 2040 2412 1012 2057 2031 1996 16941 3549 1010 1996 16941 7584 3197 1006 2089 12665 2402 2336 1007 1998 1997 20392 1996 3460 2725 2028 1997 2010 2190 4490 1012 17562 2585 2702 16885 1012 2204 5019 2004 2065 2009 2001 1037 3185 1010 2007 26162 5019 1999 2070 4534 1010 2019 5274 2006 1996 16941 2386 1005 1055 2918 1010 1998 2975 1996 2088 2367 2000 14635 1010 10468 1037 3429 3371 3185 1012 1026 7987 1013 1028 1026 7987 1013 1028 2108 2112 1016 1997 4125 1997 1996 16941 3549 1010 2023 2052 2196 4487 3736 9397 25785 1012 2007 2009 2383 1037 2307 3857 2039 2000 1996 2345 1012 1026 7987 1013 1028 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


I0412 19:26:07.646070 4535756224 tf_logging.py:115] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0412 19:26:07.647475 4535756224 tf_logging.py:115] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1 (id = 1)


I0412 19:26:07.649322 4535756224 tf_logging.py:115] label: 1 (id = 1)


INFO:tensorflow:*** Example ***


I0412 19:26:07.673784 4535756224 tf_logging.py:115] *** Example ***


INFO:tensorflow:guid: None


I0412 19:26:07.674926 4535756224 tf_logging.py:115] guid: None


INFO:tensorflow:tokens: [CLS] minor spoil ##ers < br / > < br / > misunderstood classic remains one of henson ' s finest and most personal films . it may seem funny to call a movie as beloved as this one ' misunderstood , ' but people do seem to remember this one mostly for jerry ju ##hl ' s snap ##py screenplay and paul williams ' s knockout songs . now while these things are admitted ##ly great , as is the movie ' s formal playful ##ness ( screenplay - within - the - screenplay , film break , etc . ) , what distinguishes ' the mu ##ppet movie ' from the other mu ##ppet films is the serious , wi ##st ##ful thread that runs [SEP]


I0412 19:26:07.676081 4535756224 tf_logging.py:115] tokens: [CLS] minor spoil ##ers < br / > < br / > misunderstood classic remains one of henson ' s finest and most personal films . it may seem funny to call a movie as beloved as this one ' misunderstood , ' but people do seem to remember this one mostly for jerry ju ##hl ' s snap ##py screenplay and paul williams ' s knockout songs . now while these things are admitted ##ly great , as is the movie ' s formal playful ##ness ( screenplay - within - the - screenplay , film break , etc . ) , what distinguishes ' the mu ##ppet movie ' from the other mu ##ppet films is the serious , wi ##st ##ful thread that runs [SEP]


INFO:tensorflow:input_ids: 101 3576 27594 2545 1026 7987 1013 1028 1026 7987 1013 1028 28947 4438 3464 2028 1997 27227 1005 1055 10418 1998 2087 3167 3152 1012 2009 2089 4025 6057 2000 2655 1037 3185 2004 11419 2004 2023 2028 1005 28947 1010 1005 2021 2111 2079 4025 2000 3342 2023 2028 3262 2005 6128 18414 7317 1005 1055 10245 7685 9000 1998 2703 3766 1005 1055 11369 2774 1012 2085 2096 2122 2477 2024 4914 2135 2307 1010 2004 2003 1996 3185 1005 1055 5337 18378 2791 1006 9000 1011 2306 1011 1996 1011 9000 1010 2143 3338 1010 4385 1012 1007 1010 2054 27343 1005 1996 14163 29519 3185 1005 2013 1996 2060 14163 29519 3152 2003 1996 3809 1010 15536 3367 3993 11689 2008 3216 102


I0412 19:26:07.677553 4535756224 tf_logging.py:115] input_ids: 101 3576 27594 2545 1026 7987 1013 1028 1026 7987 1013 1028 28947 4438 3464 2028 1997 27227 1005 1055 10418 1998 2087 3167 3152 1012 2009 2089 4025 6057 2000 2655 1037 3185 2004 11419 2004 2023 2028 1005 28947 1010 1005 2021 2111 2079 4025 2000 3342 2023 2028 3262 2005 6128 18414 7317 1005 1055 10245 7685 9000 1998 2703 3766 1005 1055 11369 2774 1012 2085 2096 2122 2477 2024 4914 2135 2307 1010 2004 2003 1996 3185 1005 1055 5337 18378 2791 1006 9000 1011 2306 1011 1996 1011 9000 1010 2143 3338 1010 4385 1012 1007 1010 2054 27343 1005 1996 14163 29519 3185 1005 2013 1996 2060 14163 29519 3152 2003 1996 3809 1010 15536 3367 3993 11689 2008 3216 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


I0412 19:26:07.678970 4535756224 tf_logging.py:115] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0412 19:26:07.680504 4535756224 tf_logging.py:115] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1 (id = 1)


I0412 19:26:07.681820 4535756224 tf_logging.py:115] label: 1 (id = 1)


INFO:tensorflow:*** Example ***


I0412 19:26:07.687276 4535756224 tf_logging.py:115] *** Example ***


INFO:tensorflow:guid: None


I0412 19:26:07.691925 4535756224 tf_logging.py:115] guid: None


INFO:tensorflow:tokens: [CLS] syn ##opsis correction : the ending does not show ben cruising online for guys . he is looking up arabic language courses at the pre ##si ##do military ac ##ada ##my in san francisco . perhaps to join the war in iraq as a translator , ( f ##yi - many of the dish ##ono ##rable discharge ##s from " d ' on ##t ask d ' on ##t tell have been translators ( they are now it major short supply ) ben also spoke russian . this movie is a good time capsule of life in man ##hat ##ten but quite a bit of non reality here . mostly a good laugh at lame social skills and the sad portrayal of " grown up " [SEP]


I0412 19:26:07.693483 4535756224 tf_logging.py:115] tokens: [CLS] syn ##opsis correction : the ending does not show ben cruising online for guys . he is looking up arabic language courses at the pre ##si ##do military ac ##ada ##my in san francisco . perhaps to join the war in iraq as a translator , ( f ##yi - many of the dish ##ono ##rable discharge ##s from " d ' on ##t ask d ' on ##t tell have been translators ( they are now it major short supply ) ben also spoke russian . this movie is a good time capsule of life in man ##hat ##ten but quite a bit of non reality here . mostly a good laugh at lame social skills and the sad portrayal of " grown up " [SEP]


INFO:tensorflow:input_ids: 101 19962 22599 18140 1024 1996 4566 2515 2025 2265 3841 22206 3784 2005 4364 1012 2002 2003 2559 2039 5640 2653 5352 2012 1996 3653 5332 3527 2510 9353 8447 8029 1999 2624 3799 1012 3383 2000 3693 1996 2162 1999 5712 2004 1037 11403 1010 1006 1042 10139 1011 2116 1997 1996 9841 17175 16670 11889 2015 2013 1000 1040 1005 2006 2102 3198 1040 1005 2006 2102 2425 2031 2042 28396 1006 2027 2024 2085 2009 2350 2460 4425 1007 3841 2036 3764 2845 1012 2023 3185 2003 1037 2204 2051 18269 1997 2166 1999 2158 12707 6528 2021 3243 1037 2978 1997 2512 4507 2182 1012 3262 1037 2204 4756 2012 20342 2591 4813 1998 1996 6517 13954 1997 1000 4961 2039 1000 102


I0412 19:26:07.695008 4535756224 tf_logging.py:115] input_ids: 101 19962 22599 18140 1024 1996 4566 2515 2025 2265 3841 22206 3784 2005 4364 1012 2002 2003 2559 2039 5640 2653 5352 2012 1996 3653 5332 3527 2510 9353 8447 8029 1999 2624 3799 1012 3383 2000 3693 1996 2162 1999 5712 2004 1037 11403 1010 1006 1042 10139 1011 2116 1997 1996 9841 17175 16670 11889 2015 2013 1000 1040 1005 2006 2102 3198 1040 1005 2006 2102 2425 2031 2042 28396 1006 2027 2024 2085 2009 2350 2460 4425 1007 3841 2036 3764 2845 1012 2023 3185 2003 1037 2204 2051 18269 1997 2166 1999 2158 12707 6528 2021 3243 1037 2978 1997 2512 4507 2182 1012 3262 1037 2204 4756 2012 20342 2591 4813 1998 1996 6517 13954 1997 1000 4961 2039 1000 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


I0412 19:26:07.696317 4535756224 tf_logging.py:115] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0412 19:26:07.697786 4535756224 tf_logging.py:115] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0412 19:26:07.699311 4535756224 tf_logging.py:115] label: 0 (id = 0)


INFO:tensorflow:*** Example ***


I0412 19:26:07.707590 4535756224 tf_logging.py:115] *** Example ***


INFO:tensorflow:guid: None


I0412 19:26:07.709439 4535756224 tf_logging.py:115] guid: None


INFO:tensorflow:tokens: [CLS] this film is wonderful film for students of film . in mainstream american film it is common to see stylistic techniques used to draw the audience into the movie . in this film , the director uses stylistic techniques to push the story forward . < br / > < br / > this is a love story that offers no sex . to be honest , i can ' t even recall the characters kissing . rather , the plot focuses on the emotional ties between the two characters . < br / > < br / > i would not recommend this film for everybody . it is not very accessible . it is very slow moving and the subtle . it is a [SEP]


I0412 19:26:07.710803 4535756224 tf_logging.py:115] tokens: [CLS] this film is wonderful film for students of film . in mainstream american film it is common to see stylistic techniques used to draw the audience into the movie . in this film , the director uses stylistic techniques to push the story forward . < br / > < br / > this is a love story that offers no sex . to be honest , i can ' t even recall the characters kissing . rather , the plot focuses on the emotional ties between the two characters . < br / > < br / > i would not recommend this film for everybody . it is not very accessible . it is very slow moving and the subtle . it is a [SEP]


INFO:tensorflow:input_ids: 101 2023 2143 2003 6919 2143 2005 2493 1997 2143 1012 1999 7731 2137 2143 2009 2003 2691 2000 2156 24828 5461 2109 2000 4009 1996 4378 2046 1996 3185 1012 1999 2023 2143 1010 1996 2472 3594 24828 5461 2000 5245 1996 2466 2830 1012 1026 7987 1013 1028 1026 7987 1013 1028 2023 2003 1037 2293 2466 2008 4107 2053 3348 1012 2000 2022 7481 1010 1045 2064 1005 1056 2130 9131 1996 3494 7618 1012 2738 1010 1996 5436 7679 2006 1996 6832 7208 2090 1996 2048 3494 1012 1026 7987 1013 1028 1026 7987 1013 1028 1045 2052 2025 16755 2023 2143 2005 7955 1012 2009 2003 2025 2200 7801 1012 2009 2003 2200 4030 3048 1998 1996 11259 1012 2009 2003 1037 102


I0412 19:26:07.712438 4535756224 tf_logging.py:115] input_ids: 101 2023 2143 2003 6919 2143 2005 2493 1997 2143 1012 1999 7731 2137 2143 2009 2003 2691 2000 2156 24828 5461 2109 2000 4009 1996 4378 2046 1996 3185 1012 1999 2023 2143 1010 1996 2472 3594 24828 5461 2000 5245 1996 2466 2830 1012 1026 7987 1013 1028 1026 7987 1013 1028 2023 2003 1037 2293 2466 2008 4107 2053 3348 1012 2000 2022 7481 1010 1045 2064 1005 1056 2130 9131 1996 3494 7618 1012 2738 1010 1996 5436 7679 2006 1996 6832 7208 2090 1996 2048 3494 1012 1026 7987 1013 1028 1026 7987 1013 1028 1045 2052 2025 16755 2023 2143 2005 7955 1012 2009 2003 2025 2200 7801 1012 2009 2003 2200 4030 3048 1998 1996 11259 1012 2009 2003 1037 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


I0412 19:26:07.713884 4535756224 tf_logging.py:115] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0412 19:26:07.715378 4535756224 tf_logging.py:115] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1 (id = 1)


I0412 19:26:07.716886 4535756224 tf_logging.py:115] label: 1 (id = 1)


INFO:tensorflow:*** Example ***


I0412 19:26:07.734278 4535756224 tf_logging.py:115] *** Example ***


INFO:tensorflow:guid: None


I0412 19:26:07.736026 4535756224 tf_logging.py:115] guid: None


INFO:tensorflow:tokens: [CLS] i saw this film for the first time last night . i have been thinking about it all night and this morning . i cannot say that it was my favorite film , at least not yet . i need to see it again . < br / > < br / > the cinematography is stunning . each shot has a lyric ##ism that one would expect in a film that has wi ##m wen ##ders ' s name attached to it . < br / > < br / > it is always tempting to see de chi ##ric ##o in any picture of rows of orders vanishing into the gloom , but in this case the analogy fits . in many ways the [SEP]


I0412 19:26:07.737951 4535756224 tf_logging.py:115] tokens: [CLS] i saw this film for the first time last night . i have been thinking about it all night and this morning . i cannot say that it was my favorite film , at least not yet . i need to see it again . < br / > < br / > the cinematography is stunning . each shot has a lyric ##ism that one would expect in a film that has wi ##m wen ##ders ' s name attached to it . < br / > < br / > it is always tempting to see de chi ##ric ##o in any picture of rows of orders vanishing into the gloom , but in this case the analogy fits . in many ways the [SEP]


INFO:tensorflow:input_ids: 101 1045 2387 2023 2143 2005 1996 2034 2051 2197 2305 1012 1045 2031 2042 3241 2055 2009 2035 2305 1998 2023 2851 1012 1045 3685 2360 2008 2009 2001 2026 5440 2143 1010 2012 2560 2025 2664 1012 1045 2342 2000 2156 2009 2153 1012 1026 7987 1013 1028 1026 7987 1013 1028 1996 16434 2003 14726 1012 2169 2915 2038 1037 13677 2964 2008 2028 2052 5987 1999 1037 2143 2008 2038 15536 2213 19181 13375 1005 1055 2171 4987 2000 2009 1012 1026 7987 1013 1028 1026 7987 1013 1028 2009 2003 2467 23421 2000 2156 2139 9610 7277 2080 1999 2151 3861 1997 10281 1997 4449 24866 2046 1996 24067 1010 2021 1999 2023 2553 1996 23323 16142 1012 1999 2116 3971 1996 102


I0412 19:26:07.739664 4535756224 tf_logging.py:115] input_ids: 101 1045 2387 2023 2143 2005 1996 2034 2051 2197 2305 1012 1045 2031 2042 3241 2055 2009 2035 2305 1998 2023 2851 1012 1045 3685 2360 2008 2009 2001 2026 5440 2143 1010 2012 2560 2025 2664 1012 1045 2342 2000 2156 2009 2153 1012 1026 7987 1013 1028 1026 7987 1013 1028 1996 16434 2003 14726 1012 2169 2915 2038 1037 13677 2964 2008 2028 2052 5987 1999 1037 2143 2008 2038 15536 2213 19181 13375 1005 1055 2171 4987 2000 2009 1012 1026 7987 1013 1028 1026 7987 1013 1028 2009 2003 2467 23421 2000 2156 2139 9610 7277 2080 1999 2151 3861 1997 10281 1997 4449 24866 2046 1996 24067 1010 2021 1999 2023 2553 1996 23323 16142 1012 1999 2116 3971 1996 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


I0412 19:26:07.741593 4535756224 tf_logging.py:115] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0412 19:26:07.742891 4535756224 tf_logging.py:115] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1 (id = 1)


I0412 19:26:07.745311 4535756224 tf_logging.py:115] label: 1 (id = 1)


### Creating a model

Now that we've prepared our data, let's focus on building a model. create_model does just this below. First, it loads the BERT tf hub module again (this time to extract the computation graph). Next, it creates a single new layer that will be trained to adapt BERT to our sentiment task(i.e. classifying whether a movie review is positive or negative). This strategy of using a mostly trained model is called fine-tuning.

In [21]:
def create_model(is_predicting, input_ids, input_mask, segment_ids, labels, num_labels):
    """ Creates a classification model. """
    
    bert_module = hub.Module(BERT_MODEL_HUB, trainable=True)
    bert_inputs = dict(input_ids=input_ids, input_mask=input_mask, segment_ids=segment_ids)
    bert_outputs = bert_module(inputs=bert_inputs, signature="tokens", as_dict=True)
    # Use "pooled_output" for classification tasks on an entire sentence.
    # Use "sequence_outputs" for token-level output.
    output_layer = bert_outputs["pooled_output"]

    hidden_size = output_layer.shape[-1].value

    # Create our own layer to tune for politeness data.
    output_weights = tf.get_variable(
        "output_weights", [num_labels, hidden_size],
        initializer=tf.truncated_normal_initializer(stddev=0.02))

    output_bias = tf.get_variable(
        "output_bias", [num_labels], initializer=tf.zeros_initializer())

    with tf.variable_scope("loss"):

        # Dropout helps prevent overfitting
        output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)

        logits = tf.matmul(output_layer, output_weights, transpose_b=True)
        logits = tf.nn.bias_add(logits, output_bias)
        log_probs = tf.nn.log_softmax(logits, axis=-1)

        # Convert labels into one-hot encoding
        one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)

        predicted_labels = tf.squeeze(tf.argmax(log_probs, axis=-1, output_type=tf.int32))
        # If we're predicting, we want predicted labels and the probabiltiies.
        if is_predicting:
            return (predicted_labels, log_probs)

        # If we're train/eval, compute loss between predicted and actual label
        per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)
        loss = tf.reduce_mean(per_example_loss)
        return (loss, predicted_labels, log_probs)


Next we'll wrap our model function in a model_fn_builder function that adapts our model to work for training, evaluation, and prediction.

In [24]:
# model_fn_builder actually creates our model function
# using the passed parameters for num_labels, learning_rate, etc.
def model_fn_builder(num_labels, learning_rate, num_train_steps, num_warmup_steps):
    """Returns `model_fn` closure for TPUEstimator."""
    def model_fn(features, labels, mode, params):  # pylint: disable=unused-argument
        """The `model_fn` for TPUEstimator."""

        input_ids = features["input_ids"]
        input_mask = features["input_mask"]
        segment_ids = features["segment_ids"]
        label_ids = features["label_ids"]

        is_predicting = (mode == tf.estimator.ModeKeys.PREDICT)
    
        # TRAIN and EVAL
        if not is_predicting:
            (loss, predicted_labels, log_probs) = create_model(is_predicting, input_ids, input_mask, segment_ids, label_ids, num_labels)
            train_op = bert.optimization.create_optimizer(loss, learning_rate, num_train_steps, num_warmup_steps, use_tpu=False)

            # Calculate evaluation metrics. 
            def metric_fn(label_ids, predicted_labels):
                accuracy = tf.metrics.accuracy(label_ids, predicted_labels)
                f1_score = tf.contrib.metrics.f1_score(label_ids, predicted_labels)
                auc = tf.metrics.auc(label_ids, predicted_labels)
                recall = tf.metrics.recall(label_ids, predicted_labels)
                precision = tf.metrics.precision(label_ids, predicted_labels) 
                true_pos = tf.metrics.true_positives(label_ids, predicted_labels)
                true_neg = tf.metrics.true_negatives(label_ids, predicted_labels)   
                false_pos = tf.metrics.false_positives(label_ids, predicted_labels)  
                false_neg = tf.metrics.false_negatives(label_ids, predicted_labels)
                return {
                    "eval_accuracy": accuracy,
                    "f1_score": f1_score,
                    "auc": auc,
                    "precision": precision,
                    "recall": recall,
                    "true_positives": true_pos,
                    "true_negatives": true_neg,
                    "false_positives": false_pos,
                    "false_negatives": false_neg
                }

            eval_metrics = metric_fn(label_ids, predicted_labels)

            if mode == tf.estimator.ModeKeys.TRAIN:
                return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)
            else:
                return tf.estimator.EstimatorSpec(mode=mode, loss=loss, eval_metric_ops=eval_metrics)
        else:
            (predicted_labels, log_probs) = create_model(is_predicting, input_ids, input_mask, segment_ids, label_ids, num_labels)
            predictions = {
                'probabilities': log_probs,
                'labels': predicted_labels
            }
            return tf.estimator.EstimatorSpec(mode, predictions=predictions)

    # Return the actual model function in the closure
    return model_fn

In [25]:
# Compute train and warmup steps from batch size
# These hyperparameters are copied from this colab notebook (https://colab.sandbox.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_finetuning_with_cloud_tpus.ipynb)
BATCH_SIZE = 32
LEARNING_RATE = 2e-5
NUM_TRAIN_EPOCHS = 3.0
# Warmup is a period of time where hte learning rate 
# is small and gradually increases--usually helps training.
WARMUP_PROPORTION = 0.1
# Model configs
SAVE_CHECKPOINTS_STEPS = 500
SAVE_SUMMARY_STEPS = 100

In [26]:
# Compute # train and warmup steps from batch size
num_train_steps = int(len(train_features) / BATCH_SIZE * NUM_TRAIN_EPOCHS)
num_warmup_steps = int(num_train_steps * WARMUP_PROPORTION)

In [27]:
# Specify outpit directory and number of checkpoint steps to save
run_config = tf.estimator.RunConfig(
    model_dir=OUTPUT_DIR,
    save_summary_steps=SAVE_SUMMARY_STEPS,
    save_checkpoints_steps=SAVE_CHECKPOINTS_STEPS)

In [28]:
model_fn = model_fn_builder(
    num_labels=len(label_list),
    learning_rate=LEARNING_RATE,
    num_train_steps=num_train_steps,
    num_warmup_steps=num_warmup_steps)

estimator = tf.estimator.Estimator(
    model_fn=model_fn,
    config=run_config,
    params={"batch_size": BATCH_SIZE})


INFO:tensorflow:Using config: {'_model_dir': 'OUTPUT', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 500, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x133d55c88>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


I0412 21:59:26.966120 4535756224 tf_logging.py:115] Using config: {'_model_dir': 'OUTPUT', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 500, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x133d55c88>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


Next we create an input builder function that takes our training feature set (train_features) and produces a generator. This is a pretty standard design pattern for working with Tensorflow Estimators.

In [29]:
# Create an input function for training. drop_remainder = True for using TPUs.
train_input_fn = bert.run_classifier.input_fn_builder(
    features=train_features,
    seq_length=MAX_SEQ_LENGTH,
    is_training=True,
    drop_remainder=False)

Now we train our model! For me, using a Colab notebook running on Google's GPUs, my training time was about 14 minutes.

In [30]:
print(f'Beginning Training!')
current_time = datetime.now()
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
print("Training took time ", datetime.now() - current_time)

Beginning Training!
INFO:tensorflow:Calling model_fn.


I0412 22:00:55.864845 4535756224 tf_logging.py:115] Calling model_fn.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


I0412 22:00:59.208818 4535756224 tf_logging.py:115] Saver not created because there are no variables in the graph to restore
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


INFO:tensorflow:Done calling model_fn.


I0412 22:01:13.192520 4535756224 tf_logging.py:115] Done calling model_fn.


INFO:tensorflow:Create CheckpointSaverHook.


I0412 22:01:13.196856 4535756224 tf_logging.py:115] Create CheckpointSaverHook.


INFO:tensorflow:Graph was finalized.


I0412 22:01:17.764293 4535756224 tf_logging.py:115] Graph was finalized.


INFO:tensorflow:Running local_init_op.


I0412 22:01:25.613692 4535756224 tf_logging.py:115] Running local_init_op.


INFO:tensorflow:Done running local_init_op.


I0412 22:01:25.883424 4535756224 tf_logging.py:115] Done running local_init_op.


INFO:tensorflow:Saving checkpoints for 0 into OUTPUT/model.ckpt.


I0412 22:01:37.101327 4535756224 tf_logging.py:115] Saving checkpoints for 0 into OUTPUT/model.ckpt.


INFO:tensorflow:loss = 0.6989826, step = 1


I0412 22:03:09.707064 4535756224 tf_logging.py:115] loss = 0.6989826, step = 1


KeyboardInterrupt: 