# Predicting Movie Review Sentiment with BERT on TF Hub

If you’ve been following Natural Language Processing over the past year, you’ve probably heard of BERT: Bidirectional Encoder Representations from Transformers. It’s a neural network architecture designed by Google researchers that’s totally transformed what’s state-of-the-art for NLP tasks, like text classification, translation, summarization, and question answering.

Now that BERT's been added to [TF Hub](https://www.tensorflow.org/hub) as a loadable module, it's easy(ish) to add into existing Tensorflow text pipelines. In an existing pipeline, BERT can replace text embedding layers like ELMO and GloVE. Alternatively, [finetuning](http://wiki.fast.ai/index.php/Fine_tuning) BERT can provide both an accuracy boost and faster training time in many cases.

Here, we'll train a model to predict whether a review is positive or negative using BERT in Tensorflow with tf hub.

In [1]:
!pip list

Package                            Version   
---------------------------------- ----------
absl-py                            0.9.0     
alabaster                          0.7.10    
anaconda-client                    1.6.14    
anaconda-project                   0.8.2     
asn1crypto                         0.24.0    
astor                              0.8.1     
astroid                            1.6.3     
astropy                            3.0.2     
attrs                              18.1.0    
Automat                            0.3.0     
autovizwidget                      0.15.0    
awscli                             1.18.20   
Babel                              2.5.3     
backcall                           0.1.0     
backports.shutil-get-terminal-size 1.0.0     
bcrypt                             3.1.7     
beautifulsoup4                     4.6.0     
bert-tensorflow                    1.0.1     
bitarray                           0.8.1     
bkcharts     

You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [2]:
!pip install tensorflow-hub

You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [3]:
from sklearn.model_selection import train_test_split
import pandas as pd
import tensorflow as tf
import tensorflow_hub as hub
from datetime import datetime




In [4]:
print(tf.__version__)

1.15.2


In addition to the standard libraries we imported above, we'll need to install BERT's python package.

In [5]:
!pip install bert-tensorflow

You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [3]:
import bert
from bert import run_classifier
from bert import optimization
from bert import tokenization





Below, we'll set an output directory location to store our model output and checkpoints. This can be a local directory, in which case you'd set OUTPUT_DIR to the name of the directory you'd like to create. If you're running this code in Google's hosted Colab, the directory won't persist after the Colab session ends.

Alternatively, if you're a GCP user, you can store output in a GCP bucket. To do that, set a directory name in OUTPUT_DIR and the name of the GCP bucket in the BUCKET field.

Set DO_DELETE to rewrite the OUTPUT_DIR if it exists. Otherwise, Tensorflow will load existing model checkpoints from that directory (if they exist).

In [7]:
# Set the output directory for saving model file
# Optionally, set a GCP bucket location

OUTPUT_DIR = './output'#@param {type:"string"}
#@markdown Whether or not to clear/delete the directory and create a new one
DO_DELETE = True #@param {type:"boolean"}
#@markdown Set USE_BUCKET and BUCKET if you want to (optionally) store model output on GCP bucket.
USE_BUCKET = False #@param {type:"boolean"}
BUCKET = 'BUCKET_NAME' #@param {type:"string"}

if USE_BUCKET:
  OUTPUT_DIR = 'gs://{}/{}'.format(BUCKET, OUTPUT_DIR)
  from google.colab import auth
  auth.authenticate_user()

if DO_DELETE:
  try:
    tf.gfile.DeleteRecursively(OUTPUT_DIR)
  except:
    # Doesn't matter if the directory didn't exist
    pass
tf.gfile.MakeDirs(OUTPUT_DIR)
print('***** Model output directory: {} *****'.format(OUTPUT_DIR))


***** Model output directory: ./output *****


#Data

First, let's download the dataset, hosted by Stanford. The code below, which downloads, extracts, and imports the IMDB Large Movie Review Dataset, is borrowed from [this Tensorflow tutorial](https://www.tensorflow.org/hub/tutorials/text_classification_with_tf_hub).

In [8]:
from tensorflow import keras
import os
import re

# Load all files from a directory in a DataFrame.
def load_directory_data(directory):
  data = {}
  data["sentence"] = []
  data["sentiment"] = []
  for file_path in os.listdir(directory):
    with tf.gfile.GFile(os.path.join(directory, file_path), "r") as f:
      data["sentence"].append(f.read())
      data["sentiment"].append(re.match("\d+_(\d+)\.txt", file_path).group(1))
  return pd.DataFrame.from_dict(data)

# Merge positive and negative examples, add a polarity column and shuffle.
def load_dataset(directory):
  pos_df = load_directory_data(os.path.join(directory, "pos"))
  neg_df = load_directory_data(os.path.join(directory, "neg"))
  pos_df["polarity"] = 1
  neg_df["polarity"] = 0
  return pd.concat([pos_df, neg_df]).sample(frac=1).reset_index(drop=True)

# Download and process the dataset files.
def download_and_load_datasets(force_download=False):
  dataset = tf.keras.utils.get_file(
      fname="aclImdb.tar.gz", 
      origin="http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz", 
      extract=True)
  
  train_df = load_dataset(os.path.join(os.path.dirname(dataset), 
                                       "aclImdb", "train"))
  test_df = load_dataset(os.path.join(os.path.dirname(dataset), 
                                      "aclImdb", "test"))
  
  return train_df, test_df


In [9]:
train, test = download_and_load_datasets()

To keep training fast, we'll take a sample of 5000 train and test examples, respectively.

In [10]:
train = train.sample(5000)
test = test.sample(5000)

In [11]:
train.head(5)

Unnamed: 0,sentence,sentiment,polarity
6801,Unlike endemol USA's two other current game sh...,4,0
24977,This film is a stunning piece that will convin...,9,1
254,0.5/10. This movie has absolutely nothing good...,1,0
11145,"Young, handsome, muscular Joe Buck (Jon Voight...",10,1
12417,"The master of cheap erotic horror, Rolfe Kanef...",2,0


In [12]:
test.head(5)

Unnamed: 0,sentence,sentiment,polarity
15617,I had long wanted to watch this romantic drama...,8,1
8469,"A remarkable film, bringing to the surface all...",10,1
1673,Emilio Estevez actually directed a good movie-...,7,1
2986,This is the funniest movie I have ever seen. H...,10,1
20480,A magazine columnist who writes about life on ...,8,1


In [13]:
train.columns

Index(['sentence', 'sentiment', 'polarity'], dtype='object')

For us, our input data is the 'sentence' column and our label is the 'polarity' column (0, 1 for negative and positive, respecitvely)

In [14]:
DATA_COLUMN = 'sentence'
LABEL_COLUMN = 'polarity'
# label_list is the list of labels, i.e. True, False or 0, 1 or 'dog', 'cat'
label_list = [0, 1]

#Data Preprocessing
We'll need to transform our data into a format BERT understands. This involves two steps. First, we create  `InputExample`'s using the constructor provided in the BERT library.

- `text_a` is the text we want to classify, which in this case, is the `Request` field in our Dataframe. 
- `text_b` is used if we're training a model to understand the relationship between sentences (i.e. is `text_b` a translation of `text_a`? Is `text_b` an answer to the question asked by `text_a`?). This doesn't apply to our task, so we can leave `text_b` blank.
- `label` is the label for our example, i.e. True, False

In [15]:
# Use the InputExample class from BERT's run_classifier code to create examples from the data
train_InputExamples = train.apply(lambda x: bert.run_classifier.InputExample(guid=None, # Globally unique ID for bookkeeping, unused in this example
                                                                   text_a = x[DATA_COLUMN], 
                                                                   text_b = None, 
                                                                   label = x[LABEL_COLUMN]), axis = 1)

test_InputExamples = test.apply(lambda x: bert.run_classifier.InputExample(guid=None, 
                                                                   text_a = x[DATA_COLUMN], 
                                                                   text_b = None, 
                                                                   label = x[LABEL_COLUMN]), axis = 1)

Next, we need to preprocess our data so that it matches the data BERT was trained on. For this, we'll need to do a couple of things (but don't worry--this is also included in the Python library):


1. Lowercase our text (if we're using a BERT lowercase model)
2. Tokenize it (i.e. "sally says hi" -> ["sally", "says", "hi"])
3. Break words into WordPieces (i.e. "calling" -> ["call", "##ing"])
4. Map our words to indexes using a vocab file that BERT provides
5. Add special "CLS" and "SEP" tokens (see the [readme](https://github.com/google-research/bert))
6. Append "index" and "segment" tokens to each input (see the [BERT paper](https://arxiv.org/pdf/1810.04805.pdf))

Happily, we don't have to worry about most of these details.




To start, we'll need to load a vocabulary file and lowercasing information directly from the BERT tf hub module:

In [16]:
# This is a path to an uncased (all lowercase) version of BERT
BERT_MODEL_HUB = "https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1"

def create_tokenizer_from_hub_module():
  """Get the vocab file and casing info from the Hub module."""
  with tf.Graph().as_default():
    bert_module = hub.Module(BERT_MODEL_HUB)
    tokenization_info = bert_module(signature="tokenization_info", as_dict=True)
    with tf.Session() as sess:
      vocab_file, do_lower_case = sess.run([tokenization_info["vocab_file"],
                                            tokenization_info["do_lower_case"]])
      
  return bert.tokenization.FullTokenizer(
      vocab_file=vocab_file, do_lower_case=do_lower_case)

tokenizer = create_tokenizer_from_hub_module()

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


Great--we just learned that the BERT model we're using expects lowercase data (that's what stored in tokenization_info["do_lower_case"]) and we also loaded BERT's vocab file. We also created a tokenizer, which breaks words into word pieces:

In [17]:
tokenizer.tokenize("This here's an example of using the BERT tokenizer")

['this',
 'here',
 "'",
 's',
 'an',
 'example',
 'of',
 'using',
 'the',
 'bert',
 'token',
 '##izer']

Using our tokenizer, we'll call `run_classifier.convert_examples_to_features` on our InputExamples to convert them into features BERT understands.

In [18]:
# We'll set sequences to be at most 128 tokens long.
MAX_SEQ_LENGTH = 128
# Convert our train and test features to InputFeatures that BERT understands.
train_features = bert.run_classifier.convert_examples_to_features(train_InputExamples, label_list, MAX_SEQ_LENGTH, tokenizer)
test_features = bert.run_classifier.convert_examples_to_features(test_InputExamples, label_list, MAX_SEQ_LENGTH, tokenizer)







INFO:tensorflow:Writing example 0 of 5000


INFO:tensorflow:Writing example 0 of 5000


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] unlike end ##em ##ol usa ' s two other current game shows ( deal or no deal and 1 vs . 100 ) , the pacing in this show is way too slow for what is happening on the screen . < br / > < br / > don ##d and 1 vs . 100 can get away with slow pacing because the games can change pace - - or end - - at any moment . there is risk involved in every action the player takes , the rewards are wildly variable , and it is difficult for the players to leave with a significant amount of money . suspense is usually put to good use . < br / > < br / [SEP]


INFO:tensorflow:tokens: [CLS] unlike end ##em ##ol usa ' s two other current game shows ( deal or no deal and 1 vs . 100 ) , the pacing in this show is way too slow for what is happening on the screen . < br / > < br / > don ##d and 1 vs . 100 can get away with slow pacing because the games can change pace - - or end - - at any moment . there is risk involved in every action the player takes , the rewards are wildly variable , and it is difficult for the players to leave with a significant amount of money . suspense is usually put to good use . < br / > < br / [SEP]


INFO:tensorflow:input_ids: 101 4406 2203 6633 4747 3915 1005 1055 2048 2060 2783 2208 3065 1006 3066 2030 2053 3066 1998 1015 5443 1012 2531 1007 1010 1996 15732 1999 2023 2265 2003 2126 2205 4030 2005 2054 2003 6230 2006 1996 3898 1012 1026 7987 1013 1028 1026 7987 1013 1028 2123 2094 1998 1015 5443 1012 2531 2064 2131 2185 2007 4030 15732 2138 1996 2399 2064 2689 6393 1011 1011 2030 2203 1011 1011 2012 2151 2617 1012 2045 2003 3891 2920 1999 2296 2895 1996 2447 3138 1010 1996 19054 2024 13544 8023 1010 1998 2009 2003 3697 2005 1996 2867 2000 2681 2007 1037 3278 3815 1997 2769 1012 23873 2003 2788 2404 2000 2204 2224 1012 1026 7987 1013 1028 1026 7987 1013 102


INFO:tensorflow:input_ids: 101 4406 2203 6633 4747 3915 1005 1055 2048 2060 2783 2208 3065 1006 3066 2030 2053 3066 1998 1015 5443 1012 2531 1007 1010 1996 15732 1999 2023 2265 2003 2126 2205 4030 2005 2054 2003 6230 2006 1996 3898 1012 1026 7987 1013 1028 1026 7987 1013 1028 2123 2094 1998 1015 5443 1012 2531 2064 2131 2185 2007 4030 15732 2138 1996 2399 2064 2689 6393 1011 1011 2030 2203 1011 1011 2012 2151 2617 1012 2045 2003 3891 2920 1999 2296 2895 1996 2447 3138 1010 1996 19054 2024 13544 8023 1010 1998 2009 2003 3697 2005 1996 2867 2000 2681 2007 1037 3278 3815 1997 2769 1012 23873 2003 2788 2404 2000 2204 2224 1012 1026 7987 1013 1028 1026 7987 1013 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] this film is a stunning piece that will convince even the most skeptical viewer that gerard de ##par ##die ##u is one of the finest film actors of the last 50 years . his performance shocks , entertain ##s , disgust ##s and charms you while leaving you breathless . this film was shot in the very early days of his film career and is very raw , but still is able to convey the mastery of de ##par ##die ##u . a must - see for any de ##par ##die ##u fan and by far his best early work . [SEP]


INFO:tensorflow:tokens: [CLS] this film is a stunning piece that will convince even the most skeptical viewer that gerard de ##par ##die ##u is one of the finest film actors of the last 50 years . his performance shocks , entertain ##s , disgust ##s and charms you while leaving you breathless . this film was shot in the very early days of his film career and is very raw , but still is able to convey the mastery of de ##par ##die ##u . a must - see for any de ##par ##die ##u fan and by far his best early work . [SEP]


INFO:tensorflow:input_ids: 101 2023 2143 2003 1037 14726 3538 2008 2097 8054 2130 1996 2087 18386 13972 2008 11063 2139 19362 10265 2226 2003 2028 1997 1996 10418 2143 5889 1997 1996 2197 2753 2086 1012 2010 2836 28215 1010 20432 2015 1010 12721 2015 1998 24044 2017 2096 2975 2017 16701 1012 2023 2143 2001 2915 1999 1996 2200 2220 2420 1997 2010 2143 2476 1998 2003 2200 6315 1010 2021 2145 2003 2583 2000 16636 1996 26364 1997 2139 19362 10265 2226 1012 1037 2442 1011 2156 2005 2151 2139 19362 10265 2226 5470 1998 2011 2521 2010 2190 2220 2147 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_ids: 101 2023 2143 2003 1037 14726 3538 2008 2097 8054 2130 1996 2087 18386 13972 2008 11063 2139 19362 10265 2226 2003 2028 1997 1996 10418 2143 5889 1997 1996 2197 2753 2086 1012 2010 2836 28215 1010 20432 2015 1010 12721 2015 1998 24044 2017 2096 2975 2017 16701 1012 2023 2143 2001 2915 1999 1996 2200 2220 2420 1997 2010 2143 2476 1998 2003 2200 6315 1010 2021 2145 2003 2583 2000 16636 1996 26364 1997 2139 19362 10265 2226 1012 1037 2442 1011 2156 2005 2151 2139 19362 10265 2226 5470 1998 2011 2521 2010 2190 2220 2147 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1 (id = 1)


INFO:tensorflow:label: 1 (id = 1)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] 0 . 5 / 10 . this movie has absolutely nothing good about it . the acting is among the worst i have ever seen , what is really amazing is that everyone is awful , not just a few here and there , everyone . the direction is a joke , the low budget is hopeless ##ly evident , the score is awful , i wouldn ' t say the movie was edited , brutally chopped would be a more appropriate phrase . it combines serial killings , voodoo and tar ##ot cards . dumb . dumb . dumb . it is not scary at all , the special effects are hopeless ##ly lame . laugh ##ably bad throughout . the writing was app ##all [SEP]


INFO:tensorflow:tokens: [CLS] 0 . 5 / 10 . this movie has absolutely nothing good about it . the acting is among the worst i have ever seen , what is really amazing is that everyone is awful , not just a few here and there , everyone . the direction is a joke , the low budget is hopeless ##ly evident , the score is awful , i wouldn ' t say the movie was edited , brutally chopped would be a more appropriate phrase . it combines serial killings , voodoo and tar ##ot cards . dumb . dumb . dumb . it is not scary at all , the special effects are hopeless ##ly lame . laugh ##ably bad throughout . the writing was app ##all [SEP]


INFO:tensorflow:input_ids: 101 1014 1012 1019 1013 2184 1012 2023 3185 2038 7078 2498 2204 2055 2009 1012 1996 3772 2003 2426 1996 5409 1045 2031 2412 2464 1010 2054 2003 2428 6429 2003 2008 3071 2003 9643 1010 2025 2074 1037 2261 2182 1998 2045 1010 3071 1012 1996 3257 2003 1037 8257 1010 1996 2659 5166 2003 20625 2135 10358 1010 1996 3556 2003 9643 1010 1045 2876 1005 1056 2360 1996 3185 2001 5493 1010 23197 24881 2052 2022 1037 2062 6413 7655 1012 2009 13585 7642 16431 1010 21768 1998 16985 4140 5329 1012 12873 1012 12873 1012 12873 1012 2009 2003 2025 12459 2012 2035 1010 1996 2569 3896 2024 20625 2135 20342 1012 4756 8231 2919 2802 1012 1996 3015 2001 10439 8095 102


INFO:tensorflow:input_ids: 101 1014 1012 1019 1013 2184 1012 2023 3185 2038 7078 2498 2204 2055 2009 1012 1996 3772 2003 2426 1996 5409 1045 2031 2412 2464 1010 2054 2003 2428 6429 2003 2008 3071 2003 9643 1010 2025 2074 1037 2261 2182 1998 2045 1010 3071 1012 1996 3257 2003 1037 8257 1010 1996 2659 5166 2003 20625 2135 10358 1010 1996 3556 2003 9643 1010 1045 2876 1005 1056 2360 1996 3185 2001 5493 1010 23197 24881 2052 2022 1037 2062 6413 7655 1012 2009 13585 7642 16431 1010 21768 1998 16985 4140 5329 1012 12873 1012 12873 1012 12873 1012 2009 2003 2025 12459 2012 2035 1010 1996 2569 3896 2024 20625 2135 20342 1012 4756 8231 2919 2802 1012 1996 3015 2001 10439 8095 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] young , handsome , muscular joe buck ( jon vo ##ight ) moves from texas to new york thinking he ' ll make a living by being a stud . he gets there and finds out quickly that it isn ' t going to be easy - - he goes through one de ##grad ##ing experience after another . at the end of his rope he hooks up with crippled , sl ##ea ##zy rats ##o ri ##zzo ( dustin hoffman ) . together they try to survive and get out of the city and move to florida . but will they make it ? < br / > < br / > very dark , disturbing yet fascinating movie . director john sc ##hel ##sing [SEP]


INFO:tensorflow:tokens: [CLS] young , handsome , muscular joe buck ( jon vo ##ight ) moves from texas to new york thinking he ' ll make a living by being a stud . he gets there and finds out quickly that it isn ' t going to be easy - - he goes through one de ##grad ##ing experience after another . at the end of his rope he hooks up with crippled , sl ##ea ##zy rats ##o ri ##zzo ( dustin hoffman ) . together they try to survive and get out of the city and move to florida . but will they make it ? < br / > < br / > very dark , disturbing yet fascinating movie . director john sc ##hel ##sing [SEP]


INFO:tensorflow:input_ids: 101 2402 1010 8502 1010 13472 3533 10131 1006 6285 29536 18743 1007 5829 2013 3146 2000 2047 2259 3241 2002 1005 2222 2191 1037 2542 2011 2108 1037 16054 1012 2002 4152 2045 1998 4858 2041 2855 2008 2009 3475 1005 1056 2183 2000 2022 3733 1011 1011 2002 3632 2083 2028 2139 16307 2075 3325 2044 2178 1012 2012 1996 2203 1997 2010 8164 2002 18008 2039 2007 24433 1010 22889 5243 9096 11432 2080 15544 12036 1006 24337 15107 1007 1012 2362 2027 3046 2000 5788 1998 2131 2041 1997 1996 2103 1998 2693 2000 3516 1012 2021 2097 2027 2191 2009 1029 1026 7987 1013 1028 1026 7987 1013 1028 2200 2601 1010 14888 2664 17160 3185 1012 2472 2198 8040 16001 7741 102


INFO:tensorflow:input_ids: 101 2402 1010 8502 1010 13472 3533 10131 1006 6285 29536 18743 1007 5829 2013 3146 2000 2047 2259 3241 2002 1005 2222 2191 1037 2542 2011 2108 1037 16054 1012 2002 4152 2045 1998 4858 2041 2855 2008 2009 3475 1005 1056 2183 2000 2022 3733 1011 1011 2002 3632 2083 2028 2139 16307 2075 3325 2044 2178 1012 2012 1996 2203 1997 2010 8164 2002 18008 2039 2007 24433 1010 22889 5243 9096 11432 2080 15544 12036 1006 24337 15107 1007 1012 2362 2027 3046 2000 5788 1998 2131 2041 1997 1996 2103 1998 2693 2000 3516 1012 2021 2097 2027 2191 2009 1029 1026 7987 1013 1028 1026 7987 1013 1028 2200 2601 1010 14888 2664 17160 3185 1012 2472 2198 8040 16001 7741 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1 (id = 1)


INFO:tensorflow:label: 1 (id = 1)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] the master of cheap erotic horror , rolf ##e kane ##fs ##ky , finally makes a movie that doesn ' t go straight to the playboy channel . " the ha ##zing " borrow ##s heavily from everything that came before it from nightmare on elm street to evil dead , but still manages to do it with enough humor to make it watch ##able . . . just barely . the characters are cardboard , the dialogue is wooden , the story is paper - thin and the actors couldn ' t act their way out of a grocery bag . put that all together and you have a pulp ##y ball of mu ##lch for a movie . sometimes , when i ' m [SEP]


INFO:tensorflow:tokens: [CLS] the master of cheap erotic horror , rolf ##e kane ##fs ##ky , finally makes a movie that doesn ' t go straight to the playboy channel . " the ha ##zing " borrow ##s heavily from everything that came before it from nightmare on elm street to evil dead , but still manages to do it with enough humor to make it watch ##able . . . just barely . the characters are cardboard , the dialogue is wooden , the story is paper - thin and the actors couldn ' t act their way out of a grocery bag . put that all together and you have a pulp ##y ball of mu ##lch for a movie . sometimes , when i ' m [SEP]


INFO:tensorflow:input_ids: 101 1996 3040 1997 10036 14253 5469 1010 23381 2063 8472 10343 4801 1010 2633 3084 1037 3185 2008 2987 1005 1056 2175 3442 2000 1996 18286 3149 1012 1000 1996 5292 6774 1000 17781 2015 4600 2013 2673 2008 2234 2077 2009 2013 10103 2006 17709 2395 2000 4763 2757 1010 2021 2145 9020 2000 2079 2009 2007 2438 8562 2000 2191 2009 3422 3085 1012 1012 1012 2074 4510 1012 1996 3494 2024 19747 1010 1996 7982 2003 4799 1010 1996 2466 2003 3259 1011 4857 1998 1996 5889 2481 1005 1056 2552 2037 2126 2041 1997 1037 13025 4524 1012 2404 2008 2035 2362 1998 2017 2031 1037 16016 2100 3608 1997 14163 29358 2005 1037 3185 1012 2823 1010 2043 1045 1005 1049 102


INFO:tensorflow:input_ids: 101 1996 3040 1997 10036 14253 5469 1010 23381 2063 8472 10343 4801 1010 2633 3084 1037 3185 2008 2987 1005 1056 2175 3442 2000 1996 18286 3149 1012 1000 1996 5292 6774 1000 17781 2015 4600 2013 2673 2008 2234 2077 2009 2013 10103 2006 17709 2395 2000 4763 2757 1010 2021 2145 9020 2000 2079 2009 2007 2438 8562 2000 2191 2009 3422 3085 1012 1012 1012 2074 4510 1012 1996 3494 2024 19747 1010 1996 7982 2003 4799 1010 1996 2466 2003 3259 1011 4857 1998 1996 5889 2481 1005 1056 2552 2037 2126 2041 1997 1037 13025 4524 1012 2404 2008 2035 2362 1998 2017 2031 1037 16016 2100 3608 1997 14163 29358 2005 1037 3185 1012 2823 1010 2043 1045 1005 1049 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:Writing example 0 of 5000


INFO:tensorflow:Writing example 0 of 5000


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] i had long wanted to watch this romantic drama ( with a wwii setting ) and , now that i have , all i can say is that it ' s a ve ##rita ##ble masterpiece of russian cinema ! < br / > < br / > soviet films are known for their over ##ze ##alo ##us prop ##aga ##ndi ##st approach but , thankfully , this one ' s free of such emphasis - with the interest firmly on the central tragic romance between a promising artist and a viva ##cious girl , doomed by the outbreak of war for which he gladly volunteers but from which he ' ll never return . the girl ( a remarkable performance from ta ##tya ##na sam [SEP]


INFO:tensorflow:tokens: [CLS] i had long wanted to watch this romantic drama ( with a wwii setting ) and , now that i have , all i can say is that it ' s a ve ##rita ##ble masterpiece of russian cinema ! < br / > < br / > soviet films are known for their over ##ze ##alo ##us prop ##aga ##ndi ##st approach but , thankfully , this one ' s free of such emphasis - with the interest firmly on the central tragic romance between a promising artist and a viva ##cious girl , doomed by the outbreak of war for which he gladly volunteers but from which he ' ll never return . the girl ( a remarkable performance from ta ##tya ##na sam [SEP]


INFO:tensorflow:input_ids: 101 1045 2018 2146 2359 2000 3422 2023 6298 3689 1006 2007 1037 25755 4292 1007 1998 1010 2085 2008 1045 2031 1010 2035 1045 2064 2360 2003 2008 2009 1005 1055 1037 2310 17728 3468 17743 1997 2845 5988 999 1026 7987 1013 1028 1026 7987 1013 1028 3354 3152 2024 2124 2005 2037 2058 4371 23067 2271 17678 16098 16089 3367 3921 2021 1010 16047 1010 2023 2028 1005 1055 2489 1997 2107 7902 1011 2007 1996 3037 7933 2006 1996 2430 13800 7472 2090 1037 10015 3063 1998 1037 20022 18436 2611 1010 20076 2011 1996 8293 1997 2162 2005 2029 2002 24986 7314 2021 2013 2029 2002 1005 2222 2196 2709 1012 1996 2611 1006 1037 9487 2836 2013 11937 21426 2532 3520 102


INFO:tensorflow:input_ids: 101 1045 2018 2146 2359 2000 3422 2023 6298 3689 1006 2007 1037 25755 4292 1007 1998 1010 2085 2008 1045 2031 1010 2035 1045 2064 2360 2003 2008 2009 1005 1055 1037 2310 17728 3468 17743 1997 2845 5988 999 1026 7987 1013 1028 1026 7987 1013 1028 3354 3152 2024 2124 2005 2037 2058 4371 23067 2271 17678 16098 16089 3367 3921 2021 1010 16047 1010 2023 2028 1005 1055 2489 1997 2107 7902 1011 2007 1996 3037 7933 2006 1996 2430 13800 7472 2090 1037 10015 3063 1998 1037 20022 18436 2611 1010 20076 2011 1996 8293 1997 2162 2005 2029 2002 24986 7314 2021 2013 2029 2002 1005 2222 2196 2709 1012 1996 2611 1006 1037 9487 2836 2013 11937 21426 2532 3520 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1 (id = 1)


INFO:tensorflow:label: 1 (id = 1)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] a remarkable film , bringing to the surface all sorts of feelings i had when i was much , much younger . i loved it , and the elton john music . i remember seeing in in the movies when i was a kid , and for some reason ( limited release ? ) i ' ve never known anyone else who saw this film when it was released . < br / > < br / > the dreams it inspired in me from decades ago have never left me , and seeing the film again recently brought it all rushing back , i confess , however , that my kids ( in their 20 ' s ) have not experienced a similar emotional rush [SEP]


INFO:tensorflow:tokens: [CLS] a remarkable film , bringing to the surface all sorts of feelings i had when i was much , much younger . i loved it , and the elton john music . i remember seeing in in the movies when i was a kid , and for some reason ( limited release ? ) i ' ve never known anyone else who saw this film when it was released . < br / > < br / > the dreams it inspired in me from decades ago have never left me , and seeing the film again recently brought it all rushing back , i confess , however , that my kids ( in their 20 ' s ) have not experienced a similar emotional rush [SEP]


INFO:tensorflow:input_ids: 101 1037 9487 2143 1010 5026 2000 1996 3302 2035 11901 1997 5346 1045 2018 2043 1045 2001 2172 1010 2172 3920 1012 1045 3866 2009 1010 1998 1996 19127 2198 2189 1012 1045 3342 3773 1999 1999 1996 5691 2043 1045 2001 1037 4845 1010 1998 2005 2070 3114 1006 3132 2713 1029 1007 1045 1005 2310 2196 2124 3087 2842 2040 2387 2023 2143 2043 2009 2001 2207 1012 1026 7987 1013 1028 1026 7987 1013 1028 1996 5544 2009 4427 1999 2033 2013 5109 3283 2031 2196 2187 2033 1010 1998 3773 1996 2143 2153 3728 2716 2009 2035 8375 2067 1010 1045 18766 1010 2174 1010 2008 2026 4268 1006 1999 2037 2322 1005 1055 1007 2031 2025 5281 1037 2714 6832 5481 102


INFO:tensorflow:input_ids: 101 1037 9487 2143 1010 5026 2000 1996 3302 2035 11901 1997 5346 1045 2018 2043 1045 2001 2172 1010 2172 3920 1012 1045 3866 2009 1010 1998 1996 19127 2198 2189 1012 1045 3342 3773 1999 1999 1996 5691 2043 1045 2001 1037 4845 1010 1998 2005 2070 3114 1006 3132 2713 1029 1007 1045 1005 2310 2196 2124 3087 2842 2040 2387 2023 2143 2043 2009 2001 2207 1012 1026 7987 1013 1028 1026 7987 1013 1028 1996 5544 2009 4427 1999 2033 2013 5109 3283 2031 2196 2187 2033 1010 1998 3773 1996 2143 2153 3728 2716 2009 2035 8375 2067 1010 1045 18766 1010 2174 1010 2008 2026 4268 1006 1999 2037 2322 1005 1055 1007 2031 2025 5281 1037 2714 6832 5481 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1 (id = 1)


INFO:tensorflow:label: 1 (id = 1)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] emilio este ##vez actually directed a good movie - - who would ##a thought ? i sat through two previous films este ##vez directed - - " wisdom " ( with then girlfriend demi moore ) and " men at work " ( with brother charlie sheen ) . they are lou ##sy films - - - badly acted , directed , stupid and offensive . este ##vez is a good actor but lou ##sy as a director . i turned this on in pure curious ##ity - - it has a great cast and i had nothing else to do . damned if it didn ' t pull me in . < br / > < br / > it concerns este ##vez coming home [SEP]


INFO:tensorflow:tokens: [CLS] emilio este ##vez actually directed a good movie - - who would ##a thought ? i sat through two previous films este ##vez directed - - " wisdom " ( with then girlfriend demi moore ) and " men at work " ( with brother charlie sheen ) . they are lou ##sy films - - - badly acted , directed , stupid and offensive . este ##vez is a good actor but lou ##sy as a director . i turned this on in pure curious ##ity - - it has a great cast and i had nothing else to do . damned if it didn ' t pull me in . < br / > < br / > it concerns este ##vez coming home [SEP]


INFO:tensorflow:input_ids: 101 18644 28517 26132 2941 2856 1037 2204 3185 1011 1011 2040 2052 2050 2245 1029 1045 2938 2083 2048 3025 3152 28517 26132 2856 1011 1011 1000 9866 1000 1006 2007 2059 6513 27668 5405 1007 1998 1000 2273 2012 2147 1000 1006 2007 2567 4918 20682 1007 1012 2027 2024 10223 6508 3152 1011 1011 1011 6649 6051 1010 2856 1010 5236 1998 5805 1012 28517 26132 2003 1037 2204 3364 2021 10223 6508 2004 1037 2472 1012 1045 2357 2023 2006 1999 5760 8025 3012 1011 1011 2009 2038 1037 2307 3459 1998 1045 2018 2498 2842 2000 2079 1012 9636 2065 2009 2134 1005 1056 4139 2033 1999 1012 1026 7987 1013 1028 1026 7987 1013 1028 2009 5936 28517 26132 2746 2188 102


INFO:tensorflow:input_ids: 101 18644 28517 26132 2941 2856 1037 2204 3185 1011 1011 2040 2052 2050 2245 1029 1045 2938 2083 2048 3025 3152 28517 26132 2856 1011 1011 1000 9866 1000 1006 2007 2059 6513 27668 5405 1007 1998 1000 2273 2012 2147 1000 1006 2007 2567 4918 20682 1007 1012 2027 2024 10223 6508 3152 1011 1011 1011 6649 6051 1010 2856 1010 5236 1998 5805 1012 28517 26132 2003 1037 2204 3364 2021 10223 6508 2004 1037 2472 1012 1045 2357 2023 2006 1999 5760 8025 3012 1011 1011 2009 2038 1037 2307 3459 1998 1045 2018 2498 2842 2000 2079 1012 9636 2065 2009 2134 1005 1056 4139 2033 1999 1012 1026 7987 1013 1028 1026 7987 1013 1028 2009 5936 28517 26132 2746 2188 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1 (id = 1)


INFO:tensorflow:label: 1 (id = 1)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] this is the fun ##nies ##t movie i have ever seen . however , i have laughed harder at plenty of movies . this is because best in show ' s brilliance lies not in slap ##stick or one - liner ##s , but in sophisticated and layered verbal wit . the improvised dialogue is is so quick that you end up laughing not at each individual joke , but only until after several jokes build on one another , each di ##sar ##ming your senses until the jokes climax and you can ' t help letting loose . < br / > < br / > it ' s a well - shot film , but what makes it extraordinary is the acting . i [SEP]


INFO:tensorflow:tokens: [CLS] this is the fun ##nies ##t movie i have ever seen . however , i have laughed harder at plenty of movies . this is because best in show ' s brilliance lies not in slap ##stick or one - liner ##s , but in sophisticated and layered verbal wit . the improvised dialogue is is so quick that you end up laughing not at each individual joke , but only until after several jokes build on one another , each di ##sar ##ming your senses until the jokes climax and you can ' t help letting loose . < br / > < br / > it ' s a well - shot film , but what makes it extraordinary is the acting . i [SEP]


INFO:tensorflow:input_ids: 101 2023 2003 1996 4569 15580 2102 3185 1045 2031 2412 2464 1012 2174 1010 1045 2031 4191 6211 2012 7564 1997 5691 1012 2023 2003 2138 2190 1999 2265 1005 1055 28850 3658 2025 1999 14308 21354 2030 2028 1011 11197 2015 1010 2021 1999 12138 1998 21323 12064 15966 1012 1996 19641 7982 2003 2003 2061 4248 2008 2017 2203 2039 5870 2025 2012 2169 3265 8257 1010 2021 2069 2127 2044 2195 13198 3857 2006 2028 2178 1010 2169 4487 10286 6562 2115 9456 2127 1996 13198 14463 1998 2017 2064 1005 1056 2393 5599 6065 1012 1026 7987 1013 1028 1026 7987 1013 1028 2009 1005 1055 1037 2092 1011 2915 2143 1010 2021 2054 3084 2009 9313 2003 1996 3772 1012 1045 102


INFO:tensorflow:input_ids: 101 2023 2003 1996 4569 15580 2102 3185 1045 2031 2412 2464 1012 2174 1010 1045 2031 4191 6211 2012 7564 1997 5691 1012 2023 2003 2138 2190 1999 2265 1005 1055 28850 3658 2025 1999 14308 21354 2030 2028 1011 11197 2015 1010 2021 1999 12138 1998 21323 12064 15966 1012 1996 19641 7982 2003 2003 2061 4248 2008 2017 2203 2039 5870 2025 2012 2169 3265 8257 1010 2021 2069 2127 2044 2195 13198 3857 2006 2028 2178 1010 2169 4487 10286 6562 2115 9456 2127 1996 13198 14463 1998 2017 2064 1005 1056 2393 5599 6065 1012 1026 7987 1013 1028 1026 7987 1013 1028 2009 1005 1055 1037 2092 1011 2915 2143 1010 2021 2054 3084 2009 9313 2003 1996 3772 1012 1045 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1 (id = 1)


INFO:tensorflow:label: 1 (id = 1)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] a magazine columnist who writes about life on her farm house when in fact she lives in a ny apartment must come up with a plan when she learns that her publisher and a war hero will spend christmas with her . after a slow start , it turns into an entertaining little screw ##ball comedy , thanks to a fine cast . in a big departure from her previous role as a femme fatal ##e in " double ind ##em ##nity , " stan ##wy ##ck displays a nice comedic flair . morgan is smooth as the af ##fa ##ble war hero while greens ##tree ##t is well cast as the publisher . however , sa ##kal ##l steals the film as a chef trying [SEP]


INFO:tensorflow:tokens: [CLS] a magazine columnist who writes about life on her farm house when in fact she lives in a ny apartment must come up with a plan when she learns that her publisher and a war hero will spend christmas with her . after a slow start , it turns into an entertaining little screw ##ball comedy , thanks to a fine cast . in a big departure from her previous role as a femme fatal ##e in " double ind ##em ##nity , " stan ##wy ##ck displays a nice comedic flair . morgan is smooth as the af ##fa ##ble war hero while greens ##tree ##t is well cast as the publisher . however , sa ##kal ##l steals the film as a chef trying [SEP]


INFO:tensorflow:input_ids: 101 1037 2932 13317 2040 7009 2055 2166 2006 2014 3888 2160 2043 1999 2755 2016 3268 1999 1037 6396 4545 2442 2272 2039 2007 1037 2933 2043 2016 10229 2008 2014 6674 1998 1037 2162 5394 2097 5247 4234 2007 2014 1012 2044 1037 4030 2707 1010 2009 4332 2046 2019 14036 2210 11224 7384 4038 1010 4283 2000 1037 2986 3459 1012 1999 1037 2502 6712 2013 2014 3025 2535 2004 1037 26893 10611 2063 1999 1000 3313 27427 6633 22758 1010 1000 9761 18418 3600 8834 1037 3835 21699 22012 1012 5253 2003 5744 2004 1996 21358 7011 3468 2162 5394 2096 15505 13334 2102 2003 2092 3459 2004 1996 6674 1012 2174 1010 7842 12902 2140 15539 1996 2143 2004 1037 10026 2667 102


INFO:tensorflow:input_ids: 101 1037 2932 13317 2040 7009 2055 2166 2006 2014 3888 2160 2043 1999 2755 2016 3268 1999 1037 6396 4545 2442 2272 2039 2007 1037 2933 2043 2016 10229 2008 2014 6674 1998 1037 2162 5394 2097 5247 4234 2007 2014 1012 2044 1037 4030 2707 1010 2009 4332 2046 2019 14036 2210 11224 7384 4038 1010 4283 2000 1037 2986 3459 1012 1999 1037 2502 6712 2013 2014 3025 2535 2004 1037 26893 10611 2063 1999 1000 3313 27427 6633 22758 1010 1000 9761 18418 3600 8834 1037 3835 21699 22012 1012 5253 2003 5744 2004 1996 21358 7011 3468 2162 5394 2096 15505 13334 2102 2003 2092 3459 2004 1996 6674 1012 2174 1010 7842 12902 2140 15539 1996 2143 2004 1037 10026 2667 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1 (id = 1)


INFO:tensorflow:label: 1 (id = 1)


#Creating a model

Now that we've prepared our data, let's focus on building a model. `create_model` does just this below. First, it loads the BERT tf hub module again (this time to extract the computation graph). Next, it creates a single new layer that will be trained to adapt BERT to our sentiment task (i.e. classifying whether a movie review is positive or negative). This strategy of using a mostly trained model is called [fine-tuning](http://wiki.fast.ai/index.php/Fine_tuning).

In [19]:
def create_model(is_predicting, input_ids, input_mask, segment_ids, labels,
                 num_labels):
  """Creates a classification model."""

  bert_module = hub.Module(
      BERT_MODEL_HUB,
      trainable=True)
  bert_inputs = dict(
      input_ids=input_ids,
      input_mask=input_mask,
      segment_ids=segment_ids)
  bert_outputs = bert_module(
      inputs=bert_inputs,
      signature="tokens",
      as_dict=True)

  # Use "pooled_output" for classification tasks on an entire sentence.
  # Use "sequence_outputs" for token-level output.
  output_layer = bert_outputs["pooled_output"]

  hidden_size = output_layer.shape[-1].value

  # Create our own layer to tune for politeness data.
  output_weights = tf.get_variable(
      "output_weights", [num_labels, hidden_size],
      initializer=tf.truncated_normal_initializer(stddev=0.02))

  output_bias = tf.get_variable(
      "output_bias", [num_labels], initializer=tf.zeros_initializer())

  with tf.variable_scope("loss"):

    # Dropout helps prevent overfitting
    output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)

    logits = tf.matmul(output_layer, output_weights, transpose_b=True)
    logits = tf.nn.bias_add(logits, output_bias)
    log_probs = tf.nn.log_softmax(logits, axis=-1)

    # Convert labels into one-hot encoding
    one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)

    predicted_labels = tf.squeeze(tf.argmax(log_probs, axis=-1, output_type=tf.int32))
    # If we're predicting, we want predicted labels and the probabiltiies.
    if is_predicting:
      return (predicted_labels, log_probs)

    # If we're train/eval, compute loss between predicted and actual label
    per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)
    loss = tf.reduce_mean(per_example_loss)
    return (loss, predicted_labels, log_probs)


Next we'll wrap our model function in a `model_fn_builder` function that adapts our model to work for training, evaluation, and prediction.

In [20]:
# model_fn_builder actually creates our model function
# using the passed parameters for num_labels, learning_rate, etc.
def model_fn_builder(num_labels, learning_rate, num_train_steps,
                     num_warmup_steps):
  """Returns `model_fn` closure for TPUEstimator."""
  def model_fn(features, labels, mode, params):  # pylint: disable=unused-argument
    """The `model_fn` for TPUEstimator."""

    input_ids = features["input_ids"]
    input_mask = features["input_mask"]
    segment_ids = features["segment_ids"]
    label_ids = features["label_ids"]

    is_predicting = (mode == tf.estimator.ModeKeys.PREDICT)
    
    # TRAIN and EVAL
    if not is_predicting:

      (loss, predicted_labels, log_probs) = create_model(
        is_predicting, input_ids, input_mask, segment_ids, label_ids, num_labels)

      train_op = bert.optimization.create_optimizer(
          loss, learning_rate, num_train_steps, num_warmup_steps, use_tpu=False)

      # Calculate evaluation metrics. 
      def metric_fn(label_ids, predicted_labels):
        accuracy = tf.metrics.accuracy(label_ids, predicted_labels)
        f1_score = tf.contrib.metrics.f1_score(
            label_ids,
            predicted_labels)
        auc = tf.metrics.auc(
            label_ids,
            predicted_labels)
        recall = tf.metrics.recall(
            label_ids,
            predicted_labels)
        precision = tf.metrics.precision(
            label_ids,
            predicted_labels) 
        true_pos = tf.metrics.true_positives(
            label_ids,
            predicted_labels)
        true_neg = tf.metrics.true_negatives(
            label_ids,
            predicted_labels)   
        false_pos = tf.metrics.false_positives(
            label_ids,
            predicted_labels)  
        false_neg = tf.metrics.false_negatives(
            label_ids,
            predicted_labels)
        return {
            "eval_accuracy": accuracy,
            "f1_score": f1_score,
            "auc": auc,
            "precision": precision,
            "recall": recall,
            "true_positives": true_pos,
            "true_negatives": true_neg,
            "false_positives": false_pos,
            "false_negatives": false_neg
        }

      eval_metrics = metric_fn(label_ids, predicted_labels)

      if mode == tf.estimator.ModeKeys.TRAIN:
        return tf.estimator.EstimatorSpec(mode=mode,
          loss=loss,
          train_op=train_op)
      else:
          return tf.estimator.EstimatorSpec(mode=mode,
            loss=loss,
            eval_metric_ops=eval_metrics)
    else:
      (predicted_labels, log_probs) = create_model(
        is_predicting, input_ids, input_mask, segment_ids, label_ids, num_labels)

      predictions = {
          'probabilities': log_probs,
          'labels': predicted_labels
      }
      return tf.estimator.EstimatorSpec(mode, predictions=predictions)

  # Return the actual model function in the closure
  return model_fn


In [21]:
# Compute train and warmup steps from batch size
# These hyperparameters are copied from this colab notebook (https://colab.sandbox.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_finetuning_with_cloud_tpus.ipynb)
BATCH_SIZE = 32
LEARNING_RATE = 2e-5
NUM_TRAIN_EPOCHS = 3.0
# Warmup is a period of time where hte learning rate 
# is small and gradually increases--usually helps training.
WARMUP_PROPORTION = 0.1
# Model configs
SAVE_CHECKPOINTS_STEPS = 500
SAVE_SUMMARY_STEPS = 100

In [22]:
# Compute # train and warmup steps from batch size
num_train_steps = int(len(train_features) / BATCH_SIZE * NUM_TRAIN_EPOCHS)
num_warmup_steps = int(num_train_steps * WARMUP_PROPORTION)

In [23]:
# Specify outpit directory and number of checkpoint steps to save
run_config = tf.estimator.RunConfig(
    model_dir=OUTPUT_DIR,
    save_summary_steps=SAVE_SUMMARY_STEPS,
    save_checkpoints_steps=SAVE_CHECKPOINTS_STEPS)

In [24]:
model_fn = model_fn_builder(
    num_labels=len(label_list),
    learning_rate=LEARNING_RATE,
    num_train_steps=num_train_steps,
    num_warmup_steps=num_warmup_steps)

estimator = tf.estimator.Estimator(
    model_fn=model_fn,
    config=run_config,
    params={"batch_size": BATCH_SIZE})

INFO:tensorflow:Using config: {'_model_dir': './output', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 500, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f6889dcbe10>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


INFO:tensorflow:Using config: {'_model_dir': './output', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 500, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f6889dcbe10>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


Next we create an input builder function that takes our training feature set (`train_features`) and produces a generator. This is a pretty standard design pattern for working with Tensorflow [Estimators](https://www.tensorflow.org/guide/estimators).

In [25]:
# Create an input function for training. drop_remainder = True for using TPUs.
train_input_fn = bert.run_classifier.input_fn_builder(
    features=train_features,
    seq_length=MAX_SEQ_LENGTH,
    is_training=True,
    drop_remainder=False)

Now we train our model! For me, using a Colab notebook running on Google's GPUs, my training time was about 14 minutes.

In [None]:
print(f'Beginning Training!')
current_time = datetime.now()
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
print("Training took time ", datetime.now() - current_time)

Beginning Training!
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.


Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.




















Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



Instructions for updating:
Deprecated in favor of operator or tf.math.divide.


Instructions for updating:
Deprecated in favor of operator or tf.math.divide.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Saving checkpoints for 0 into ./output/model.ckpt.


INFO:tensorflow:Saving checkpoints for 0 into ./output/model.ckpt.


INFO:tensorflow:loss = 0.7039824, step = 1


INFO:tensorflow:loss = 0.7039824, step = 1


Now let's use our test data to see how well our model did:

In [2]:
test_input_fn = run_classifier.input_fn_builder(
    features=test_features,
    seq_length=MAX_SEQ_LENGTH,
    is_training=False,
    drop_remainder=False)

NameError: name 'bert' is not defined

In [None]:
estimator.evaluate(input_fn=test_input_fn, steps=None)

Now let's write code to make predictions on new sentences:

In [None]:
def getPrediction(in_sentences):
    labels = ["Negative", "Positive"]
    input_examples = [run_classifier.InputExample(guid="", text_a = x, text_b = None, label = 0) for x in in_sentences] # here, "" is just a dummy label
    input_features = run_classifier.convert_examples_to_features(input_examples, label_list, MAX_SEQ_LENGTH, tokenizer)
    predict_input_fn = run_classifier.input_fn_builder(features=input_features, seq_length=MAX_SEQ_LENGTH, is_training=False, drop_remainder=False)
    predictions = estimator.predict(predict_input_fn)
    return [(sentence, prediction['probabilities'], labels[prediction['labels']]) for sentence, prediction in zip(in_sentences, predictions)]

In [None]:
pred_sentences = [
  "That movie was absolutely awful",
  "The acting was a bit lacking",
  "The film was creative and surprising",
  "Absolutely fantastic!"
]

In [None]:
predictions = getPrediction(pred_sentences)

Voila! We have a sentiment classifier!

In [None]:
predictions