In [0]:
# Copyright 2019 Google Inc.

# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at

#     http://www.apache.org/licenses/LICENSE-2.0

# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

If you’ve been following Natural Language Processing over the past year, you’ve probably heard of BERT: Bidirectional Encoder Representations from Transformers. It’s a neural network architecture designed by Google researchers that’s totally transformed what’s state-of-the-art for NLP tasks, like text classification, translation, summarization, and question answering.

Now that BERT's been added to [TF Hub](https://www.tensorflow.org/hub) as a loadable module, it's easy(ish) to add into existing Tensorflow text pipelines. In an existing pipeline, BERT can replace text embedding layers like ELMO and GloVE. Alternatively, [finetuning](http://wiki.fast.ai/index.php/Fine_tuning) BERT can provide both an accuracy boost and faster training time in many cases.

Here, we'll train a model to predict whether an IMDB movie review is positive or negative using BERT in Tensorflow with tf hub. Some code was adapted from [this colab notebook](https://colab.sandbox.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_finetuning_with_cloud_tpus.ipynb). Let's get started!

In [0]:
from sklearn.model_selection import train_test_split
import pandas as pd
import tensorflow as tf
import tensorflow_hub as hub
from datetime import datetime

In addition to the standard libraries we imported above, we'll need to install BERT's python package.

In [0]:
!pip install bert-tensorflow

Collecting bert-tensorflow
[?25l  Downloading https://files.pythonhosted.org/packages/a6/66/7eb4e8b6ea35b7cc54c322c816f976167a43019750279a8473d355800a93/bert_tensorflow-1.0.1-py2.py3-none-any.whl (67kB)
[K     |████▉                           | 10kB 23.4MB/s eta 0:00:01[K     |█████████▊                      | 20kB 5.8MB/s eta 0:00:01[K     |██████████████▋                 | 30kB 8.4MB/s eta 0:00:01[K     |███████████████████▍            | 40kB 5.5MB/s eta 0:00:01[K     |████████████████████████▎       | 51kB 6.7MB/s eta 0:00:01[K     |█████████████████████████████▏  | 61kB 8.0MB/s eta 0:00:01[K     |████████████████████████████████| 71kB 8.3MB/s 
Installing collected packages: bert-tensorflow
Successfully installed bert-tensorflow-1.0.1


In [0]:
import bert
from bert import run_classifier
from bert import optimization
from bert import tokenization




Below, we'll set an output directory location to store our model output and checkpoints. This can be a local directory, in which case you'd set OUTPUT_DIR to the name of the directory you'd like to create. If you're running this code in Google's hosted Colab, the directory won't persist after the Colab session ends.

Alternatively, if you're a GCP user, you can store output in a GCP bucket. To do that, set a directory name in OUTPUT_DIR and the name of the GCP bucket in the BUCKET field.

Set DO_DELETE to rewrite the OUTPUT_DIR if it exists. Otherwise, Tensorflow will load existing model checkpoints from that directory (if they exist).

In [0]:
# Set the output directory for saving model file
# Optionally, set a GCP bucket location

OUTPUT_DIR = 'EMAIL_MODEL'#@param {type:"string"}
#@markdown Whether or not to clear/delete the directory and create a new one
DO_DELETE = False #@param {type:"boolean"}
#@markdown Set USE_BUCKET and BUCKET if you want to (optionally) store model output on GCP bucket.
USE_BUCKET = True #@param {type:"boolean"}
BUCKET = 'email_models' #@param {type:"string"}

if USE_BUCKET:
  OUTPUT_DIR = 'gs://{}/{}'.format(BUCKET, OUTPUT_DIR)
  from google.colab import auth
  auth.authenticate_user()

if DO_DELETE:
  try:
    tf.gfile.DeleteRecursively(OUTPUT_DIR)
  except:
    # Doesn't matter if the directory didn't exist
    pass
tf.gfile.MakeDirs(OUTPUT_DIR)
print('***** Model output directory: {} *****'.format(OUTPUT_DIR))


***** Model output directory: gs://email_models/EMAIL_MODEL *****


#Data

First, let's download the dataset, hosted by Stanford. The code below, which downloads, extracts, and imports the IMDB Large Movie Review Dataset, is borrowed from [this Tensorflow tutorial](https://www.tensorflow.org/hub/tutorials/text_classification_with_tf_hub).

In [0]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
import re
from bs4 import BeautifulSoup
TAG_RE = re.compile(r'<[^>]+>')

def remove_tags(text):
    return TAG_RE.sub('', text)

def strip_html(text):
    soup = BeautifulSoup(text, "html.parser")
    return soup.get_text()

def remove_between_square_brackets(text):
    return re.sub('\[[^]]*\]', '', text)

def remove_string_control(text):
    return re.sub(r'[\n\r\t]', '', text)

def remove_links(text):
    return re.sub(r'\w+:\/{2}[\d\w-]+(\.[\d\w-]+)*(?:(?:\/[^\s/]*))*', '', text)

dataset = pd.read_csv(r'/content/drive/My Drive/enron/large_emails.csv')
dataset.columns #Index(['text', 'spam'], dtype='object')
print(dataset.shape)
dataset.drop_duplicates(inplace = True)
print(dataset.shape)
dataset = dataset[pd.notnull(dataset['text'])]
print(pd.DataFrame(dataset.isnull().sum()))

# dataset['text'] = dataset['text'].apply(lambda x: x.replace("Subject: ", ""))
# dataset['text'] = dataset['text'].apply(lambda x: x.replace("re : ", ""))
# dataset['text'] = dataset['text'].apply(lambda x: x.replace("fw : ", ""))
# dataset['text'] = dataset['text'].apply(lambda x: x.replace("fwd : ", ""))

dataset['text'] = dataset['text'].apply(lambda x: remove_links(x))
dataset['text'] = dataset['text'].apply(lambda x: strip_html(x))
dataset['text'] = dataset['text'].apply(lambda x: remove_between_square_brackets(x))
dataset['text'] = dataset['text'].apply(lambda x: remove_string_control(x))

dataset = dataset[pd.notnull(dataset['text'])]
print(pd.DataFrame(dataset.isnull().sum()))
train, test = train_test_split(dataset, test_size = 0.20)

(58910, 2)
(52936, 2)
      0
text  0
spam  0
      0
text  0
spam  0


In [0]:
print(dataset)

                                                    text  spam
0      Perhaps you already know, we help companies "G...   1.0
1       PrOfitab|e c0mpany with increased interest fr...   1.0
2      Following please find the Daily EnronOnline Ex...   0.0
3      Hey guys, Please review these and let me know ...   0.0
4      --Fig2xvG2VGoz8o/sContent-Type: text/plain; ch...   0.0
5      FYI6/8/009J49  Exxon 6,500 cut to 6,016(cut of...   0.0
6      As a follow-up to our discussion when I was in...   0.0
7      System information - January 5thChristmass s@|...   1.0
8      V|AGRA'S NEWEST R|VAL C!AL|S HAS BEEN AROUND F...   1.0
9      Staff department "World Logistic"Are you satis...   1.0
10     This is a multi-part message in MIME format--=...   1.0
11         Kysa M. AlportEnron North America503-464-7486   0.0
12     ----9200337155470286Content-Type: text/html; c...   1.0
13     Hello!Viagra is the #1 med to struggle with me...   1.0
14     Dear Vince,we have just received the signed co..

In [0]:
train.columns

Index(['text', 'spam'], dtype='object')

For us, our input data is the 'sentence' column and our label is the 'polarity' column (0, 1 for negative and positive, respecitvely)

In [0]:
DATA_COLUMN = 'text'
LABEL_COLUMN = 'spam'
# label_list is the list of labels, i.e. True, False or 0, 1 or 'dog', 'cat'
label_list = [0, 1]

#Data Preprocessing
We'll need to transform our data into a format BERT understands. This involves two steps. First, we create  `InputExample`'s using the constructor provided in the BERT library.

- `text_a` is the text we want to classify, which in this case, is the `Request` field in our Dataframe. 
- `text_b` is used if we're training a model to understand the relationship between sentences (i.e. is `text_b` a translation of `text_a`? Is `text_b` an answer to the question asked by `text_a`?). This doesn't apply to our task, so we can leave `text_b` blank.
- `label` is the label for our example, i.e. True, False

In [0]:
# Use the InputExample class from BERT's run_classifier code to create examples from the data
train_InputExamples = train.apply(lambda x: bert.run_classifier.InputExample(guid=None, # Globally unique ID for bookkeeping, unused in this example
                                                                   text_a = x[DATA_COLUMN], 
                                                                   text_b = None, 
                                                                   label = x[LABEL_COLUMN]), axis = 1)

test_InputExamples = test.apply(lambda x: bert.run_classifier.InputExample(guid=None, 
                                                                   text_a = x[DATA_COLUMN], 
                                                                   text_b = None, 
                                                                   label = x[LABEL_COLUMN]), axis = 1)

Next, we need to preprocess our data so that it matches the data BERT was trained on. For this, we'll need to do a couple of things (but don't worry--this is also included in the Python library):


1. Lowercase our text (if we're using a BERT lowercase model)
2. Tokenize it (i.e. "sally says hi" -> ["sally", "says", "hi"])
3. Break words into WordPieces (i.e. "calling" -> ["call", "##ing"])
4. Map our words to indexes using a vocab file that BERT provides
5. Add special "CLS" and "SEP" tokens (see the [readme](https://github.com/google-research/bert))
6. Append "index" and "segment" tokens to each input (see the [BERT paper](https://arxiv.org/pdf/1810.04805.pdf))

Happily, we don't have to worry about most of these details.




To start, we'll need to load a vocabulary file and lowercasing information directly from the BERT tf hub module:

In [12]:
# This is a path to an uncased (all lowercase) version of BERT
BERT_MODEL_HUB = "https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1"

def create_tokenizer_from_hub_module():
  """Get the vocab file and casing info from the Hub module."""
  with tf.Graph().as_default():
    bert_module = hub.Module(BERT_MODEL_HUB)
    tokenization_info = bert_module(signature="tokenization_info", as_dict=True)
    with tf.Session() as sess:
      vocab_file, do_lower_case = sess.run([tokenization_info["vocab_file"],
                                            tokenization_info["do_lower_case"]])
      
  return bert.tokenization.FullTokenizer(
      vocab_file=vocab_file, do_lower_case=do_lower_case)

tokenizer = create_tokenizer_from_hub_module()

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


Great--we just learned that the BERT model we're using expects lowercase data (that's what stored in tokenization_info["do_lower_case"]) and we also loaded BERT's vocab file. We also created a tokenizer, which breaks words into word pieces:

Using our tokenizer, we'll call `run_classifier.convert_examples_to_features` on our InputExamples to convert them into features BERT understands.

In [11]:
# We'll set sequences to be at most 128 tokens long.
MAX_SEQ_LENGTH = 128
# Convert our train and test features to InputFeatures that BERT understands.
train_features = bert.run_classifier.convert_examples_to_features(train_InputExamples, label_list, MAX_SEQ_LENGTH, tokenizer)
test_features = bert.run_classifier.convert_examples_to_features(test_InputExamples, label_list, MAX_SEQ_LENGTH, tokenizer)







INFO:tensorflow:Writing example 0 of 42348


INFO:tensorflow:Writing example 0 of 42348


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] > > > > > " r " = = robert harley writes : r > depends who writes it . one guy will write a bug every 5 lines , r > another every 5000 lines . put them both on a project and that r > will average out to a bug every 4 . 99 ##5 lines . and a java program , due to the extensive class libraries , will weigh ##in at 10 % the number of lines of the equivalent c program . q ##ed . - - gary lawrence murphy tel ##ed ##yna ##mics communications inc business advantage through community software : " computers are useless . they can only give you answers . " ( pablo picasso ) [SEP]


INFO:tensorflow:tokens: [CLS] > > > > > " r " = = robert harley writes : r > depends who writes it . one guy will write a bug every 5 lines , r > another every 5000 lines . put them both on a project and that r > will average out to a bug every 4 . 99 ##5 lines . and a java program , due to the extensive class libraries , will weigh ##in at 10 % the number of lines of the equivalent c program . q ##ed . - - gary lawrence murphy tel ##ed ##yna ##mics communications inc business advantage through community software : " computers are useless . they can only give you answers . " ( pablo picasso ) [SEP]


INFO:tensorflow:input_ids: 101 1028 1028 1028 1028 1028 1000 1054 1000 1027 1027 2728 13653 7009 1024 1054 1028 9041 2040 7009 2009 1012 2028 3124 2097 4339 1037 11829 2296 1019 3210 1010 1054 1028 2178 2296 13509 3210 1012 2404 2068 2119 2006 1037 2622 1998 2008 1054 1028 2097 2779 2041 2000 1037 11829 2296 1018 1012 5585 2629 3210 1012 1998 1037 9262 2565 1010 2349 2000 1996 4866 2465 8860 1010 2097 17042 2378 2012 2184 1003 1996 2193 1997 3210 1997 1996 5662 1039 2565 1012 1053 2098 1012 1011 1011 5639 5623 7104 10093 2098 18279 22924 4806 4297 2449 5056 2083 2451 4007 1024 1000 7588 2024 11809 1012 2027 2064 2069 2507 2017 6998 1012 1000 1006 11623 22457 1007 102


INFO:tensorflow:input_ids: 101 1028 1028 1028 1028 1028 1000 1054 1000 1027 1027 2728 13653 7009 1024 1054 1028 9041 2040 7009 2009 1012 2028 3124 2097 4339 1037 11829 2296 1019 3210 1010 1054 1028 2178 2296 13509 3210 1012 2404 2068 2119 2006 1037 2622 1998 2008 1054 1028 2097 2779 2041 2000 1037 11829 2296 1018 1012 5585 2629 3210 1012 1998 1037 9262 2565 1010 2349 2000 1996 4866 2465 8860 1010 2097 17042 2378 2012 2184 1003 1996 2193 1997 3210 1997 1996 5662 1039 2565 1012 1053 2098 1012 1011 1011 5639 5623 7104 10093 2098 18279 22924 4806 4297 2449 5056 2083 2451 4007 1024 1000 7588 2024 11809 1012 2027 2064 2069 2507 2017 6998 1012 1000 1006 11623 22457 1007 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0.0 (id = 0)


INFO:tensorflow:label: 0.0 (id = 0)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] this is standard procedure but i agree that the timing is questionable . i will forward this on to louise . - - - - - original message - - - - - from : forster , david sent : monday , november 12 , 2001 8 : 49 am ##to : duran , w . david ##su ##b ##ject : f ##w : conflict of interest policy mail ##ing to vendors dave , given how the audience might interpret this in light of recent events , do we really want to be sending a letter to customers bo ##ast ##ing of en ##ron conducting its affairs in accordance with the " highest ethical standards " ? dave - - - - - original message - [SEP]


INFO:tensorflow:tokens: [CLS] this is standard procedure but i agree that the timing is questionable . i will forward this on to louise . - - - - - original message - - - - - from : forster , david sent : monday , november 12 , 2001 8 : 49 am ##to : duran , w . david ##su ##b ##ject : f ##w : conflict of interest policy mail ##ing to vendors dave , given how the audience might interpret this in light of recent events , do we really want to be sending a letter to customers bo ##ast ##ing of en ##ron conducting its affairs in accordance with the " highest ethical standards " ? dave - - - - - original message - [SEP]


INFO:tensorflow:input_ids: 101 2023 2003 3115 7709 2021 1045 5993 2008 1996 10984 2003 21068 1012 1045 2097 2830 2023 2006 2000 8227 1012 1011 1011 1011 1011 1011 2434 4471 1011 1011 1011 1011 1011 2013 1024 21316 1010 2585 2741 1024 6928 1010 2281 2260 1010 2541 1022 1024 4749 2572 3406 1024 22959 1010 1059 1012 2585 6342 2497 20614 1024 1042 2860 1024 4736 1997 3037 3343 5653 2075 2000 17088 4913 1010 2445 2129 1996 4378 2453 17841 2023 1999 2422 1997 3522 2824 1010 2079 2057 2428 2215 2000 2022 6016 1037 3661 2000 6304 8945 14083 2075 1997 4372 4948 9283 2049 3821 1999 10388 2007 1996 1000 3284 12962 4781 1000 1029 4913 1011 1011 1011 1011 1011 2434 4471 1011 102


INFO:tensorflow:input_ids: 101 2023 2003 3115 7709 2021 1045 5993 2008 1996 10984 2003 21068 1012 1045 2097 2830 2023 2006 2000 8227 1012 1011 1011 1011 1011 1011 2434 4471 1011 1011 1011 1011 1011 2013 1024 21316 1010 2585 2741 1024 6928 1010 2281 2260 1010 2541 1022 1024 4749 2572 3406 1024 22959 1010 1059 1012 2585 6342 2497 20614 1024 1042 2860 1024 4736 1997 3037 3343 5653 2075 2000 17088 4913 1010 2445 2129 1996 4378 2453 17841 2023 1999 2422 1997 3522 2824 1010 2079 2057 2428 2215 2000 2022 6016 1037 3661 2000 6304 8945 14083 2075 1997 4372 4948 9283 2049 3821 1999 10388 2007 1996 1000 3284 12962 4781 1000 1029 4913 1011 1011 1011 1011 1011 2434 4471 1011 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0.0 (id = 0)


INFO:tensorflow:label: 0.0 (id = 0)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] st ##ins ##on / vince , i think this is important to know . regards , sand ##ee ##p . - - - - - - - - - - - - - - - - - - - - - - forward ##ed by sand ##ee ##p ko ##hli / en ##ron _ development on 03 / 28 / 2001 08 : 23 am - - - - - - - - - - - - - - - - - - - - - - - - - - - an ##shu ##man sri ##vas ##ta ##v ##0 ##3 / 28 / 2001 06 : 32 : 20 am ##to : marc de la roche @ ec ##tc ##c : tu ##sha ##r dh [SEP]


INFO:tensorflow:tokens: [CLS] st ##ins ##on / vince , i think this is important to know . regards , sand ##ee ##p . - - - - - - - - - - - - - - - - - - - - - - forward ##ed by sand ##ee ##p ko ##hli / en ##ron _ development on 03 / 28 / 2001 08 : 23 am - - - - - - - - - - - - - - - - - - - - - - - - - - - an ##shu ##man sri ##vas ##ta ##v ##0 ##3 / 28 / 2001 06 : 32 : 20 am ##to : marc de la roche @ ec ##tc ##c : tu ##sha ##r dh [SEP]


INFO:tensorflow:input_ids: 101 2358 7076 2239 1013 12159 1010 1045 2228 2023 2003 2590 2000 2113 1012 12362 1010 5472 4402 2361 1012 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 2830 2098 2011 5472 4402 2361 12849 27766 1013 4372 4948 1035 2458 2006 6021 1013 2654 1013 2541 5511 1024 2603 2572 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 2019 14235 2386 5185 12044 2696 2615 2692 2509 1013 2654 1013 2541 5757 1024 3590 1024 2322 2572 3406 1024 7871 2139 2474 20162 1030 14925 13535 2278 1024 10722 7377 2099 28144 102


INFO:tensorflow:input_ids: 101 2358 7076 2239 1013 12159 1010 1045 2228 2023 2003 2590 2000 2113 1012 12362 1010 5472 4402 2361 1012 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 2830 2098 2011 5472 4402 2361 12849 27766 1013 4372 4948 1035 2458 2006 6021 1013 2654 1013 2541 5511 1024 2603 2572 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 1011 2019 14235 2386 5185 12044 2696 2615 2692 2509 1013 2654 1013 2541 5757 1024 3590 1024 2322 2572 3406 1024 7871 2139 2474 20162 1030 14925 13535 2278 1024 10722 7377 2099 28144 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0.0 (id = 0)


INFO:tensorflow:label: 0.0 (id = 0)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] louise , this is better in my opinion . it tends to force better definition from the developer and his team . your costs are reduced when we have good definition . keith - - - - - original message - - - - - from : kitchen , louise sent : thursday , september 13 , 2001 12 : 20 pm ##to : duran , w . david ; jacob ##y , ben ; ir ##vin , steve ; k ##roll , heather ; vi ##rgo , robert ; may ##s , wayne ; morris , sandy ; golden , bruce ; rose , steven ; ch ##mie ##le ##wski , robert ; rim ##bau , robert w . ; co ##ffer , walter ; [SEP]


INFO:tensorflow:tokens: [CLS] louise , this is better in my opinion . it tends to force better definition from the developer and his team . your costs are reduced when we have good definition . keith - - - - - original message - - - - - from : kitchen , louise sent : thursday , september 13 , 2001 12 : 20 pm ##to : duran , w . david ; jacob ##y , ben ; ir ##vin , steve ; k ##roll , heather ; vi ##rgo , robert ; may ##s , wayne ; morris , sandy ; golden , bruce ; rose , steven ; ch ##mie ##le ##wski , robert ; rim ##bau , robert w . ; co ##ffer , walter ; [SEP]


INFO:tensorflow:input_ids: 101 8227 1010 2023 2003 2488 1999 2026 5448 1012 2009 12102 2000 2486 2488 6210 2013 1996 9722 1998 2010 2136 1012 2115 5366 2024 4359 2043 2057 2031 2204 6210 1012 6766 1011 1011 1011 1011 1011 2434 4471 1011 1011 1011 1011 1011 2013 1024 3829 1010 8227 2741 1024 9432 1010 2244 2410 1010 2541 2260 1024 2322 7610 3406 1024 22959 1010 1059 1012 2585 1025 6213 2100 1010 3841 1025 20868 6371 1010 3889 1025 1047 28402 1010 9533 1025 6819 18581 1010 2728 1025 2089 2015 1010 6159 1025 6384 1010 7525 1025 3585 1010 5503 1025 3123 1010 7112 1025 10381 9856 2571 10344 1010 2728 1025 11418 27773 1010 2728 1059 1012 1025 2522 12494 1010 4787 1025 102


INFO:tensorflow:input_ids: 101 8227 1010 2023 2003 2488 1999 2026 5448 1012 2009 12102 2000 2486 2488 6210 2013 1996 9722 1998 2010 2136 1012 2115 5366 2024 4359 2043 2057 2031 2204 6210 1012 6766 1011 1011 1011 1011 1011 2434 4471 1011 1011 1011 1011 1011 2013 1024 3829 1010 8227 2741 1024 9432 1010 2244 2410 1010 2541 2260 1024 2322 7610 3406 1024 22959 1010 1059 1012 2585 1025 6213 2100 1010 3841 1025 20868 6371 1010 3889 1025 1047 28402 1010 9533 1025 6819 18581 1010 2728 1025 2089 2015 1010 6159 1025 6384 1010 7525 1025 3585 1010 5503 1025 3123 1010 7112 1025 10381 9856 2571 10344 1010 2728 1025 11418 27773 1010 2728 1059 1012 1025 2522 12494 1010 4787 1025 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0.0 (id = 0)


INFO:tensorflow:label: 0.0 (id = 0)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] jill is not representing the pipeline anymore brian riley should be handling this for ee ##x [SEP]


INFO:tensorflow:tokens: [CLS] jill is not representing the pipeline anymore brian riley should be handling this for ee ##x [SEP]


INFO:tensorflow:input_ids: 101 10454 2003 2025 5052 1996 13117 4902 4422 7546 2323 2022 8304 2023 2005 25212 2595 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_ids: 101 10454 2003 2025 5052 1996 13117 4902 4422 7546 2323 2022 8304 2023 2005 25212 2595 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0.0 (id = 0)


INFO:tensorflow:label: 0.0 (id = 0)


INFO:tensorflow:Writing example 10000 of 42348


INFO:tensorflow:Writing example 10000 of 42348


INFO:tensorflow:Writing example 20000 of 42348


INFO:tensorflow:Writing example 20000 of 42348


INFO:tensorflow:Writing example 30000 of 42348


INFO:tensorflow:Writing example 30000 of 42348


INFO:tensorflow:Writing example 40000 of 42348


INFO:tensorflow:Writing example 40000 of 42348


INFO:tensorflow:Writing example 0 of 10587


INFO:tensorflow:Writing example 0 of 10587


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] monthly in ##vo ##ice n ##90 ##2 - 1818 ##7 ##7 ##de ##ar che ##ps ##oft customer , my name is ke ##rmi ##t mcconnell , and i work at che ##ps ##oft llc . you are important to me ! you spend your money and time on = cheap ##so ##ft , and i want to let you know that we have finished update our programs = store . i want to remind you that we are offering now more than 1500 popular ##so ##ft ##ware for low price with your personal customer ' s discount . please spend few moments of yours precious time to check our updated = software ##stor ##e : with regards , customer service department , ke ##rmi ##t mcconnell [SEP]


INFO:tensorflow:tokens: [CLS] monthly in ##vo ##ice n ##90 ##2 - 1818 ##7 ##7 ##de ##ar che ##ps ##oft customer , my name is ke ##rmi ##t mcconnell , and i work at che ##ps ##oft llc . you are important to me ! you spend your money and time on = cheap ##so ##ft , and i want to let you know that we have finished update our programs = store . i want to remind you that we are offering now more than 1500 popular ##so ##ft ##ware for low price with your personal customer ' s discount . please spend few moments of yours precious time to check our updated = software ##stor ##e : with regards , customer service department , ke ##rmi ##t mcconnell [SEP]


INFO:tensorflow:input_ids: 101 7058 1999 6767 6610 1050 21057 2475 1011 12094 2581 2581 3207 2906 18178 4523 15794 8013 1010 2026 2171 2003 17710 28550 2102 28514 1010 1998 1045 2147 2012 18178 4523 15794 11775 1012 2017 2024 2590 2000 2033 999 2017 5247 2115 2769 1998 2051 2006 1027 10036 6499 6199 1010 1998 1045 2215 2000 2292 2017 2113 2008 2057 2031 2736 10651 2256 3454 1027 3573 1012 1045 2215 2000 10825 2017 2008 2057 2024 5378 2085 2062 2084 10347 2759 6499 6199 8059 2005 2659 3976 2007 2115 3167 8013 1005 1055 19575 1012 3531 5247 2261 5312 1997 6737 9062 2051 2000 4638 2256 7172 1027 4007 23809 2063 1024 2007 12362 1010 8013 2326 2533 1010 17710 28550 2102 28514 102


INFO:tensorflow:input_ids: 101 7058 1999 6767 6610 1050 21057 2475 1011 12094 2581 2581 3207 2906 18178 4523 15794 8013 1010 2026 2171 2003 17710 28550 2102 28514 1010 1998 1045 2147 2012 18178 4523 15794 11775 1012 2017 2024 2590 2000 2033 999 2017 5247 2115 2769 1998 2051 2006 1027 10036 6499 6199 1010 1998 1045 2215 2000 2292 2017 2113 2008 2057 2031 2736 10651 2256 3454 1027 3573 1012 1045 2215 2000 10825 2017 2008 2057 2024 5378 2085 2062 2084 10347 2759 6499 6199 8059 2005 2659 3976 2007 2115 3167 8013 1005 1055 19575 1012 3531 5247 2261 5312 1997 6737 9062 2051 2000 4638 2256 7172 1027 4007 23809 2063 1024 2007 12362 1010 8013 2326 2533 1010 17710 28550 2102 28514 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1.0 (id = 1)


INFO:tensorflow:label: 1.0 (id = 1)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] if your sex life is good . . . then make it fantastic ! the covers of this book are too far apart . real freedom lies in wild ##ness , not in civilization . with the gift of listening comes the gift of healing . [SEP]


INFO:tensorflow:tokens: [CLS] if your sex life is good . . . then make it fantastic ! the covers of this book are too far apart . real freedom lies in wild ##ness , not in civilization . with the gift of listening comes the gift of healing . [SEP]


INFO:tensorflow:input_ids: 101 2065 2115 3348 2166 2003 2204 1012 1012 1012 2059 2191 2009 10392 999 1996 4472 1997 2023 2338 2024 2205 2521 4237 1012 2613 4071 3658 1999 3748 2791 1010 2025 1999 10585 1012 2007 1996 5592 1997 5962 3310 1996 5592 1997 8907 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_ids: 101 2065 2115 3348 2166 2003 2204 1012 1012 1012 2059 2191 2009 10392 999 1996 4472 1997 2023 2338 2024 2205 2521 4237 1012 2613 4071 3658 1999 3748 2791 1010 2025 1999 10585 1012 2007 1996 5592 1997 5962 3310 1996 5592 1997 8907 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1.0 (id = 1)


INFO:tensorflow:label: 1.0 (id = 1)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] ) , a print media and advertising services company in china , discusses about the company ' s objectives and planned prospects for the coming months . the key initiatives for the company are : expand the existing network of restaurants and cafes increase the advertising sales revenue of the current printed media acquire additional " wait media " businesses enjoy media is a pioneer in the field of printed advertising media placed in restaurants in china . it supplies paper products , such as paper place ##mat ##s , napkin ##s and other displays , displaying advertisements , free of charge , to restaurants and cafes . enjoy media ' s growing list of restaurants and cafes is now over 1 , 200 in the [SEP]


INFO:tensorflow:tokens: [CLS] ) , a print media and advertising services company in china , discusses about the company ' s objectives and planned prospects for the coming months . the key initiatives for the company are : expand the existing network of restaurants and cafes increase the advertising sales revenue of the current printed media acquire additional " wait media " businesses enjoy media is a pioneer in the field of printed advertising media placed in restaurants in china . it supplies paper products , such as paper place ##mat ##s , napkin ##s and other displays , displaying advertisements , free of charge , to restaurants and cafes . enjoy media ' s growing list of restaurants and cafes is now over 1 , 200 in the [SEP]


INFO:tensorflow:input_ids: 101 1007 1010 1037 6140 2865 1998 6475 2578 2194 1999 2859 1010 15841 2055 1996 2194 1005 1055 11100 1998 3740 16746 2005 1996 2746 2706 1012 1996 3145 11107 2005 1996 2194 2024 1024 7818 1996 4493 2897 1997 7884 1998 23812 3623 1996 6475 4341 6599 1997 1996 2783 6267 2865 9878 3176 1000 3524 2865 1000 5661 5959 2865 2003 1037 7156 1999 1996 2492 1997 6267 6475 2865 2872 1999 7884 1999 2859 1012 2009 6067 3259 3688 1010 2107 2004 3259 2173 18900 2015 1010 20619 2015 1998 2060 8834 1010 14962 14389 1010 2489 1997 3715 1010 2000 7884 1998 23812 1012 5959 2865 1005 1055 3652 2862 1997 7884 1998 23812 2003 2085 2058 1015 1010 3263 1999 1996 102


INFO:tensorflow:input_ids: 101 1007 1010 1037 6140 2865 1998 6475 2578 2194 1999 2859 1010 15841 2055 1996 2194 1005 1055 11100 1998 3740 16746 2005 1996 2746 2706 1012 1996 3145 11107 2005 1996 2194 2024 1024 7818 1996 4493 2897 1997 7884 1998 23812 3623 1996 6475 4341 6599 1997 1996 2783 6267 2865 9878 3176 1000 3524 2865 1000 5661 5959 2865 2003 1037 7156 1999 1996 2492 1997 6267 6475 2865 2872 1999 7884 1999 2859 1012 2009 6067 3259 3688 1010 2107 2004 3259 2173 18900 2015 1010 20619 2015 1998 2060 8834 1010 14962 14389 1010 2489 1997 3715 1010 2000 7884 1998 23812 1012 5959 2865 1005 1055 3652 2862 1997 7884 1998 23812 2003 2085 2058 1015 1010 3263 1999 1996 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1.0 (id = 1)


INFO:tensorflow:label: 1.0 (id = 1)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] dillon seductive mere ##tric ##ious te ##tan ##us might ##n ' t nad ##ir ph ##os ##ph ##oric = climax an ##har ##mon ##ic co ##rus ##cate hey ##day deprivation du hood posts ##cript = de ##ple ##te pal ##l con ##se ##quent bien ur ##ea claus pentagon e ##ffa ##ce baby lent ##en = delhi h ##yp ##oc ##ris ##y ins ##ou ##cian ##t el ##ate the secret on how porn stars grew big dick ##s ! the answer is turn off notification ##s st ##ag ##nant ##ob ##li ##que ##de ##mo ##crat ##act ##ora ##tone ##go ##pin ##ve ##ig ##le ##ju ##an ##cl ##a = mm ##yas ##tro ##phy ##sic ##ist ##de ##tra ##ctor ##hea ##the ##nish ##in ##let ##per ##use ##cus ##tom . [SEP]


INFO:tensorflow:tokens: [CLS] dillon seductive mere ##tric ##ious te ##tan ##us might ##n ' t nad ##ir ph ##os ##ph ##oric = climax an ##har ##mon ##ic co ##rus ##cate hey ##day deprivation du hood posts ##cript = de ##ple ##te pal ##l con ##se ##quent bien ur ##ea claus pentagon e ##ffa ##ce baby lent ##en = delhi h ##yp ##oc ##ris ##y ins ##ou ##cian ##t el ##ate the secret on how porn stars grew big dick ##s ! the answer is turn off notification ##s st ##ag ##nant ##ob ##li ##que ##de ##mo ##crat ##act ##ora ##tone ##go ##pin ##ve ##ig ##le ##ju ##an ##cl ##a = mm ##yas ##tro ##phy ##sic ##ist ##de ##tra ##ctor ##hea ##the ##nish ##in ##let ##per ##use ##cus ##tom . [SEP]


INFO:tensorflow:input_ids: 101 14602 23182 8210 12412 6313 8915 5794 2271 2453 2078 1005 1056 23233 4313 6887 2891 8458 29180 1027 14463 2019 8167 8202 2594 2522 7946 16280 4931 10259 29516 4241 7415 8466 23235 1027 2139 10814 2618 14412 2140 9530 3366 15417 29316 24471 5243 19118 20864 1041 20961 3401 3336 15307 2368 1027 6768 1044 22571 10085 6935 2100 16021 7140 14483 2102 3449 3686 1996 3595 2006 2129 22555 3340 3473 2502 5980 2015 999 1996 3437 2003 2735 2125 26828 2015 2358 8490 16885 16429 3669 4226 3207 5302 23185 18908 6525 5524 3995 8091 3726 8004 2571 9103 2319 20464 2050 1027 3461 16303 13181 21281 19570 2923 3207 6494 16761 20192 10760 24014 2378 7485 4842 8557 7874 20389 1012 102


INFO:tensorflow:input_ids: 101 14602 23182 8210 12412 6313 8915 5794 2271 2453 2078 1005 1056 23233 4313 6887 2891 8458 29180 1027 14463 2019 8167 8202 2594 2522 7946 16280 4931 10259 29516 4241 7415 8466 23235 1027 2139 10814 2618 14412 2140 9530 3366 15417 29316 24471 5243 19118 20864 1041 20961 3401 3336 15307 2368 1027 6768 1044 22571 10085 6935 2100 16021 7140 14483 2102 3449 3686 1996 3595 2006 2129 22555 3340 3473 2502 5980 2015 999 1996 3437 2003 2735 2125 26828 2015 2358 8490 16885 16429 3669 4226 3207 5302 23185 18908 6525 5524 3995 8091 3726 8004 2571 9103 2319 20464 2050 1027 3461 16303 13181 21281 19570 2923 3207 6494 16761 20192 10760 24014 2378 7485 4842 8557 7874 20389 1012 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1.0 (id = 1)


INFO:tensorflow:label: 1.0 (id = 1)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] this is a multi - part message in mi ##me format - - fe ##28 ##b ##6 ##cc - 85 ##64 - 495 ##6 - 84 ##1 ##d - 6 ##ed ##5 ##c ##f ##4 ##af ##27 ##4 ##con ##ten ##t - type : text / html ; char ##set = iso - 88 ##59 - 1 ##con ##ten ##t - transfer - encoding : quoted - print ##able ##unt ##it ##led document ad ##que ##en is an institute = committed to brand planning and designing for enterprises , providing end to end = ad services , from plan to design , and finally press . - - - - - - - make maximal use of your marketing fund ! http : / / www [SEP]


INFO:tensorflow:tokens: [CLS] this is a multi - part message in mi ##me format - - fe ##28 ##b ##6 ##cc - 85 ##64 - 495 ##6 - 84 ##1 ##d - 6 ##ed ##5 ##c ##f ##4 ##af ##27 ##4 ##con ##ten ##t - type : text / html ; char ##set = iso - 88 ##59 - 1 ##con ##ten ##t - transfer - encoding : quoted - print ##able ##unt ##it ##led document ad ##que ##en is an institute = committed to brand planning and designing for enterprises , providing end to end = ad services , from plan to design , and finally press . - - - - - - - make maximal use of your marketing fund ! http : / / www [SEP]


INFO:tensorflow:input_ids: 101 2023 2003 1037 4800 1011 2112 4471 1999 2771 4168 4289 1011 1011 10768 22407 2497 2575 9468 1011 5594 21084 1011 29302 2575 1011 6391 2487 2094 1011 1020 2098 2629 2278 2546 2549 10354 22907 2549 8663 6528 2102 1011 2828 1024 3793 1013 16129 1025 25869 13462 1027 11163 1011 6070 28154 1011 1015 8663 6528 2102 1011 4651 1011 17181 1024 9339 1011 6140 3085 16671 4183 3709 6254 4748 4226 2368 2003 2019 2820 1027 5462 2000 4435 4041 1998 12697 2005 9926 1010 4346 2203 2000 2203 1027 4748 2578 1010 2013 2933 2000 2640 1010 1998 2633 2811 1012 1011 1011 1011 1011 1011 1011 1011 2191 29160 2224 1997 2115 5821 4636 999 8299 1024 1013 1013 7479 102


INFO:tensorflow:input_ids: 101 2023 2003 1037 4800 1011 2112 4471 1999 2771 4168 4289 1011 1011 10768 22407 2497 2575 9468 1011 5594 21084 1011 29302 2575 1011 6391 2487 2094 1011 1020 2098 2629 2278 2546 2549 10354 22907 2549 8663 6528 2102 1011 2828 1024 3793 1013 16129 1025 25869 13462 1027 11163 1011 6070 28154 1011 1015 8663 6528 2102 1011 4651 1011 17181 1024 9339 1011 6140 3085 16671 4183 3709 6254 4748 4226 2368 2003 2019 2820 1027 5462 2000 4435 4041 1998 12697 2005 9926 1010 4346 2203 2000 2203 1027 4748 2578 1010 2013 2933 2000 2640 1010 1998 2633 2811 1012 1011 1011 1011 1011 1011 1011 1011 2191 29160 2224 1997 2115 5821 4636 999 8299 1024 1013 1013 7479 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1.0 (id = 1)


INFO:tensorflow:label: 1.0 (id = 1)


INFO:tensorflow:Writing example 10000 of 10587


INFO:tensorflow:Writing example 10000 of 10587


#Creating a model

Now that we've prepared our data, let's focus on building a model. `create_model` does just this below. First, it loads the BERT tf hub module again (this time to extract the computation graph). Next, it creates a single new layer that will be trained to adapt BERT to our sentiment task (i.e. classifying whether a movie review is positive or negative). This strategy of using a mostly trained model is called [fine-tuning](http://wiki.fast.ai/index.php/Fine_tuning).

In [0]:
def create_model(is_predicting, input_ids, input_mask, segment_ids, labels,
                 num_labels):
  """Creates a classification model."""

  bert_module = hub.Module(
      BERT_MODEL_HUB,
      trainable=True)
  bert_inputs = dict(
      input_ids=input_ids,
      input_mask=input_mask,
      segment_ids=segment_ids)
  bert_outputs = bert_module(
      inputs=bert_inputs,
      signature="tokens",
      as_dict=True)

  # Use "pooled_output" for classification tasks on an entire sentence.
  # Use "sequence_outputs" for token-level output.
  output_layer = bert_outputs["pooled_output"]

  hidden_size = output_layer.shape[-1].value

  # Create our own layer to tune for politeness data.
  output_weights = tf.get_variable(
      "output_weights", [num_labels, hidden_size],
      initializer=tf.truncated_normal_initializer(stddev=0.02))

  output_bias = tf.get_variable(
      "output_bias", [num_labels], initializer=tf.zeros_initializer())

  with tf.variable_scope("loss"):

    # Dropout helps prevent overfitting
    output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)

    logits = tf.matmul(output_layer, output_weights, transpose_b=True)
    logits = tf.nn.bias_add(logits, output_bias)
    log_probs = tf.nn.log_softmax(logits, axis=-1)

    # Convert labels into one-hot encoding
    one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)

    predicted_labels = tf.squeeze(tf.argmax(log_probs, axis=-1, output_type=tf.int32))
    # If we're predicting, we want predicted labels and the probabiltiies.
    if is_predicting:
      return (predicted_labels, log_probs)

    # If we're train/eval, compute loss between predicted and actual label
    per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)
    loss = tf.reduce_mean(per_example_loss)
    return (loss, predicted_labels, log_probs)


Next we'll wrap our model function in a `model_fn_builder` function that adapts our model to work for training, evaluation, and prediction.

In [0]:
# model_fn_builder actually creates our model function
# using the passed parameters for num_labels, learning_rate, etc.
def model_fn_builder(num_labels, learning_rate, num_train_steps,
                     num_warmup_steps):
  """Returns `model_fn` closure for TPUEstimator."""
  def model_fn(features, labels, mode, params):  # pylint: disable=unused-argument
    """The `model_fn` for TPUEstimator."""

    input_ids = features["input_ids"]
    input_mask = features["input_mask"]
    segment_ids = features["segment_ids"]
    label_ids = features["label_ids"]

    is_predicting = (mode == tf.estimator.ModeKeys.PREDICT)
    
    # TRAIN and EVAL
    if not is_predicting:

      (loss, predicted_labels, log_probs) = create_model(
        is_predicting, input_ids, input_mask, segment_ids, label_ids, num_labels)

      train_op = bert.optimization.create_optimizer(
          loss, learning_rate, num_train_steps, num_warmup_steps, use_tpu=False)

      # Calculate evaluation metrics. 
      def metric_fn(label_ids, predicted_labels):
        accuracy = tf.metrics.accuracy(label_ids, predicted_labels)
        f1_score = tf.contrib.metrics.f1_score(
            label_ids,
            predicted_labels)
        auc = tf.metrics.auc(
            label_ids,
            predicted_labels)
        recall = tf.metrics.recall(
            label_ids,
            predicted_labels)
        precision = tf.metrics.precision(
            label_ids,
            predicted_labels) 
        true_pos = tf.metrics.true_positives(
            label_ids,
            predicted_labels)
        true_neg = tf.metrics.true_negatives(
            label_ids,
            predicted_labels)   
        false_pos = tf.metrics.false_positives(
            label_ids,
            predicted_labels)  
        false_neg = tf.metrics.false_negatives(
            label_ids,
            predicted_labels)
        return {
            "eval_accuracy": accuracy,
            "f1_score": f1_score,
            "auc": auc,
            "precision": precision,
            "recall": recall,
            "true_positives": true_pos,
            "true_negatives": true_neg,
            "false_positives": false_pos,
            "false_negatives": false_neg
        }

      eval_metrics = metric_fn(label_ids, predicted_labels)

      if mode == tf.estimator.ModeKeys.TRAIN:
        return tf.estimator.EstimatorSpec(mode=mode,
          loss=loss,
          train_op=train_op)
      else:
          return tf.estimator.EstimatorSpec(mode=mode,
            loss=loss,
            eval_metric_ops=eval_metrics)
    else:
      (predicted_labels, log_probs) = create_model(
        is_predicting, input_ids, input_mask, segment_ids, label_ids, num_labels)

      predictions = {
          'probabilities': log_probs,
          'labels': predicted_labels
      }
      return tf.estimator.EstimatorSpec(mode, predictions=predictions)

  # Return the actual model function in the closure
  return model_fn


In [0]:
# Compute train and warmup steps from batch size
# These hyperparameters are copied from this colab notebook (https://colab.sandbox.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_finetuning_with_cloud_tpus.ipynb)
BATCH_SIZE = 32
LEARNING_RATE = 2e-5
NUM_TRAIN_EPOCHS = 3.0
# Warmup is a period of time where hte learning rate 
# is small and gradually increases--usually helps training.
WARMUP_PROPORTION = 0.1
# Model configs
SAVE_CHECKPOINTS_STEPS = 500
SAVE_SUMMARY_STEPS = 100

In [0]:
# Compute # train and warmup steps from batch size
num_train_steps = int(checkpoint_num) + int(len(train_features) / BATCH_SIZE * NUM_TRAIN_EPOCHS)
num_warmup_steps = int(num_train_steps * WARMUP_PROPORTION)

In [0]:
# Specify output directory and number of checkpoint steps to save
run_config = tf.estimator.RunConfig(
    model_dir=OUTPUT_DIR,
    save_summary_steps=SAVE_SUMMARY_STEPS,
    save_checkpoints_steps=SAVE_CHECKPOINTS_STEPS)

In [0]:
model_fn = model_fn_builder(
  num_labels=len(label_list),
  learning_rate=LEARNING_RATE,
  num_train_steps=num_train_steps,
  num_warmup_steps=num_warmup_steps)

estimator = tf.estimator.Estimator(
  model_fn=model_fn,
  config=run_config,
  params={"batch_size": BATCH_SIZE})


INFO:tensorflow:Using config: {'_model_dir': 'gs://email_models/EMAIL_MODEL', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 500, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f9fe2d4b1d0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


INFO:tensorflow:Using config: {'_model_dir': 'gs://email_models/EMAIL_MODEL', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 500, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f9fe2d4b1d0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


Next we create an input builder function that takes our training feature set (`train_features`) and produces a generator. This is a pretty standard design pattern for working with Tensorflow [Estimators](https://www.tensorflow.org/guide/estimators).

In [0]:
# Create an input function for training. drop_remainder = True for using TPUs.
train_input_fn = bert.run_classifier.input_fn_builder(
    features=train_features,
    seq_length=MAX_SEQ_LENGTH,
    is_training=True,
    drop_remainder=False)

Now we train our model! For me, using a Colab notebook running on Google's GPUs

In [0]:
print(f'Beginning Training!')
current_time = datetime.now()
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
print("Training took time ", datetime.now() - current_time)

Beginning Training!
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.


Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.














Instructions for updating:
Deprecated in favor of operator or tf.math.divide.


Instructions for updating:
Deprecated in favor of operator or tf.math.divide.








Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where






  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Graph was finalized.


Instructions for updating:
Use standard file APIs to check for files with this prefix.


Instructions for updating:
Use standard file APIs to check for files with this prefix.


INFO:tensorflow:Restoring parameters from gs://email_models/EMAIL_MODEL/model.ckpt-533


INFO:tensorflow:Restoring parameters from gs://email_models/EMAIL_MODEL/model.ckpt-533


Instructions for updating:
Use standard file utilities to get mtimes.


Instructions for updating:
Use standard file utilities to get mtimes.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Saving checkpoints for 533 into gs://email_models/EMAIL_MODEL/model.ckpt.


INFO:tensorflow:Saving checkpoints for 533 into gs://email_models/EMAIL_MODEL/model.ckpt.


INFO:tensorflow:loss = 0.32937378, step = 533


INFO:tensorflow:loss = 0.32937378, step = 533


INFO:tensorflow:global_step/sec: 1.04424


INFO:tensorflow:global_step/sec: 1.04424


INFO:tensorflow:loss = 0.007825088, step = 633 (95.764 sec)


INFO:tensorflow:loss = 0.007825088, step = 633 (95.764 sec)


INFO:tensorflow:global_step/sec: 1.10293


INFO:tensorflow:global_step/sec: 1.10293


INFO:tensorflow:loss = 0.14464949, step = 733 (90.667 sec)


INFO:tensorflow:loss = 0.14464949, step = 733 (90.667 sec)


INFO:tensorflow:global_step/sec: 1.1451


INFO:tensorflow:global_step/sec: 1.1451


INFO:tensorflow:loss = 0.0028299731, step = 833 (87.328 sec)


INFO:tensorflow:loss = 0.0028299731, step = 833 (87.328 sec)


INFO:tensorflow:global_step/sec: 1.0679


INFO:tensorflow:global_step/sec: 1.0679


INFO:tensorflow:loss = 0.037625454, step = 933 (93.642 sec)


INFO:tensorflow:loss = 0.037625454, step = 933 (93.642 sec)


INFO:tensorflow:Saving checkpoints for 1033 into gs://email_models/EMAIL_MODEL/model.ckpt.


INFO:tensorflow:Saving checkpoints for 1033 into gs://email_models/EMAIL_MODEL/model.ckpt.


INFO:tensorflow:global_step/sec: 0.725931


INFO:tensorflow:global_step/sec: 0.725931


INFO:tensorflow:loss = 0.025638482, step = 1033 (137.754 sec)


INFO:tensorflow:loss = 0.025638482, step = 1033 (137.754 sec)


INFO:tensorflow:global_step/sec: 1.16078


INFO:tensorflow:global_step/sec: 1.16078


INFO:tensorflow:loss = 0.0015837431, step = 1133 (86.150 sec)


INFO:tensorflow:loss = 0.0015837431, step = 1133 (86.150 sec)


INFO:tensorflow:global_step/sec: 1.08541


INFO:tensorflow:global_step/sec: 1.08541


INFO:tensorflow:loss = 0.0034159417, step = 1233 (92.130 sec)


INFO:tensorflow:loss = 0.0034159417, step = 1233 (92.130 sec)


INFO:tensorflow:global_step/sec: 1.12932


INFO:tensorflow:global_step/sec: 1.12932


INFO:tensorflow:loss = 0.0024019391, step = 1333 (88.549 sec)


INFO:tensorflow:loss = 0.0024019391, step = 1333 (88.549 sec)


INFO:tensorflow:global_step/sec: 1.07545


INFO:tensorflow:global_step/sec: 1.07545


INFO:tensorflow:loss = 0.07385103, step = 1433 (92.985 sec)


INFO:tensorflow:loss = 0.07385103, step = 1433 (92.985 sec)


INFO:tensorflow:Saving checkpoints for 1533 into gs://email_models/EMAIL_MODEL/model.ckpt.


INFO:tensorflow:Saving checkpoints for 1533 into gs://email_models/EMAIL_MODEL/model.ckpt.


INFO:tensorflow:global_step/sec: 0.720933


INFO:tensorflow:global_step/sec: 0.720933


INFO:tensorflow:loss = 0.15941478, step = 1533 (138.709 sec)


INFO:tensorflow:loss = 0.15941478, step = 1533 (138.709 sec)


INFO:tensorflow:global_step/sec: 1.16001


INFO:tensorflow:global_step/sec: 1.16001


INFO:tensorflow:loss = 0.16284011, step = 1633 (86.206 sec)


INFO:tensorflow:loss = 0.16284011, step = 1633 (86.206 sec)


INFO:tensorflow:global_step/sec: 1.07894


INFO:tensorflow:global_step/sec: 1.07894


INFO:tensorflow:loss = 0.0023895537, step = 1733 (92.684 sec)


INFO:tensorflow:loss = 0.0023895537, step = 1733 (92.684 sec)


INFO:tensorflow:global_step/sec: 1.13018


INFO:tensorflow:global_step/sec: 1.13018


INFO:tensorflow:loss = 0.080388114, step = 1833 (88.484 sec)


INFO:tensorflow:loss = 0.080388114, step = 1833 (88.484 sec)


INFO:tensorflow:global_step/sec: 1.07804


INFO:tensorflow:global_step/sec: 1.07804


INFO:tensorflow:loss = 0.0028186836, step = 1933 (92.761 sec)


INFO:tensorflow:loss = 0.0028186836, step = 1933 (92.761 sec)


INFO:tensorflow:Saving checkpoints for 2033 into gs://email_models/EMAIL_MODEL/model.ckpt.


INFO:tensorflow:Saving checkpoints for 2033 into gs://email_models/EMAIL_MODEL/model.ckpt.


Instructions for updating:
Use standard file APIs to delete files with this prefix.


Instructions for updating:
Use standard file APIs to delete files with this prefix.


INFO:tensorflow:global_step/sec: 0.719673


INFO:tensorflow:global_step/sec: 0.719673


INFO:tensorflow:loss = 0.01838514, step = 2033 (138.950 sec)


INFO:tensorflow:loss = 0.01838514, step = 2033 (138.950 sec)


INFO:tensorflow:global_step/sec: 1.16128


INFO:tensorflow:global_step/sec: 1.16128


INFO:tensorflow:loss = 0.0017305312, step = 2133 (86.113 sec)


INFO:tensorflow:loss = 0.0017305312, step = 2133 (86.113 sec)


INFO:tensorflow:global_step/sec: 1.07529


INFO:tensorflow:global_step/sec: 1.07529


INFO:tensorflow:loss = 0.002823508, step = 2233 (92.998 sec)


INFO:tensorflow:loss = 0.002823508, step = 2233 (92.998 sec)


INFO:tensorflow:global_step/sec: 1.13046


INFO:tensorflow:global_step/sec: 1.13046


INFO:tensorflow:loss = 0.12421471, step = 2333 (88.459 sec)


INFO:tensorflow:loss = 0.12421471, step = 2333 (88.459 sec)


INFO:tensorflow:global_step/sec: 1.08145


INFO:tensorflow:global_step/sec: 1.08145


INFO:tensorflow:loss = 0.0457922, step = 2433 (92.468 sec)


INFO:tensorflow:loss = 0.0457922, step = 2433 (92.468 sec)


INFO:tensorflow:Saving checkpoints for 2533 into gs://email_models/EMAIL_MODEL/model.ckpt.


INFO:tensorflow:Saving checkpoints for 2533 into gs://email_models/EMAIL_MODEL/model.ckpt.


INFO:tensorflow:global_step/sec: 0.724003


INFO:tensorflow:global_step/sec: 0.724003


INFO:tensorflow:loss = 0.00014217466, step = 2533 (138.121 sec)


INFO:tensorflow:loss = 0.00014217466, step = 2533 (138.121 sec)


INFO:tensorflow:global_step/sec: 1.16024


INFO:tensorflow:global_step/sec: 1.16024


INFO:tensorflow:loss = 0.00017863805, step = 2633 (86.189 sec)


INFO:tensorflow:loss = 0.00017863805, step = 2633 (86.189 sec)


INFO:tensorflow:global_step/sec: 1.07944


INFO:tensorflow:global_step/sec: 1.07944


INFO:tensorflow:loss = 0.0026365372, step = 2733 (92.641 sec)


INFO:tensorflow:loss = 0.0026365372, step = 2733 (92.641 sec)


INFO:tensorflow:global_step/sec: 1.12838


INFO:tensorflow:global_step/sec: 1.12838


INFO:tensorflow:loss = 0.0019560945, step = 2833 (88.622 sec)


INFO:tensorflow:loss = 0.0019560945, step = 2833 (88.622 sec)


INFO:tensorflow:global_step/sec: 1.0606


INFO:tensorflow:global_step/sec: 1.0606


INFO:tensorflow:loss = 0.000296951, step = 2933 (94.286 sec)


INFO:tensorflow:loss = 0.000296951, step = 2933 (94.286 sec)


INFO:tensorflow:Saving checkpoints for 3033 into gs://email_models/EMAIL_MODEL/model.ckpt.


INFO:tensorflow:Saving checkpoints for 3033 into gs://email_models/EMAIL_MODEL/model.ckpt.


INFO:tensorflow:global_step/sec: 0.76254


INFO:tensorflow:global_step/sec: 0.76254


INFO:tensorflow:loss = 9.823059e-05, step = 3033 (131.140 sec)


INFO:tensorflow:loss = 9.823059e-05, step = 3033 (131.140 sec)


INFO:tensorflow:global_step/sec: 1.15754


INFO:tensorflow:global_step/sec: 1.15754


INFO:tensorflow:loss = 0.0021998663, step = 3133 (86.389 sec)


INFO:tensorflow:loss = 0.0021998663, step = 3133 (86.389 sec)


INFO:tensorflow:global_step/sec: 1.08186


INFO:tensorflow:global_step/sec: 1.08186


INFO:tensorflow:loss = 0.0015840314, step = 3233 (92.434 sec)


INFO:tensorflow:loss = 0.0015840314, step = 3233 (92.434 sec)


INFO:tensorflow:global_step/sec: 1.12859


INFO:tensorflow:global_step/sec: 1.12859


INFO:tensorflow:loss = 6.0749484e-05, step = 3333 (88.606 sec)


INFO:tensorflow:loss = 6.0749484e-05, step = 3333 (88.606 sec)


INFO:tensorflow:global_step/sec: 1.05796


INFO:tensorflow:global_step/sec: 1.05796


INFO:tensorflow:loss = 6.2061015e-05, step = 3433 (94.522 sec)


INFO:tensorflow:loss = 6.2061015e-05, step = 3433 (94.522 sec)


INFO:tensorflow:Saving checkpoints for 3533 into gs://email_models/EMAIL_MODEL/model.ckpt.


INFO:tensorflow:Saving checkpoints for 3533 into gs://email_models/EMAIL_MODEL/model.ckpt.


INFO:tensorflow:global_step/sec: 0.701699


INFO:tensorflow:global_step/sec: 0.701699


INFO:tensorflow:loss = 0.0011707968, step = 3533 (142.511 sec)


INFO:tensorflow:loss = 0.0011707968, step = 3533 (142.511 sec)


INFO:tensorflow:global_step/sec: 1.16362


INFO:tensorflow:global_step/sec: 1.16362


INFO:tensorflow:loss = 4.1513675e-05, step = 3633 (85.939 sec)


INFO:tensorflow:loss = 4.1513675e-05, step = 3633 (85.939 sec)


INFO:tensorflow:global_step/sec: 1.08895


INFO:tensorflow:global_step/sec: 1.08895


INFO:tensorflow:loss = 4.5242552e-05, step = 3733 (91.831 sec)


INFO:tensorflow:loss = 4.5242552e-05, step = 3733 (91.831 sec)


INFO:tensorflow:global_step/sec: 1.12909


INFO:tensorflow:global_step/sec: 1.12909


INFO:tensorflow:loss = 0.0013784661, step = 3833 (93.858 sec)


INFO:tensorflow:loss = 0.0013784661, step = 3833 (93.858 sec)


INFO:tensorflow:global_step/sec: 1.06471


INFO:tensorflow:global_step/sec: 1.06471


INFO:tensorflow:loss = 7.064996e-05, step = 3933 (88.631 sec)


INFO:tensorflow:loss = 7.064996e-05, step = 3933 (88.631 sec)


INFO:tensorflow:Saving checkpoints for 4033 into gs://email_models/EMAIL_MODEL/model.ckpt.


INFO:tensorflow:Saving checkpoints for 4033 into gs://email_models/EMAIL_MODEL/model.ckpt.


INFO:tensorflow:global_step/sec: 0.719804


INFO:tensorflow:global_step/sec: 0.719804


INFO:tensorflow:loss = 0.1278688, step = 4033 (138.926 sec)


INFO:tensorflow:loss = 0.1278688, step = 4033 (138.926 sec)


INFO:tensorflow:global_step/sec: 1.09825


INFO:tensorflow:global_step/sec: 1.09825


INFO:tensorflow:loss = 0.00041302317, step = 4133 (91.054 sec)


INFO:tensorflow:loss = 0.00041302317, step = 4133 (91.054 sec)


INFO:tensorflow:global_step/sec: 1.13633


INFO:tensorflow:global_step/sec: 1.13633


INFO:tensorflow:loss = 6.4634354e-05, step = 4233 (93.376 sec)


INFO:tensorflow:loss = 6.4634354e-05, step = 4233 (93.376 sec)


INFO:tensorflow:global_step/sec: 1.06786


INFO:tensorflow:global_step/sec: 1.06786


INFO:tensorflow:loss = 6.6015346e-05, step = 4333 (88.272 sec)


INFO:tensorflow:loss = 6.6015346e-05, step = 4333 (88.272 sec)


INFO:tensorflow:global_step/sec: 1.08359


INFO:tensorflow:global_step/sec: 1.08359


INFO:tensorflow:loss = 7.046978e-05, step = 4433 (92.286 sec)


INFO:tensorflow:loss = 7.046978e-05, step = 4433 (92.286 sec)


INFO:tensorflow:Saving checkpoints for 4503 into gs://email_models/EMAIL_MODEL/model.ckpt.


INFO:tensorflow:Saving checkpoints for 4503 into gs://email_models/EMAIL_MODEL/model.ckpt.


INFO:tensorflow:Loss for final step: 7.5324824e-05.


INFO:tensorflow:Loss for final step: 7.5324824e-05.


Training took time  1:09:24.663371


Now let's use our test data to see how well our model did:

In [0]:
test_input_fn = run_classifier.input_fn_builder(
    features=test_features,
    seq_length=MAX_SEQ_LENGTH,
    is_training=False,
    drop_remainder=False)

In [0]:
estimator.evaluate(input_fn=test_input_fn, steps=None)

INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Starting evaluation at 2019-09-13T00:57:15Z


INFO:tensorflow:Starting evaluation at 2019-09-13T00:57:15Z


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Restoring parameters from gs://email_models/EMAIL_MODEL/model.ckpt-4503


INFO:tensorflow:Restoring parameters from gs://email_models/EMAIL_MODEL/model.ckpt-4503


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Finished evaluation at 2019-09-13-01:02:15


INFO:tensorflow:Finished evaluation at 2019-09-13-01:02:15


INFO:tensorflow:Saving dict for global step 4503: auc = 0.9928803, eval_accuracy = 0.9931992, f1_score = 0.99407405, false_negatives = 30.0, false_positives = 42.0, global_step = 4503, loss = 0.03366081, precision = 0.99309325, recall = 0.99505687, true_negatives = 4476.0, true_positives = 6039.0


INFO:tensorflow:Saving dict for global step 4503: auc = 0.9928803, eval_accuracy = 0.9931992, f1_score = 0.99407405, false_negatives = 30.0, false_positives = 42.0, global_step = 4503, loss = 0.03366081, precision = 0.99309325, recall = 0.99505687, true_negatives = 4476.0, true_positives = 6039.0


INFO:tensorflow:Saving 'checkpoint_path' summary for global step 4503: gs://email_models/EMAIL_MODEL/model.ckpt-4503


INFO:tensorflow:Saving 'checkpoint_path' summary for global step 4503: gs://email_models/EMAIL_MODEL/model.ckpt-4503


{'auc': 0.9928803,
 'eval_accuracy': 0.9931992,
 'f1_score': 0.99407405,
 'false_negatives': 30.0,
 'false_positives': 42.0,
 'global_step': 4503,
 'loss': 0.03366081,
 'precision': 0.99309325,
 'recall': 0.99505687,
 'true_negatives': 4476.0,
 'true_positives': 6039.0}

Now let's write code to make predictions on new sentences:

In [0]:
def getPrediction(in_sentences):
  labels = ["Negative", "Positive"]
  input_examples = [run_classifier.InputExample(guid="", text_a = x, text_b = None, label = 0) for x in in_sentences] # here, "" is just a dummy label
  input_features = run_classifier.convert_examples_to_features(input_examples, label_list, MAX_SEQ_LENGTH, tokenizer)
  predict_input_fn = run_classifier.input_fn_builder(features=input_features, seq_length=MAX_SEQ_LENGTH, is_training=False, drop_remainder=False)
  predictions = estimator.predict(predict_input_fn)
  return [(sentence, prediction['probabilities'], labels[prediction['labels']]) for sentence, prediction in zip(in_sentences, predictions)]

In [0]:
def serving_input_fn():
    label_ids = tf.placeholder(tf.int32, [None], name='label_ids')
    input_ids = tf.placeholder(tf.int32, [None, MAX_SEQ_LENGTH], name='input_ids')
    input_mask = tf.placeholder(tf.int32, [None, MAX_SEQ_LENGTH], name='input_mask')
    segment_ids = tf.placeholder(tf.int32, [None, MAX_SEQ_LENGTH], name='segment_ids')
    input_fn = tf.estimator.export.build_raw_serving_input_receiver_fn({
        'label_ids': label_ids,
        'input_ids': input_ids,
        'input_mask': input_mask,
        'segment_ids': segment_ids,
    })()
    return input_fn

In [0]:
print(serving_input_fn)

<function input_fn_builder.<locals>.input_fn at 0x7fddc7e83f28>


In [0]:
estimator._export_to_tpu = False
estimator.export_savedmodel(OUTPUT_DIR,serving_input_fn)

ValueError: ignored

In [0]:
pred_sentences = [
  "Base64 encoding schemes are commonly used when there is a need to encode binary data that needs be stored and transferred over media that are designed to deal with textual data. This is to ensure that the data remains intact without modification during transport. Base64 is used commonly in a number of applications including email via MIME, and storing complex data in XML.",
  "Have to deal with Base64 format? Then this tool is made for you! Use the super simple online form below to decode or encode your data. Welcome!",
  "New meeting for next week will be scheduled on friday",
  "The University’s free Wi-Fi network now covers more outdoor areas, including the City Road bus stops and spaces around the Jane Foss Russell, Abercrombie and Carslaw buildings. "
]

In [0]:
pred_sentences = [
  "Purchasing phones from our website now and you can get a great discount",
  "The phone I was purchasing yesterday on the website got a great discount",
  "This one is a serious email. Please contact me later today. Peter O'Halloran Sales Engineer "
]

In [0]:
predictions = getPrediction(pred_sentences)

INFO:tensorflow:Writing example 0 of 3


INFO:tensorflow:Writing example 0 of 3


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: 


INFO:tensorflow:guid: 


INFO:tensorflow:tokens: [CLS] purchasing phones from our website now and you can get a great discount [SEP]


INFO:tensorflow:tokens: [CLS] purchasing phones from our website now and you can get a great discount [SEP]


INFO:tensorflow:input_ids: 101 13131 11640 2013 2256 4037 2085 1998 2017 2064 2131 1037 2307 19575 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_ids: 101 13131 11640 2013 2256 4037 2085 1998 2017 2064 2131 1037 2307 19575 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: 


INFO:tensorflow:guid: 


INFO:tensorflow:tokens: [CLS] the phone i was purchasing yesterday on the website got a great discount [SEP]


INFO:tensorflow:tokens: [CLS] the phone i was purchasing yesterday on the website got a great discount [SEP]


INFO:tensorflow:input_ids: 101 1996 3042 1045 2001 13131 7483 2006 1996 4037 2288 1037 2307 19575 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_ids: 101 1996 3042 1045 2001 13131 7483 2006 1996 4037 2288 1037 2307 19575 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: 


INFO:tensorflow:guid: 


INFO:tensorflow:tokens: [CLS] this one is a serious email . please contact me later today . peter o ' hall ##oran sales engineer [SEP]


INFO:tensorflow:tokens: [CLS] this one is a serious email . please contact me later today . peter o ' hall ##oran sales engineer [SEP]


INFO:tensorflow:input_ids: 101 2023 2028 2003 1037 3809 10373 1012 3531 3967 2033 2101 2651 1012 2848 1051 1005 2534 18842 4341 3992 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_ids: 101 2023 2028 2003 1037 3809 10373 1012 3531 3967 2033 2101 2651 1012 2848 1051 1005 2534 18842 4341 3992 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Restoring parameters from gs://email_models/EMAIL_MODEL/model.ckpt-4503


INFO:tensorflow:Restoring parameters from gs://email_models/EMAIL_MODEL/model.ckpt-4503


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Done running local_init_op.


In [0]:
for prediction in predictions:
    probability = prediction[1][0]/(prediction[1][0] + prediction[1][1])
    print(probability, prediction[2], prediction[0])


0.99999416 Positive Purchasing phones from our website now and you can get a great discount
0.0021353047 Negative The phone I was purchasing yesterday on the website got a great discount
0.00056452776 Negative This one is a serious email. Please contact me later today. Peter O'Halloran Sales Engineer 


Voila! We have a sentiment classifier!