<a href="https://colab.research.google.com/github/anjapago/AnalyzeAccountability/blob/master/Classifier_with_BERT_Policy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Binary Classifier BERT on TF Hub

Bidirectional Encoder Representations from Transformers(BERT) is a neural network architecture designed by Google researchers is a state-of-the-art approach or NLP tasks, including text classification, translation, summarization, and question answering.

BERT has been added to [TF Hub](https://www.tensorflow.org/hub) as a loadable module, and in an existing pipeline, BERT can replace text embedding layers like ELMO and GloVE. 

[Finetuning](http://wiki.fast.ai/index.php/Fine_tuning) BERT can provide both an accuracy boost and faster training time in many cases.

Here, we'll train a a classifier to detect accountability in news articles using BERT in Tensorflow with tf hub. Code was adapted from [this colab notebook](https://colab.research.google.com/github/google-research/bert/blob/master/predicting_movie_reviews_with_bert_on_tf_hub.ipynb).

In [0]:
from sklearn.model_selection import train_test_split
import pandas as pd
import tensorflow as tf
import tensorflow_hub as hub
from datetime import datetime

In addition to the standard libraries we imported above, we'll need to install BERT's python package.

In [7]:
!pip install bert-tensorflow



In [0]:
import bert
from bert import run_classifier
from bert import optimization
from bert import tokenization

Below, we'll set an output directory location to store our model output and checkpoints. We are running this code in Google's hosted Colab, so the directory won't persist after the Colab session ends.

Set DO_DELETE to rewrite the OUTPUT_DIR if it exists. Otherwise, Tensorflow will load existing model checkpoints from that directory (if they exist).

In [9]:
# Set the output directory for saving model file
# Optionally, set a GCP bucket location

OUTPUT_DIR = 'OUTPUT_DIR_NAME'#@param {type:"string"}
DO_DELETE = True #@param {type:"boolean"}
USE_BUCKET = False #@param {type:"boolean"}
BUCKET = 'BUCKET_NAME' #@param {type:"string"}

if USE_BUCKET:
  OUTPUT_DIR = 'gs://{}/{}'.format(BUCKET, OUTPUT_DIR)
  from google.colab import auth
  auth.authenticate_user()

if DO_DELETE:
  try:
    tf.gfile.DeleteRecursively(OUTPUT_DIR)
  except:
    # Doesn't matter if the directory didn't exist
    pass
tf.gfile.MakeDirs(OUTPUT_DIR)
print('***** Model output directory: {} *****'.format(OUTPUT_DIR))


***** Model output directory: OUTPUT_DIR_NAME *****


#Data

Load the dataset of news excerpts annotated with the accountability label. The code below loads the data from xlsx files, formats it as a pandas data frame, and splits it into test and training sets.

In [74]:
from tensorflow import keras
import os
import re
import nltk
from nltk import sent_tokenize
nltk.download('punkt')
from sklearn.model_selection import train_test_split

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [0]:
filenames = [filename for filename in os.listdir() if 'xlsx' in filename]

DATA_COLUMN = 'excerpt'
LABEL_COLUMN = 'label'
label_list = [0, 1]
max_sent = 5
label = 'policy'

In [78]:
train_list = []
test_list = []
for file_name in filenames:
  data = pd.read_excel(file_name, sheet_name='Dedoose Excerpts Export')
  data = data.dropna(axis=0)



  # get relevant columns:
  label_cols = [l for l in data.columns if label in l.lower()]
  excerpt_col = [l for l in data.columns if DATA_COLUMN in l.lower()][0]
  data_subcols = data.loc[:, label_cols+[excerpt_col]]

  for colname in label_cols:
    data_subcols = data_subcols.astype({colname: int})

  #print(data_subcols.shape)
  print(label_cols)

  # filter out rows that do not have any policy subtype label
  label_ids = data_subcols.loc[:, label_cols].sum(axis=1) > 1 
  df_label = data_subcols.loc[label_ids,:]
  #print(df_label.shape)

  # filter out long excerpts

  short_ex_ids = [len(sent_tokenize(sent))>max_sent for sent in df_label.loc[:, excerpt_col]]
  df_label_short = df_label.loc[short_ex_ids, :]
  print(df_label_short.shape)

  # split into train and test dfs
  train, test = train_test_split(df_label_short, test_size=0.25, random_state=42)
  #print(train.shape)
  #print(test.shape)
  train_list.append(train)
  test_list.append(test)

['Code: Policy Applied', 'Code: Policy\\Advocacy by others Applied', 'Code: Policy\\Advocacy by victims families Applied', 'Code: Policy\\Guns Applied', 'Code: Policy\\Immigration Applied', 'Code: Policy\\Information Sharing Applied', 'Code: Policy\\Mental Health Applied', 'Code: Policy\\Other Applied', 'Code: Policy\\Practice Applied']
(59, 10)
['Code: Policy Applied', 'Code: Policy\\Advocacy by others Applied', 'Code: Policy\\Advocacy by victims families Applied', 'Code: Policy\\Guns Applied', 'Code: Policy\\Immigration Applied', 'Code: Policy\\Information Sharing Applied', 'Code: Policy\\Mental Health Applied', 'Code: Policy\\Other Applied', 'Code: Policy\\Practice Applied']
(50, 10)
['POLICY', 'POLICY_Guns', 'POLICY_InfoSharing', 'POLICY_MentalHealth', 'POLICY_Other', 'POLICY_VictimAdv', 'POLICY_OtherAdv', 'POLICY_Practice']
(472, 9)
['POLICY', 'POLICY_Guns', 'POLICY_InfoSharing', 'POLICY_MentalHealth', 'POLICY_Other', 'POLICY_VictimAdv', 'POLICY_OtherAdv', 'POLICY_Practice']
(1891

In [0]:
# transform columns for all data frames to be the same
col_dict = {
    'OtherAdv': ['POLICY_OtherAdv', 'Code: Policy\Advocacy by others Applied'],
    'VictimAdv': ['POLICY_VictimAdv', 'Code: Policy\Advocacy by victims families Applied'],
    'Guns': [ 'POLICY_Guns', 'Code: Policy\Guns Applied'],
    'InfoSharing': ['POLICY_InfoSharing', 'Code: Policy\Information Sharing Applied'],
    'MentalHealth': ['POLICY_MentalHealth', 'Code: Policy\Mental Health Applied'],
    'Other': ['POLICY_Other', 'Code: Policy\Other Applied'],
    'Practice': ['POLICY_Practice', 'Code: Policy\Practice Applied'],
    'Immigration': ['Code: Policy\Immigration Applied']
}

In [112]:
def merge_dfs(df_list):
  merged_df = pd.DataFrame(columns = list(col_dict.keys())+[DATA_COLUMN])

  for df in df_list:
    df_renamed = pd.DataFrame(columns = col_dict.keys(), index = df.index)
    print(df.shape)

    #renamed ex col
    df_renamed[DATA_COLUMN] = df.loc[:,[l for l in df.columns if DATA_COLUMN in l.lower()][0]]

    # make each dict in the list to have the columns in col_dict
    for new_colname in col_dict.keys():
      #check if df has subtype:
      col = [colname for colname in df.columns if colname in col_dict[new_colname]]
      if len(col) ==0:
        df_renamed[new_colname] = 0
      else:
        df_renamed[new_colname] = df.loc[:, col]
    merged_df = merged_df.append(df_renamed, ignore_index=True)
  print(merged_df.shape)
  return merged_df

test_merged = merge_dfs(test_list)
train_merged = merge_dfs(train_list)

(15, 10)
(13, 10)
(118, 9)
(473, 9)
(14, 10)
(13, 10)
(140, 9)
(786, 9)
(44, 10)
(37, 10)
(354, 9)
(1418, 9)
(42, 10)
(39, 10)
(420, 9)
(2354, 9)


View the loaded data, and inspect the first few entries in the training set.

#Data Preprocessing
We'll need to transform our data into a format BERT understands. This involves two steps. First, we create  `InputExample`'s using the constructor provided in the BERT library.

- `text_a` is the text we want to classify, which in this case, is the `Request` field in our Dataframe. 
- `text_b` is used if we're training a model to understand the relationship between sentences (i.e. is `text_b` a translation of `text_a`? Is `text_b` an answer to the question asked by `text_a`?). This doesn't apply to our task, so we can leave `text_b` blank.
- `label` is the label for our example, i.e. True, False

In [0]:
# Use the InputExample class from BERT's run_classifier code to create examples from the data
train_InputExamples = train.apply(lambda x: bert.run_classifier.InputExample(guid=None, # Globally unique ID for bookkeeping, unused in this example
                                                                   text_a = x[DATA_COLUMN], 
                                                                   text_b = None, 
                                                                   label = x[LABEL_COLUMN]), axis = 1)

test_InputExamples = test.apply(lambda x: bert.run_classifier.InputExample(guid=None, 
                                                                   text_a = x[DATA_COLUMN], 
                                                                   text_b = None, 
                                                                   label = x[LABEL_COLUMN]), axis = 1)

Next, we need to preprocess our data so that it matches the data BERT was trained on:


1. Lowercase our text (if we're using a BERT lowercase model)
2. Tokenize it (i.e. "sally says hi" -> ["sally", "says", "hi"])
3. Break words into WordPieces (i.e. "calling" -> ["call", "##ing"])
4. Map our words to indexes using a vocab file that BERT provides
5. Add special "CLS" and "SEP" tokens (see the [readme](https://github.com/google-research/bert))
6. Append "index" and "segment" tokens to each input (see the [BERT paper](https://arxiv.org/pdf/1810.04805.pdf))




To start, we'll need to load a vocabulary file and lowercasing information directly from the BERT tf hub module:

In [0]:
# This is a path to an uncased (all lowercase) version of BERT
BERT_MODEL_HUB = "https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1"

def create_tokenizer_from_hub_module():
  """Get the vocab file and casing info from the Hub module."""
  with tf.Graph().as_default():
    bert_module = hub.Module(BERT_MODEL_HUB)
    tokenization_info = bert_module(signature="tokenization_info", as_dict=True)
    with tf.Session() as sess:
      vocab_file, do_lower_case = sess.run([tokenization_info["vocab_file"],
                                            tokenization_info["do_lower_case"]])
      
  return bert.tokenization.FullTokenizer(
      vocab_file=vocab_file, do_lower_case=do_lower_case)

tokenizer = create_tokenizer_from_hub_module()

W0711 19:30:55.338241 139687083800448 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/bert/tokenization.py:125: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.



In [0]:
tokenizer.tokenize("This here's an example of using the BERT tokenizer")

['this',
 'here',
 "'",
 's',
 'an',
 'example',
 'of',
 'using',
 'the',
 'bert',
 'token',
 '##izer']

Using our tokenizer, we'll call `run_classifier.convert_examples_to_features` on our InputExamples to convert them into features BERT understands.

In [0]:
# We'll set sequences to be at most 128 tokens long.
MAX_SEQ_LENGTH = 128
# Convert our train and test features to InputFeatures that BERT understands.
train_features = bert.run_classifier.convert_examples_to_features(train_InputExamples, label_list, MAX_SEQ_LENGTH, tokenizer)
test_features = bert.run_classifier.convert_examples_to_features(test_InputExamples, label_list, MAX_SEQ_LENGTH, tokenizer)

W0711 19:30:55.488969 139687083800448 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/bert/run_classifier.py:774: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.



#Creating a model

Now that we've prepared our data, let's focus on building a model. `create_model` does just this below. First, it loads the BERT tf hub module again (this time to extract the computation graph). Next, it creates a single new layer that will be trained to adapt BERT to our accountability detection task. This strategy of using a mostly trained model is called [fine-tuning](http://wiki.fast.ai/index.php/Fine_tuning).

In [0]:
def create_model(is_predicting, input_ids, input_mask, segment_ids, labels,
                 num_labels):
  """Creates a classification model."""

  bert_module = hub.Module(
      BERT_MODEL_HUB,
      trainable=True)
  bert_inputs = dict(
      input_ids=input_ids,
      input_mask=input_mask,
      segment_ids=segment_ids)
  bert_outputs = bert_module(
      inputs=bert_inputs,
      signature="tokens",
      as_dict=True)

  # beta for L2 regularizer
  beta = 0.1
  
  # Use "pooled_output" for classification tasks on an entire sentence.
  # Use "sequence_outputs" for token-level output.
  output_layer = bert_outputs["pooled_output"]

  hidden_size = output_layer.shape[-1].value

  # Create our own layer to tune for accountability data.
  output_weights = tf.get_variable(
      "output_weights", [num_labels, hidden_size],
      initializer=tf.truncated_normal_initializer(stddev=0.02))

  output_bias = tf.get_variable(
      "output_bias", [num_labels], initializer=tf.zeros_initializer())

  with tf.variable_scope("loss"):

    # Dropout helps prevent overfitting
    output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)

    logits = tf.matmul(output_layer, output_weights, transpose_b=True)
    logits = tf.nn.bias_add(logits, output_bias)
    log_probs = tf.nn.log_softmax(logits, axis=-1)

    # Convert labels into one-hot encoding
    one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)

    predicted_labels = tf.squeeze(tf.argmax(log_probs, axis=-1, output_type=tf.int32))
    # If we're predicting, we want predicted labels and the probabiltiies.
    if is_predicting:
      return (predicted_labels, log_probs)

    # If we're train/eval, compute loss between predicted and actual label
    per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)
    regularizer = tf.nn.l2_loss(output_weights)
    #loss = tf.reduce_mean(per_example_loss + beta*regularizer)
    loss = tf.reduce_mean(per_example_loss)
    return (loss, predicted_labels, log_probs)


Next we'll wrap our model function in a `model_fn_builder` function that adapts our model to work for training, evaluation, and prediction.

In [0]:
# model_fn_builder actually creates our model function
# using the passed parameters for num_labels, learning_rate, etc.
def model_fn_builder(num_labels, learning_rate, num_train_steps,
                     num_warmup_steps):
  """Returns `model_fn` closure for TPUEstimator."""
  def model_fn(features, labels, mode, params):  # pylint: disable=unused-argument
    """The `model_fn` for TPUEstimator."""

    input_ids = features["input_ids"]
    input_mask = features["input_mask"]
    segment_ids = features["segment_ids"]
    label_ids = features["label_ids"]

    is_predicting = (mode == tf.estimator.ModeKeys.PREDICT)
    
    # TRAIN and EVAL
    if not is_predicting:

      (loss, predicted_labels, log_probs) = create_model(
        is_predicting, input_ids, input_mask, segment_ids, label_ids, num_labels)

      train_op = bert.optimization.create_optimizer(
          loss, learning_rate, num_train_steps, num_warmup_steps, use_tpu=False)

      # Calculate evaluation metrics. 
      def metric_fn(label_ids, predicted_labels):
        accuracy = tf.metrics.accuracy(label_ids, predicted_labels)
        #f1_score = tf.contrib.metrics.f1_score(
        #    label_ids,
        #    predicted_labels)
        auc = tf.metrics.auc(
            label_ids,
            predicted_labels)
        recall = tf.metrics.recall(
            label_ids,
            predicted_labels)
        precision = tf.metrics.precision(
            label_ids,
            predicted_labels) 
        true_pos = tf.metrics.true_positives(
            label_ids,
            predicted_labels)
        true_neg = tf.metrics.true_negatives(
            label_ids,
            predicted_labels)   
        false_pos = tf.metrics.false_positives(
            label_ids,
            predicted_labels)  
        false_neg = tf.metrics.false_negatives(
            label_ids,
            predicted_labels)
        return {
            "eval_accuracy": accuracy,
            #"f1_score": f1_score,
            "auc": auc,
            "precision": precision,
            "recall": recall,
            "true_positives": true_pos,
            "true_negatives": true_neg,
            "false_positives": false_pos,
            "false_negatives": false_neg
        }

      eval_metrics = metric_fn(label_ids, predicted_labels)

      if mode == tf.estimator.ModeKeys.TRAIN:
        return tf.estimator.EstimatorSpec(mode=mode,
          loss=loss,
          train_op=train_op)
      else:
          return tf.estimator.EstimatorSpec(mode=mode,
            loss=loss,
            eval_metric_ops=eval_metrics)
    else:
      (predicted_labels, log_probs) = create_model(
        is_predicting, input_ids, input_mask, segment_ids, label_ids, num_labels)

      predictions = {
          'probabilities': log_probs,
          'labels': predicted_labels
      }
      return tf.estimator.EstimatorSpec(mode, predictions=predictions)

  # Return the actual model function in the closure
  return model_fn


In [0]:
# Compute train and warmup steps from batch size
# These hyperparameters are copied from this colab notebook (https://colab.sandbox.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_finetuning_with_cloud_tpus.ipynb)
BATCH_SIZE = 32
LEARNING_RATE = 2e-5
NUM_TRAIN_EPOCHS = 3.0
# Warmup is a period of time where hte learning rate 
# is small and gradually increases--usually helps training.
WARMUP_PROPORTION = 0.1
# Model configs
SAVE_CHECKPOINTS_STEPS = 500
SAVE_SUMMARY_STEPS = 100

In [0]:
# Compute # train and warmup steps from batch size
num_train_steps = int(len(train_features) / BATCH_SIZE * NUM_TRAIN_EPOCHS)
num_warmup_steps = int(num_train_steps * WARMUP_PROPORTION)

In [0]:
# Specify outpit directory and number of checkpoint steps to save
run_config = tf.estimator.RunConfig(
    model_dir=OUTPUT_DIR,
    save_summary_steps=SAVE_SUMMARY_STEPS,
    save_checkpoints_steps=SAVE_CHECKPOINTS_STEPS)

In [0]:
model_fn = model_fn_builder(
  num_labels=len(label_list),
  learning_rate=LEARNING_RATE,
  num_train_steps=num_train_steps,
  num_warmup_steps=num_warmup_steps)

estimator = tf.estimator.Estimator(
  model_fn=model_fn,
  config=run_config,
  params={"batch_size": BATCH_SIZE})


Next we create an input builder function that takes our training feature set (`train_features`) and produces a generator. This is a pretty standard design pattern for working with Tensorflow [Estimators](https://www.tensorflow.org/guide/estimators).

In [0]:
# Create an input function for training. drop_remainder = True for using TPUs.
train_input_fn = bert.run_classifier.input_fn_builder(
    features=train_features,
    seq_length=MAX_SEQ_LENGTH,
    is_training=True,
    drop_remainder=False)

Now we train our model! For me, using a Colab notebook running on Google's GPUs, my training time was about 25 minutes for three epochs.

In [0]:
print(f'Beginning Training!')
current_time = datetime.now()
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
print("Training took time ", datetime.now() - current_time)

W0711 19:30:59.997400 139687083800448 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.


Beginning Training!


W0711 19:31:06.470019 139687083800448 deprecation.py:506] From <ipython-input-12-a183d214be12>:37: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
W0711 19:31:06.524541 139687083800448 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/bert/optimization.py:27: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.

W0711 19:31:06.526797 139687083800448 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/bert/optimization.py:32: The name tf.train.polynomial_decay is deprecated. Please use tf.compat.v1.train.polynomial_decay instead.

W0711 19:31:06.538047 139687083800448 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/optimizer_v2/learning_rate_schedule.

Training took time  0:13:14.827609


Now let's use our test data to see how well our model did:

In [0]:
test_input_fn = run_classifier.input_fn_builder(
    features=test_features,
    seq_length=MAX_SEQ_LENGTH,
    is_training=False,
    drop_remainder=False)

In [0]:
estimator.evaluate(input_fn=test_input_fn, steps=None)

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
W0711 19:44:33.570834 139687083800448 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.


{'auc': 0.776172,
 'eval_accuracy': 0.9467714,
 'false_negatives': 44.0,
 'false_positives': 17.0,
 'global_step': 429,
 'loss': 0.17867197,
 'precision': 0.7733333,
 'recall': 0.5686275,
 'true_negatives': 1027.0,
 'true_positives': 58.0}

In [0]:
test_train_input_fn = run_classifier.input_fn_builder(
    features=train_features,
    seq_length=MAX_SEQ_LENGTH,
    is_training=False,
    drop_remainder=False)
estimator.evaluate(input_fn=test_train_input_fn, steps=None)

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


{'auc': 0.9834604,
 'eval_accuracy': 0.9941099,
 'false_negatives': 10.0,
 'false_positives': 17.0,
 'global_step': 429,
 'loss': 0.017777901,
 'precision': 0.95156693,
 'recall': 0.9709302,
 'true_negatives': 4223.0,
 'true_positives': 334.0}

### **Results from Single Sentence Classifiers:**

**single_sents_df:**

    Results from trainable = true, epochs = 3, random state = 77, dropout = 0.9

test = {'auc': 0.776172,
 'eval_accuracy': 0.9467714,
 'false_negatives': 44.0,
 'false_positives': 17.0,
 'global_step': 429,
 'loss': 0.17867197,
 'precision': 0.7733333,
 'recall': 0.5686275,
 'true_negatives': 1027.0,
 'true_positives': 58.0}
 
 train = {'auc': 0.9834604,
 'eval_accuracy': 0.9941099,
 'false_negatives': 10.0,
 'false_positives': 17.0,
 'global_step': 429,
 'loss': 0.017777901,
 'precision': 0.95156693,
 'recall': 0.9709302,
 'true_negatives': 4223.0,
 'true_positives': 334.0}

    Results from trainable = true, epochs = 2, random state = 42, dropout = 0.9, with L2, beta= 0.1

test= {'auc': 0.76895684,
 'eval_accuracy': 0.94502616,
 'false_negatives': 42.0,
 'false_positives': 21.0,
 'global_step': 286,
 'loss': 0.1732803,
 'precision': 0.7162162,
 'recall': 0.55789477,
 'true_negatives': 1030.0,
 'true_positives': 53.0}
 
 train= {'auc': 0.94493103,
 'eval_accuracy': 0.98756546,
 'false_negatives': 37.0,
 'false_positives': 20.0,
 'global_step': 286,
 'loss': 0.06178797,
 'precision': 0.94011974,
 'recall': 0.8945869,
 'true_negatives': 4213.0,
 'true_positives': 314.0}
 
     Results from trainable = true, epochs = 2, random state = 42, dropout = 0.9, with L2, beta= 0.1
     
test = {'auc': 0.7617608,
 'eval_accuracy': 0.94938916,
 'false_negatives': 44.0,
 'false_positives': 14.0,
 'global_step': 429,
 'loss': 0.2134014,
 'precision': 0.7846154,
 'recall': 0.5368421,
 'true_negatives': 1037.0,
 'true_positives': 51.0}
 
 train = {'auc': 0.96819556,
 'eval_accuracy': 0.99192846,
 'false_negatives': 21.0,
 'false_positives': 16.0,
 'global_step': 429,
 'loss': 0.046465907,
 'precision': 0.9537572,
 'recall': 0.94017094,
 'true_negatives': 4217.0,
 'true_positives': 330.0}
 
 **short_excerpts_df:**
 
        Results from trainable = true, epochs = 3, random state = 77, dropout = 0.9 -- took 1hr
      
 test = {'auc': 0.7981828,
 'eval_accuracy': 0.91728663,
 'false_negatives': 338.0,
 'false_positives': 229.0,
 'global_step': 2570,
 'loss': 0.308602,
 'precision': 0.71970624,
 'recall': 0.6349892,
 'true_negatives': 5700.0,
 'true_positives': 588.0}
 
 train = {'auc': 0.9667568,
 'eval_accuracy': 0.9836245,
 'false_negatives': 209.0,
 'false_positives': 240.0,
 'global_step': 2570,
 'loss': 0.044080183,
 'precision': 0.9358117,
 'recall': 0.9436354,
 'true_negatives': 23471.0,
 'true_positives': 3499.0}
 
 
     Results from trainable = true, epochs = 3, random state = 42, dropout = 0.9, with L2, beta= 0.1
     
     
test = {'auc': 0.80001724,
 'eval_accuracy': 0.9162655,
 'false_negatives': 347.0,
 'false_positives': 227.0,
 'global_step': 2570,
 'loss': 0.32646456,
 'precision': 0.7297619,
 'recall': 0.63854164,
 'true_negatives': 5668.0,
 'true_positives': 613.0}
 
 train = {'auc': 0.96409565,
 'eval_accuracy': 0.9830409,
 'false_negatives': 227.0,
 'false_positives': 238.0,
 'global_step': 2570,
 'loss': 0.06603416,
 'precision': 0.93541384,
 'recall': 0.9382145,
 'true_negatives': 23507.0,
 'true_positives': 3447.0}

_____________________________________________________________
### **Results from Excerpt Classifier:**

    Results from trainable = true, epochs = 6, random state = 0

test = {'auc': 0.85972005,
 'eval_accuracy': 0.9376,
 'false_negatives': 150.0,
 'false_positives': 123.0,
 'global_step': 3280,
 'loss': 0.2913218,
 'precision': 0.78719723,
 'recall': 0.75206614,
 'true_negatives': 3647.0,
 'true_positives': 455.0}
 
 train = {'auc': 0.9930013,
 'eval_accuracy': 0.9955421,
 'false_negatives': 25.0,
 'false_positives': 53.0,
 'global_step': 3280,
 'loss': 0.011221944,
 'precision': 0.97801745,
 'recall': 0.98950905,
 'true_negatives': 15061.0,
 'true_positives': 2358.0}
 
    Results from trainable = true, epochs = 3, random state = 0
 
test = {'auc': 0.8620361,
 'eval_accuracy': 0.9392,
 'f1_score': 0.77457625,
 'false_negatives': 148.0,
 'false_positives': 118.0,
 'global_step': 1640,
 'loss': 0.21235444,
 'precision': 0.7947826,
 'recall': 0.7553719,
 'true_negatives': 3652.0,
 'true_positives': 457.0}
 
    Results from trainable = true, epochs = 3, random state = 42, dropout = 0.9


test = {'auc': 0.8757505,
 'eval_accuracy': 0.9472,
 'false_negatives': 130.0,
 'false_positives': 101.0,
 'global_step': 1640,
 'loss': 0.20042753,
 'precision': 0.81867146,
 'recall': 0.778157,
 'true_negatives': 3688.0,
 'true_positives': 456.0}
 
 train = {'auc': 0.9838055,
 'eval_accuracy': 0.99108416,
 'false_negatives': 63.0,
 'false_positives': 93.0,
 'global_step': 1640,
 'loss': 0.028095467,
 'precision': 0.96175987,
 'recall': 0.97377187,
 'true_negatives': 15002.0,
 'true_positives': 2339.0}
 
     Results from trainable = true, epochs = 3, random state = 42, dropout = 0.9, with L2, beta= 0.1
 
 test = {'auc': 0.86386645,
 'eval_accuracy': 0.94285715,
 'false_negatives': 143.0,
 'false_positives': 107.0,
 'global_step': 1640,
 'loss': 0.23301551,
 'precision': 0.80545455,
 'recall': 0.7559727,
 'true_negatives': 3682.0,
 'true_positives': 443.0}
 
 train = {'auc': 0.9849268,
 'eval_accuracy': 0.99211293,
 'false_negatives': 60.0,
 'false_positives': 78.0,
 'global_step': 1640,
 'loss': 0.043376114,
 'precision': 0.9677686,
 'recall': 0.9750208,
 'true_negatives': 15017.0,
 'true_positives': 2342.0}
 
      Results from trainable = true, epochs = 1, random state = 42, dropout = 0.9, with L2, beta= 0.1
 
 test = {'auc': 0.8586856,
 'eval_accuracy': 0.94262856,
 'false_negatives': 150.0,
 'false_positives': 101.0,
 'global_step': 546,
 'loss': 0.16572995,
 'precision': 0.8119181,
 'recall': 0.7440273,
 'true_negatives': 3688.0,
 'true_positives': 436.0}
 
 train = {'auc': 0.9224019,
 'eval_accuracy': 0.96668,
 'false_negatives': 333.0,
 'false_positives': 250.0,
 'global_step': 546,
 'loss': 0.11190539,
 'precision': 0.8921949,
 'recall': 0.86136556,
 'true_negatives': 14845.0,
 'true_positives': 2069.0}
 
     Results from trainable = true, epochs = 3, random state = 42, dropout=1 (no dropout)
 
 test = {'auc': 0.86617136,
 'eval_accuracy': 0.9456,
 'false_negatives': 142.0,
 'false_positives': 96.0,
 'global_step': 1640,
 'loss': 0.22034082,
 'precision': 0.82222223,
 'recall': 0.75767916,
 'true_negatives': 3693.0,
 'true_positives': 444.0}
 
 train = {'auc': 0.9862277,
 'eval_accuracy': 0.99194145,
 'false_negatives': 52.0,
 'false_positives': 89.0,
 'global_step': 1640,
 'loss': 0.025706513,
 'precision': 0.9635096,
 'recall': 0.97835135,
 'true_negatives': 15006.0,
 'true_positives': 2350.0}
 
    Results from trainable = true, epochs = 3, random state = 42, dropout=0.8
  
  test = {'auc': 0.87114984,
 'eval_accuracy': 0.9442286,
 'false_negatives': 134.0,
 'false_positives': 110.0,
 'global_step': 1640,
 'loss': 0.20319265,
 'precision': 0.80427045,
 'recall': 0.7713311,
 'true_negatives': 3679.0,
 'true_positives': 452.0}
  
  train = {'auc': 0.9843448,
 'eval_accuracy': 0.99171287,
 'false_negatives': 62.0,
 'false_positives': 83.0,
 'global_step': 1640,
 'loss': 0.024463367,
 'precision': 0.965745,
 'recall': 0.97418815,
 'true_negatives': 15012.0,
 'true_positives': 2340.0}
 
     Results from trainable = true, epochs = 3, random state = 42, dropout = 0.6
 
 test = {'auc': 0.87173915,
 'eval_accuracy': 0.944,
 'false_negatives': 133.0,
 'false_positives': 112.0,
 'global_step': 1640,
 'loss': 0.21192765,
 'precision': 0.8017699,
 'recall': 0.77303755,
 'true_negatives': 3677.0,
 'true_positives': 453.0}
 
 train = {'auc': 0.9841035,
 'eval_accuracy': 0.99159855,
 'false_negatives': 63.0,
 'false_positives': 84.0,
 'global_step': 1640,
 'loss': 0.025331786,
 'precision': 0.9653322,
 'recall': 0.97377187,
 'true_negatives': 15011.0,
 'true_positives': 2339.0}
 
      Results from trainable = true, epochs = 3, random state = 42, dropout = 0.4
 
 test = {'auc': 0.8663647,
 'eval_accuracy': 0.9446857,
 'false_negatives': 141.0,
 'false_positives': 101.0,
 'global_step': 1640,
 'loss': 0.21934973,
 'precision': 0.8150183,
 'recall': 0.75938565,
 'true_negatives': 3688.0,
 'true_positives': 445.0}
 
 train = {'auc': 0.98442054,
 'eval_accuracy': 0.9915414,
 'false_negatives': 61.0,
 'false_positives': 87.0,
 'global_step': 1640,
 'loss': 0.026988935,
 'precision': 0.964168,
 'recall': 0.9746045,
 'true_negatives': 15008.0,
 'true_positives': 2341.0}
 
    Results from trainable = false, epochs = 3, random state = 42
  
  test = {'auc': 0.5003254,
 'eval_accuracy': 0.8653714,
 'false_negatives': 585.0,
 'false_positives': 4.0,
 'global_step': 1640,
 'loss': 0.37944606,
 'precision': 0.2,
 'recall': 0.0017064846,
 'true_negatives': 3785.0,
 'true_positives': 1.0}
  
  train = {'auc': 0.49989575,
 'eval_accuracy': 0.8616334,
 'false_negatives': 2399.0,
 'false_positives': 22.0,
 'global_step': 1640,
 'loss': 0.3832463,
 'precision': 0.12,
 'recall': 0.0012489592,
 'true_negatives': 15073.0,
 'true_positives': 3.0}
 
     Results from trainable =false, epochs = 10, random state = 42
  
  test = {'auc': 0.5003254,
 'eval_accuracy': 0.8653714,
 'false_negatives': 585.0,
 'false_positives': 4.0,
 'global_step': 5467,
 'loss': 0.35959232,
 'precision': 0.2,
 'recall': 0.0017064846,
 'true_negatives': 3785.0,
 'true_positives': 1.0}