# Predict labels of candidate relation triples with BERT

Some code was adapted from [this colab notebook](https://colab.sandbox.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_finetuning_with_cloud_tpus.ipynb).

Firstly, you need install bert and tensorflow libraries.

In [1]:
# !pip install bert-tensorflow==1.0.1
# !pip install tensorflow==1.15.0
# !pip install tensorflow_hub==0.11.0 

In [2]:
import sys
sys.path.append("../")

from scripts import myutils

In [3]:
from sklearn.model_selection import train_test_split
import pandas as pd
import tensorflow as tf
import tensorflow_hub as hub
from datetime import datetime

In [4]:
import bert
from bert import run_classifier
from bert import optimization
from bert import tokenization




Below, set an output directory location to store our model output and checkpoints. 

In [5]:
OUTPUT_DIR = 'OUTPUT_DIR'

# Get your data

In [6]:
train_set_csv_path = "../csvs/processed_train_set.csv"
test_set_csv_path = "../csvs/processed_test_set.csv"

train = pd.read_csv(train_set_csv_path)
test = pd.read_csv(test_set_csv_path)

train = train.sample(len(train))
test = test.sample(len(test))

# print(train)
# print(test)

## Take a look at our train and test data.

In [7]:
train.columns
train.head()

Unnamed: 0,entity1,relation,entity2,merged_sentence,label
993,iPad applications,are usually written in,the Xcode IDE,ipad applications are usually written in the x...,T
1494,Apple,announced,a new language,apple announced a new language,F
1204,native code,compiled to target,version,native code compiled to target version,F
909,the MVC pattern,is supported by,factory methods,the mvc pattern is supported by factory methods,T
1728,AngularJS,is used internally by,applications,angularjs is used internally by applications,F


In [8]:
test.columns
test.head()

Unnamed: 0,entity1,relation,entity2,merged_sentence,label
554,The Arduino Uno,has,an ICSP header,the arduino uno has an icsp header,T
564,Using Markdown,is simplified to,LaTeX documents,using markdown is simplified to latex documents,T
201,Entry,Controlled,loops,entry controlled loops,F
777,MFC classes,are identified by,CMFCPropertyPage,mfc classes are identified by cmfcpropertypage,T
428,RDB,loses created,snapshot,rdb loses created snapshot,T


## For me, my input data is the 'merged_sentence' column and label is the 'label' column (T and F)

In [9]:
DATA_COLUMN = "merged_sentence"
LABEL_COLUMN = "label"
# label_list is the list of labels, i.e. True, False or 0, 1 or 'dog', 'cat'
label_list = ['T', 'F']

# Data Preprocessing
We'll need to transform our data into a format BERT understands. This involves two steps. First, we create  `InputExample`'s using the constructor provided in the BERT library.

- `text_a` is the text we want to classify, which in this case, is the `Request` field in our Dataframe. 
- `text_b` is used if we're training a model to understand the relationship between sentences (i.e. is `text_b` a translation of `text_a`? Is `text_b` an answer to the question asked by `text_a`?). This doesn't apply to our task, so we can leave `text_b` blank.
- `label` is the label for our example, i.e. True, False

In [10]:
# Use the InputExample class from BERT's run_classifier code to create examples from the data
train_InputExamples = train.apply(lambda x: bert.run_classifier.InputExample(guid=None, # Globally unique ID for bookkeeping, unused in this example
                                                                   text_a = x[DATA_COLUMN], 
                                                                   text_b = None, 
                                                                   label = x[LABEL_COLUMN]), axis = 1)
# print(train_InputExamples)

test_InputExamples = test.apply(lambda x: bert.run_classifier.InputExample(guid=None, 
                                                                   text_a = x[DATA_COLUMN], 
                                                                   text_b = None, 
                                                                   label = x[LABEL_COLUMN]), axis = 1)

Next, we need to preprocess our data so that it matches the data BERT was trained on. For this, we'll need to do a couple of things (but don't worry--this is also included in the Python library):


1. Lowercase our text (if we're using a BERT lowercase model)
2. Tokenize it (i.e. "sally says hi" -> ["sally", "says", "hi"])
3. Break words into WordPieces (i.e. "calling" -> ["call", "##ing"])
4. Map our words to indexes using a vocab file that BERT provides
5. Add special "CLS" and "SEP" tokens (see the [readme](https://github.com/google-research/bert))
6. Append "index" and "segment" tokens to each input (see the [BERT paper](https://arxiv.org/pdf/1810.04805.pdf))

Happily, we don't have to worry about most of these details.




## Load a vocabulary file and lowercasing information from URL or local position.

In [11]:
# This is a path to an uncased (all lowercase) version of BERT

# BERT_MODEL_HUB = "https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1"
BERT_MODEL_HUB = hub.load_module_spec("E:/Dl/Chrome/bert_uncased_L-12_H-768_A-12_1")

In [12]:

def create_tokenizer_from_hub_module():
  """Get the vocab file and casing info from the Hub module."""
  with tf.Graph().as_default():
    bert_module = hub.Module(BERT_MODEL_HUB)
    tokenization_info = bert_module(signature="tokenization_info", as_dict=True)
    print(tokenization_info)
    with tf.Session() as sess:
      vocab_file, do_lower_case = sess.run([tokenization_info["vocab_file"],
                                            tokenization_info["do_lower_case"]])
      
  return bert.tokenization.FullTokenizer(
      vocab_file=vocab_file, do_lower_case=do_lower_case)



In [13]:
tokenizer = create_tokenizer_from_hub_module()

INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
{'do_lower_case': <tf.Tensor 'module_apply_tokenization_info/Const:0' shape=() dtype=bool>, 'vocab_file': <tf.Tensor 'module_apply_tokenization_info/vocab_file:0' shape=() dtype=string>}




## Using the tokenizer, we'll call `run_classifier.convert_examples_to_features` on our InputExamples to convert them into features BERT understands.

In [14]:
# We'll set sequences to be at most 128 tokens long.
MAX_SEQ_LENGTH = 128
# Convert our train and test features to InputFeatures that BERT understands.
train_features = bert.run_classifier.convert_examples_to_features(train_InputExamples, label_list, MAX_SEQ_LENGTH, tokenizer)
test_features = bert.run_classifier.convert_examples_to_features(test_InputExamples, label_list, MAX_SEQ_LENGTH, tokenizer)



INFO:tensorflow:Writing example 0 of 2266
INFO:tensorflow:Writing example 0 of 2266
INFO:tensorflow:*** Example ***
INFO:tensorflow:*** Example ***
INFO:tensorflow:guid: None
INFO:tensorflow:guid: None
INFO:tensorflow:tokens: [CLS] ipad applications are usually written in the x ##code id ##e [SEP]
INFO:tensorflow:tokens: [CLS] ipad applications are usually written in the x ##code id ##e [SEP]
INFO:tensorflow:input_ids: 101 25249 5097 2024 2788 2517 1999 1996 1060 16044 8909 2063 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:input_ids: 101 25249 5097 2024 2788 2517 1999 1996 1060 16044 8909 2063 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

# Creating a model

Now that we've prepared our data, let's focus on building a model. `create_model` does just this below. First, it loads the BERT tf hub module again (this time to extract the computation graph). Next, it creates a single new layer that will be trained to adapt BERT to our sentiment task (i.e. classifying whether a movie review is positive or negative). This strategy of using a mostly trained model is called [fine-tuning](http://wiki.fast.ai/index.php/Fine_tuning).

In [15]:
def create_model(is_predicting, input_ids, input_mask, segment_ids, labels,
                 num_labels):
  """Creates a classification model."""

  bert_module = hub.Module(
      BERT_MODEL_HUB,
      trainable=True)
  bert_inputs = dict(
      input_ids=input_ids,
      input_mask=input_mask,
      segment_ids=segment_ids)
  bert_outputs = bert_module(
      inputs=bert_inputs,
      signature="tokens",
      as_dict=True)
  
  print(bert_outputs.keys())
  # Use "pooled_output" for classification tasks on an entire sentence.
  # Use "sequence_outputs" for token-level output.
  output_layer = bert_outputs["pooled_output"]

  hidden_size = output_layer.shape[-1].value

  # Create our own layer to tune for politeness data.
  output_weights = tf.get_variable(
      "output_weights", [num_labels, hidden_size],
      initializer=tf.truncated_normal_initializer(stddev=0.02))

  output_bias = tf.get_variable(
      "output_bias", [num_labels], initializer=tf.zeros_initializer())

  with tf.variable_scope("loss"):

    # Dropout helps prevent overfitting
    output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)

    logits = tf.matmul(output_layer, output_weights, transpose_b=True)
    logits = tf.nn.bias_add(logits, output_bias)
    log_probs = tf.nn.log_softmax(logits, axis=-1)

    # Convert labels into one-hot encoding
    one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)

    predicted_labels = tf.squeeze(tf.argmax(log_probs, axis=-1, output_type=tf.int32))
    # If we're predicting, we want predicted labels and the probabiltiies.
    if is_predicting:
      return (predicted_labels, log_probs)

    # If we're train/eval, compute loss between predicted and actual label
    per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)
    loss = tf.reduce_mean(per_example_loss)
    return (loss, predicted_labels, log_probs)


## Next wrap our model function in a `model_fn_builder` function that adapts our model to work for training, evaluation, and prediction.

In [16]:
# model_fn_builder actually creates our model function
# using the passed parameters for num_labels, learning_rate, etc.
def model_fn_builder(num_labels, learning_rate, num_train_steps,
                     num_warmup_steps):
  """Returns `model_fn` closure for TPUEstimator."""
  def model_fn(features, labels, mode, params):  # pylint: disable=unused-argument
    """The `model_fn` for TPUEstimator."""

    input_ids = features["input_ids"]
    input_mask = features["input_mask"]
    segment_ids = features["segment_ids"]
    label_ids = features["label_ids"]

    is_predicting = (mode == tf.estimator.ModeKeys.PREDICT)
    
    # TRAIN and EVAL
    if not is_predicting:

      (loss, predicted_labels, log_probs) = create_model(
        is_predicting, input_ids, input_mask, segment_ids, label_ids, num_labels)

      train_op = bert.optimization.create_optimizer(
          loss, learning_rate, num_train_steps, num_warmup_steps, use_tpu=False)

      # Calculate evaluation metrics. 
      def metric_fn(label_ids, predicted_labels):
        accuracy = tf.metrics.accuracy(label_ids, predicted_labels)
        f1_score = tf.contrib.metrics.f1_score(
            label_ids,
            predicted_labels)
        auc = tf.metrics.auc(
            label_ids,
            predicted_labels)
        recall = tf.metrics.recall(
            label_ids,
            predicted_labels)
        precision = tf.metrics.precision(
            label_ids,
            predicted_labels) 
        true_pos = tf.metrics.true_positives(
            label_ids,
            predicted_labels)
        true_neg = tf.metrics.true_negatives(
            label_ids,
            predicted_labels)   
        false_pos = tf.metrics.false_positives(
            label_ids,
            predicted_labels)  
        false_neg = tf.metrics.false_negatives(
            label_ids,
            predicted_labels)
        return {
            "eval_accuracy": accuracy,
            "f1_score": f1_score,
            "auc": auc,
            "precision": precision,
            "recall": recall,
            "true_positives": true_pos,
            "true_negatives": true_neg,
            "false_positives": false_pos,
            "false_negatives": false_neg
        }

      eval_metrics = metric_fn(label_ids, predicted_labels)

      if mode == tf.estimator.ModeKeys.TRAIN:
        return tf.estimator.EstimatorSpec(mode=mode,
          loss=loss,
          train_op=train_op)
      else:
          return tf.estimator.EstimatorSpec(mode=mode,
            loss=loss,
            eval_metric_ops=eval_metrics)
    else:
      (predicted_labels, log_probs) = create_model(
        is_predicting, input_ids, input_mask, segment_ids, label_ids, num_labels)

      predictions = {
          'probabilities': log_probs,
          'labels': predicted_labels
      }
      return tf.estimator.EstimatorSpec(mode, predictions=predictions)

  # Return the actual model function in the closure
  return model_fn


In [17]:
BATCH_SIZE = 10
LEARNING_RATE = 2e-5
NUM_TRAIN_EPOCHS = 3.0
# Warmup is a period of time where hte learning rate 
# is small and gradually increases--usually helps training.
WARMUP_PROPORTION = 0.1
# Model configs
SAVE_CHECKPOINTS_STEPS = 500
SAVE_SUMMARY_STEPS = 100

## Compute train and warmup steps from batch size

In [18]:
num_train_steps = int(len(train_features) / BATCH_SIZE * NUM_TRAIN_EPOCHS)
print(num_train_steps)
num_warmup_steps = int(num_train_steps * WARMUP_PROPORTION)

679


In [19]:
# Specify outpit directory and number of checkpoint steps to save
run_config = tf.estimator.RunConfig(
    model_dir=OUTPUT_DIR,
    save_summary_steps=SAVE_SUMMARY_STEPS,
    save_checkpoints_steps=SAVE_CHECKPOINTS_STEPS)

In [20]:
model_fn = model_fn_builder(
  num_labels=len(label_list),
  learning_rate=LEARNING_RATE,
  num_train_steps=num_train_steps,
  num_warmup_steps=num_warmup_steps)

estimator = tf.estimator.Estimator(
  model_fn=model_fn,
  config=run_config,
  params={"batch_size": BATCH_SIZE})


INFO:tensorflow:Using config: {'_model_dir': 'OUTPUT_DIR', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 500, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x000001FB7341C7B8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Using config: {'_model_dir': 'OUTPUT_DIR', '_tf_random_seed': None, '_save_summa

## Next create an input builder function that takes our training feature set (`train_features`) and produces a generator. This is a pretty standard design pattern for working with Tensorflow [Estimators](https://www.tensorflow.org/guide/estimators).

In [21]:
# Create an input function for training. drop_remainder = True for using TPUs.
train_input_fn = bert.run_classifier.input_fn_builder(
    features=train_features,
    seq_length=MAX_SEQ_LENGTH,
    is_training=True,
    drop_remainder=False)

## Now Let's train the model. For me, using a GPU, my training time was about 5 minutes.

In [22]:
print('Beginning Training!')
current_time = datetime.now()
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
print("Training took time ", datetime.now() - current_time)

Beginning Training!
INFO:tensorflow:Skipping training since max_steps has already saved.
INFO:tensorflow:Skipping training since max_steps has already saved.
Training took time  0:00:00.015957


## Now let's test the model
### For me, the trained model can get a accuracy of 0.821, recall of 0.752 and F1 of 0.808

In [23]:
test_input_fn = run_classifier.input_fn_builder(
    features=test_features,
    seq_length=MAX_SEQ_LENGTH,
    is_training=False,
    drop_remainder=False)

In [24]:
estimator.evaluate(input_fn=test_input_fn, steps=None)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
dict_keys(['sequence_output', 'pooled_output'])
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.






Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://githu

{'auc': 0.8275,
 'eval_accuracy': 0.8275,
 'f1_score': 0.81300807,
 'false_negatives': 100.0,
 'false_positives': 38.0,
 'global_step': 679,
 'loss': 0.77910703,
 'precision': 0.88757396,
 'recall': 0.75,
 'true_negatives': 362.0,
 'true_positives': 300.0}

## Now let's make predictions on new sentences:

In [25]:
def getPrediction(in_sentences):
  labels = ['T', 'F']
  input_examples = [run_classifier.InputExample(guid="", text_a = x, text_b = None, label = "T") for x in in_sentences] # here, "" is just a dummy labelt

  input_features = run_classifier.convert_examples_to_features(input_examples, label_list, MAX_SEQ_LENGTH, tokenizer)

  predict_input_fn = run_classifier.input_fn_builder(features=input_features, seq_length=MAX_SEQ_LENGTH, is_training=False, drop_remainder=False)

  predictions = estimator.predict(predict_input_fn)
  
  return [[sentence, labels[prediction['labels']]] for sentence, prediction in zip(in_sentences, predictions)]

In [26]:
pred_sentences = [
  "openshift implments a container",
  "anything deals with speed",
  "The film was creative and surprising",
  "PyTables is built on HDF5 library",
  "Jansi is small java library",
  "Absolutely fantastic!"
]

In [27]:
predictions = getPrediction(pred_sentences)

INFO:tensorflow:Writing example 0 of 6
INFO:tensorflow:Writing example 0 of 6
INFO:tensorflow:*** Example ***
INFO:tensorflow:*** Example ***
INFO:tensorflow:guid: 
INFO:tensorflow:guid: 
INFO:tensorflow:tokens: [CLS] opens ##hi ##ft imp ##lm ##ents a container [SEP]
INFO:tensorflow:tokens: [CLS] opens ##hi ##ft imp ##lm ##ents a container [SEP]
INFO:tensorflow:input_ids: 101 7480 4048 6199 17727 13728 11187 1037 11661 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:input_ids: 101 7480 4048 6199 17727 13728 11187 1037 11661 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:input_

## It seems an ideal prediction. Keep going.

In [28]:
for prediction in predictions:
    print(prediction)

['openshift implments a container', 'T']
['anything deals with speed', 'F']
['The film was creative and surprising', 'F']
['PyTables is built on HDF5 library', 'T']
['Jansi is small java library', 'T']
['Absolutely fantastic!', 'F']


## Now we need to predict labels of all left triples

In [40]:
import csv
from tqdm import tqdm

In [41]:
merged_relation_triples = myutils.open_csv("merged_relation_triples")
merged_relation_triples.writerow(["entity1", "relation", "entity2", "merged_sentence"])

42

## merge candidate relation triples to a sentence

In [42]:
with open("../csvs/candidate_relation_triples.csv", newline='', encoding="gb18030") as src_csv_path:
    src_csv = csv.reader(src_csv_path)
    for line in tqdm(src_csv):
        merged_sentence = line[0].lower().strip() + " " + line[1].lower().strip() + " " + line[2].lower().strip()
        merged_relation_triples.writerow([line[0], line[1], line[2], merged_sentence])


4298it [00:00, 205219.75it/s]


## use model to predict labels

In [43]:
sentences = list()
with open("../csvs/merged_relation_triples.csv", newline='', encoding="gb18030") as src_csv_path:
    src_csv = csv.reader(src_csv_path)
    for line in tqdm(src_csv):
        # 3 is "merged_sentence"
        sentences.append(line[3])

# print(sentences)


4230it [00:00, 469922.02it/s]


## get predicitons

In [44]:
predictions = getPrediction(sentences)

INFO:tensorflow:Writing example 0 of 4230
INFO:tensorflow:Writing example 0 of 4230
INFO:tensorflow:*** Example ***
INFO:tensorflow:*** Example ***
INFO:tensorflow:guid: 
INFO:tensorflow:guid: 
INFO:tensorflow:tokens: [CLS] merged _ sentence [SEP]
INFO:tensorflow:tokens: [CLS] merged _ sentence [SEP]
INFO:tensorflow:input_ids: 101 5314 1035 6251 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:input_ids: 101 5314 1035 6251 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:input_mask: 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

## write predicted results into labeled_relation_triples

In [45]:
for prediction in predictions:
    print(prediction)

['merged_sentence', 'F']
['nuxt.js comes with a framework', 'T']
['nuxt.js comes with features', 'T']
['nuxt.js following to create a rich web application development', 'F']
['file reduces the load time', 'F']
['nuxt generate provides a powerful tool', 'T']
['nuxt generate prerender pages', 'T']
['google cloud datastore offers the following features', 'F']
['markdown can be converted to html', 'T']
['markdown create rich text', 'T']
['markdown is often used to format readme files', 'T']
['the language takes many cues', 'F']
['the language takes existing conventions', 'F']
['a perl script converts up text input', 'T']
['a perl script replaces pointing angle brackets', 'T']
['a perl script replaces ampersands', 'T']
['markdown been re-implemented by others', 'T']
['markdown been re-implemented by a perl module', 'T']
['markdown has since been re-implemented others', 'F']
['markdown has since been re-implemented a perl module', 'T']
['markdown is distributed under style license', 'T']
['n

In [46]:
# print(type(predictions))
labeled_relation_triples = myutils.open_csv("labeled_relation_triples")
labeled_relation_triples.writerow(["entity1", "relation", "entity2", "merged_sentence", "label"])
with open("../csvs/merged_relation_triples.csv", newline='', encoding="gb18030") as src_csv_path:
    src_csv = csv.reader(src_csv_path)
    # for prediction in predictions:
    cnt = 0
    for line in src_csv:
        if cnt == 0:
            cnt += 1
            continue
        labeled_relation_triples.writerow([line[0], line[1], line[2], line[3], predictions[cnt][1]])
        # print(line[0], line[1], line[2], line[3], predictions[cnt][1])
        cnt += 1
    print(cnt)

4230
