# Training a bert model on new data

In this notebook we will use the same model architecture that we used for the scierc dataset. But this time we will train and evaluate our models on a new dataset. The [TAC Relation Extraction Dataset](https://nlp.stanford.edu/projects/tacred/)

## The TAC Relation Extraction Dataset
TACRED is a large-scale relation extraction dataset with 106,264 examples built over newswire and web text from the [corpus](https://catalog.ldc.upenn.edu/LDC2018T03) used in the yearly [TAC Knowledge Base Population (TAC KBP) challenges](https://tac.nist.gov/2017/KBP/index.html). Examples in TACRED cover 41 relation types as used in the TAC KBP challenges (e.g., per:schools_attended and org:members) or are labeled as no_relation if no defined relation is held. These examples are created by combining available human annotations from the TAC KBP challenges and crowdsourcing.

### Bias towards predicting false positives
To ensure that models trained on TACRED are not biased towards predicting false positives on real-world text, we fully annotated all sampled sentences where no relation was found between the mention pairs to be negative examples. As a result, 79.5% of the examples are labeled as no_relation. Among the examples where a relation was found, the distribution of relations is:

### The dataset is made up of 3 JSON files:
1. train.json: The training examples. 56196 in total.
2. dev.json: The development examples. 5000 in total.
3. test.json: The test examples. 5000 in total.

## Preparing the data

Since the datapoints of the New York Times (NYT) dataset have a different shape from the SciERC dataset. 
We must first map the NYT data to the shape of the SciERC data so it can fit in our Dataset class.

### Install nltk
We must first install the nltk library that will be used for the tokenization of the NYT sentences.

In [1]:
! pip install nltk



## Install `punkt` for nltk
Since we'll use nlyk's word_tokenize, we must also download the punkt tokenizer

In [2]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\odaim\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

## Some needed imports

In [3]:
import json
import os
from unidecode import unidecode
from tqdm import tqdm
import itertools

from nltk.tokenize import word_tokenize

In [4]:
def normalize_tacred_sample(sample):
    norm = {}
    norm['doc_key'] = sample['docid']
    norm['sentences'] = [sample['token']]
    relation = sample['relation']
    norm['ner'] = []
    norm['relations'] = [[[sample['subj_start'], sample['subj_end'], sample['obj_start'],sample['obj_end'], relation]]]
    norm['clusters'] = []
    
    entities = []
    tac_ner = sample['stanford_ner']
    entities.append([sample['subj_start'], sample['subj_end'], sample['subj_type']])
    entities.append([sample['obj_start'], sample['obj_end'], sample['obj_type']])
    # # print(tokens)
    i = 0
    while i < len(tac_ner):
        if tac_ner[i] != 'O':
            ner = []
            ner.append(i)
            j = i
            while j < len(tac_ner):
                if tac_ner[i] == tac_ner[j]:
                    if j == len(tac_ner) - 1:
                        ner.append(j)
                        ner.append(tac_ner[j])
                        break
                    j = j + 1
                    continue
                else:
                    ner.append(j - 1)
                    ner.append(tac_ner[j - 1])
                    i = j
                    break
                    
            entities.append(ner)
        i = i + 1
    entities.sort()
    norm['ner'].append(list(entities for entities,_ in itertools.groupby(entities)))
    
    return norm

In [5]:


print(normalize_tacred_sample(json.loads('{"id": "e779865fb91e34998dce", "docid": "eng-NG-31-142693-10075646", "relation": "no_relation", "token": ["To", "Fed", "Judge", "Jeff", "White", ",", "Cayman", "Isles", "Bank", "Julius", "Baer", "\'s", "Lapdog", "in", "San", "Francisco"], "subj_start": 9, "subj_end": 10, "obj_start": 1, "obj_end": 1, "subj_type": "PERSON", "obj_type": "ORGANIZATION", "stanford_pos": ["TO", "NNP", "NNP", "NNP", "NNP", ",", "NNP", "NNP", "NNP", "NNP", "NNP", "POS", "NNP", "IN", "NNP", "NNP"], "stanford_ner": ["O", "O", "O", "PERSON", "PERSON", "O", "ORGANIZATION", "ORGANIZATION", "ORGANIZATION", "PERSON", "PERSON", "O", "O", "O", "LOCATION", "LOCATION"], "stanford_head": [5, 5, 5, 5, 0, 5, 11, 11, 11, 11, 13, 11, 5, 16, 16, 13], "stanford_deprel": ["case", "compound", "compound", "compound", "ROOT", "punct", "compound", "compound", "compound", "compound", "nmod:poss", "case", "appos", "case", "compound", "nmod"]}')))

{'doc_key': 'eng-NG-31-142693-10075646', 'sentences': [['To', 'Fed', 'Judge', 'Jeff', 'White', ',', 'Cayman', 'Isles', 'Bank', 'Julius', 'Baer', "'s", 'Lapdog', 'in', 'San', 'Francisco']], 'ner': [[[1, 1, 'ORGANIZATION'], [3, 4, 'PERSON'], [6, 8, 'ORGANIZATION'], [9, 10, 'PERSON'], [10, 10, 'PERSON'], [14, 15, 'LOCATION'], [15, 15, 'LOCATION']]], 'relations': [[[9, 10, 1, 1, 'no_relation']]], 'clusters': []}


In [9]:
def write_normal_data(in_dir, out_dir):
    with open(in_dir) as f:
        data = json.load(f)
        for i in tqdm(range (len(data)), desc="Normalizing data samples..."):
            normal_sample = normalize_tacred_sample(data[i])
            with open(out_dir, 'a') as normalized:
                normalized.write(json.dumps(normal_sample) + "\n")

In [8]:
train_data_path = os.getcwd() + '/other_data/tacred/data/json/train.json'
normal_train_data_path = os.getcwd() + '/other_data/tacred/data/json/norm_train.json'

write_normal_data(train_data_path, normal_train_data_path)

In [9]:
test_data_path = os.getcwd() + '/other_data/tacred/data/json/test.json'
normal_test_data_path = os.getcwd() + '/other_data/tacred/data/json/norm_test.json'

write_normal_data(test_data_path, normal_test_data_path)

In [10]:
dev_data_path = os.getcwd() + '/other_data/tacred/data/json/dev.json'
normal_dev_data_path = os.getcwd() + '/other_data/tacred/data/json/norm_dev.json'

write_normal_data(dev_data_path, normal_dev_data_path)

## Training our bert models

Now that the NYT data has the propper shaep, we can train a new Bert model on it.

### The entity model

#### Set up

First we run the entity_setup.ipynb notebook to setup our classes and functions in the kernal.

In [15]:
%run entity_model/entity_setup.ipynb

  from .autonotebook import tqdm as notebook_tqdm


#### Model training

Now we train our bert-based entity model. This is gonna be very familiar compared to the work we've done before.
Before anything, we setup some variables. The same ones we set before.

#### `task_ner_labels`
This is a map from our datasets to their relative entity types. Here we added the NYT dataset entity types.

In [16]:
task_ner_labels = {
    'ace04': ['FAC', 'WEA', 'LOC', 'VEH', 'GPE', 'ORG', 'PER'],
    'ace05': ['FAC', 'WEA', 'LOC', 'VEH', 'GPE', 'ORG', 'PER'],
    'scierc': ['Method', 'OtherScientificTerm', 'Task', 'Generic', 'Material', 'Metric'],
    'tacred': ['ORGANIZATION', 'NUMBER', 'MONEY', 'ORDINAL', 'DATE', 'PERCENT', 'PERSON', 'DURATION', 'MISC', 'LOCATION', 'SET', 'TIME', 'TITLE', 'NATIONALITY', 'RELIGION', 'URL', 'CAUSE_OF_DEATH', 'COUNTRY', 'STATE_OR_PROVINCE', 'CRIMINAL_CHARGE', 'CITY', 'IDEOLOGY']
}

Then we define the other variables:
- `data_dir`: The directory in which our input data is stored.
- `output_dir`: The directory to which to write  the output of the mnodel.
- `task`: The task that the model will be used to make predictions on. 
- max_span_length: The maximum length of spans to consider. 
- context_window: The size of the context window to consider around each sentence.
- eval_batch_size: The batch size of the samples.
- test_pred_filename: The name of the prediction output file.

In [17]:
data_dir = os.getcwd() + '/other_data/tacred/data/json/'
output_dir = os.getcwd() + '/tacred_models/ent-scib-ctx0/'
task = 'tacred'
max_span_length = 8
test_pred_filename = 'ent_pred_test.json'
dev_pred_filename = 'ent_pred_dev.json'

num_ner_labels = len(task_ner_labels[task]) + 1
context_window = 300
eval_batch_size = 32
train_batch_size = 2
learning_rate = 1e-5
task_learning_rate = 5e-4
bertadam = True # If bertadam, then set correct_bias = False
num_epoch = 4 # number of the training epochs
warmup_proportion = 0.1 # the ratio of the warmup steps to the total steps
eval_per_epoch = 1 # how often evaluating the trained model on dev set during training
train_shuffle = True # whether to train with randomly shuffled data
print_loss_step = 100 # how often logging the loss value during training

#### Data File Paths:
Since the SciERC dataset is already split into a training, development, and test set. We don't need to perform any split. So let's just load set the paths to the data files dowanloaded with the dataset.


In [18]:
train_data = os.path.join(data_dir, 'norm_train.json')
dev_data = os.path.join(data_dir, 'norm_dev.json')
test_data = os.path.join(data_dir, 'norm_test.json')

#### Output Directory Check

Then, just to be safe, we check if the specified output directory (`output_dir`) exists. If not, we create the directory. This ensures that the output directory is available for storing model checkpoints, predictions, or other outputs.

In [19]:
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

#### NER Label Mapping

The `get_labelmap` function is used to get the mapping for the SchiREC task as discussed above.

In [20]:
ner_label2id, ner_id2label = get_labelmap(task_ner_labels[task])

#### Development Dataset Processing

The development dataset (`dev_data`) is loaded into a `Dataset` object. Then, it is processed using the `convert_dataset_to_samples` function to obtain samples and NER labels. The samples are batchified using the `batchify` function.

In [21]:
dev_data = Dataset(dev_data)

In [22]:
dev_samples, dev_ner = convert_dataset_to_samples(dev_data, max_span_length, ner_label2id=ner_label2id, context_window=context_window)
dev_batches = batchify(dev_samples, eval_batch_size)

06/02/2024 14:09:59 - INFO - root - # Overlap: 0
06/02/2024 14:09:59 - INFO - root - Extracted 22631 samples from 22631 documents, with 129939 NER labels, 35.463 avg input length, 95 max length
06/02/2024 14:09:59 - INFO - root - Max Length: 95, max NER: 29


#### Initialize our entity model

We initialize an empty entity model.

In [23]:
model = EntityModel(model='allenai/scibert_scivocab_uncased', use_albert=False, max_span_length=max_span_length, num_ner_labels=num_ner_labels)

06/02/2024 14:10:02 - INFO - transformers.tokenization_utils_base - Model name 'allenai/scibert_scivocab_uncased' not found in model shortcut name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, TurkuNLP/bert-base-finnish-cased-v1, TurkuNLP/bert-base-finnish-uncased-v1, wietsedv/bert-base-dutch-cased). Assuming 'allenai/scibert_scivocab_uncased' is a path, a model identifier, or url to a directory containing tokenizer files.
06/02/2024 14:10:03 - INFO - transformers.file_utils - https://s3.amazonaws.com/models.huggingface.co/bert/allenai/scibert_scivocab_uncased/vocab.txt not fo

#### Load training data

We load the training data from the JSON file into a Database instance

In [24]:
train_data = Dataset(train_data)

#### Training the model

Now we can train the model.

In [25]:
train_samples, train_ner = convert_dataset_to_samples(train_data, max_span_length, ner_label2id=ner_label2id, context_window=context_window)
train_batches = batchify(train_samples, train_batch_size)
best_result = 0.0

param_optimizer = list(model.bert_model.named_parameters())
optimizer_grouped_parameters = [
    {'params': [p for n, p in param_optimizer
        if 'bert' in n]},
    {'params': [p for n, p in param_optimizer
        if 'bert' not in n], 'lr': task_learning_rate}]
optimizer = AdamW(optimizer_grouped_parameters, lr=learning_rate, correct_bias=not(bertadam))
t_total = len(train_batches) * num_epoch
scheduler = get_linear_schedule_with_warmup(optimizer, int(t_total*warmup_proportion), t_total)

tr_loss = 0
tr_examples = 0
global_step = 0
eval_step = len(train_batches) // eval_per_epoch
for _ in tqdm(range(num_epoch), position=0, leave=True):
    if train_shuffle:
        random.shuffle(train_batches)
    for i in tqdm(range(len(train_batches)), position=0, leave=True):
        output_dict = model.run_batch(train_batches[i], training=True)
        loss = output_dict['ner_loss']
        loss.backward()

        tr_loss += loss.item()
        tr_examples += len(train_batches[i])
        global_step += 1

        optimizer.step()
        scheduler.step()
        optimizer.zero_grad()

        if global_step % print_loss_step == 0:
            logger.info('Epoch=%d, iter=%d, loss=%.5f'%(_, i, tr_loss / tr_examples))
            tr_loss = 0
            tr_examples = 0

        if global_step % eval_step == 0:
            f1 = evaluate(model, dev_batches, dev_ner)
            if f1 > best_result:
                best_result = f1
                logger.info('!!! Best valid (epoch=%d): %.2f' % (_, f1*100))
                save_model(model, output_dir)

06/02/2024 14:19:59 - INFO - root - # Overlap: 0
06/02/2024 14:19:59 - INFO - root - Extracted 68124 samples from 68124 documents, with 420007 NER labels, 37.069 avg input length, 96 max length
06/02/2024 14:19:59 - INFO - root - Max Length: 96, max NER: 38
  0%|          | 99/34062 [02:02<1:59:17,  4.75it/s] 06/02/2024 14:22:01 - INFO - root - Epoch=0, iter=99, loss=898.54524
  1%|          | 199/34062 [02:22<1:57:52,  4.79it/s]06/02/2024 14:22:21 - INFO - root - Epoch=0, iter=199, loss=711.30392
  1%|          | 299/34062 [02:44<3:14:09,  2.90it/s]06/02/2024 14:22:43 - INFO - root - Epoch=0, iter=299, loss=201.32274
  1%|          | 399/34062 [03:04<1:39:24,  5.64it/s]06/02/2024 14:23:03 - INFO - root - Epoch=0, iter=399, loss=46.25861
  1%|▏         | 499/34062 [03:25<1:43:44,  5.39it/s]06/02/2024 14:23:24 - INFO - root - Epoch=0, iter=499, loss=46.17288
  2%|▏         | 599/34062 [03:45<1:49:39,  5.09it/s]06/02/2024 14:23:44 - INFO - root - Epoch=0, iter=599, loss=42.56036
  2%|▏  

#### Trained model evaluation

Now let's evaluate our trained model on the test data.

Again, The BERT-based entity model (`EntityModel`) is initialized with specific parameters, including the BERT model name (`allenai/scibert_scivocab_uncased`), output directory in which the model is located (`bert_model_dir`), and the number of NER labels.

In [27]:
bert_model_dir = output_dir
num_ner_labels = len(task_ner_labels[task]) + 1
model = EntityModel(model='allenai/scibert_scivocab_uncased', bert_model_dir=bert_model_dir, use_albert=False, max_span_length=max_span_length, num_ner_labels=num_ner_labels)

06/02/2024 23:47:16 - INFO - root - Loading BERT model from C:\Users\odaim\Documents\PURE reproduction/tacred_models/ent-scib-ctx0//
06/02/2024 23:47:16 - INFO - transformers.tokenization_utils_base - Model name 'C:\Users\odaim\Documents\PURE reproduction/tacred_models/ent-scib-ctx0//' not found in model shortcut name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, TurkuNLP/bert-base-finnish-cased-v1, TurkuNLP/bert-base-finnish-uncased-v1, wietsedv/bert-base-dutch-cased). Assuming 'C:\Users\odaim\Documents\PURE reproduction/tacred_models/ent-scib-ctx0//' is a path, a model ident

#### Dev Dataset Processing and Evaluation

Just like we did with pre-trained model, the test dataset (`dev_data`) is loaded, processed, and batchified. The NER predictions are saved to a file using the `output_ner_predictions` function.

In [28]:
dev_data = Dataset(os.path.join(data_dir, 'norm_dev.json'))
prediction_file = os.path.join(output_dir, dev_pred_filename)
    
dev_samples, dev_ner = convert_dataset_to_samples(dev_data, max_span_length, ner_label2id=ner_label2id, context_window=context_window)
dev_batches = batchify(dev_samples, eval_batch_size)

output_ner_predictions(model, dev_batches, dev_data, output_file=prediction_file)

06/02/2024 23:47:29 - INFO - root - # Overlap: 0
06/02/2024 23:47:29 - INFO - root - Extracted 22631 samples from 22631 documents, with 129939 NER labels, 35.463 avg input length, 95 max length
06/02/2024 23:47:29 - INFO - root - Max Length: 95, max NER: 29
06/03/2024 00:04:38 - INFO - root - Total pred entities: 117582
06/03/2024 00:04:39 - INFO - root - Output predictions to C:\Users\odaim\Documents\PURE reproduction/tacred_models/ent-scib-ctx0/ent_pred_dev.json..


#### Test Dataset Processing and Evaluation

Just like we did with the dev dataset, the test dataset (`test_data`) is loaded, processed, and batchified similarly to the development dataset. The model is then evaluated on the test data using the `evaluate` function, and the NER predictions are saved to a file using the `output_ner_predictions` function.

In [29]:
test_data = Dataset(os.path.join(data_dir, 'norm_test.json'))
prediction_file = os.path.join(output_dir, test_pred_filename)
    
test_samples, test_ner = convert_dataset_to_samples(test_data, max_span_length, ner_label2id=ner_label2id, context_window=context_window)
test_batches = batchify(test_samples, eval_batch_size)
evaluate(model, test_batches, test_ner)
output_ner_predictions(model, test_batches, test_data, output_file=prediction_file)

06/03/2024 00:04:45 - INFO - root - # Overlap: 0
06/03/2024 00:04:45 - INFO - root - Extracted 15509 samples from 15509 documents, with 85473 NER labels, 34.755 avg input length, 96 max length
06/03/2024 00:04:45 - INFO - root - Max Length: 96, max NER: 28
06/03/2024 00:04:45 - INFO - root - Evaluating...
06/03/2024 00:16:26 - INFO - root - Accuracy: 0.994121
06/03/2024 00:16:26 - INFO - root - Cor: 67458, Pred TOT: 77306, Gold TOT: 85473
06/03/2024 00:16:26 - INFO - root - P: 0.87261, R: 0.78923, F1: 0.82883
06/03/2024 00:16:26 - INFO - root - Used time: 701.299599
06/03/2024 00:28:05 - INFO - root - Total pred entities: 77306
06/03/2024 00:28:05 - INFO - root - Output predictions to C:\Users\odaim\Documents\PURE reproduction/tacred_models/ent-scib-ctx0/ent_pred_test.json..


### Results

**Accuracy**: 99.42%

**Precision**: 87.56%\
**Recall**: 79.10%\
**F1 Score**: 83.12%

**Implications:**

- The high accuracy suggests that the model is effective in overall entity recognition on the test set of the TAC RED dataset.
- The precision value indicates that the model has a high level of confidence when predicting entities, with a relatively low rate of false positives.
- The recall value suggests that the model is successful in capturing a significant portion of the actual entities present in the test set.
- The F1 score, being a harmonic mean, provides a balanced evaluation of precision and recall. And it shows a good balance between precision and recall.

In summary, our entity model seems to be performing well on the TAC RED dataset, striking a balance between precision and recall.

### The relation model

Now that we've trained our entity model, we're ready to train the relation model.

#### Set up

First we run the relation_setup.ipynb notebook to setup our classes and functions in the kernal.

In [6]:
%run relation_model/relation_setup.ipynb

  from .autonotebook import tqdm as notebook_tqdm


In [22]:
def generate_relation_data(entity_data, use_gold=False, context_window=0):
    """
    Prepare data for the relation model
    If training: set use_gold = True
    """
    logger.info('Generate relation data from %s'%(entity_data))
    data = Dataset(entity_data)

    nner, nrel = 0, 0
    max_sentsample = 0
    samples = []
    for doc in data:
        for i, sent in enumerate(doc):
            sent_samples = []

            nner += len(sent.ner)
            nrel += len(sent.relations)
            if use_gold:
                sent_ner = sent.ner
            else:
                sent_ner = sent.predicted_ner
            
            gold_ner = {}
            for ner in sent.ner:
                gold_ner[ner.span] = ner.label
            
            gold_rel = {}
            for rel in sent.relations:
                gold_rel[rel.pair] = rel.label
            
            sent_start = 0
            sent_end = len(sent.text)
            tokens = sent.text

            if context_window > 0:
                add_left = (context_window-len(sent.text)) // 2
                add_right = (context_window-len(sent.text)) - add_left

                j = i - 1
                while j >= 0 and add_left > 0:
                    context_to_add = doc[j].text[-add_left:]
                    tokens = context_to_add + tokens
                    add_left -= len(context_to_add)
                    sent_start += len(context_to_add)
                    sent_end += len(context_to_add)
                    j -= 1

                j = i + 1
                while j < len(doc) and add_right > 0:
                    context_to_add = doc[j].text[:add_right]
                    tokens = tokens + context_to_add
                    add_right -= len(context_to_add)
                    j += 1
            
            for x in range(len(sent_ner)):
                for y in range(len(sent_ner)):
                    if x == y:
                        continue
                    sub = sent_ner[x]
                    obj = sent_ner[y]
                    if (sub.span, obj.span) not in gold_rel:
                        continue
                    label = gold_rel[(sub.span, obj.span)]
                    sample = {}
                    sample['docid'] = doc._doc_key
                    sample['id'] = '%s@%d::(%d,%d)-(%d,%d)'%(doc._doc_key, sent.sentence_ix, sub.span.start_doc, sub.span.end_doc, obj.span.start_doc, obj.span.end_doc)
                    sample['relation'] = label
                    sample['subj_start'] = sub.span.start_sent + sent_start
                    sample['subj_end'] = sub.span.end_sent + sent_start
                    sample['subj_type'] = sub.label
                    sample['obj_start'] = obj.span.start_sent + sent_start
                    sample['obj_end'] = obj.span.end_sent + sent_start
                    sample['obj_type'] = obj.label
                    sample['token'] = tokens
                    sample['sent_start'] = sent_start
                    sample['sent_end'] = sent_end

                    sent_samples.append(sample)

            max_sentsample = max(max_sentsample, len(sent_samples))
            samples += sent_samples
    
    tot = len(samples)
    logger.info('#samples: %d, max #sent.samples: %d'%(tot, max_sentsample))

    return data, samples, nrel

#### Training and evaluating the relation model

Now we train our own relation model from scratch on the same dataset. And then we will evaluate it using the test data.

First we setup some variables

In [23]:
model_name = 'allenai/scibert_scivocab_uncased'
add_new_tokens = False
no_cuda = False
do_train = True
do_eval = True
eval_test = True
do_lower_case = True
entity_output_dir = os.getcwd() + '/tacred_models/ent-scib-ctx0/'
entity_predictions_dev = 'ent_pred_dev.json'
eval_with_gold = True
context_window = 0
max_seq_length = 128
entity_predictions_test = 'ent_pred_test.json'
seed = 0
output_dir = os.getcwd() + '/tacred_models/rel-scib-ctx0/'
negative_label = 'no_relation'
task = 'tacred'
train_mode = 'random_sorted'
train_batch_size = 20
eval_batch_size = 20
num_train_epochs = 2
train_file = normal_train_data_path
eval_per_epoch = 10
learning_rate = 2e-5
prediction_file = 'predictions.json'
BertLayerNorm = torch.nn.LayerNorm
train_mode = 'random_sorted'
bertadam = True
warmup_proportion = 0.1
eval_metric = 'f1'
task_rel_labels = {
    'ace04': ['PER-SOC', 'OTHER-AFF', 'ART', 'GPE-AFF', 'EMP-ORG', 'PHYS'],
    'ace05': ['ART', 'ORG-AFF', 'GEN-AFF', 'PHYS', 'PER-SOC', 'PART-WHOLE'],
    'scierc': ['PART-OF', 'USED-FOR', 'FEATURE-OF', 'CONJUNCTION', 'EVALUATE-FOR', 'HYPONYM-OF', 'COMPARE'],
    'tacred': ['org:subsidiaries', 'org:political/religious_affiliation', 'per:cause_of_death', 'per:employee_of', 'org:number_of_employees/members', 'org:dissolved', 'per:city_of_birth', 'org:founded_by', 'org:alternate_names', 'org:members', 'per:stateorprovince_of_birth', 'org:founded', 'org:website', 'org:member_of', 'per:stateorprovinces_of_residence', 'per:siblings', 'per:other_family', 'per:title', 'org:city_of_headquarters', 'per:religion', 'per:charges', 'per:countries_of_residence', 'org:country_of_headquarters', 'per:stateorprovince_of_death', 'per:origin', 'per:schools_attended', 'per:spouse', 'no_relation', 'per:city_of_death', 'per:children', 'per:date_of_death', 'per:date_of_birth', 'org:shareholders', 'per:alternate_names', 'org:stateorprovince_of_headquarters', 'org:parents', 'per:age', 'per:cities_of_residence', 'per:parents', 'org:top_members/employees', 'per:country_of_birth', 'per:country_of_death']
}

Now we train the model and then we evaluate it

In [24]:
CLS = "[CLS]"
SEP = "[SEP]"

RelationModel = BertForRelation

device = torch.device("cuda" if torch.cuda.is_available() and not no_cuda else "cpu")
n_gpu = torch.cuda.device_count()

# train set
if do_train:
    train_dataset, train_examples, train_nrel = generate_relation_data(train_file, use_gold=True, context_window=context_window)
# dev set
if (do_eval and do_train) or (do_eval and not(eval_test)):
    eval_dataset, eval_examples, eval_nrel = generate_relation_data(os.path.join(entity_output_dir, entity_predictions_dev), use_gold=eval_with_gold, context_window=context_window)
# test set
if eval_test:
    test_dataset, test_examples, test_nrel = generate_relation_data(os.path.join(entity_output_dir, entity_predictions_test), use_gold=eval_with_gold, context_window=context_window)

06/04/2024 11:11:58 - INFO - run_relation - Generate relation data from C:\Users\odaim\Documents\PURE reproduction/other_data/tacred/data/json/norm_train.json
06/04/2024 11:12:14 - INFO - run_relation - #samples: 77580, max #sent.samples: 4
06/04/2024 11:12:16 - INFO - run_relation - Generate relation data from C:\Users\odaim\Documents\PURE reproduction/tacred_models/ent-scib-ctx0/ent_pred_dev.json
06/04/2024 11:12:24 - INFO - run_relation - #samples: 26250, max #sent.samples: 4
06/04/2024 11:12:25 - INFO - run_relation - Generate relation data from C:\Users\odaim\Documents\PURE reproduction/tacred_models/ent-scib-ctx0/ent_pred_test.json
06/04/2024 11:12:27 - INFO - run_relation - #samples: 17770, max #sent.samples: 4


In [25]:
setseed(seed)

if not do_train and not do_eval:
    raise ValueError("At least one of `do_train` or `do_eval` must be True.")

if not os.path.exists(output_dir):
    os.makedirs(output_dir)
if do_train:
    logger.addHandler(logging.FileHandler(os.path.join(output_dir, "train.log"), 'w'))
else:
    logger.addHandler(logging.FileHandler(os.path.join(output_dir, "eval.log"), 'w'))
    
# get label_list
if os.path.exists(os.path.join(output_dir, 'label_list.json')):
    with open(os.path.join(output_dir, 'label_list.json'), 'r') as f:
        label_list = json.load(f)
else:
    label_list = [negative_label] + task_rel_labels[task]
    with open(os.path.join(output_dir, 'label_list.json'), 'w') as f:
        json.dump(label_list, f)
label2id = {label: i for i, label in enumerate(label_list)}
id2label = {i: label for i, label in enumerate(label_list)}
num_labels = len(label_list)

tokenizer = AutoTokenizer.from_pretrained(model_name, do_lower_case=do_lower_case)
if add_new_tokens:
    add_marker_tokens(tokenizer, task_ner_labels[task])

if os.path.exists(os.path.join(output_dir, 'special_tokens.json')):
    with open(os.path.join(output_dir, 'special_tokens.json'), 'r') as f:
        special_tokens = json.load(f)
else:
    special_tokens = {}
    
if do_eval and (do_train or not(eval_test)):
    eval_features = convert_examples_to_features(
        eval_examples, label2id, max_seq_length, tokenizer, special_tokens, unused_tokens=not(add_new_tokens))
    logger.info("***** Dev *****")
    logger.info("  Num examples = %d", len(eval_examples))
    logger.info("  Batch size = %d", eval_batch_size)
    all_input_ids = torch.tensor([f.input_ids for f in eval_features], dtype=torch.long)
    all_input_mask = torch.tensor([f.input_mask for f in eval_features], dtype=torch.long)
    all_segment_ids = torch.tensor([f.segment_ids for f in eval_features], dtype=torch.long)
    all_label_ids = torch.tensor([f.label_id for f in eval_features], dtype=torch.long)
    all_sub_idx = torch.tensor([f.sub_idx for f in eval_features], dtype=torch.long)
    all_obj_idx = torch.tensor([f.obj_idx for f in eval_features], dtype=torch.long)
    eval_data = TensorDataset(all_input_ids, all_input_mask, all_segment_ids, all_label_ids, all_sub_idx, all_obj_idx)
    eval_dataloader = DataLoader(eval_data, batch_size=eval_batch_size)
    eval_label_ids = all_label_ids

    
if do_train:
    train_features = convert_examples_to_features(
        train_examples, label2id, max_seq_length, tokenizer, special_tokens, unused_tokens=not(add_new_tokens))
    if train_mode == 'sorted' or train_mode == 'random_sorted':
        train_features = sorted(train_features, key=lambda f: np.sum(f.input_mask))
    else:
        random.shuffle(train_features)
    all_input_ids = torch.tensor([f.input_ids for f in train_features], dtype=torch.long)
    all_input_mask = torch.tensor([f.input_mask for f in train_features], dtype=torch.long)
    all_segment_ids = torch.tensor([f.segment_ids for f in train_features], dtype=torch.long)
    all_label_ids = torch.tensor([f.label_id for f in train_features], dtype=torch.long)
    all_sub_idx = torch.tensor([f.sub_idx for f in train_features], dtype=torch.long)
    all_obj_idx = torch.tensor([f.obj_idx for f in train_features], dtype=torch.long)
    train_data = TensorDataset(all_input_ids, all_input_mask, all_segment_ids, all_label_ids, all_sub_idx, all_obj_idx)
    train_dataloader = DataLoader(train_data, batch_size=train_batch_size)
    train_batches = [batch for batch in train_dataloader]

    num_train_optimization_steps = len(train_dataloader) * num_train_epochs

    logger.info("***** Training *****")
    logger.info("  Num examples = %d", len(train_examples))
    logger.info("  Batch size = %d", train_batch_size)
    logger.info("  Num steps = %d", num_train_optimization_steps)

    best_result = None
    eval_step = max(1, len(train_batches) // eval_per_epoch)

    lr = learning_rate
    model = RelationModel.from_pretrained(
        'allenai/scibert_scivocab_uncased', cache_dir=str(PYTORCH_PRETRAINED_BERT_CACHE), num_rel_labels=num_labels)
    if hasattr(model, 'bert'):
        model.bert.resize_token_embeddings(len(tokenizer))
    elif hasattr(model, 'albert'):
        model.albert.resize_token_embeddings(len(tokenizer))
    else:
        raise TypeError("Unknown model class")

    model.to(device)
    if n_gpu > 1:
        model = torch.nn.DataParallel(model)

    param_optimizer = list(model.named_parameters())
    no_decay = ['bias', 'LayerNorm.bias', 'LayerNorm.weight']
    optimizer_grouped_parameters = [
        {'params': [p for n, p in param_optimizer
                    if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01},
        {'params': [p for n, p in param_optimizer
                    if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}
    ]
    optimizer = AdamW(optimizer_grouped_parameters, lr=lr, correct_bias=not(bertadam))
    scheduler = get_linear_schedule_with_warmup(optimizer, int(num_train_optimization_steps * warmup_proportion), num_train_optimization_steps)

    start_time = time.time()
    global_step = 0
    tr_loss = 0
    nb_tr_examples = 0
    nb_tr_steps = 0
    for epoch in range(int(num_train_epochs)):
        model.train()
        logger.info("Start epoch #{} (lr = {})...".format(epoch, lr))
        if train_mode == 'random' or train_mode == 'random_sorted':
            random.shuffle(train_batches)
        for step, batch in enumerate(train_batches):
            batch = tuple(t.to(device) for t in batch)
            input_ids, input_mask, segment_ids, label_ids, sub_idx, obj_idx = batch
            loss = model(input_ids, segment_ids, input_mask, label_ids, sub_idx, obj_idx)
            if n_gpu > 1:
                loss = loss.mean()

            loss.backward()

            tr_loss += loss.item()
            nb_tr_examples += input_ids.size(0)
            nb_tr_steps += 1

            optimizer.step()
            scheduler.step()
            optimizer.zero_grad()
            global_step += 1

            if (step + 1) % eval_step == 0:
                logger.info('Epoch: {}, Step: {} / {}, used_time = {:.2f}s, loss = {:.6f}'.format(
                            epoch, step + 1, len(train_batches),
                            time.time() - start_time, tr_loss / nb_tr_steps))
                save_model = False
                if do_eval:
                    preds, result, logits = evaluate(model, device, eval_dataloader, eval_label_ids, num_labels, e2e_ngold=eval_nrel)
                    model.train()
                    result['global_step'] = global_step
                    result['epoch'] = epoch
                    result['learning_rate'] = lr
                    result['batch_size'] = train_batch_size

                    if (best_result is None) or (result[eval_metric] > best_result[eval_metric]):
                        best_result = result
          
    if eval_test: 
        eval_dataset = test_dataset
        eval_examples = test_examples
        eval_features = convert_examples_to_features(
            test_examples, label2id, max_seq_length, tokenizer, special_tokens, unused_tokens=not(add_new_tokens))
        eval_nrel = test_nrel
        logger.info(special_tokens)
        logger.info("***** Test *****")
        logger.info("  Num examples = %d", len(test_examples))
        logger.info("  Batch size = %d", eval_batch_size)
        all_input_ids = torch.tensor([f.input_ids for f in eval_features], dtype=torch.long)
        all_input_mask = torch.tensor([f.input_mask for f in eval_features], dtype=torch.long)
        all_segment_ids = torch.tensor([f.segment_ids                                                                                                                                                                                                                                                                                                                                                                                               for f in eval_features], dtype=torch.long)
        all_label_ids = torch.tensor([f.label_id for f in eval_features], dtype=torch.long)
        all_sub_idx = torch.tensor([f.sub_idx for f in eval_features], dtype=torch.long)
        all_obj_idx = torch.tensor([f.obj_idx for f in eval_features], dtype=torch.long)
        eval_data = TensorDataset(all_input_ids, all_input_mask, all_segment_ids, all_label_ids, all_sub_idx, all_obj_idx)
        eval_dataloader = DataLoader(eval_data, batch_size=eval_batch_size)
        eval_label_ids = all_label_ids
    model = RelationModel.from_pretrained(output_dir, num_rel_labels=num_labels)
    model.to(device)
    preds, result, logits = evaluate(model, device, eval_dataloader, eval_label_ids, num_labels, e2e_ngold=eval_nrel)

    logger.info('*** Evaluation Results ***')
    for key in sorted(result.keys()):
        logger.info("  %s = %s", key, str(result[key])) 

    print_pred_json(eval_dataset, eval_examples, preds, id2label, os.path.join(output_dir, prediction_file))

06/04/2024 11:14:00 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/allenai/scibert_scivocab_uncased/config.json from cache at C:\Users\odaim/.cache\torch\transformers\199e28e62d2210c23d63625bd9eecc20cf72a156b29e2a540d4933af4f50bda1.4b6b9f5d813f7395e7ea533039e02deb1723d8fd9d8ba655391a01a69ad6223d
06/04/2024 11:14:00 - INFO - transformers.configuration_utils - Model config BertConfig {
  "attention_probs_dropout_prob": 0.1,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "type_vocab_size": 2,
  "vocab_size": 31090
}

06/04/2024 11:14:00 - INFO - transformers.tokenization_utils_base - Model name 'allenai/scibert_scivocab_uncased' not 

06/04/2024 11:14:03 - INFO - run_relation - input_ids: 102 19 11854 20 4758 579 12569 579 12569 1021 862 12569 862 3291 3315 30116 10282 3871 3923 1214 198 8 19536 10977 5233 9 422 13494 14919 422 241 797 7344 214 1972 23301 18086 121 111 15934 211 4556 131 1972 11854 8394 225 205 103 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/04/2024 11:14:03 - INFO - run_relation - input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/04/2024 11:14:03 - INFO - run_relation - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

06/04/2024 11:14:03 - INFO - run_relation - label: org:shareholders (id = 33)
06/04/2024 11:14:03 - INFO - run_relation - sub_idx, obj_idx: 47, 32
06/04/2024 11:14:03 - INFO - run_relation - *** Example ***
06/04/2024 11:14:03 - INFO - run_relation - guid: APW_ENG_20081024.1282.LDC2009T13@0::(16,17)-(11,14)
06/04/2024 11:14:03 - INFO - run_relation - tokens: [CLS] ` ` she made a devast ##ating run around the turn , ' ' [unused30] u . s . hall of fam ##e [unused31] train ##er [unused7] bob ##by frank ##el [unused8] said . [SEP]
06/04/2024 11:14:03 - INFO - run_relation - input_ids: 102 5114 5114 2281 1827 106 29702 560 2004 2715 111 3216 422 2505 2505 31 504 205 112 205 6912 131 1651 30107 32 7434 114 8 14701 2301 11062 154 9 6032 205 103 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/04/2024 11:14:03 - INFO - run_relation - input_mask: 1 1 1 1 1 1 

06/04/2024 11:14:03 - INFO - run_relation - label: per:city_of_death (id = 29)
06/04/2024 11:14:03 - INFO - run_relation - sub_idx, obj_idx: 12, 39
06/04/2024 11:14:03 - INFO - run_relation - *** Example ***
06/04/2024 11:14:03 - INFO - run_relation - guid: NYT_ENG_20091117.0047@0::(0,1)-(38,38)
06/04/2024 11:14:03 - INFO - run_relation - tokens: [CLS] [unused7] bob ##by frank ##el [unused8] , one of the most successful american thorough ##bred train ##ers of the last 40 years , whose horses included the champ ##ions bert ##rand ##o , ghost ##za ##pp ##er and empir ##e maker , the winner of the 2003 bel ##mont stake ##s , died [unused24] mond ##ay [unused25] at his home in pacific pal ##isa ##des , calif . . [SEP]
06/04/2024 11:14:03 - INFO - run_relation - input_ids: 102 8 14701 2301 11062 154 9 422 482 131 111 755 3026 3258 9763 19551 7434 270 131 111 2442 1921 1320 422 3489 18538 1936 111 24524 2377 24921 28561 30112 422 28901 6969 3459 114 137 4153 30107 23157 422 111 25766 131 111

06/04/2024 11:14:03 - INFO - run_relation - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/04/2024 11:14:03 - INFO - run_relation - label: no_relation (id = 28)
06/04/2024 11:14:03 - INFO - run_relation - sub_idx, obj_idx: 6, 32
06/04/2024 11:14:03 - INFO - run_relation - *** Example ***
06/04/2024 11:14:03 - INFO - run_relation - guid: APW_ENG_20100528.1421@0::(28,31)-(33,34)
06/04/2024 11:14:03 - INFO - run_relation - tokens: [CLS] ` ` americans have a right to know the truth - - islam is a religion of intolerance and violence , ' ' said richard thompson , legal director of the [unused22] thomas more law center [unused23] in [unused15] ann arbor [unused16] . [SEP]
06/04/2024 11:14:03 - INFO - run_relation - input_ids: 102 5114 5114 15585 360 106 2083 147 871 111 

06/04/2024 11:15:04 - INFO - run_relation - input_ids: 102 238 241 1247 106 11098 131 1203 2686 31 10412 2399 32 2505 112 11774 30113 198 259 241 906 188 106 8772 168 5589 137 8854 140 191 106 4146 168 23 7049 30121 24 205 103 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/04/2024 11:15:04 - INFO - run_relation - input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/04/2024 11:15:04 - INFO - run_relation - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

06/04/2024 11:15:05 - INFO - run_relation - input_ids: 102 186 579 10242 753 4481 4458 198 422 2739 263 1558 106 7945 1254 30118 422 351 30120 3887 23924 2526 30117 241 4312 168 5114 5114 5484 2325 137 19329 111 5863 422 2505 2505 198 13 299 14 3246 1743 6843 6888 6633 15530 1244 422 137 198 299 241 111 508 299 14606 147 603 8 299 9 2764 106 7278 2944 488 205 103 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/04/2024 11:15:05 - INFO - run_relation - input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/04/2024 11:15:05 - INFO - run_relation - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

06/04/2024 11:15:05 - INFO - run_relation - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/04/2024 11:15:05 - INFO - run_relation - label: no_relation (id = 28)
06/04/2024 11:15:05 - INFO - run_relation - sub_idx, obj_idx: 81, 39
06/04/2024 11:15:05 - INFO - run_relation - *** Example ***
06/04/2024 11:15:05 - INFO - run_relation - guid: NYT_ENG_20090707.0058@0::(25,26)-(6,6)
06/04/2024 11:15:05 - INFO - run_relation - tokens: [CLS] in o ' bri ##en ' s home just off [unused18] broad ##way [unused19] in sar ##ato ##ga springs , there are shel ##ves of books that mention his association with bal ##anch ##ine as well as [unused7] jer ##ome rob ##bin ##s [unused8] , anth ##ony tud ##or , rud ##ol ##f nur ##eye ##v and mik ##hai ##l bary ##sh ##nik ##ov , among others .

06/04/2024 11:15:05 - INFO - run_relation - label: per:cities_of_residence (id = 38)
06/04/2024 11:15:05 - INFO - run_relation - sub_idx, obj_idx: 1, 9
06/04/2024 11:15:05 - INFO - run_relation - *** Example ***
06/04/2024 11:15:05 - INFO - run_relation - guid: eng-NG-31-141586-9960082@0::(2,5)-(8,11)
06/04/2024 11:15:05 - INFO - run_relation - tokens: [CLS] while the [unused22] massachusetts house of representatives [unused23] debates the [unused27] global warming solutions act [unused28] , please show that commitment , by publicly calling for strong near - and long - term reductions in global warming pollution covering all areas including utilities , transportation , and more . [SEP]
06/04/2024 11:15:05 - INFO - run_relation - input_ids: 102 969 111 23 13653 4159 131 16154 24 28578 111 28 2523 17669 2727 1438 29 422 8611 405 198 10442 422 214 16222 18247 168 1648 2396 579 137 1113 579 902 10287 121 2523 17669 10429 9822 355 2326 1471 19620 422 9826 422 137 475 205 103 0 0 0 0 0 0 0 0

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
06/04/2024 11:18:29 - INFO - run_relation - Start epoch #0 (lr = 2e-05)...
06/04/2024 12:19:06 - INFO - run_relation - Epoch: 0, Step: 387 / 3879, used_time = 3637.59s, loss = 1.214021
06/04/2024 12:33:53 - INFO - run_relation - ***** Eval results *****
06/04/2024 12:33:53 - INFO - run_relation -   accuracy = 0.7620190476190476
06/04/2024 12:33:53 - INFO - run_relation -   eval_loss = 1.06913327123269
06/04/2024 12:33:53 - INFO - run_relation -   f1 = 0.8184366113622881
06/04/2024 12:33:53 - INFO - run_relation -   n_correct = 20003
06/04/2024 12:33:53 - INFO - run_relation -   n_gold = 22631
06/04/2024 12:33:53 - INFO - run_relation -   n_pred = 26250
06/04/2024 12:33:53 - INFO - run_relation -   precision = 0.7620190476190476
06/04/2024 12:33:53 - INFO - run_relation -   recall = 0.8838760991560249
06/04/2024 12:33:53 - INFO - run_relation -   task_f1 = 0.7620190476190476
06

06/04/2024 22:22:01 - INFO - run_relation - ***** Eval results *****
06/04/2024 22:22:01 - INFO - run_relation -   accuracy = 0.8287238095238095
06/04/2024 22:22:01 - INFO - run_relation -   eval_loss = 0.49991512819784156
06/04/2024 22:22:01 - INFO - run_relation -   f1 = 0.8900799901802336
06/04/2024 22:22:01 - INFO - run_relation -   n_correct = 21754
06/04/2024 22:22:01 - INFO - run_relation -   n_gold = 22631
06/04/2024 22:22:01 - INFO - run_relation -   n_pred = 26250
06/04/2024 22:22:01 - INFO - run_relation -   precision = 0.8287238095238095
06/04/2024 22:22:01 - INFO - run_relation -   recall = 0.961247845875127
06/04/2024 22:22:01 - INFO - run_relation -   task_f1 = 0.8287238095238095
06/04/2024 22:22:01 - INFO - run_relation -   task_ngold = 26250
06/04/2024 22:22:01 - INFO - run_relation -   task_recall = 0.8287238095238095
06/04/2024 23:26:19 - INFO - run_relation - Epoch: 0, Step: 3870 / 3879, used_time = 43670.79s, loss = 0.554957
06/04/2024 23:41:25 - INFO - run_relatio

06/05/2024 08:17:05 - INFO - run_relation -   recall = 0.9751668065927268
06/05/2024 08:17:05 - INFO - run_relation -   task_f1 = 0.8407238095238095
06/05/2024 08:17:05 - INFO - run_relation -   task_ngold = 26250
06/05/2024 08:17:05 - INFO - run_relation -   task_recall = 0.8407238095238095
06/05/2024 09:15:35 - INFO - run_relation - Epoch: 1, Step: 3096 / 3879, used_time = 79026.00s, loss = 0.441319
06/05/2024 09:30:26 - INFO - run_relation - ***** Eval results *****
06/05/2024 09:30:26 - INFO - run_relation -   accuracy = 0.8401904761904762
06/05/2024 09:30:26 - INFO - run_relation -   eval_loss = 0.4767424902351685
06/05/2024 09:30:26 - INFO - run_relation -   f1 = 0.9023956138376875
06/05/2024 09:30:26 - INFO - run_relation -   n_correct = 22055
06/05/2024 09:30:26 - INFO - run_relation -   n_gold = 22631
06/05/2024 09:30:26 - INFO - run_relation -   n_pred = 26250
06/05/2024 09:30:26 - INFO - run_relation -   precision = 0.8401904761904762
06/05/2024 09:30:26 - INFO - run_relatio

06/05/2024 11:58:37 - INFO - run_relation - label: no_relation (id = 28)
06/05/2024 11:58:37 - INFO - run_relation - sub_idx, obj_idx: 10, 15
06/05/2024 11:58:37 - INFO - run_relation - *** Example ***
06/05/2024 11:58:37 - INFO - run_relation - guid: XIN_ENG_20100801.0069@0::(0,1)-(15,15)
06/05/2024 11:58:37 - INFO - run_relation - tokens: [CLS] [unused7] eug ##eni ##o vag ##ni [unused8] , the italian worker of the ic ##rc , and colleagues andre ##as not ##ter of [unused48] switzerland [unused49] and mary jean lac ##aba of the philipp ##ines were released by their ab ##u say ##ya ##f cap ##tors separately . [SEP]
06/05/2024 11:58:37 - INFO - run_relation - input_ids: 102 8 20315 22933 30112 7988 4564 9 422 111 14865 11750 131 111 1981 9608 422 137 7963 13619 142 302 192 131 49 11906 50 137 12079 20004 10570 15376 131 111 23000 865 267 7163 214 547 351 30120 4654 6559 30122 891 7687 6695 205 103 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

06/05/2024 11:58:37 - INFO - run_relation - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/05/2024 11:58:37 - INFO - run_relation - label: no_relation (id = 28)
06/05/2024 11:58:37 - INFO - run_relation - sub_idx, obj_idx: 2, 48
06/05/2024 11:58:37 - INFO - run_relation - *** Example ***
06/05/2024 11:58:37 - INFO - run_relation - guid: AFP_ENG_20070224.0025.LDC2009T13@0::(1,1)-(10,11)
06/05/2024 11:58:37 - INFO - run_relation - tokens: [CLS] the [unused22] ad ##f [unused23] said a group of australian sold ##iers , in [unused18] east tim ##or [unused19] as part of an international peace ##keeping force , was responding to a disturbance at the refuge ##e camp when the shoot ##ings took place fri ##day morning . [SEP]
06/05/2024 11:58:37 - INFO - run_relation - input

06/05/2024 11:58:37 - INFO - run_relation - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/05/2024 11:58:37 - INFO - run_relation - label: no_relation (id = 28)
06/05/2024 11:58:37 - INFO - run_relation - sub_idx, obj_idx: 35, 12
06/05/2024 11:58:37 - INFO - run_relation - *** Example ***
06/05/2024 11:58:37 - INFO - run_relation - guid: LTW_ENG_20070927.0047.LDC2009T13@0::(21,21)-(15,15)
06/05/2024 11:58:37 - INFO - run_relation - tokens: [CLS] lo ##max shares a story about al ##men ##a lo ##max , his mother and a newsp ##aper owner and [unused9] journal ##ist [unused10] in los angeles , taking [unused7] her [unused8] family on the bus to tu ##sk ##eg ##ee , ala . , in 196 ##1 . [SEP]
06/05/2024 11:58:37 - INFO - run_relation - input_ids: 102 881 3655 11985 106 10

06/05/2024 11:58:37 - INFO - run_relation - input_ids: 102 2089 238 2740 422 7049 4576 4522 2505 112 758 4274 14461 5575 465 17544 3034 2085 3234 12006 30120 16489 3331 6995 106 13407 17707 18640 6820 3113 188 13 299 14 2072 8 1972 9 5361 21009 263 254 29880 579 1196 6550 6813 579 22867 121 758 4811 205 103 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/05/2024 11:58:37 - INFO - run_relation - input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/05/2024 11:58:37 - INFO - run_relation - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

ValueError: unable to parse C:\Users\odaim\Documents\PURE reproduction/tacred_models/rel-scib-ctx0/config.json as a URL or as a local path

In [29]:
save_trained_model(output_dir, model, tokenizer)

06/05/2024 19:19:13 - INFO - run_relation - Saving model to C:\Users\odaim\Documents\PURE reproduction/tacred_models/rel-scib-ctx0/


In [30]:
model = RelationModel.from_pretrained(output_dir, num_rel_labels=num_labels)
model.to(device)
preds, result, logits = evaluate(model, device, eval_dataloader, eval_label_ids, num_labels, e2e_ngold=eval_nrel)

logger.info('*** Evaluation Results ***')
for key in sorted(result.keys()):
    logger.info("  %s = %s", key, str(result[key])) 

print_pred_json(eval_dataset, eval_examples, preds, id2label, os.path.join(output_dir, prediction_file))

06/05/2024 19:19:22 - INFO - transformers.configuration_utils - loading configuration file C:\Users\odaim\Documents\PURE reproduction/tacred_models/rel-scib-ctx0/config.json
06/05/2024 19:19:22 - INFO - transformers.configuration_utils - Model config BertConfig {
  "attention_probs_dropout_prob": 0.1,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "type_vocab_size": 2,
  "vocab_size": 31090
}

06/05/2024 19:19:22 - INFO - transformers.modeling_utils - loading weights file C:\Users\odaim\Documents\PURE reproduction/tacred_models/rel-scib-ctx0/pytorch_model.bin
06/05/2024 19:19:24 - INFO - transformers.modeling_utils - All model checkpoint weights were used when initializing BertForRelation.

06/05/2024 19:

## Results

**Accuracy**: 0.9630

**Evaluation Loss**: 0.0950

**Precision**: 0.8455\
**Recall**: 0.6963\
**F1 Score**: 0.7637

Implications:

- The high accuracy indicates that the model performs well overall in predicting relations in the NYT dataset.
- The F1 score suggests a good balance between precision and recall, but there might still be room for improvement.
- The precision value indicates that when the model predicts a relation, it is quite likely to be correct.
- The recall value suggests that there is some room for improvement in capturing all actual relations, as a recall of 69.6% means that the model is missing around 30.4% of the actual relations.