# Running the relation model
In this notebook we will run and evalute the relation model proposed in the research paper [A Frustratingly Easy Approach for Entity and Relation Extraction](https://arxiv.org/pdf/2010.12812.pdf).

This is a reproduction based on the instructions left by the authors in their [GitHub repo](https://github.com/princeton-nlp/PURE)

**Environment information**

- Windows 11
- Python 3.6.13
- pip 21.2.2

## Basic setup
Firstly, run the notebook relation_setup to load needed classes and functions

In [None]:
%run relation_setup.ipynb

  from .autonotebook import tqdm as notebook_tqdm


Next, we setup our notebook by importing needed libraries and modules.

And we initialize a logger.

In [1]:
from transformers import BertModel, BertPreTrainedModel
from transformers import AlbertModel, AlbertPreTrainedModel

from transformers import AutoTokenizer
from transformers import AdamW, get_linear_schedule_with_warmup
from transformers.file_utils import PYTORCH_PRETRAINED_BERT_CACHE, WEIGHTS_NAME, CONFIG_NAME

import os
import time
import torch
import logging
import json
import numpy as np
import random
from torch.utils.data import DataLoader, TensorDataset
from torch import nn
from torch.nn import CrossEntropyLoss


logging.basicConfig(format='%(asctime)s - %(levelname)s - %(name)s - %(message)s',
                    datefmt='%m/%d/%Y %H:%M:%S',
                    level=logging.INFO)
logger = logging.getLogger('run_relation')

  from .autonotebook import tqdm as notebook_tqdm


## Running and evaluationg the pre-trained relation model

Now that the setup is out of the way. We can actually run the model and evaluate it with a pre-trained BERT-based model on the SciERC dataset.

Let's perform the training, evaluating, and predicting of the relation model using the SciBERT model.

Configuration Parameters:
- model_name: The name the pre-trained SciBERT model.
- add_new_tokens: A boolean indicating whether to add new task-specific tokens to the tokenizer.
- no_cuda: A boolean indicating whether to use CPU instead of GPU.
- do_train, do_eval, eval_test: Boolean flags to control whether to perform training, evaluation, and evaluation on the test set.
- do_lower_case: A boolean indicating whether to convert text to lowercase during tokenization.
- entity_output_dir: The directory containing output from the entity recognition model.
- entity_predictions_dev, entity_predictions_test: The filenames for entity predictions on the development and test sets.
- eval_with_gold: A boolean indicating whether to use gold standard entities during evaluation.
- context_window: The size of the context window around each sentence.
- max_seq_length: The maximum sequence length for tokenized input.
- seed: The random seed for reproducibility.
- output_dir: The directory to save the trained model and evaluation logs.
- negative_label: The label for negative relations.
- task: The specific relation extraction task (assumed to be "scierc").
- train_mode: The training mode, assumed to be "random_sorted".
- train_batch_size, eval_batch_size: Batch sizes for training and evaluation.
- num_train_epochs: The number of training epochs.
- train_file: The file containing training data.
- eval_per_epoch: The frequency of evaluation per epoch.
- learning_rate: The learning rate for training.
- prediction_file: The filename for saving predictions.
- BertLayerNorm: The LayerNorm class to be used (assumed to be from torch.nn).
- CLS, SEP: Special tokens for [CLS] (classification) and [SEP] (separator).
- RelationModel: The model class for relation extraction.
- device: The computing device (CPU or GPU).
- n_gpu: The number of available GPUs.

In [25]:
model_name = 'allenai/scibert_scivocab_uncased'
add_new_tokens = False
no_cuda = False
do_train = False
do_eval = True
eval_test = True
do_lower_case = True
entity_output_dir = os.getcwd() + '/scierc_models/ent-scib-ctx0/'
entity_predictions_dev = 'ent_pred_dev.json'
eval_with_gold = True
context_window = 0
max_seq_length = 128
entity_predictions_test = 'ent_pred_test.json'
seed = 0
output_dir = os.getcwd() + '/scierc_models/rel_approx-scib-ctx0/'
negative_label = 'no_relation'
task = 'scierc'
train_mode = 'random_sorted'
train_batch_size = 32
eval_batch_size = 8
num_train_epochs = 3.0
train_file = None
eval_per_epoch = 10
learning_rate = None
prediction_file = 'predictions.json'
BertLayerNorm = torch.nn.LayerNorm

#### Special Tokens and Model Initialization:

- `CLS` and `SEP` are special tokens used in BERT-like models, where [CLS] denotes the start of a sequence, and [SEP] separates segments or sentences.
- `RelationModel` is set to the BertForRelation model, since that's the model we will useCLS = "[CLS]"

In [26]:
CLS = "[CLS]"
SEP = "[SEP]"

RelationModel = BertForRelation

#### Device and GPU Configuration:

- `device`: It checks if a GPU is available. If available, it uses GPU; otherwise, it falls back to CPU.
- `n_gpu`: It counts the number of available GPUs.

In [59]:
device = torch.device("cuda" if torch.cuda.is_available() and not no_cuda else "cpu")
n_gpu = torch.cuda.device_count()

#### Data Preparation:

The input data format of the relation model is almost the same as that of the entity model, except that there is one more filed ."predicted_ner" to store the predictions of the entity model.

```json
{
  "doc_key": "CNN_ENG_20030306_083604.6",
  "sentences": [...],
  "ner": [...],
  "relations": [...],
  "predicted_ner": [
    [...],
    [...],
    [[26, 26, "LOC"], [14, 15, "PER"], ...],
    ...
  ]
}
```

Let's prepare the data for the training, development (dev), and test sets using the generate_relation_data function.
- `train_dataset`, `eval_dataset`, and `test_dataset` hold the generated datasets.
- `train_examples`, `eval_examples`, and `test_examples` contain examples from the respective datasets.
- `train_nrel`, `eval_nrel`, and `test_nrel` store the number of relations in the corresponding sets.

In [60]:
# train set
if do_train:
    train_dataset, train_examples, train_nrel = generate_relation_data(train_file, use_gold=True, context_window=context_window)
# dev set
if (do_eval and do_train) or (do_eval and not(eval_test)):
    eval_dataset, eval_examples, eval_nrel = generate_relation_data(os.path.join(entity_output_dir, entity_predictions_dev), use_gold=eval_with_gold, context_window=context_window)
# test set
if eval_test:
    test_dataset, test_examples, test_nrel = generate_relation_data(os.path.join(entity_output_dir, entity_predictions_test), use_gold=eval_with_gold, context_window=context_window)

11/21/2023 15:53:29 - INFO - run_relation - Generate relation data from C:\Users\odaim\Documents\PURE reproduction/scierc_models/ent-scib-ctx0/ent_pred_test.json
11/21/2023 15:53:30 - INFO - run_relation - #samples: 5062, max #sent.samples: 156


#### Random Seed Setting:

In [61]:
setseed(seed)

#### Directory and Logging Setup:

- Ensuring that at least one of `do_train` or `do_eval` is set to True.
- We also create the output directory if it doesn't exist and set up logging.

In [62]:
if not do_train and not do_eval:
    raise ValueError("At least one of `do_train` or `do_eval` must be True.")

if not os.path.exists(output_dir):
    os.makedirs(output_dir)
if do_train:
    logger.addHandler(logging.FileHandler(os.path.join(output_dir, "train.log"), 'w'))
else:
    logger.addHandler(logging.FileHandler(os.path.join(output_dir, "eval.log"), 'w'))

#### Label List and Mappings:

- Loading/creating a list of relation labels (`label_list`) and saving it to a file.
- Again, we create the mappings (`label2id` and `id2label`) and calculate the number of labels (`num_labels`).

In [63]:
# get label_list
if os.path.exists(os.path.join(output_dir, 'label_list.json')):
    with open(os.path.join(output_dir, 'label_list.json'), 'r') as f:
        label_list = json.load(f)
else:
    label_list = [negative_label] + task_rel_labels[task]
    with open(os.path.join(output_dir, 'label_list.json'), 'w') as f:
        json.dump(label_list, f)
label2id = {label: i for i, label in enumerate(label_list)}
id2label = {i: label for i, label in enumerate(label_list)}
num_labels = len(label_list)

#### Tokenizer and Special Tokens:

- It initializes the tokenizer using the specified pre-trained model (`model_name`).
- If `add_new_tokens` is set to True, it adds task-specific marker tokens using the `add_marker_tokens` function.

In [64]:
tokenizer = AutoTokenizer.from_pretrained(model_name, do_lower_case=do_lower_case)
if add_new_tokens:
    add_marker_tokens(tokenizer, task_ner_labels[task])

11/21/2023 15:53:36 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/allenai/scibert_scivocab_uncased/config.json from cache at C:\Users\odaim/.cache\torch\transformers\199e28e62d2210c23d63625bd9eecc20cf72a156b29e2a540d4933af4f50bda1.4b6b9f5d813f7395e7ea533039e02deb1723d8fd9d8ba655391a01a69ad6223d
11/21/2023 15:53:36 - INFO - transformers.configuration_utils - Model config BertConfig {
  "attention_probs_dropout_prob": 0.1,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "type_vocab_size": 2,
  "vocab_size": 31090
}

11/21/2023 15:53:36 - INFO - transformers.tokenization_utils_base - Model name 'allenai/scibert_scivocab_uncased' not 

#### Special Tokens Saving/Loading:

It loads or initializes a dictionary for special tokens (`special_tokens`).

In [65]:
if os.path.exists(os.path.join(output_dir, 'special_tokens.json')):
    with open(os.path.join(output_dir, 'special_tokens.json'), 'r') as f:
        special_tokens = json.load(f)
else:
    special_tokens = {}

#### Evaluation Data Preparation:

- It converts evaluation examples (eval_examples) to features using the convert_examples_to_features function.
- It logs information about the evaluation dataset, such as the number of examples and the batch size.
- It creates PyTorch tensors for input features (all_input_ids, all_input_mask, etc.).
- It constructs a PyTorch TensorDataset and a corresponding DataLoader for the evaluation set.

In [66]:
if do_eval and (do_train or not(eval_test)):
    eval_features = convert_examples_to_features(
        eval_examples, label2id, max_seq_length, tokenizer, special_tokens, unused_tokens=not(add_new_tokens))
    logger.info("***** Dev *****")
    logger.info("  Num examples = %d", len(eval_examples))
    logger.info("  Batch size = %d", eval_batch_size)
    all_input_ids = torch.tensor([f.input_ids for f in eval_features], dtype=torch.long)
    all_input_mask = torch.tensor([f.input_mask for f in eval_features], dtype=torch.long)
    all_segment_ids = torch.tensor([f.segment_ids for f in eval_features], dtype=torch.long)
    all_label_ids = torch.tensor([f.label_id for f in eval_features], dtype=torch.long)
    all_sub_idx = torch.tensor([f.sub_idx for f in eval_features], dtype=torch.long)
    all_obj_idx = torch.tensor([f.obj_idx for f in eval_features], dtype=torch.long)
    eval_data = TensorDataset(all_input_ids, all_input_mask, all_segment_ids, all_label_ids, all_sub_idx, all_obj_idx)
    eval_dataloader = DataLoader(eval_data, batch_size=eval_batch_size)
    eval_label_ids = all_label_ids
    

#### Save Special Tokens

It saves the `special_tokens` dictionary to a JSON file named 'special_tokens.json' in the output directory.

with open(os.path.join(output_dir, 'special_tokens.json'), 'w') as f:
    json.dump(special_tokens, f)

#### Model Evaluation

- It logs the `special_tokens` dictionary.
- If `eval_test` is True, it uses the test dataset (`test_dataset`, `test_examples`) for evaluation; otherwise, it uses the previously loaded dev dataset.
- It converts the evaluation examples to features and constructs a PyTorch TensorDataset and a corresponding DataLoader for the evaluation set.
- It loads the pre-trained relation extraction model (RelationModel) from the output directory.
- It evaluates the model on the evaluation set and logs the results.
- It prints the predictions to a JSON file.

In [67]:
if do_eval:
    logger.info(special_tokens)
    if eval_test:
        eval_dataset = test_dataset
        eval_examples = test_examples
        eval_features = convert_examples_to_features(
            test_examples, label2id, max_seq_length, tokenizer, special_tokens, unused_tokens=not(add_new_tokens))
        eval_nrel = test_nrel
        logger.info(special_tokens)
        logger.info("***** Test *****")
        logger.info("  Num examples = %d", len(test_examples))
        logger.info("  Batch size = %d", eval_batch_size)
        all_input_ids = torch.tensor([f.input_ids for f in eval_features], dtype=torch.long)
        all_input_mask = torch.tensor([f.input_mask for f in eval_features], dtype=torch.long)
        all_segment_ids = torch.tensor([f.segment_ids for f in eval_features], dtype=torch.long)
        all_label_ids = torch.tensor([f.label_id for f in eval_features], dtype=torch.long)
        all_sub_idx = torch.tensor([f.sub_idx for f in eval_features], dtype=torch.long)
        all_obj_idx = torch.tensor([f.obj_idx for f in eval_features], dtype=torch.long)
        eval_data = TensorDataset(all_input_ids, all_input_mask, all_segment_ids, all_label_ids, all_sub_idx, all_obj_idx)
        eval_dataloader = DataLoader(eval_data, batch_size=eval_batch_size)
        eval_label_ids = all_label_ids
    model = RelationModel.from_pretrained(output_dir, num_rel_labels=num_labels)
    model.to(device)
    preds, result, logits = evaluate(model, device, eval_dataloader, eval_label_ids, num_labels, e2e_ngold=eval_nrel)

    logger.info('*** Evaluation Results ***')
    for key in sorted(result.keys()):
        logger.info("  %s = %s", key, str(result[key]))

    print_pred_json(eval_dataset, eval_examples, preds, id2label, os.path.join(output_dir, prediction_file))

11/21/2023 15:53:45 - INFO - run_relation - {'SUBJ_START': '[unused1]', 'SUBJ_END': '[unused2]', 'OBJ_START': '[unused3]', 'OBJ_END': '[unused4]', 'SUBJ=Generic': '[unused5]', 'OBJ=OtherScientificTerm': '[unused6]', 'SUBJ_START=Generic': '[unused7]', 'SUBJ_END=Generic': '[unused8]', 'OBJ_START=OtherScientificTerm': '[unused9]', 'OBJ_END=OtherScientificTerm': '[unused10]', 'OBJ=Material': '[unused11]', 'OBJ_START=Material': '[unused12]', 'OBJ_END=Material': '[unused13]', 'SUBJ=OtherScientificTerm': '[unused14]', 'OBJ=Generic': '[unused15]', 'SUBJ_START=OtherScientificTerm': '[unused16]', 'SUBJ_END=OtherScientificTerm': '[unused17]', 'OBJ_START=Generic': '[unused18]', 'OBJ_END=Generic': '[unused19]', 'SUBJ=Material': '[unused20]', 'SUBJ_START=Material': '[unused21]', 'SUBJ_END=Material': '[unused22]', 'OBJ=Task': '[unused23]', 'OBJ_START=Task': '[unused24]', 'OBJ_END=Task': '[unused25]', 'SUBJ=Task': '[unused26]', 'SUBJ_START=Task': '[unused27]', 'SUBJ_END=Task': '[unused28]', 'OBJ=Metho

## Results

**Accuracy**: 0.8558

**Evaluation Loss**: 0.4450

**Precision**: 0.5792\
**Recall**: 0.6684\
**F1 Score**: 0.6206

Implications:

- The model achieved a relatively high overall accuracy, indicating strong performance in predicting entity relations.
- The F1 score suggests a reasonable balance between precision and recall, with room for improvement.
- The precision of 57.92% indicates that when the model predicts a relation, it is correct around 57.92% of the time.
- The recall of 66.84% suggests that the model is capturing a substantial proportion of the actual relations.
- The task-specific F1 score and recall metrics further emphasize the model's performance on the relation extraction task.

In summary, the model is performing well in identifying entity relations, but there is still room for improvement, particularly in precision. The results provide insights into how well the model is generalizing to unseen data and its effectiveness in extracting relations between entities. Fine-tuning or adjusting the model architecture may be considered to further improve performance.