# Relation Extraction Model

This notebook conducts relation extraction on the annotated psychedelic research papers. It takes in binary spacey files with the individual labels defined. These have been converted from json format into the necessary spacy format prior to running the relation extraction model using the spacey_file_converter.py. 

To run the script, the GPU needs to be activated by clicking on 'Edit' ==> 'Notebook settings' ==> change hardware to 'GPU'. 

## Install Dependencies

In [None]:
!pip install -U spacy-nightly --pre

In [None]:
!pip install -U pip setuptools wheel

### Clone the Repository

We're using Spacy's new 'project' feature which allows you to clone their repository for a specific task and use it as a template for your model. This repository contains the folder architecture necessary for conducing relation extraction. 

In [None]:
!python -m spacy project clone tutorials/rel_component

[38;5;2m✔ Cloned 'tutorials/rel_component' from explosion/projects[0m
/content/rel_component
[38;5;2m✔ Your project is now ready![0m
To fetch the assets, run:
python -m spacy project assets /content/rel_component


### Install transformer pipeline

In [None]:
!python -m spacy download en_core_web_trf
!pip install -U spacy transformers

Collecting en-core-web-trf==3.2.0
  Using cached https://github.com/explosion/spacy-models/releases/download/en_core_web_trf-3.2.0/en_core_web_trf-3.2.0-py3-none-any.whl (460.2 MB)
Collecting transformers<4.13.0,>=3.4.0
  Using cached transformers-4.12.5-py3-none-any.whl (3.1 MB)
Installing collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.15.0
    Uninstalling transformers-4.15.0:
      Successfully uninstalled transformers-4.15.0
Successfully installed transformers-4.12.5
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_trf')
Collecting transformers
  Using cached transformers-4.15.0-py3-none-any.whl (3.4 MB)
Installing collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.12.5
    Uninstalling transformers-4.12.5:
      Successfully uninstalled transformers-4.12.5
[31mERROR: pip's dependency r

### Change the working directory

In [None]:
cd rel_component

/content/rel_component


## Train the sciBERT model 

We train the model on the GPU using Spacy's 'train_gpu' script, which is found in the cloned repository. Prior to running this, the train, test, and dev data has to be added into the paths and parameters need to be set. We've ammended some of the parameters so that the model is running with SciBERT, and with a max-lenth of tokens between entities set to 50.

In [None]:
!spacy project run train_gpu 

[1m
Running command: /usr/bin/python3 -m spacy train configs/rel_trf.cfg --output training --paths.train data/train.spacy --paths.dev data/dev.spacy -c ./scripts/custom_functions.py --gpu-id 0
[38;5;4mℹ Saving to output directory: training[0m
[38;5;4mℹ Using GPU: 0[0m
[1m
[2022-01-03 05:33:15,496] [INFO] Set up nlp object from config
[2022-01-03 05:33:15,507] [INFO] Pipeline: ['transformer', 'relation_extractor']
[2022-01-03 05:33:15,512] [INFO] Created vocabulary
[2022-01-03 05:33:15,514] [INFO] Finished initializing nlp object
Downloading: 100% 385/385 [00:00<00:00, 374kB/s]
Downloading: 100% 223k/223k [00:00<00:00, 700kB/s]
Downloading: 100% 422M/422M [00:10<00:00, 42.2MB/s]
Some weights of the model checkpoint at allenai/scibert_scivocab_uncased were not used when initializing BertModel: ['cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.decoder.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.tran

## Evaluate the Model 

In [None]:
cd /content/drive/MyDrive/rel_component

/content/drive/MyDrive/rel_component


In [None]:
!spacy project run evaluate

## Compare with a non-transformer tok2vec model 

Our trained model is quite disappointing in the relation extraction task, but to see what difference the transformer achitecture is making, we will also train a tok2vec model and see how well this model is able to perform. 

In [None]:
!spacy project run train_cpu 
!spacy project run evaluate

[1m
Running command: /usr/bin/python3 -m spacy train configs/rel_tok2vec.cfg --output training --paths.train data/train.spacy --paths.dev data/dev.spacy -c ./scripts/custom_functions.py
[38;5;4mℹ Saving to output directory: training[0m
[38;5;4mℹ Using CPU[0m
[38;5;4mℹ To switch to GPU 0, use the option: --gpu-id 0[0m
[1m
[2022-01-03 06:04:44,372] [INFO] Set up nlp object from config
[2022-01-03 06:04:44,384] [INFO] Pipeline: ['tok2vec', 'relation_extractor']
[2022-01-03 06:04:44,390] [INFO] Created vocabulary
[2022-01-03 06:04:44,392] [INFO] Finished initializing nlp object
[2022-01-03 06:04:46,863] [INFO] Initialized pipeline components: ['tok2vec', 'relation_extractor']
[38;5;2m✔ Initialized pipeline[0m
[1m
[38;5;4mℹ Pipeline: ['tok2vec', 'relation_extractor'][0m
[38;5;4mℹ Initial learn rate: 0.001[0m
E    #       LOSS TOK2VEC  LOSS RELAT...  REL_MICRO_P  REL_MICRO_R  REL_MICRO_F  SCORE 
---  ------  ------------  -------------  -----------  -----------  -----------  -

The tok2vec model is considerably worse than the transformer model, only being able to classify relations with an accuracy of around 18%. This tells us that we need to rethink some of our relationships, and perhaps simplify from classifying between 5 relationships to instead only classify between 2 or 3. These are ammendments which will be considered in the evaluation of the pipeline. 