Make sure that a folder called "models" exists in the current directory.

<hr>

# Method A: Create pipeline and save as a spaCy model

In [1]:
import spacy
from phrase_extraction import *

# Create and save model with phrase_spans component - only needs to run once
nlp = spacy.load('en_core_web_trf')
nlp.add_pipe('merge_noun_chunks')
nlp.add_pipe('merge_entities')
nlp.add_pipe('phrase_spans')
nlp.to_disk('./models/method_a')

# Afterwards, to use the above model with pipeline, simply load the previously saved model
# nlp = spacy.load('models/method_a')


  from .autonotebook import tqdm as notebook_tqdm


<hr>

# Method B: Train and save model

## Option 1: Use spaCy

If you do not have Prodigy installed, then you can use our pre-annotated spaCy corpus in the `./dataset/` directory:

In [None]:
!python -m spacy train ./dataset/corpus_b/config.cfg --output ./models/method_b --paths.train ./dataset/corpus_b/train.spacy --paths.dev ./dataset/corpus_b/dev.spacy

## Option 2: Use Prodigy

If you have Prodigy license, train directly with Prodigy:

In order to be able to train directly with Prodigy, annotation saved in the internal Prodigy database is needed. One possibility is to train your dataset manually, or import our pre-annotated dataset under `./dataset/` directory.

### Step 1.A: Annotate using Prodigy manually:

Run the command to start a prodigy web server and annotate the spans to save them in the specificed database (see [Prodigy docs](https://prodi.gy/docs/span-categorization#manual)):

In [None]:
!prodigy spans.manual gold_standard blank:en ./dataset/gold_standard.jsonl --label SUBJECT,SIGNAL,VERB,TIME,CONDITION,OBJECT,OP_SUBJECT,OP_SIGNAL,OP_VERB,OP_TIME,OP_CONDITION,OP_OBJECT

In [None]:
!prodigy spans.manual training_data blank:en ./dataset/training_data.jsonl --label SUBJECT,SIGNAL,VERB,TIME,CONDITION,OBJECT,OP_SUBJECT,OP_SIGNAL,OP_VERB,OP_TIME,OP_CONDITION,OP_OBJECT

### Step 1.B: Import the pre-annotated dataset into Prodigy:

I have already annotated 3 datasets under `./dataset/` (`annotated_gold_standard.jsonl`, `annotated_reach_data.jsonl`, `annotated_training_data.jsonl`). You can simply load these files into the internal Prodigy database:

In [None]:
!prodigy db-in gold_standard ./dataset/annotated_gold_standard.jsonl

In [None]:
!prodigy db-in training_data ./dataset/annotated_training_data.jsonl

In [None]:
!prodigy db-in reach_data ./dataset/annotated_reach_data.jsonl

### Step 2: Train a model from the database

In [None]:
!prodigy train ./models/method_b --spancat training_data,eval:gold_standard

### (Optional) Export Prodigy database to spaCy training corpus

In [None]:
!prodigy data-to-spacy ./dataset/corpus_b --spancat training_data,eval:gold_standard

<hr>

# Method C: Train and save model

For method C, since we are using a custom training configuration (`config.cfg` file), we must use spaCy to train our model. We also need to include the path to the python script containing our custom suggester function.

In [None]:
!python -m spacy train ./dataset/corpus_c/config.cfg --output ./models/method_c --paths.train ./dataset/corpus_c/train.spacy --paths.dev ./dataset/corpus_c/dev.spacy --code span_suggester.py