## Prompt-based Relation Extraction
Test how to perform prompt-based relation extraction. Assumed conspiracies is installed
and that we are currently working on Grundtvig.


## Create Pipeline
As the examples are already coref resolved we do not need a tweet id. Assumed you have
the `da_core_news_trf` model installed. If not, run the following command in your terminal:

```bash
spacy download da_core_news_trf
```

In [1]:
from data import load_api_key
import spacy

nlp = spacy.load("da_core_news_trf")
config = {
    "prompt_template": "conspiracies/template_1",
    "examples": None,
    "task_description": None,
    "model_name": "text-davinci-002",
    "backend": "conspiracies/openai_gpt3_api",
    "api_key": load_api_key(),
    "split_doc_fn": None,
    "api_kwargs": {
        "max_tokens": 500,
        "temperature": 0.7,
        "top_p": 1,
        "frequency_penalty": 0,
        "presence_penalty": 0,
    },
    "force": True,
}

relation_component = nlp.add_pipe(
    "conspiracies/prompt_relation_extraction", last=True, config=config
)

  from .autonotebook import tqdm as notebook_tqdm


## Load gold triplets

In [11]:
from data import load_gold_triplets

gold_docs = load_gold_triplets(nlp=nlp)

examples = gold_docs[:5]
gold_docs = gold_docs[5:]

# # print an example
# print("---")
# print(examples[0])
# for triplet in examples[0]._.relation_triplets:
#     print(" -", triplet)

## Set examples

In [3]:
prompt_template = relation_component.prompt_template
prompt_template.examples = examples
# print an example prompt
# print(prompt_template.create_prompt("This is an example of a target tweet"))

## Check that pipeline works

In [4]:
doc = nlp("This is an example of a target tweet")
for triplet in doc._.relation_triplets:
    print(" -", triplet)

 - subject=This predicate=is object=an example of a target tweet


## Run forward pass

In [5]:
docs = nlp.pipe([doc.text for doc in gold_docs])
docs = list(docs)

## Evaluate
Evaluate the the model using spacy Examples

In [6]:
from spacy.training import Example

examples = [
    Example(predicted=pred_doc, reference=gold_doc)
    for pred_doc, gold_doc in zip(docs, gold_docs)
]

scores = relation_component.score(examples)

In [7]:
print("Scores:")

scores.pop("sample_scores")
scores

Scores:


{'exact_span_match_precision': 0.0,
 'exact_span_match_recall': 0.0,
 'exact_span_match_f1': nan,
 'exact_string_match_precision': 0.22727272727272727,
 'exact_string_match_recall': 0.125,
 'exact_string_match_f1': 0.16129032258064516,
 'normalized_span_overlap_precision': 0.0,
 'normalized_span_overlap_recall': 0.0,
 'normalized_span_overlap_f1': nan,
 'normalized_string_overlap_precision': 0.7161045420687514,
 'normalized_string_overlap_recall': 0.3938574981378133,
 'normalized_string_overlap_f1': 0.5082032234036301,
 'n_predictions': 44,
 'n_references': 80}

In [8]:
# examine an example
ref = examples[0].reference
pred = examples[0].predicted

In [9]:
assert ref.text == pred.text

for triplet in ref._.relation_triplets:
    triplet.visualize()

In [10]:
for triplet in pred._.relation_triplets:
    triplet.visualize()