# Wrapping a fine-tuned Transformer

---
This tutorial will briefly take you through how to wrap an already trained transformer for text classification in a SpaCy pipeline. For this example we will use DaNLP's BertTone as an example. However, do note that this approach also works using models directly from Huggingface's model hub.

Before we start let us make sure everything is installed:


In [None]:
!pip install dacy[all]

## Wrapping a Huggingface model
---
Wrapping a Huggingface model can be done in a single line of code:


In [3]:
import spacy
import dacy

from dacy.subclasses import add_huggingface_model

nlp = spacy.blank("da")  # replace with your desired pipeline
nlp = add_huggingface_model(
    nlp,
    download_name="pin/senda",  # the model name on the huggingface hub
    doc_extension="senda_trf_data",  # the doc extention for transformer data e.g. including wordpieces
    model_name="senda",  # the name of the model in the pipeline
    category="polarity",  # the category type it predicts
    labels=["negative", "neutral", "positive"],  # possible outcome labels
)

Downloading: 100%|██████████| 971/971 [00:00<00:00, 210kB/s]
Downloading: 100%|██████████| 253k/253k [00:00<00:00, 1.19MB/s]
Downloading: 100%|██████████| 112/112 [00:00<00:00, 52.9kB/s]
Downloading: 100%|██████████| 316/316 [00:00<00:00, 137kB/s]
Downloading: 100%|██████████| 443M/443M [00:18<00:00, 24.1MB/s]


## Step by step
---
However, one might want to look a bit more into it so let is go through step by step. First of all let us start with the setup. We will here use a downloaded Huggingface transformer from DaNLP.

### Setup
Let's start off by downloading the model and be sure that it can be loaded in using Huggingface transformers:

In [None]:
from transformers import AutoModelForSequenceClassification

# load and download the model
name = "DaNLP/da-bert-tone-sentiment-polarity"
berttone = AutoModelForSequenceClassification.from_pretrained(name, num_labels=3)

Model bert.polarity exists in /Users/au554730/.danlp/bert.polarity


Assuming this works you are ready to move on. However if you were to do this on your own model I would also test that the forward pass works as intended.

## Wrapping the Model

---

Now we will start wrapping the model. DaCy provides good utility functions for doing precisely this without making more changes than necessary to the transformer class from SpaCy. This should allow you to use the extensive documentation by SpaCy while working with this code.

This utilizes SpaCy's config system, which might take a bit of getting used to initially, but it is quite worth the effort. For now I will walk you through it:


In [8]:
from dacy.subclasses import ClassificationTransformer, install_classification_extensions

labels = ["positive", "neutral", "negative"]
doc_extension = "berttone_pol_trf_data"
category = "polarity"

config = {
    "doc_extension_attribute": doc_extension,
    "model": {
        "@architectures": "dacy.ClassificationTransformerModel.v1",
        "name": name,
        "num_labels": len(labels),
    },
}


# add the relevant extentsion to the doc
install_classification_extensions(
    category=category, labels=labels, doc_extension=doc_extension, force=True
)

The config file is an extension of the `Transformers` config in SpaCy, but you will note a few changes:

1) `doc_extension_attribute`: This is to make sure that the doc extension can be customized. The doc extension is how you fetch data relevant to your model. The SpaCy transformer uses the `trf_data`, but we don't want to overwrite this in case we are using multiple transformers.
2) `num_labels`: The number of labels. This is an argument passed forward when loading the model. Without this, the Huggingface transformers package will raise an error (at least for cases where `num_labels` isn't 2)

`name` simply specifies the name of the model. You could potentially change this out for any sequence classification model on Huggingfaces model hub. Lastly the `install_classification_extensions` adds the getter function for the model. Here it would for instance add `doc._.polarity` for extracting the label of the model as well as a `doc._.polarity_prop` for extracting the polarity probabilities for each class.


## Adding it to the NLP pipeline
---

Now it can simply be added it to the pipeline using `add_pipe`

In [9]:
import spacy

nlp = spacy.blank("da")  # dummy nlp

clf_transformer = nlp.add_pipe(
    "classification_transformer", name="berttone", config=config
)
clf_transformer.model.initialize()

<thinc.model.Model at 0x7fa076a86f40>

## Final Test

--- 

We can then finish off with a final test to see if everything works as intended:

In [10]:
texts = [
    "Analysen viser, at økonomien bliver forfærdelig dårlig",
    "Jeg tror alligvel, det bliver godt",
]

docs = nlp.pipe(texts)

for doc in docs:
    print(doc._.polarity)
    print(doc._.polarity_prop)

negative
{'prop': array([0.002, 0.008, 0.99 ], dtype=float32), 'labels': ['positive', 'neutral', 'negative']}
positive
{'prop': array([0.854, 0.146, 0.001], dtype=float32), 'labels': ['positive', 'neutral', 'negative']}


In [11]:
# we can also examine the wordpieces used (and see the entire TransformersData)

doc._.berttone_pol_trf_data

TransformerData(wordpieces=WordpieceBatch(strings=[['[CLS]', 'jeg', 'tror', 'alli', '##g', '##vel', ',', 'det', 'bliver', 'godt', '[SEP]']], input_ids=array([[    2,   102,  1421,  9682, 31704,  1041,   883,    49,   352,
          380,     3]]), attention_mask=array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]), lengths=[11], token_type_ids=array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])), tensors=[array([[ 3.3191876,  1.5510517, -4.0984917]], dtype=float32)], align=Ragged(data=array([[1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7],
       [8],
       [9]], dtype=int32), lengths=array([1, 1, 3, 1, 1, 1, 1], dtype=int32), data_shape=(-1,), cumsums=None))