# Registering a Hugging Face (roBERTa Sentiment Classification) model on Verta

Within Verta, a "Model" can be any arbitrary function: a traditional ML model (e.g., sklearn, PyTorch, TF, etc); a function (e.g., squaring a number, making a DB function etc.); or a mixture of the above (e.g., pre-processing code, a DB call, and then a model application.) See more [here](https://docs.verta.ai/verta/registry/concepts).

This notebook provides an example of how to catalog a Hugging Face model on Verta as a Verta Standard Model by extending [VertaModelBase](https://verta.readthedocs.io/en/master/_autogen/verta.registry.VertaModelBase.html?highlight=VertaModelBase#verta.registry.VertaModelBase).

Updated for Verta version: 0.21.0

This notebook walks through initializing roBERTa sentiment classification model through huggingface, and cataloging them to the Verta platform.

<a href="https://colab.research.google.com/github/VertaAI/examples/blob/registry_examples/registry/huggingface/roberta-tweet-sentiment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 0. Imports

In [None]:
!python -m pip install verta
!python -m pip install transformers

In [None]:
from transformers import (
    pipeline,
    AutoModelForSequenceClassification,
    AutoTokenizer,
)

## 1. Register model

### 1.1 Define Model

A model has to exist before we can register, so we will instantiate one here in our notebook.

This model class will be an extensible wrapper over a [pre-trained Hugging Face classifier](https://huggingface.co/transformers/index.html).

**Model Info**

This is a RoBERTa-base model trained on ~124M tweets from January 2018 to December 2021, and finetuned for sentiment analysis with the TweetEval benchmark. The original Twitter-based RoBERTa model can be found here and the original reference paper is [TweetEval](https://github.com/cardiffnlp/tweeteval). This model is suitable for English.

Reference Paper: [TimeLMs paper](https://arxiv.org/abs/2202.03829).
Git Repo: [TimeLMs official repository](https://github.com/cardiffnlp/timelms).

Labels: 0 -> Negative; 1 -> Neutral; 2 -> Positive

In [None]:
from verta.registry import VertaModelBase, verify_io
from verta.utils import ModelAPI

class roBERTa(VertaModelBase):
    MODEL = "cardiffnlp/twitter-roberta-base-sentiment-latest"

    def __init__(self, artifacts=None):
        self.model = pipeline(
            task="sentiment-analysis",
            model=AutoModelForSequenceClassification.from_pretrained(self.MODEL),
            tokenizer=AutoTokenizer.from_pretrained(self.MODEL),
        )

    @verify_io
    def predict(self, texts):
        return self.model(texts)

    def example(self):
        return [
            "I like you",
            "I don't like this film",
        ]

As a sanity check, we can validate that our model is instantiable and can produce predictions.

In [None]:
roberta = roBERTa()

texts = roberta.example()
prediction = roberta.predict(texts)

print(texts)
print(prediction)

### 1.2. Register Model to Verta Model Catalog

Now that the model is in a good shape, we can register it into the Verta platform.

We'll connect to Verta through the [Verta Python Client](https://verta.readthedocs.io/en/main/_autogen/verta.Client.html)
create a [registered model](https://verta.readthedocs.io/en/master/_autogen/verta.registry.entities.RegisteredModel.html) for our roBERTa text classification model  
and a [version](https://verta.readthedocs.io/en/master/_autogen/verta.registry.entities.RegisteredModelVersion.html) to associate this particular model with.

All of these can be viewed in the Verta web app once they are created.

In [None]:
# Paste your credentials in this cell or anywhere above this along with the code snippet to connect to Verta Platform

from verta import Client

client = Client(
        #   host="app.verta.ai",
        #   email="user@verta.ai",
        #   dev_key="a765b2de-786d-466c-b2d8-thiye06f80d5",
        )

In [None]:
# Create/Get a Verta registered model

from verta.registry import data_type, task_type

registered_model = client.get_or_create_registered_model(
    name="Twitter-roBERTa-base-example", # Name to identify on the catalog
    desc="Model for Sentiment Analysis on Tweets", # Small description to show on Model Card
    data_type=data_type.Text(), # Data Type of the Model
    task_type=task_type.Classification(), # Task Type of the Model
    labels=["NLP", "Neural Net"], # tags/labels to filter, search and categorize
)

In [None]:
from verta.environment import Python

distilbert_model = registered_model.create_standard_model(
    model_cls=roBERTa,  # Model class defined over VertaModelBase
    name="v1", # Name to identify the version in the model versions tab
    environment=Python([
        "dill",  # used by torch internally for serialization
        "torch",
        "transformers",
    ]),
    labels=["std_model_verta_model_base"], # tags/labels to filter, search and categorize
    model_api=ModelAPI(texts, prediction), # (Optional) To Populate the Model API
)

And that's it! You should be able to see your model in the Model Catalog