# Rubrix Cookbook

Yeah, you heard it right! Not a cheatsheet, but a cookbook. A notebook of recipes. 

In this quick guide, we are going to show you how easy can Rubrix be used side by side with some of the most popular AI Python libraries. Rubrix is *agnostic*, it can be used  with any library or framework, no need to implement any interface or modify your existing toolbox and workflows. With these few example you will be able to start loging and exploring your data for any of these libraries with just a glance, and maybe pick up some inspiration if your library of choice is not in this list.

If you miss one AI library in this list, tell us about it at [our Github forum](https://github.com/recognai/rubrix/discussions).

## HuggingFace Transformers

HuggingFace has given to the NLP community many useful tools, and with HuggingFace Transformers is easier than ever. With a few lines of code we can take a Transformer model from their hub, start making some predictions and then log them into Rubrix.

### Text Classification

Let's try a zero-shot classifier using SqueezeBERT for predicting the topic of a sentence.

In [37]:
import rubrix as rb
from transformers import pipeline

# We define our HuggingFace Pipeline
classifier = pipeline(
        "zero-shot-classification",
        model="typeform/squeezebert-mnli",
        framework="pt",
    )
    
# Choosing our input
text_input = "I love watching rock climbing competitions!"

# Making the prediction
prediction = classifier(
    text_input,
    candidate_labels=[
        "politics",
        "sports",
        "technology",
    ],
    hypothesis_template="This text is about {}.",
)

# Creating a record object to log into rubrix.
record = rb.TextClassificationRecord(
    inputs={"text": prediction["sequence"]},
    prediction=list(zip(prediction["labels"], prediction["scores"])),
    prediction_agent="https://huggingface.co/typeform/squeezebert-mnli",
)

# Logging into Rubrix
rb.log(records=record, name="zeroshot-topic-classifier")

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


BulkResponse(dataset='zeroshot-topic-classifier', processed=1, failed=0)

### Token Classification

We will explore a NER zero-shot classifier in the English language.

In [38]:
import rubrix as rb
from transformers import pipeline

# We define our HuggingFace Pipeline
classifier = pipeline(
        "ner",
        model="elastic/distilbert-base-cased-finetuned-conll03-english",
        framework="pt",
    )

# Choosing our input
text_input = "My name is Sarah and I live in London"

# Making the prediction
predictions = classifier(
    text_input,
)

# Creating a record object to log into rubrix.
record = rb.TokenClassificationRecord(
    text=text_input,
    tokens=text_input.split(),
    prediction=[(pred["entity"], pred["start"], pred["end"]) for pred in predictions],
    prediction_agent="https://huggingface.co/elastic/distilbert-base-cased-finetuned-conll03-english",
)

# Logging into Rubrix
rb.log(records=record, name="zeroshot-ner")

BulkResponse(dataset='zeroshot-ner', processed=1, failed=0)

## Spacy



### Text Classification

### Token Classification

In [39]:
import rubrix as rb
import spacy

input_text = "Paris a un enfant et la forêt a un oiseau ; l’oiseau s’appelle le moineau ; l’enfant s’appelle le gamin"

# Loading spaCy model
nlp = spacy.load("fr_core_news_sm")

# Creating spaCy doc
doc = nlp(input_text)

# Building TokenClassificationRecord
record = rb.TokenClassificationRecord(
    text=input_text,
    tokens=[token.text for token in doc],
    prediction=[(ent.label_, ent.start_char, ent.end_char) for ent in doc.ents],
    prediction_agent="spacy.fr_core_news_sm",
)

# Logging into Rubrix
rb.log(records=record, name="lesmiserables-ner")

BulkResponse(dataset='lesmiserables-ner', processed=1, failed=0)

## Flair

Developed by the University of Berlin, it is a simple, yet powerful state-of-the-art NLP framework. It provides an NLP library, a text embedding library and a PyTorch framework for NLP. Flair offers sequence tagging language models in English, Spanish, Dutch, German and many more, and they are also hosted on [HuggingFace Model Hub](https://huggingface.co/models).

### Text Classification

Flair offers some zero-shot models ready to be used, which we are going to use to introduce logging `TextClassificationRecords` with Rubrix. Let's see how to integrate Rubrix in their Deutch offensive language model (we promise to not get very explicit).

In [40]:
import rubrix as rb

from flair.models import TextClassifier
from flair.data import Sentence

# Load our pre-trained TARS model for English
classifier = TextClassifier.load('de-offensive-language')

input_text = 'Du erzählst immer Quatsch.' #something like: "You are always narrating silliness."

# Creating Sentence object
sentence = Sentence(input_text) 

# Make the prediction
classifier.predict(sentence, multi_class_prob=True)

# Creating a record object to log into rubrix.
record = rb.TextClassificationRecord(
    inputs={"text": input_text},
    prediction=[(pred.value, pred.score) for pred in sentence.labels],
    prediction_agent="flair/de-offensive-language",
)

# Logging into Rubrix
rb.log(records=record, name="german-offensive-language")

2021-05-29 21:50:30,492 loading file /Users/ignaciotalaveracepeda/.flair/models/germ-eval-2018-task-1-v0.5.pt


BulkResponse(dataset='german-offensive-language', processed=1, failed=0)

### Token Classification

Flair offers a lot of tools for Token Classification, supporting tasks like named entity recognition (NER), part-of-speech tagging (POS), special support for biomedical data... and with a growing number of supported languages. Lets see some examples for NER and POS tagging.

#### NER

In this example, we will try the pretrained Dutch NER model from Flair.

In [41]:
from flair.data import Sentence
from flair.models import SequenceTagger

input_text = "De Nachtwacht is in het Rijksmuseum"

# Loading our NER model
tagger = SequenceTagger.load('flair/ner-dutch')

# Creating Sentence object
sentence = Sentence(input_text)

# run NER over sentence
tagger.predict(sentence)

# Building TokenClassificationRecord
record = rb.TokenClassificationRecord(
    text=input_text,
    tokens=[token.text for token in sentence],
    prediction=[(entity.get_labels()[0].value, entity.start_pos, entity.end_pos) for entity in sentence.get_spans('ner')],
    prediction_agent="flair/ner-dutch",
)

# Logging into Rubrix
rb.log(records=record, name="dutch-flair-ner")

2021-05-29 21:50:34,951 loading file /Users/ignaciotalaveracepeda/.flair/models/ner-dutch/fd03fc5c7a02268a538432a010f4d09ec15e55fe70efd02dfea158916fa4cba8.04438768e42ba7d6599cea01fcabf77563c8c7e2b27a245618f0ed535ad8919c


BulkResponse(dataset='dutch-flair-ner', processed=1, failed=0)

#### POS tagging

In the following snippet we will use de multilingual POS tagging model from Flair.

In [42]:
from flair.data import Sentence
from flair.models import SequenceTagger

input_text = "George Washington went to Washington. Dort kaufte er einen Hut."

# Loading our NER model
tagger = SequenceTagger.load('flair/upos-multi')

# Creating Sentence object
sentence = Sentence(input_text)

# run NER over sentence
tagger.predict(sentence)

# Building TokenClassificationRecord
record = rb.TokenClassificationRecord(
    text=input_text,
    tokens=[token.text for token in sentence],
    prediction=[(entity.get_labels()[0].value, entity.start_pos, entity.end_pos) for entity in sentence.get_spans()],
    prediction_agent="flair/upos-multi",
)

# Logging into Rubrix
rb.log(records=record, name="flair-pos-tagging")

2021-05-29 21:50:51,193 loading file /Users/ignaciotalaveracepeda/.flair/models/upos-multi/1a44f168663182024fd3ea6d7dcaeee47fe5bcb537cc737ad058b64ad4db9736.5f899f25846741510a6567b89027d988bd6f634b2776a7c3e834fea4629367cb


BulkResponse(dataset='flair-pos-tagging', processed=1, failed=0)