# < Title > 

This tutorial will walk you through the process of using Rubrix to improve weak supervision and data programming workflows with the [Epoxy library](https://github.com/HazyResearch/epoxy), an extension to [FlyingSquid](https://github.com/HazyResearch/flyingsquid).

- Using Rubrix, we define heuristic rules for the [SST-2](https://nlp.stanford.edu/sentiment/index.html) dataset.
- We produce weak labels from our rules with FlyingSquid, and use them to train a sentiment classification model.
- We expand the weak labels produced by FlyingSquid with Epoxy, and compare them with our previous results.

# Introduction

Our goal is to show you how you can incorporate Rubrix into data programming workflows to programatically build training data with a human-in-the-loop approach. We will use the FlyingSquid and Epoxy libraries.

## What is weak supervision? and FlyingSquid? and Epoxy?

Weak supervision is a branch of machine learning based on getting lower quality labels more efficiently. We can achieve this by using FlyingSquid, a new framework for automatically building label models from multiple user-defined labelling functions.  

Epoxy is a library to turn the labelling functions produced by FlyingSquid into *extended labelling functions* through nearest-neighbors search with pre-trained word embeddings. This allows us to expand the coverage of our heuristic rules beyond the records that we have directly annotated with them.

## This tutorial

In this tutorial, we will show you how to extend weak supervision workflows in FlyingSquid and Epoxy with Rubrix.

<!--- TO-DO : Tutorial overview -->

# Setup 

Rubrix, is a free and open-source tool to explore, annotate, and monitor data for NLP projects.

If you are new to Rubrix, check out the ⭐ Github repository.

If you have not installed and launched Rubrix yet, check the Setup and Installation guide.

For this tutorial we also need some third party libraries that can be installed via pip:

In [None]:
%pip install sentence_transformers datasets

# 1. Log the dataset into Rubrix

Rubrix allows you to log and track data for different NLP tasks (such as `Token Classification` or `Text Classification`).

In this tutorial, we will use the SST-2 dataset, a standard benchmark for sentiment analysis. SST-2 is made of movie reviews that must be classified as either positive or negative.


## The dataset

We will use FlyingSquid's data programming methods to annotate our training set, with the help of Rubrix for analyzing and reviewing the data. We will then train a model on this training set.

Although the gold labels for the training set of SST-2 are already known, we will purposefully ignore them, as our goal in this tutorial is to build our own annotations and see how well they perform on the development set.

In [1]:
from datasets import load_dataset

train = load_dataset("glue", "sst2", split="train")
dev = load_dataset("glue", "sst2", split="validation")

Reusing dataset glue (/home/user/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
Reusing dataset glue (/home/user/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)


In [None]:
import rubrix as rb

records = [
    rb.TextClassificationRecord(
        inputs=text
    )
    for text in train['sentence']
]

records += [
    rb.TextClassificationRecord(
        inputs=text,
        annotation=dev.features['label'].names[label]
    )
    for text, label in zip(dev['sentence'], dev['label'])
]


rb.log(records, name="weak_supervision_sst2")

In [2]:
from rubrix.labeling.text_classification import Rule, WeakLabels
import rubrix as rb

positive_keywords = [
    "funny", "comedy", "love",
    "fun", "entertaining", "romantic",
    "compelling", "worth", "sweet",
    "fascinating", "laughs", "comic",
    "enjoyable", "clever", "perfect",
    "beautiful", "amusing", "powerful",
    "charming", "engaging", 
]

negative_keywords = [
    "bad", "dull", "worst", "worse",
    "spiritless", "silly", "monotonous", 
    "terrible", "banal", "unimaginative", 
    "inane", "shallow", "offensive", 
    "redundant", "lazy", "loose", 
    "poorly", "awful", "pathetic", 
    "lousy", "inept"
]

rules = [ Rule(query=keyword, label="positive") for keyword in positive_keywords ]
rules += [ Rule(query=keyword, label="negative") for keyword in negative_keywords ]

from rubrix.labeling.text_classification import load_rules

# optionally add the rules defined in the web app UI
rules += load_rules(dataset="weak_supervision_sst2")

# apply the rules to a dataset to obtain the weak labels
weak_labels = WeakLabels(
    rules=rules,
    dataset="weak_supervision_sst2"
)

  from cryptography.utils import int_from_bytes, int_to_bytes


Preparing rules:   0%|          | 0/41 [00:00<?, ?it/s]

Applying rules:   0%|          | 0/68221 [00:00<?, ?it/s]

In [3]:

from rubrix.labeling.text_classification import FlyingSquid

# we pass our WeakLabels instance to our FlyingSquid label model
flyingsquid_model = FlyingSquid(weak_labels)

# we fit the model
flyingsquid_model.fit()



In [4]:
print(flyingsquid_model.score(tie_break_policy="abstain", output_str=True))

                precision recall f1-score support
negative             0.90   0.50     0.64      94
positive             0.66   0.95     0.78      97
macro avg            0.78   0.72     0.71     191
weighted avg         0.78   0.73     0.71     191


efficacy                             0.47     191
fscore_cautious                      0.34     191
coverage                             0.22     191
accuracy                             0.73     191


In [8]:
print(flyingsquid_model.score(tie_break_policy="random", output_str=True))

              precision    recall  f1-score   support

    negative       0.57      0.51      0.54       428
    positive       0.57      0.63      0.60       444

    accuracy                           0.57       872
   macro avg       0.57      0.57      0.57       872
weighted avg       0.57      0.57      0.57       872



In [5]:
import pandas as pd

# get your training records with the predictions of the label model
records_for_training = flyingsquid_model.predict()

# log the records to a new dataset in Rubrix
rb.log(records_for_training, name="flyingsquid_results")

# extract training data
training_data = pd.DataFrame(
    [
        {"text": rec.inputs["text"], "label": flyingsquid_model.weak_labels.label2int[rec.prediction[0][0]]}
        for rec in records_for_training
    ]
)

  0%|          | 0/9574 [00:00<?, ?it/s]

9574 records logged to http://localhost:6900/ws/rubrix/flyingsquid_results


In [6]:
from sklearn.feature_extraction.text import TfidfTransformer, CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline

# define our final classifier
classifier = Pipeline([
    ('vect', CountVectorizer()),
    ('clf', MultinomialNB())
])

# fit the classifier
classifier.fit(
    X=training_data.text.tolist(),
    y=training_data.label.values
)

Pipeline(steps=[('vect', CountVectorizer()), ('clf', MultinomialNB())])

In [7]:
# compute the test accuracy
accuracy = classifier.score(
    X=dev['sentence'],
    y=dev['label']
)

print(f"Test accuracy: {accuracy}")

Test accuracy: 0.6009174311926605


# Epoxy

In [9]:
train = load_dataset("glue", "sst2", split="train[:10%]")
dev = load_dataset("glue", "sst2", split="validation")

Reusing dataset glue (/home/user/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
Reusing dataset glue (/home/user/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)


In [10]:
import rubrix as rb

records = [
    rb.TextClassificationRecord(
        inputs=text
    )
    for text in train['sentence']
]

records += [
    rb.TextClassificationRecord(
        inputs=text,
        annotation=dev.features['label'].names[label]
    )
    for text, label in zip(dev['sentence'], dev['label'])
]


rb.log(records, name="weak_supervision_sst2_epoxy")

  0%|          | 0/7607 [00:00<?, ?it/s]

7607 records logged to http://localhost:6900/ws/rubrix/weak_supervision_sst2_epoxy


BulkResponse(dataset='weak_supervision_sst2_epoxy', processed=7607, failed=0)

In [11]:
# apply the rules to a dataset to obtain the weak labels
weak_labels_epoxy = WeakLabels(
    rules=rules,
    dataset="weak_supervision_sst2_epoxy"
)

Preparing rules:   0%|          | 0/41 [00:00<?, ?it/s]

Applying rules:   0%|          | 0/7607 [00:00<?, ?it/s]

In [12]:
!docker stop $(docker container ls -q)

67ca69ddac6b


In [13]:
from sentence_transformers import SentenceTransformer

class SentenceTransformerModel(object):
    
    def __init__(self, embedding_model_name):
        self.embedding_model = SentenceTransformer(embedding_model_name)
    
    def __call__(self, records):
        texts = [ x.inputs["text"] for x in records ]
        embeddings = self.embedding_model.encode(texts)
        return embeddings

In [14]:
from rubrix.labeling.text_classification import Epoxy
embedding_model_name = "average_word_embeddings_glove.840B.300d"
model = SentenceTransformerModel(embedding_model_name)

In [15]:
embeddings = model(weak_labels_epoxy.records())

In [None]:
epoxy = Epoxy(weak_labels_epoxy, embeddings)
epoxy.fit()

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))














































In [None]:
print(epoxy.score(tie_break_policy="abstain"))

In [None]:
print(epoxy.score(tie_break_policy="random"))

In [None]:
# get your training records with the predictions of the label model
records_for_training = epoxy.predict()

# log the records to a new dataset in Rubrix

# extract training data
training_data = pd.DataFrame(
    [
        {"text": rec.inputs["text"], "label": epoxy.weak_labels.label2int[rec.prediction[0][0]]}
        for rec in records_for_training
    ]
)

In [None]:
# preview training data
training_data

In [None]:
from sklearn.feature_extraction.text import TfidfTransformer, CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline

# define our final classifier
classifier = Pipeline([
    ('vect', CountVectorizer()),
    ('clf', MultinomialNB())
])

# fit the classifier
classifier.fit(
    X=training_data.text.tolist(),
    y=training_data.label.values
)

In [None]:
# compute the test accuracy
accuracy = classifier.score(
    X=dev['sentence'],
    y=dev['label']
)

print(f"Test accuracy: {accuracy}")