# Annotate new data to improve NLP models using Rubrix and biome.text

## Introduction

Hey there! In this guide, we will show you how to use Rubrix to annotate new data and use this new data to improve existing Deep Learning models. Our use case will be Automatic Misogyny Detection (AMI): Deep Learning models able to detect the underlying misogyny on a given text. Ground-breaking work is being made every year on this subject, with shared tasks and new models that push the performance of these models closer and closer to be implemented in apps, social networks, and other digital environments. 

To train these NLP models we are going to use [biome.text](https://github.com/recognai/biome-text), an open-source library to train models with a simple workflow. Rubrix is compatible with almost any library or service, so we will work back and forth with both of them. 

The data used to feed the models and make the annotations comes from the [IberEval 2018](https://sites.google.com/view/ibereval-2…) shared task. It's a compilation of tweets, analyzed by experts and classified into 5 different misogyny categories. We are also making the specific datasets used in each step of this guide available, so they can be reproduced in the best way possible.

## Dependencies

If you want to reproduce this code, make sure that all the libraries needed to run this guide are installed and imported.

In [None]:
%pip install -U git+https://github.com/recognai/biome-text
%pip install rubrix
%pip install pandas
exit(0)  # Force restart of the runtime

In [None]:
#TODO:erase

import os
os.environ['WANDB_API_KEY'] = '7bd265df21100baa9767bb9f69108bc417db4b4a'

In [1]:
from biome.text import *
import pandas as pd
import rubrix as rb

#TODO: erase
from biome.text import *
from biome.text.hpo import TuneExperiment
from ray.tune.suggest.hyperopt import HyperOptSearch
from ray import tune
import math

import wandb 



## Loading datasets

Let's load some prepared datasets we've made to quickly train our first model.

In [None]:
# Loading the datasets
training_ds = Dataset.from_csv('annotation_data/training_full_df.csv')
test_ds = Dataset.from_csv('annotation_data/test_df.csv')

# TODO: remove el mapeo, hacer que se guarde bien
# Removing non-useful generated columns
training_ds = training_ds.map(remove_columns=["Unnamed: 0", "id"])
test_ds = test_ds.map(remove_columns=["Unnamed: 0", "id"])

## Training the first model

Creating NLP pipelines with biome.text is quick and convenient! We performed an HPO process on the background, to find suitable hyperparameters for this domain, so let's use them to create our first AMI model. Note that we're making a pipeline with BETO, a Spanish Transformer model, at the head. To learn more about what a Transformer is, please visit the [Transformer guide of biome.text](https://recognai.github.io/biome-text/v3.0.0/documentation/tutorials/4-Using_Transformers_in_biome_text.html).

In [None]:
pipeline_dict = {
    "name": "AMI_first_model",
    "features": {
        "transformers": {
            "model_name": "dccuchile/bert-base-spanish-wwm-cased", # BETO model
            "trainable": True,
            "max_length": 280,  # As we are working with data from Twitter, this is our max length
        }
    },
    "head": {
        "type": "TextClassification",
        
        # These are the possible misogyny categories. 0 indicates it is non-sexist
        "labels": [
            'sexual_harassment',
             'dominance',
             'discredit',
             'stereotype',
             'derailing',
             'passive',
             'active',
             '0'
        ],
        "pooler": {
            "type": "lstm",
            "num_layers": 1,
            "hidden_size": 256,
            "bidirectional": True,
        },
    },
}

pl = Pipeline.from_config(pipeline_dict)

In [None]:
batch_size = 16
trainer_dict = {
    "optimizer": {
        "type": "adamw",
        "lr": tune.loguniform(1e-5, 1e-4),
        "weight_decay": tune.loguniform(2e-3, 6e-2 )
    },
    "learning_rate_scheduler": {
        "type": "linear_with_warmup",
        "num_epochs": 10,
        "num_steps_per_epoch": int(math.floor(len(training_ds)/batch_size)),
        "warmup_steps": 100,
    },
    "batch_size": batch_size,
    "num_epochs": 10,
    
}

In [None]:
trainer_config = TrainerConfiguration(
    optimizer={
        "type": "adamw",
        "lr": 0.000023636840436059507,
        "weight_decay": 0.01438297700463013,
    },
    batch_size=8,
    max_epochs=10,
)

In [None]:
trainer = Trainer(
    pipeline=pl,
    train_dataset=training_ds,
    valid_dataset=test_ds,
    trainer_config=trainer_config
)

In [None]:
trainer.fit()

After `trainer.fit()` stops, the results of the training and the obtained model will be in the output folder. Nevertheless, we know that this training can long on non-dedicated machines, so we also provide the obtained model to download and import. If you don't want to manually train the model, run the cell below, which downloads and imports the trained model into a biome pipeline.

In [None]:
#TODO:Descargar e importar código

In [2]:
#TODO: eliminar
pl = Pipeline.from_pretrained("model_annotation_guide.tar.gz")

Some weights of BertModel were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased and are newly initialized: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


We can make some predictions, and take a look at the performance of the model.

In [3]:
pl.predict("Las mujeres no deberían tener derecho a voto")

{'labels': ['discredit',
  'derailing',
  '0',
  'stereotype',
  'dominance',
  'sexual_harassment',
  'passive',
  'active'],
 'probabilities': [0.7752683162689209,
  0.10697121918201447,
  0.056650321930646896,
  0.039366547018289566,
  0.012474359944462776,
  0.005319079849869013,
  0.0026962445117533207,
  0.001253939582966268]}

## Annotating as a single agent

When we said that we prepared some datasets with the tweets from IberEval 2018, we might have lied a little bit. We prepared the datasets with almost all tweets from IberEval 2018, but we also make a small compilation of 15 instances for you to start annotating. Picture you, after training your first model, trying to push a little bit its performance, or include some new data to cover as many different cases as possible. You came across new instances, and you want to annotate them and include them in a follow-up training. This is where Rubrix comes along. 

In this chapter of the guide we will show you how to:
* Import datasets to Rubrix (in our case, from a csv file).
* Annotate datasets using Rubrix.
* Export the annotated datasets to use them in your pipelines.

And we will cover the scenario of a single annotation agent. In the next chapter, we will give you some insight on how to annotate in teams.

### Logging a dataset into Rubrix

Let's start by logging the dataset into a Rubrix dataset. As these instances were initially annotated by the IberEval team, we can treat them as predictions. In the annotation process, therefore, we will decide if we agree with those predictions or not. If we take raw data, we wouldn't have these predictions to support our annotation process, but that's okay too!

The first step is to download the datasets. Then, we will iterate through all the instances, logging them into Rubrix.

In [2]:
#TODO: download to_annotate.csv

annotation_ds = Dataset.from_csv('annotation_data/to_annotate.csv')

Using custom data configuration default-28845c1d061ee336
Reusing dataset csv (/Users/ignaciotalaveracepeda/.cache/huggingface/datasets/csv/default-28845c1d061ee336/0.0.0/2dc6629a9ff6b5697d82c25b73731dd440507a69cbce8b425db50b751e8fcfd0)


In [6]:
records = []    # here we will store the TextClassificationRecord objects

# Possible labels, used to build the predictions
labels = ['sexual_harassment','dominance','discredit','stereotype','derailing','0']

for record in annotation_ds:

    # Prediction list of each record in the dataset
    predictions = []

    # We build the prediction list with tuples.
    for label in labels:

        # If the label is the one predicted in the dataset, it has a score of 1
        if label==record["label"]:
            pred = (label, 1)
            predictions.append(pred)

        # Else, it has a score of 0
        else:
            pred = (label, 0)
            predictions.append(pred)

    # Appending the record into the list
    records.append(rb.TextClassificationRecord(
        id=record["id"],
        inputs=record["text"],
        prediction=predictions,
        prediction_agent="IberEval 2018",
        metadata={'id': record["id"]},
        )
    )

# Logging the records into Rubrix
rb.log(records=records, name="annotation_misogyny_celia")

BulkResponse(dataset='annotation_misogyny_celia', processed=15, failed=0)

### Annotating in the UI

Once we've logged our annotation dataset into Rubrix, we can start annotating on the UI. We know that first times can be challenging, so here we have some instructions and a GIF to show you around.

1. Open Rubrix in your browser. If you're running it locally, it is usually running on [http://localhost:6900](http://localhost:6900).
2. Select the `annotation_misogyny` dataset.
3. On the upper-right corner, toggle the `Annotation mode`. 
4. Start selecting the categories that you think fit the input text. If you don't know Spanish, don't worry! 15 instances are not going to change the final model that much, and you will still learn how to annotate.
5. For each instance you can annotate a category by pressing it, discarding the record (if you think it does not fit the problem domain), or leave it without an annotation.

![Example of Annotation](https://imgur.com/e2ntn5c.gif)

If you're wondering why we annotated that instance as 'non-sexist' is because we are trying to make a model capable to differentiate if the input text is being misogynistic or if it is talking about something misogynistic that happened. This second case is considered non-sexist. 

And that's it! We have annotated a dataset as a single annotator, and these new data can be used to retrain and fine-tune our NLP model. To save the annotated dataset, just run the following code

In [None]:
annotation_final = rb.load("annotation_misogyny")

## Annotating with multiple agents

There will be times in which you won't annotate your data alone. Usually, it comes down to two reasons: there is a lot of data, or you want a certain grade of agreement in your annotations. When annotating sexist tweets, for example, it is crucial that you try to encode as few biases as possible. How can you fight against that? Well, one of the most effective ways to do it is by gathering a diverse, multicultural team to annotate. 

In this case, we need a way to merge several annotations of the same instance into one, preserving the will of the majority, and that's when the Inter-Annotator Agreement (IAA) comes in handy. There many different types of IAAs, some based on rules and others based on statistics. For our case, we want to show you how to implement a simple, rule-based IAA. Once you know how to merge annotations from different sources, you can do it the way you want, with more complex rules or using statistic indicators like Cohen's kappa.

Our simple rule system will consist on these few rules for three annotators:
* For an instance to be annotated with a category, there must be consensus of, at least, two annotators.
* If there's consensus in an sexism category, and other annotators find there's no sexism in the instance, it will be discarded.

We have loaded and annotated three datasets, one for each of our annotators, called `annotation_misogyn_anna`, `annotation_misogyn_bob` and `annotation_misogyn_celia`. Each one of them have annotated their instances, and now it's time to merge them into a single, annotated dataset.

### Save the annotated datasets

Once our agents have done their annotations, we should start by saving those datasets, extracting them from Rubrix.

In [2]:
annotation_anna = rb.load("annotation_misogyny_anna").set_index("id").sort_index()

annotation_bob = rb.load("annotation_misogyny_bob").set_index("id").sort_index()

annotation_celia = rb.load("annotation_misogyny_celia").set_index("id").sort_index()

`rb.load()`returns a Pandas Dataframe. We will use this library to merge our annotations into a single dataset.

Now, let's create the DataFrame that will hold our merged annotations

In [3]:
annotation_final = pd.DataFrame(columns=['id','text', 'prediction', 'prediction_agent', 'annotation', 'annotation_agent'])

And finally, we are going to make our main loop. In it, we will extract the annotations made by the three different annotators, and apply the rules we discussed before. We also added two new statuses, `non-annotated` if no annotator has annotated the record, and `no-consensus` if no category could be annotated, according to our rules. Take into consideration these special categories when reintroducing the dataset into a training pipeline. 

In [41]:
# We will use this tool to count ocurrences in list
from collections import Counter

In [38]:
# Iterating through the datasets, all of them has the same length
for i in range(len(annotation_anna)):
    
    # Extracting the annotated categories by each annotator
    category_annotated_anna = annotation_anna.iloc[i]["annotation"]
    category_annotated_bob = annotation_bob.iloc[i]["annotation"]
    category_annotated_celia = annotation_celia.iloc[i]["annotation"]
    
    # Merging the annotations into a list
    annotated_categories = [category_annotated_anna, category_annotated_bob, category_annotated_celia]
    # Flattening the list (if there is annotation, it is saved as an individual list)
    if not None in annotated_categories:
        annotated_categories = [item for sublist in annotated_categories for item in sublist] 
    
    # If all the elements in the list are None, we can return 'non-annotated'
    if all(annotation is None for annotation in annotated_categories):
        merged_annotation = 'non-annotated'    
    
    # Counting the annotations
    counted_annotations = Counter(annotated_categories)
    
    # Checking if the element with the most number of annotations follows the rules to be annotated
    if counted_annotations[max(counted_annotations, key=counted_annotations.get)] >= 2 and "0" not in counted_annotations:
        merged_annotation = max(counted_annotations, key=counted_annotations.get)
        
    else:
        merged_annotation = 'no-consensus'
        
        
    # As all elements in each row of the DataFrame except the annotations are the same, we can
    # retrieve information from any of the annotators. In our case is Anna.
    annotation_final = annotation_final.append({
        'id': annotation_anna.iloc[i]["metadata"]["id"],
        'text': annotation_anna.iloc[i]["inputs"]["text"],
        'prediction': annotation_anna.iloc[i]["prediction"],
        'prediction_agent': annotation_anna.iloc[i]["prediction_agent"],
        'annotation': merged_annotation,
        'annotation_agent': 'Anna, Bob and Celia',
    }, ignore_index=True)