# Annotator modeling experiment

__Objective:__ develop a model for toxicity detection that models annotators as well as data, so that possibly diverging opinions can be consistently captured.

Main sources:<br>
[1] ["Architectural sweet spots" paper](https://aclanthology.org/2023.emnlp-main.687/)<br>
[2] ["Jury Learning" paper](https://arxiv.org/abs/2202.02950)

[1] is the main source. Here, we'll try to implement one of the architectures they suggested.

Other sources (fine-tuning scripts, docs, ...):
- [Script for fine-tuning RoBERTa models](https://github.com/huggingface/transformers/blob/main/examples/pytorch/text-classification/run_classification.py)
- [Hugging Face RoBERTa docs](https://huggingface.co/docs/transformers/v4.46.3/en/model_doc/roberta)
- [Hugging Face source code for RoBERTa models](https://github.com/huggingface/transformers/blob/main/src/transformers/models/roberta/modeling_roberta.py)

Modelling plan (from lower to higher complexity, i.e. from less to more annotator modelling, see the [meeting notes](https://docs.google.com/document/d/1YRPDi0JVk2ijyNm_TURZWQW_wUkves-2S_3j3ZWhw-U/edit?pli=1&tab=t.0#heading=h.bt8gf8u1l9h4)):
1. Labels aggregated over annotators by majority vote, no annotator modelling at all (baseline).
2. *SepHeads* architecture (from [1]): a single LLM for text encoding (RoBERTa), with classification heads fine-tuned for each annotator.
3. *ShareREC* architecture (again from [1]): a single LLM for text encoding (RoBERTa, as above), a separate network for annotator encoding and a "neural combiner" component (either a feed-forward NN or a Deep Cross Network (DCN)) for merging the two pieces of information together and producing an output classification.

In [1]:
import sys
import pandas as pd
import torch

sys.path.append('../modules/')

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# device = torch.device('cpu')

%load_ext autoreload
%autoreload 2

## Load data

In [2]:
DATASET_PATHS = {
    'popquorn': '../data/samples/POPQUORN_offensiveness.csv',
    'kumar': {
        'train': '/data1/moscato/personalised-hate-boundaries-data/data/kumar_perspective_clean/kumar_processed_with_ID_and_full_perspective_clean_train.csv',
        # 'train':  '/data/milanlp/moscato/personal_hate_bounds_data/kumar_processed_with_ID_and_full_perspective_clean.csv',
        'test': '/data1/moscato/personalised-hate-boundaries-data/data/kumar_perspective_clean/kumar_processed_with_ID_and_full_perspective_clean_test.csv',
    }
}

DATASET_NAME = 'kumar'

training_data = pd.read_csv(DATASET_PATHS[DATASET_NAME]['train'])
test_data = pd.read_csv(DATASET_PATHS[DATASET_NAME]['test'])

In [3]:
# N annotators.
training_data['worker_id'].unique().shape

(17110,)

In [6]:
# Min N annotations per annotator.
training_data.groupby('worker_id')['text_id'].count().min()

np.int64(12)

## DeBERTa model for text classification

**Notes:**
- When instantiating a `RobertaForSequenceClassification` model, the classification head is instantiated automatically inside it, reading the `num_labels` and `classifier_dropout` parameters from the config. Unfortunately, the hidden size of the classification head is read from the `hidden_size` parameter of the config, which is also read by the encoder, so the two must be equal (and equal to 768 for pretrained model). Alternative: define a RoBERTa encoder with the default classifier head, then substitute it with a new one instantiated using a separate config.

In [7]:
from model_utils import get_deberta_model

  from .autonotebook import tqdm as notebook_tqdm


In [8]:
num_labels = training_data['toxic_score'].unique().shape[0]

model_dir = '/data1/shared_models/'

print('N labels found in training data:', num_labels)

deberta_tokenizer, deberta_model = get_deberta_model(
    num_labels,
    model_dir,
    device,
    use_custom_head=False,
    pooler_out_features=768,  # Default: 768.
    pooler_drop_prob=0.0,  # Default: 0.0
    classifier_drop_prob=0.1,  # Default: 0.1
    use_fast_tokenizer=False
)

2025-02-19 15:01:28,176 - get_deberta_model - INFO - Instantiating DeBERTa tokenizer


N labels found in training data: 2


2025-02-19 15:01:29,271 - get_deberta_model - INFO - Instantiating DeBERTa model with default classification head
Some weights of DebertaV2ForSequenceClassification were not initialized from the model checkpoint at microsoft/deberta-v3-base and are newly initialized: ['classifier.bias', 'classifier.weight', 'pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [9]:
# Old experiment with a RoBERTa model.
# # from transformers import AutoConfig, PretrainedConfig, AutoTokenizer, RobertaForSequenceClassification, pipeline
# # from transformers.models.roberta.modeling_roberta import RobertaClassificationHead

# model_id = 'roberta-base'

# # Config for the encoder.
# roberta_classifier_config = AutoConfig.from_pretrained(
#     model_id,
#     finetuning_task="text-classification",
#     id2label={
#         0: 'non-toxic',
#         1: 'toxic'
#     }
# )

# # Config for the classification head. These are all the
# # parameters a `RobertaClassificationHead` requires.
# roberta_classification_head_config = PretrainedConfig()

# roberta_classification_head_config.classifier_dropout = 0.1
# roberta_classification_head_config.hidden_size = 768
# roberta_classification_head_config.num_labels = 2


# # Instantiate tokenizer.
# roberta_tokenizer = AutoTokenizer.from_pretrained(model_id)

# # Instantiate RoBERTa model.
# roberta_classifier = RobertaForSequenceClassification.from_pretrained(
#     'roberta-base',
#     config=roberta_classifier_config,
# )

# # Substitute the default classification head with a custom one.
# roberta_classifier.classifier = RobertaClassificationHead(roberta_classification_head_config)


# # Put everything together in a single pipeline object.
# roberta_classifier_pipeline = pipeline(
#     task='text-classification',
#     config=roberta_classifier_config,
#     tokenizer=roberta_tokenizer,
#     model=roberta_classifier
# )

# roberta_classifier_pipeline(data_df.iloc[:12]['text'].tolist())

## RoBERTa model for text encoding + classification head (old experiment)

Text encoding:
- RoBERTa outputs two tensors:
    - Latent representation of the `<cls>` token (`model(**encoded_input).last_hidden_state[:, 0, :]`, where the first dimension is the batch size).
    - Output of the former, passed through a "RoBERTa pooler" linear layer with tanh activation (`model(**encoded_input).pooler_output`).
- From [this issue](https://github.com/huggingface/transformers/issues/8776) it looks like the representation fed into the classification head is actually the pooled one, but the classification head only works with the full output of the encoder as its input...

In [9]:
from transformers import RobertaTokenizer, RobertaModel, RobertaConfig

Tokenization and encoding.

In [9]:
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
text_encoder = RobertaModel.from_pretrained('roberta-base')

roberta_config = RobertaConfig.from_pretrained('roberta-base')

Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModel: ['lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.layer_norm.bias', 'lm_head.dense.bias', 'lm_head.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [10]:
encoded_input = tokenizer(data_df.iloc[:2]['text'].tolist(), return_tensors='pt', padding=True)

output = text_encoder(**encoded_input)

Classification head.

In [11]:
# Instantiate config for the classification head.
roberta_classification_head_config = PretrainedConfig()

roberta_classification_head_config.classifier_dropout = 0.1
roberta_classification_head_config.hidden_size = 768
roberta_classification_head_config.num_labels = 5

# Instantiate the classification head.
classification_head = RobertaClassificationHead(roberta_classification_head_config)

In [12]:
classification_head.eval()

RobertaClassificationHead(
  (dense): Linear(in_features=768, out_features=768, bias=True)
  (dropout): Dropout(p=0.1, inplace=False)
  (out_proj): Linear(in_features=768, out_features=5, bias=True)
)

In [13]:
logits = classification_head(output.last_hidden_state)

logits

tensor([[-0.0100,  0.2229,  0.2679, -0.0165,  0.1054],
        [-0.0244,  0.2138,  0.2665, -0.0081,  0.1243]],
       grad_fn=<AddmmBackward0>)

In [14]:
type(text_encoder)

transformers.models.roberta.modeling_roberta.RobertaModel

## DeBERTa model with annotator-specific classification head

__Objective:__ create a model that
- uses the body of a pre-trained DeBERTa model for text encoding,
- has a different classification head for each annotator and according to the selected annotator uses that head to make the prediction.

**Notes:**
- In some form, the annotator's ID **must** be included among the model's inputs.
- The annotator's ID **cannot** be included directly as the input, indices must be used. Reason: HF `Dataset`s object return batches of samples as lists, which are then converted to PyTorch tensors by a `DataCollator` object, but the conversion doesn't work for a list of strings. Therefore, **an explicit mapping between the `worker_id` field in the dataset (string) and integer IDs must be created**.

Explore the architecture of the DeBERTa model (`DebertaV2ForSequenceClassification` object).

In [10]:
def send_tensor_dict_to_device(tensor_dict, device):
    """
    """
    return {
        k: v.to(device=device)
        for k, v in tensor_dict.items()
    }

In [11]:
random_samples = training_data['comment'].sample(7).tolist()
random_samples_tokenized = send_tensor_dict_to_device(
    deberta_tokenizer(
        random_samples,
        padding=True,
        return_tensors='pt'
    ),
    device
)

# Get the latent representation for each token in each sequence
# in the batch via the DeBERTa encoder.
with torch.no_grad():
    tokens_latent_reps = deberta_model.deberta(**random_samples_tokenized)['last_hidden_state']

# Shape: (batch_size, seq_len, hidden_dim).
tokens_latent_reps.shape

torch.Size([7, 54, 768])

In [12]:
# Note: latent representations are put in a `BaseModelOutput`
#       object implemented as an OrderedDict with indexing
#       as well.
with torch.no_grad():
    print(
        (deberta_model.deberta(**random_samples_tokenized)['last_hidden_state']
         == deberta_model.deberta(**random_samples_tokenized)[0]).all()
    )

tensor(True, device='cuda:0')


In [13]:
# Get logits from the DeBERTa model by applying
# sequentially: encoder -> pooler -> dropout -> classifier.
with torch.no_grad():
    test_logits = deberta_model.classifier(
        deberta_model.dropout(
            deberta_model.pooler(
                deberta_model.deberta(**random_samples_tokenized)[0]
            )
        )
    )

# Shape: (batch_size, num_labels).
test_logits

tensor([[-0.0290, -0.1493],
        [-0.0317, -0.1563],
        [-0.0439, -0.1568],
        [-0.0301, -0.1575],
        [-0.0301, -0.1561],
        [-0.0308, -0.1580],
        [-0.0412, -0.1559]], device='cuda:0')

Test the DeBERTa model with annotator-specific heads.

In [14]:
from models import DebertaWithAnnotatorHeads
from copy import deepcopy

In [22]:
test_annotator_ids = [0, 1]

deberta_with_annotator_heads_model = DebertaWithAnnotatorHeads(
    deberta_encoder=deepcopy(deberta_model.deberta),
    deberta_pooler=deepcopy(deberta_model.pooler),
    deberta_dropout=deepcopy(deberta_model.dropout),
    num_labels=num_labels,
    annotator_ids=test_annotator_ids,
)

In [23]:
test_samples = pd.concat([
    training_data[training_data['annotator_id'] == annotator_id].iloc[:2]
    for annotator_id in test_annotator_ids
])
test_samples_tokenized = send_tensor_dict_to_device(
    deberta_tokenizer(test_samples['comment'].tolist(), padding=True, return_tensors='pt'),
    device
)

test_samples

Unnamed: 0,comment,text_id,worker_id,toxic_score,extreme_annotator,annotator_id
0,Just a matter of time before pick up on this s...,0,24482c451b411b96d2c2880bafbab9884007e000d143c0...,0,no,0
5,this is QUINN you DUMBASS 😭😭😭,1,24482c451b411b96d2c2880bafbab9884007e000d143c0...,0,no,0
1,Just a matter of time before pick up on this s...,0,dbc501198ada6725d8e8cc6f0101824f04d4b4b8935059...,0,no,1
6,this is QUINN you DUMBASS 😭😭😭,1,dbc501198ada6725d8e8cc6f0101824f04d4b4b8935059...,1,no,1


In [26]:
deberta_with_annotator_heads_model.eval()
# deberta_with_annotator_heads_model.train()

with torch.no_grad():
    test_logits = deberta_with_annotator_heads_model(
        **test_samples_tokenized,
        annotator_ids=test_samples['annotator_id'].tolist()
    )

# Shape: (batch_size, num_labels).
test_logits

{'logits': tensor([[-0.0964, -0.0032],
         [-0.0991,  0.0088],
         [-0.1300,  0.0346],
         [-0.1335,  0.0504]], device='cuda:0')}

In [28]:
# Forward pass including the labels among the inputs.
with torch.no_grad():
    test_logits = deberta_with_annotator_heads_model(
        **test_samples_tokenized,
        annotator_ids=test_samples['annotator_id'].tolist(),
        labels=torch.tensor(test_samples['toxic_score'].values).to(device=device)
    )

# Shape: (batch_size, num_labels).
test_logits

{'loss': tensor(0.7184, device='cuda:0'),
 'logits': tensor([[-0.0964, -0.0032],
         [-0.0991,  0.0088],
         [-0.1300,  0.0346],
         [-0.1335,  0.0504]], device='cuda:0')}

Model training.

Source for building custom models compatible with the Hugging Face Transformers framework:
- [Hugging Face custom models](https://huggingface.co/docs/transformers/custom_models)
- [Related discussion](https://discuss.huggingface.co/t/using-huggingface-trainer-for-custom-models/16882/6)
- [Resources](https://discuss.huggingface.co/t/resources-for-using-custom-models-with-trainer/4151)

In [29]:
import transformers
import datasets
from training_metrics import compute_metrics_sklearn

In [30]:
def tokenize_function(examples):
    return deberta_tokenizer(
        examples["text"],
        padding='max_length',
        truncation=True,
        max_length=512,
        # return_tensors='pt'
    )

In [31]:
# For testing.
test_ds = datasets.Dataset.from_dict(
    test_samples[[
        'comment',
        'toxic_score',
        'annotator_id'
    ]].rename(
        columns={
            'comment': 'text',
            'toxic_score': 'label',
            'annotator_id': 'annotator_ids',
        }
    )
    .to_dict(orient='list')
)

tokenized_test_ds = (
    test_ds
    .map(tokenize_function, batched=True)
    .remove_columns("text")
)

tokenized_test_ds

Map: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 703.77 examples/s]


Dataset({
    features: ['label', 'annotator_ids', 'input_ids', 'token_type_ids', 'attention_mask'],
    num_rows: 4
})

In [38]:
EXPERIMENT_ID = 'sepheads_model_test'
MODEL_OUTPUT_DIR = f'/data1/moscato/personalised-hate-boundaries-data/models/{EXPERIMENT_ID}/'
N_EPOCHS = 10

training_args = transformers.TrainingArguments(
    output_dir=MODEL_OUTPUT_DIR,
    eval_strategy="epoch",
    save_strategy="no",  # Options: 'no', 'epoch', 'steps' (requires the `save_steps` argument to be set though).
    save_total_limit=2,
    load_best_model_at_end=False,
    learning_rate=5e-6,
    per_device_train_batch_size=4,  # Default: 8.
    gradient_accumulation_steps=1,  # Default: 1.
    per_device_eval_batch_size=4,  # Default: 8.
    num_train_epochs=N_EPOCHS,
    warmup_ratio=0.0,  # For linear warmup of learning rate.
    metric_for_best_model="f1",
    push_to_hub=False,
    # label_names=list(roberta_classifier.config.id2label.keys()),
    logging_strategy='epoch',
    logging_first_step=True,
    logging_dir=None,
    # logging_steps=10,
    disable_tqdm=False
)

data_collator = transformers.DataCollatorWithPadding(tokenizer=deberta_tokenizer)

trainer = transformers.Trainer(
    model=deberta_with_annotator_heads_model,
    args=training_args,
    train_dataset=tokenized_test_ds,
    eval_dataset=tokenized_test_ds,
    data_collator=data_collator,
    tokenizer=deberta_tokenizer,
    compute_metrics=compute_metrics_sklearn,
)

training_output = trainer.train()

  trainer = transformers.Trainer(
Detected kernel version 4.18.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.6728,0.674811,0.5,0.5,0.666667,0.666667
2,0.6909,0.667037,0.5,0.5,0.666667,0.666667
3,0.6655,0.660247,0.75,0.733333,0.75,0.833333
4,0.7255,0.653036,0.75,0.733333,0.75,0.833333
5,0.6683,0.645996,0.75,0.733333,0.75,0.833333
6,0.6705,0.63959,0.75,0.733333,0.75,0.833333
7,0.6348,0.634191,1.0,1.0,1.0,1.0
8,0.6373,0.630148,1.0,1.0,1.0,1.0
9,0.5996,0.627633,1.0,1.0,1.0,1.0
10,0.6801,0.626387,1.0,1.0,1.0,1.0




## Annotator encoding

In [17]:
from sklearn.preprocessing import OneHotEncoder

One-hot encoding of the annotators' categorical features.

In [22]:
annotators_categorical_features = [
    'gender',
    'race',
    'age',
    'occupation',
    'education'
]

In [32]:
data_df_one_hot = pd.concat(
    [
        data_df.drop(columns=annotators_categorical_features),
        pd.DataFrame(
            one_hot_encoder.fit_transform(data_df[annotators_categorical_features]),
            columns=one_hot_encoder.get_feature_names_out()
        )
    ],
    axis=1
)

data_df_one_hot

Unnamed: 0,instance_id,user_id,text,offensiveness,gender_Man,gender_Non-binary,gender_Woman,race_Arab American,race_Asian,race_Black or African American,...,occupation_Prefer not to disclose,occupation_Retired,occupation_Self-employed,occupation_Student,occupation_Unemployed,education_College degree,education_Graduate degree,education_High school diploma or equivalent,education_Less than a high school diploma,education_Other
0,530,0,"I think a lot of Dethklok songs use drop C, wo...",1.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0
1,1280,0,There are relatively simple ways around all of...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0
2,621,0,Tell the british soldier in WW1 to shoot that ...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0
3,676,0,Top comment pretty much. I have gay friends an...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0
4,635,0,Don't tell them just let them and their liniag...,3.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13031,471,262,They’re closed anti-vaxx Facebook groups. When...,1.0,0.0,0.0,1.0,0.0,1.0,0.0,...,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
13032,1033,262,Bioethics; an interesting field in which the w...,1.0,0.0,0.0,1.0,0.0,1.0,0.0,...,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
13033,740,262,Or they are really secure but hang around inse...,5.0,0.0,0.0,1.0,0.0,1.0,0.0,...,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
13034,894,262,Don't have to worry about being too big to fit...,4.0,0.0,0.0,1.0,0.0,1.0,0.0,...,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0


In [29]:
one_hot_encoder.get_feature_names_out()

array(['gender_Man', 'gender_Non-binary', 'gender_Woman',
       'race_Arab American', 'race_Asian',
       'race_Black or African American', 'race_Hispanic or Latino',
       'race_Native American', 'race_White', 'age_18-24', 'age_25-29',
       'age_30-34', 'age_35-39', 'age_40-44', 'age_45-49', 'age_50-54',
       'age_54-59', 'age_60-64', 'age_>65', 'occupation_Employed',
       'occupation_Homemaker', 'occupation_Other',
       'occupation_Prefer not to disclose', 'occupation_Retired',
       'occupation_Self-employed', 'occupation_Student',
       'occupation_Unemployed', 'education_College degree',
       'education_Graduate degree',
       'education_High school diploma or equivalent',
       'education_Less than a high school diploma', 'education_Other'],
      dtype=object)