## Exploration of existing solutions
Given task is an example of text2text problem, for which there are a variety of models already developed. The most straight forward solution to solve the problem is to find existing text2text model and finetune it on the provided dataset.

Exploring the internet, I have found a [general text2text model](https://huggingface.co/t5-small) which aims to be finetunable to different kinds of tasks. In the original [paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) Raffel et al. have tested the model on a variety of datasets and a variety of different tasks, some of which are more complicated that detoxificiation task. That means this model should be good enough for the task at hand.

The model is available at HuggingFace and I'll be using their set of libraries for that model. They provide a framework for translation tasks, so I'll use their [example training pipeline](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/translation.ipynb).

In [1]:
# load the cleaned dataset from previous notebook
import numpy as np
import pandas as pd

df = pd.read_csv("processed.csv")
df.head()

Unnamed: 0,id,toxic,detoxified,tox_score,detox_score,similarity,length_diff
0,0,"if Alkar floods her with her mental waste, it ...","If Alkar is flooding her with psychic waste, t...",0.981983,0.014195,0.785171,0.010309
1,1,you're becoming disgusting.,Now you're getting nasty.,0.999039,0.065473,0.749687,0.071429
2,2,"well, we can spare your life.","Well, we could spare your life, for one.",0.985068,0.213313,0.919051,0.268293
3,3,"monkey, you have to wake up.","Ah! Monkey, you've got to snap out of it.",0.994215,0.053362,0.664333,0.309524
4,4,I have orders to kill her.,I've got orders to put her down.,0.999348,0.009402,0.726639,0.181818


In [2]:
# load the model
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("t5-small")
model = T5ForConditionalGeneration.from_pretrained("t5-small")

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [3]:
# test that it runs
input_ids = tokenizer("translate English to German: The house is wonderful.", return_tensors="pt").input_ids
outputs = model.generate(input_ids)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))



Das Haus ist wunderbar.


In [4]:
# prepare the dataset
from datasets import Dataset
from sklearn.model_selection import train_test_split

NUM_VAL = 50000
NUM_TEST = 50000

df_text = df[['toxic','detoxified']].rename(columns={'toxic':'input','detoxified':'target'})
train, val = train_test_split(df_text, test_size=NUM_VAL / len(df_text), random_state=42)
train, test = train_test_split(train, test_size=NUM_TEST / len(train), random_state=42)

train_dataset = Dataset.from_dict(train.to_dict(orient='list'))
val_dataset = Dataset.from_dict(val.to_dict(orient='list'))
test_dataset = Dataset.from_dict(test.to_dict(orient='list'))

In [5]:
# preprocess the dataset
max_input_length = 128
max_target_length = 128

def preprocess_function(examples):
    inputs = examples['input']
    targets = examples['target']
    model_inputs = tokenizer(inputs, max_length=max_input_length, truncation=True)

    # Setup the tokenizer for targets
    with tokenizer.as_target_tokenizer():
        labels = tokenizer(targets, max_length=max_target_length, truncation=True)

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

In [6]:
train_processed = train_dataset.map(preprocess_function, batched=True)
val_processed = val_dataset.map(preprocess_function, batched=True)
# test_processed = test_dataset.map(preprocess_function, batched=True)

Map:   0%|          | 0/477777 [00:00<?, ? examples/s]



Map:   0%|          | 0/50000 [00:00<?, ? examples/s]

Map:   0%|          | 0/50000 [00:00<?, ? examples/s]

In [7]:
# set up training arguments
from transformers import DataCollatorForSeq2Seq, Seq2SeqTrainingArguments, Seq2SeqTrainer

batch_size = 16
args = Seq2SeqTrainingArguments(
    "T5-Small-finetuned-detoxification",
    evaluation_strategy = "epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    weight_decay=0.01,
    save_total_limit=3,
    num_train_epochs=1,
    predict_with_generate=True,
    fp16=True, # set to True if you have CUDA, False if CUDA is not available
)

In [8]:
# set up the data collator to pad the inputs and labels
data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)

In [9]:
# set up the metrics for the training process.
from datasets import load_metric

metric = load_metric("sacrebleu") # using the metric from the example

def postprocess_text(preds, labels):
    preds = [pred.strip() for pred in preds]
    labels = [[label.strip()] for label in labels]

    return preds, labels

def compute_metrics(eval_preds):
    preds, labels = eval_preds
    if isinstance(preds, tuple):
        preds = preds[0]
    decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)

    # Replace -100 in the labels as we can't decode them.
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

    # Some simple post-processing
    decoded_preds, decoded_labels = postprocess_text(decoded_preds, decoded_labels)

    result = metric.compute(predictions=decoded_preds, references=decoded_labels)
    result = {"bleu": result["score"]}

    prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds]
    result["gen_len"] = np.mean(prediction_lens)
    result = {k: round(v, 4) for k, v in result.items()}
    return result

  metric = load_metric("sacrebleu") # using the metric from the example


In [10]:
# set up the trainer itself
trainer = Seq2SeqTrainer(
    model,
    args,
    train_dataset=train_processed,
    eval_dataset=val_processed,
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

My laptop is not really capable of running the training, so it would be done in colab. 

In [None]:
# do the thing. That would take some time.
trainer.train()

In [None]:
# save the model
trainer.save_model('T5Small-detoxification')

In [29]:
# load the model from file
tokenizer_trained = T5Tokenizer.from_pretrained("T5Small-detoxification")
model_trained = T5ForConditionalGeneration.from_pretrained('T5Small-detoxification')

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [43]:
# test if it is working on a part of test dataset
input = []
result = [] 
target = []
N = 50

for i in range(N):
    input_ids = tokenizer_trained(test_dataset['input'][i], return_tensors="pt").input_ids
    outputs = model_trained.generate(input_ids, max_new_tokens = 100)
    input.append(test_dataset['input'][i])
    result.append(tokenizer.decode(outputs[0], skip_special_tokens=True))
    target.append(test_dataset['target'][i])

pd.DataFrame.from_dict({'input': input, 'result': result, 'target': target}).head(N)

Unnamed: 0,input,result,target
0,"so far, I have only planned to kill the Theban...","I only planned to kill the Thebans, and he org...",I have only planned and organized the killing ...
1,"What the fuck are you gonna do, huh?","what are you going to do, huh?",what are you gonna do with me?
2,Or that thing that chopped you up like little ...,or that thing that smashed you like a little p...,and the thing that cut you in slices like cutl...
3,"Shut up, I'm thinking aloud.",I'm thinking loud.,"shut up, I'm thinking out loud"
4,A reliable source from inside Palmer's campaig...,a reliable source from Palmer's campaign... co...,a source inside Senator Palmer's campaign earl...
5,you have terrible communication skills.,You have terrible communication skills.,You have terrible communication skills. I do?
6,I desire to catch her with the magic she pursu...,I want to catch her with the magic she pursues...,"I want to catch her with the magic she craves,..."
7,Where in the fuck do you expect me to go?,where do you expect me to go?,where do you want me to go?
8,Leeches. ...put that new boxtroll to work.,...put this new boxtroll to work.,the leeches.... to bring the new Shataturan to...
9,How the hell do you set a catapult?,how do you set a catapult?,how the hell does a catapult be prepared?


As you can see, the model has shown some performance in detoxifying provided sentences. It is cleary capable of filtering swear words and doing some paraphrasing. It is very hesitant to make drastic changes to the text, and avoids removing words like 'kill', 'damned' and 'hell'. Some of the shorter and more agressvive sentences however are butchered just like the original dataset. Eg: I'm fucking his wife. -> I'm gonna be his wife.

Regardless, the method is working, and it is the matter of tweaking to make it work. Increasing learning rate or number of epochs or adding a prefix could help improve the model performance. Or perhaps trying another model...