# Text Detoxification with T5 Model

This notebook outlines the process of fine-tuning a sequence-to-sequence T5 model on a dataset for the task of text detoxification. The goal is to transform toxic sentences into neutral ones without changing the original meaning.

## Setup and Data Loading
First, we will load the dataset which contains toxic and neutral counterparts.

## Splitting the Data
The training data is further split into training and validation sets to evaluate the model's performance during training.


In [None]:
from datasets import load_dataset

# Define the paths to the training and testing datasets
data_files = {
    "train": "../data/interim/training.csv",
    "test": "../data/interim/testing.csv"
}

# Load the dataset from CSV files
toxic_dataset = load_dataset("csv", data_files=data_files)

# Split the training data into training and validation subsets
train_validation_split = toxic_dataset["train"].train_test_split(train_size=0.20, test_size=0.04, seed=20)
train_validation_split["validation"] = train_validation_split.pop("test")
train_validation_split

Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]

Extracting data files:   0%|          | 0/2 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['Unnamed: 0', 'toxic', 'neutral', 'toxicity score', 'toxicity of neutral score'],
        num_rows: 92444
    })
    validation: Dataset({
        features: ['Unnamed: 0', 'toxic', 'neutral', 'toxicity score', 'toxicity of neutral score'],
        num_rows: 18489
    })
})

In [None]:
# Viewing an example from the training set
train_validation_split["train"][2]

{'Unnamed: 0': 369772,
 'toxic': 'damn it, Miel, stop pulling that thing.',
 'neutral': 'Christ, Miel, stop picking away at that thing.',
 'toxicity score': 0.99936705827713,
 'toxicity of neutral score': 0.000439585361164}

## Tokenization
Here, we load the tokenizer for our sequence-to-sequence model and demonstrate tokenization on a sample sentence.


In [None]:
from transformers import AutoTokenizer

# Load the tokenizer for the style transfer model
tokenizer = AutoTokenizer.from_pretrained("rajistics/informal_formal_style_transfer")

# Example of tokenization
input_sentence = train_validation_split["train"][2]["toxic"]
target = train_validation_split["train"][1]["neutral"]
tokenized_output = tokenizer(input_sentence, text_target=target)
tokenizer.convert_ids_to_tokens(tokenized_output["input_ids"])

Downloading (…)okenizer_config.json:   0%|          | 0.00/1.89k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.39k [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/1.79k [00:00<?, ?B/s]

['▁damn',
 '▁it',
 ',',
 '▁Mi',
 'e',
 'l',
 ',',
 '▁stop',
 '▁pulling',
 '▁that',
 '▁thing',
 '.',
 '</s>']

## Preprocessing
We define a function for preprocessing our dataset which includes tokenization and formatting required for our seq2seq model.


In [None]:
# Define the maximum token length
max_token_length = 128

def preprocessing_function(examples):
    # Tokenize the inputs and labels
    inputs = examples["toxic"]
    targets = examples["neutral"]
    model_inputs = tokenizer(inputs, padding="max_length", truncation=True, max_length=max_token_length)
    model_inputs["labels"] = tokenizer(targets, padding="max_length", truncation=True, max_length=max_token_length)["input_ids"]
    return model_inputs

# Apply the preprocessing function to the datasets
tokenized_datasets = train_validation_split.map(
    preprocessing_function,
    batched=True,
    remove_columns=train_validation_split["train"].column_names
)


Map:   0%|          | 0/92444 [00:00<?, ? examples/s]

Map:   0%|          | 0/18489 [00:00<?, ? examples/s]

## Model Initialization
We load the pre-trained seq2seq model which will be fine-tuned for our task.


In [None]:
from transformers import AutoModelForSeq2SeqLM

# Load the pre-trained seq2seq model
model = AutoModelForSeq2SeqLM.from_pretrained("rajistics/informal_formal_style_transfer")

## Model Freezing
To prevent overfitting and speed up training, we freeze a portion of the model layers, specifically the encoder and decoder blocks.

In [None]:
# Freeze portions of the encoder and decoder
# Assuming the T5 model has a certain number of layers, we freeze 95% of them

total_encoder_layers = len(model.encoder.block)
total_decoder_layers = len(model.decoder.block)

# Calculate 95% of the layers to freeze
num_encoder_layers_to_freeze = int(total_encoder_layers * 0.95)
num_decoder_layers_to_freeze = int(total_decoder_layers * 0.95)

# Freeze 95% of the encoder layers
for layer in model.encoder.block[:num_encoder_layers_to_freeze]:
    for param in layer.parameters():
        param.requires_grad = False

# Freeze 95% of the decoder layers
for layer in model.decoder.block[:num_decoder_layers_to_freeze]:
    for param in layer.parameters():
        param.requires_grad = False

# Optionally freeze embeddings
for param in model.shared.parameters():
    param.requires_grad = False


Downloading pytorch_model.bin:   0%|          | 0.00/892M [00:00<?, ?B/s]

## Data Collator
We instantiate a data collator that will dynamically pad the batched data to the maximum length in each batch.


In [None]:
from transformers import DataCollatorForLanguageModeling

# Initialize the data collator
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

# Example of creating a batch
sample_batch = data_collator([tokenized_datasets["train"][i] for i in range(10, 13)])
sample_batch.keys()

You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


dict_keys(['input_ids', 'attention_mask', 'labels'])

## Style Transfer Accuracy Metric
We define a class and function for calculating the style transfer accuracy using a pre-trained classifier.


In [None]:
from transformers import RobertaTokenizer, RobertaForSequenceClassification
import tqdm
import torch
import sys
sys.path.append("../src/")
from metric.style_trasnfer_accuracy import StyleTransferAccuracy

# Initialize the accuracy measuring class
style_transfer_accuracy = StyleTransferAccuracy()

# Define the function to compute metrics for evaluation
def compute_metrics(eval_preds):
    preds, labels = eval_preds
    if isinstance(preds, tuple):
        preds = preds[0]

    # Decode the predictions
    decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
    
    # Calculate style accuracy
    style_accuracy = style_transfer_accuracy.classify_preds(batch_size=32, preds=decoded_preds)

    # Calculate the average style accuracy
    average_style_accuracy = sum(style_accuracy) / len(style_accuracy)

    print(average_style_accuracy)
    # Return the metric as a dictionary
    return {"average_style_accuracy": average_style_accuracy}


Downloading (…)okenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/794 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/501M [00:00<?, ?B/s]

Some weights of the model checkpoint at SkolkovoInstitute/roberta_toxicity_classifier were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


## Training
Setting up the training arguments, initializing the trainer, and starting the training process.


In [None]:
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments

# Training arguments
training_args = Seq2SeqTrainingArguments(
    "T5-detoxification",
    evaluation_strategy="no",
    save_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=64,
    weight_decay=0.01,
    save_total_limit=3,
    num_train_epochs=3,
    predict_with_generate=True,
    fp16=torch.cuda.is_available(),
    push_to_hub=False,
    logging_steps=100,
)

# Initialize the trainer
trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
)


In [None]:
# Start the training process
trainer.train()

Step,Training Loss
100,0.059
200,0.0475
300,0.0547
400,0.0418
500,0.037
600,0.0404
700,0.0315
800,0.0305
900,0.028
1000,0.0387


TrainOutput(global_step=8667, training_loss=0.02780443766241711, metrics={'train_runtime': 3677.8125, 'train_samples_per_second': 75.407, 'train_steps_per_second': 2.357, 'total_flos': 4.222087742619648e+16, 'train_loss': 0.02780443766241711, 'epoch': 3.0})

## Saving the Model
After training, we save the model to a specified path for future use.


In [None]:
model_path = r"../models/final_solution"
trainer.save_model(model_path)

## Inference
We load the trained model and tokenizer for inference, preparing and testing with a sample prompt.


In [None]:
# Load the trained model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSeq2SeqLM.from_pretrained(model_path)

# Check for GPU availability
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Move model to the appropriate device (GPU or CPU)
model.to(device)

# Prepare a sample prompt
prompt = "this model is shit"

# Tokenize and prepare the input tensors
inputs = tokenizer(prompt, return_tensors='pt', truncation=True, max_length=128)
inputs = {k: v.to(device) for k, v in inputs.items()}

# Generate the output sequence without updating model weights
with torch.no_grad():
    outputs = model.generate(**inputs, max_length=128, num_return_sequences=1)

# Decode the generated sequence to text
detoxified_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Detoxified text: {detoxified_text}")


Detoxified text: this model is shit.
