# Train a DistilBERT base model (uncase) for toxicity classification

In this notebook, we will train a [DistilBERT base model (uncased) ](https://huggingface.co/distilbert-base-uncased) model to classify text with toxic and non-toxic content.

In [None]:
!pip install transformers datasets evaluate
!pip install accelerate==0.20.3

### Perform inference with the pre-trained model

The model available at [tensor-trek/distilbert-toxicity-classifier](https://huggingface.co/tensor-trek/distilbert-toxicity-classifier) was trained exactly described in the rest of this notebook. You can simply use this model directly from HuggingFace Hub using th code below and test it with sample texts.

In [30]:
from transformers import pipeline

text = ["This was a masterpiece. Not completely faithful to the books, but enthralling from beginning to end. Might be my favorite of the three.", "I wish i could kill that bird, I hate it"]

classifier = pipeline("text-classification", model="tensor-trek/distilbert-toxicity-classifier")
classifier(text)


[{'label': 'NEUTRAL', 'score': 0.9995143413543701},
 {'label': 'TOXIC', 'score': 0.9622979164123535}]

If you want to train a `distilbert-base-uncased` model yourself, then follow along the steps below.

### Data preparation

Download the [Jigsaw dataset](https://www.kaggle.com/competitions/jigsaw-multilingual-toxic-comment-classification/data) from Kaggle, and place it in a folder named `data`. The `data` folder must contain the `train.csv`, `test.csv`, and `test_labels.csv` files. Once the data is downloaded and ready we will do some slicing and dicing and use `Dataset` to load the data and prepare it for training our model.

In [None]:
import pandas as pd
df_train = pd.read_csv('./data/train.csv')
df_train.drop(['id'], inplace=True, axis=1)
df_train['label'] = df_train[['toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate']].max(axis=1)

new_df_train = df_train[['comment_text', 'label']].rename(columns={'comment_text': 'text'})
label_counts = new_df_train['label'].value_counts()
print(label_counts)

This new data frame condenses all the different types of toxic categories into one where `0` signifies `neutral` id no toxic content exists, and `1` signifies if the text contains toxic content. With this we can easily train a binary text classification model.

In [None]:
new_df_train

Prepare test data

In [None]:
import pandas as pd
df_test = pd.read_csv('./data/test.csv')
df_test.drop(['id'], inplace=True, axis=1)

new_df_test = df_test.rename(columns={'comment_text': 'text'})
new_df_test

#### Create a DataSet

In [6]:
from datasets import Dataset, DatasetDict

train_dataset = Dataset.from_pandas(new_df_train)
test_dataset = Dataset.from_pandas(new_df_test)

dataset = DatasetDict({
    'train': train_dataset,
    'test': test_dataset
})


In [None]:
dataset

#### Perform tokenization

Load a DistilBERT tokenizer to preprocess the text field

In [None]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

def preprocess_function(examples):
    return tokenizer(examples["text"], truncation=True)

tokenized_data = dataset.map(preprocess_function, batched=True)

Create a batch of examples using DataCollatorWithPadding. It’s more efficient to dynamically pad the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximum length.

In [9]:
from transformers import DataCollatorWithPadding

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

### Define evaluation function

In [10]:
import evaluate
import numpy as np

accuracy = evaluate.load("accuracy")
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)

We will create a map of the Labels to their respective IDs and vice versa. Remember `0` is neutral and `1` is toxic.

In [11]:
id2label = {1: "TOXIC", 0: "NEUTRAL"}
label2id = {"TOXIC":1, "NEUTRAL": 0}

### Train the model

In this step we train the model and save it locally. Remember to change the `output_dir` argument to give it a proper name for the directory where the model artifacts are going to be saved. This model is best trained with GPU instances (faster), but CPU instances should work as well.

In [None]:
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer

model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased", num_labels=2, id2label=id2label, label2id=label2id
)

training_args = TrainingArguments(
    output_dir="my-toxicity-classifier",
    label_names=["label"],
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=1,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    push_to_hub=False,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_data["train"],
    eval_dataset=tokenized_data["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

trainer.train()
trainer.save_model()

### Perform inference with the trained model

In [15]:
from transformers import pipeline

toxicity = pipeline('text-classification', model='./my-toxicity-classifier/<replace_your_checkpoint_dir>")

In [18]:
toxicity("Hey There, how are you?")

[{'label': 'NEUTRAL', 'score': 0.9966611862182617}]