## Detecting Toxic Content using BERT

This project demonstrates the application of a BERT-based model for detecting toxic content in text data. It covers the complete process from data loading and preprocessing to training and deploying the model. The model is capable of identifying various types of toxicity in comments, such as **toxic**, **severe toxic**, **obscene**, **threat**, **insult**, and **identity hate**.

### Project Overview

1. **Dataset Analysis and Preparation**
    - The dataset is loaded and balanced to ensure a fair distribution of toxic and non-toxic comments.
    - The dataset used in this project comes from the [Jigsaw Toxic Comment Classification Challenge](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/data) on Kaggle.
    - The `comment_text` field is used as input, while multiple toxicity labels are treated as outputs.

2. **Tokenization and Dataset Preparation**
    - The `BertTokenizer` is used to tokenize the text data.
    - Custom PyTorch datasets are created for training and testing.

3. **Model Training**
    - A BERT model (`bert-base-uncased`) is fine-tuned using Hugging Face's `Trainer` API.
    - The model is trained to predict multiple toxicity labels simultaneously.

4. **Model Evaluation and Deployment**
    - The model's performance is evaluated during training using validation data.
    - The best model is uploaded to the Hugging Face Hub for easy access and deployment.

5. **Inference**
    - A simple function is provided to predict toxicity levels for new comments.
    - Example predictions demonstrate the model's ability to handle various text inputs.

### Key Features

- **Preprocessing:** Tokenizes and prepares input text for BERT.
- **Multi-label Classification:** Simultaneously predicts multiple toxicity categories.
- **Ease of Use:** Deployed on Hugging Face Hub for easy access.
- **Reproducibility:** Includes all steps and code for reproducing the training process.


In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
import torch

### Download the dataset

In [2]:
data = pd.read_csv('/content/train.csv')
data.head()

Unnamed: 0,id,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate
0,0000997932d777bf,Explanation\nWhy the edits made under my usern...,0,0,0,0,0,0
1,000103f0d9cfb60f,D'aww! He matches this background colour I'm s...,0,0,0,0,0,0
2,000113f07ec002fd,"Hey man, I'm really not trying to edit war. It...",0,0,0,0,0,0
3,0001b41b1c6bb37e,"""\nMore\nI can't make any real suggestions on ...",0,0,0,0,0,0
4,0001d958c54c6e35,"You, sir, are my hero. Any chance you remember...",0,0,0,0,0,0


### Analyze the dataset

In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 159571 entries, 0 to 159570
Data columns (total 8 columns):
 #   Column         Non-Null Count   Dtype 
---  ------         --------------   ----- 
 0   id             159571 non-null  object
 1   comment_text   159571 non-null  object
 2   toxic          159571 non-null  int64 
 3   severe_toxic   159571 non-null  int64 
 4   obscene        159571 non-null  int64 
 5   threat         159571 non-null  int64 
 6   insult         159571 non-null  int64 
 7   identity_hate  159571 non-null  int64 
dtypes: int64(6), object(2)
memory usage: 9.7+ MB


In [4]:
num_columns = data.select_dtypes(include=['number']).columns
num_columns

Index(['toxic', 'severe_toxic', 'obscene', 'threat', 'insult',
       'identity_hate'],
      dtype='object')

In [5]:
# Check if we have more than one type of toxicity in one comment
data.loc[data[num_columns].sum(axis=1) > 1].head()

Unnamed: 0,id,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate
6,0002bcb3da6cb337,COCKSUCKER BEFORE YOU PISS AROUND ON MY WORK,1,1,1,0,1,0
42,001810bf8c45bf5f,You are gay or antisemmitian? \n\nArchangel WH...,1,0,1,0,1,1
43,00190820581d90ce,"FUCK YOUR FILTHY MOTHER IN THE ASS, DRY!",1,0,1,0,1,0
51,001dc38a83d420cf,GET FUCKED UP. GET FUCKEEED UP. GOT A DRINK T...,1,0,1,0,0,0
55,0020e7119b96eeeb,Stupid peace of shit stop deleting my stuff as...,1,1,1,0,1,0


In [6]:
# Check the balance in labels
data.loc[data[num_columns].sum(axis=1) > 0].shape

(16225, 8)

In [7]:
data.loc[data[num_columns].sum(axis=1) == 0].shape

(143346, 8)

In [8]:
balanced_data = pd.concat([data.loc[data[num_columns].sum(axis=1) > 0], data.loc[data[num_columns].sum(axis=1) == 0].sample(n=20000)])
balanced_data.head()

Unnamed: 0,id,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate
6,0002bcb3da6cb337,COCKSUCKER BEFORE YOU PISS AROUND ON MY WORK,1,1,1,0,1,0
12,0005c987bdfc9d4b,Hey... what is it..\n@ | talk .\nWhat is it......,1,0,0,0,0,0
16,0007e25b2121310b,"Bye! \n\nDon't look, come or think of comming ...",1,0,0,0,0,0
42,001810bf8c45bf5f,You are gay or antisemmitian? \n\nArchangel WH...,1,0,1,0,1,1
43,00190820581d90ce,"FUCK YOUR FILTHY MOTHER IN THE ASS, DRY!",1,0,1,0,1,0


### Prepate the dataset for training

In [9]:
X_train, X_test, y_train, y_test = train_test_split(balanced_data['comment_text'], balanced_data[num_columns], test_size=0.2, random_state=42)
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((28980,), (7245,), (28980, 6), (7245, 6))

In [10]:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
train_encodings = tokenizer(list(X_train), truncation=True, padding=True)
test_encodings = tokenizer(list(X_test), truncation=True, padding=True)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

In [11]:
class ToxicCommentsDataset(torch.utils.data.Dataset):
  def __init__(self, encodings, labels):
    self.encodings = encodings
    self.labels = labels

  def __len__(self):
    return len(self.labels)

  def __getitem__(self, idx):
    item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
    item['labels'] = torch.tensor(self.labels.iloc[idx].values.astype(float))
    return item

In [12]:
train_dataset = ToxicCommentsDataset(train_encodings, y_train)
test_dataset = ToxicCommentsDataset(test_encodings, y_test)

In [13]:
train_dataset[0]

{'input_ids': tensor([  101,  3160,  1045,  2031,  2464,  8810,  1997,  1996,  6594,  3931,
          2008,  1045,  2031,  2517,  2131, 17159,  1012,  2025,  9749,  1010,
          2021, 17159,  1012,  2031,  2017,  5561,  8208,  2005,  2216,  2111,
          1029,  1045,  2572,  2035,  2005,  4363,  1996,  2624,  6593,  3012,
          1997,  2115,  3931,  1010,  2021,  2025,  2012,  1996,  3465,  1997,
         28616,  2378, 14192,  3370,  1012,  2019, 26445, 21170,  5576,  2052,
          2022,  9544, 16670,  2005,  2119,  1997,  2149,  2004,  1045,  2572,
          2469,  1045,  2064,  2224,  2151,  2193,  1997, 12997, 11596,  1012,
          1045,  3198,  2000,  3499,  2033,  2000,  6869,  2000,  2339,  1045,
          2903,  1045,  2572,  2025, 20084,  1996,  3513,  1010,  2059,  2017,
          8756,  2065,  2017,  2031,  1037, 10465,  1012,  2017,  2187,  7696,
          2005,  2033,  2525,  1010,  2681,  1037,  4471,  2007,  1037, 10465,
          1997,  1996, 16884,  1998,  1

### Create and train the model

In [14]:
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=len(num_columns))

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [15]:
training_arguments = TrainingArguments(
    output_dir='./results',
    evaluation_strategy='epoch',
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=2,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=10,
    save_strategy='epoch',
    load_best_model_at_end=True,
    metric_for_best_model='eval_loss'
)



In [16]:
trainer = Trainer(
    model=model,
    args=training_arguments,
    train_dataset=train_dataset,
    eval_dataset=test_dataset
)

In [17]:
trainer.train()

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Epoch,Training Loss,Validation Loss
1,0.1251,0.130725
2,0.1171,0.128327


TrainOutput(global_step=3624, training_loss=0.12818459224916015, metrics={'train_runtime': 5688.0215, 'train_samples_per_second': 10.19, 'train_steps_per_second': 0.637, 'total_flos': 1.525046446006272e+16, 'train_loss': 0.12818459224916015, 'epoch': 2.0})

## Upload and Test the model

In [None]:
from huggingface_hub import login, HfApi
import os

login(token=os.environ['HF_TOKEN'])
repo_name = "bert-toxic-comment"
trainer.push_to_hub(repo_name)
tokenizer.push_to_hub(repo_name)

In [5]:
model = BertForSequenceClassification.from_pretrained('InnaK342/bert-toxic-comment')
tokenizer = BertTokenizer.from_pretrained('InnaK342/bert-toxic-comment')

In [6]:
def predict_toxicity(texts, model, tokenizer):
    model.eval()
    inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=512)
    with torch.no_grad():
      outputs = model(**inputs)
    logits = outputs.logits
    probabilities = torch.sigmoid(logits).numpy()
    return probabilities

In [7]:
texts = [
    "You are an amazing person!",
    "I hate you and everything you stand for.",
    "Shut up, idiot!"
]
predictions = predict_toxicity(texts, model, tokenizer)
labels = ["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"]
df_results = pd.DataFrame(predictions, columns=labels, index=texts)
df_results

Unnamed: 0,toxic,severe_toxic,obscene,threat,insult,identity_hate
You are an amazing person!,0.00466,0.000746,0.001335,0.000741,0.002481,0.000838
I hate you and everything you stand for.,0.965783,0.003944,0.011073,0.020813,0.106406,0.030609
"Shut up, idiot!",0.991827,0.055659,0.866921,0.002202,0.957499,0.012786
