#  Developing a Content Moderation System by Natural Language Inference

The code sets up a Natural Language Inference (NLI) system using the DistilBERT model for content moderation. The code loads the MultiNLI dataset and use it to fine-tunes a DistilBERT model to classify input relationships as entailment, contradiction, or neutral. The fine-tuned model can then evaluate user posts (hypothesis) against platform terms of service (premise) to detect policy violations. By using the fine-tuned model, the system can automatically flag inappropriate or non-compliant content.

## Section 1: Install necessary packages and import libraries

In [None]:
# Install the required libraries.
# 'transformers' for access to pre-trained models like DistilBERT and related utilities.
# 'datasets' for easy loading and handling of datasets like MultiNLI.
# 'torch' for building and running deep learning models using PyTorch.
!pip install transformers datasets torch

# Import specific classes and functions from the installed libraries.
# 'DistilBertTokenizer' for tokenizing text data using the DistilBERT vocabulary.
# 'DistilBertForSequenceClassification' is the DistilBERT model with a classification head on top, suitable for tasks like NLI.
# 'Trainer' is a utility class from the transformers library to simplify model training and evaluation.
# 'TrainingArguments' is used to define the hyperparameters and configuration for the Trainer.
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification, Trainer, TrainingArguments

# Import the 'load_dataset' function from the datasets library to easily load benchmark datasets.
from datasets import load_dataset

# Import the 'torch' library, the main deep learning framework used.
import torch

# Import the 'warnings' module to manage warnings that might be generated during execution.
import warnings

# Import the 'os' module to interact with the operating system, such as setting environment variables.
import os

# Set the environment variable 'WANDB_DISABLED' to "true".
# This prevents the transformers Trainer from automatically initializing and reporting to Weights & Biases (wandb),
# which avoids the need for a wandb API key prompt.
os.environ["WANDB_DISABLED"] = "true"



## Section 2: Device setup and dataset loading

In [None]:
# Detect if GPU is available and set the device (either GPU or CPU) accordingly for faster computation.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load the MultiNLI dataset for Natural Language Inference (NLI).
mnli_dataset = load_dataset("multi_nli")

# Initialize the DistilBERT tokenizer, a smaller and faster version of BERT, which will handle input tokenization.
tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


## Section 3: Tokenization and dataset preprocessing

In [None]:
# Define a tokenization function to process premise and hypothesis pairs. We apply truncation and padding
# to ensure each input has the same length for efficient model processing.
def tokenize_function(example):
    return tokenizer(example['premise'], example['hypothesis'], truncation=True, padding='max_length', max_length=128)

# Suppress warnings related to token truncation for a cleaner output.
warnings.filterwarnings("ignore", message="Be aware, overflowing tokens are not returned*")

# Apply the tokenization function to the dataset in batches using multiple processors for faster processing.
# Adjust 'num_proc' based on the number of available CPU cores (e.g., num_proc=4).
tokenized_mnli = mnli_dataset.map(tokenize_function, batched=True, num_proc=4)

# Remove unnecessary columns (premise and hypothesis) from the dataset and set the format for PyTorch.
tokenized_mnli = tokenized_mnli.remove_columns(['premise', 'hypothesis'])
tokenized_mnli.set_format("torch")


## Section 4: Model setup and training configuration

In [None]:
# Load the pretrained DistilBERT model, which includes a classification head with 3 output labels (entailment, contradiction, neutral).
model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=3)

# Move the model to the appropriate device (GPU or CPU).
model.to(device)

# Define training arguments for fine-tuning the model, including batch sizes, learning rate, number of epochs, and weight decay.
training_args = TrainingArguments(
    output_dir="./results",          # Directory for saving model checkpoints and results.
    eval_strategy="epoch",           # Evaluate the model at the end of each epoch.
    learning_rate=2e-5,              # Learning rate for the optimizer.
    per_device_train_batch_size=16,  # Batch size for training on each device.
    per_device_eval_batch_size=16,   # Batch size for evaluation on each device.
    num_train_epochs=3,              # Number of training epochs.
    weight_decay=0.01,               # Weight decay for regularization.
    report_to=None,                  # Disable reporting.
)

# Initialize the Trainer with the model, training arguments, and datasets for training and evaluation.
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_mnli["train"].shuffle(seed=42).select(range(2000)),  # Using a small subset of the training data for a quick run.
    eval_dataset=tokenized_mnli["validation_matched"].select(range(500)),        # Using a small subset of the validation data for quick evaluation.
)

# Fine-tune the model using the Trainer API.
trainer.train()

# Print evaluation results after fine-tuning (this will be added after evaluating the model).
eval_results = trainer.evaluate()
print(f"Evaluation Results: {eval_results}")

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Epoch,Training Loss,Validation Loss
1,No log,1.075831
2,No log,0.971679
3,No log,0.937023


Evaluation Results: {'eval_loss': 0.9370229244232178, 'eval_runtime': 1.7378, 'eval_samples_per_second': 287.717, 'eval_steps_per_second': 18.414, 'epoch': 3.0}


## Section 5: Inference and testing the model

In [None]:
# Define a function for making predictions using the fine-tuned model.
def predict_nli(premise, hypothesis):
    # Tokenize the premise and hypothesis inputs.
    inputs = tokenizer(premise, hypothesis, return_tensors="pt", truncation=True, padding='max_length', max_length=128)

    # Move the tokenized inputs to the same device as the model (GPU or CPU).
    inputs = {key: val.to(device) for key, val in inputs.items()}

    # Put the model in evaluation mode and perform inference without gradient calculation.
    model.eval()
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits

    # Get the predicted class label (entailment, neutral, contradiction).
    predicted_class_id = torch.argmax(logits, dim=-1).item()
    label_map = {0: "entailment", 1: "neutral", 2: "contradiction"}
    return label_map[predicted_class_id]

# Example inference related to climate change misinformation
premise = "The platform prohibits the spread of scientifically unverified claims, particularly those that can mislead people about critical global issues like climate change."
hypothesis = "Climate change is a hoax, and global warming is just a natural cycle that has nothing to do with human activity."
result = predict_nli(premise, hypothesis)
print(f"Prediction: {result}")

Prediction: contradiction


## Section 6: Experiment with your own examples

Now that you have a trained model, feel free to experiment with your own premise and hypothesis pairs using the `predict_nli` function. You can test different scenarios and see how the model classifies the relationship between the premise and the hypothesis.

From your own examples find one result that you disagree with. Share your findings and examples with your classmates!