# Toxic Comment Classifier (Multi-Label NLP Project)

This project focuses on building a multi-label text classification model that can automatically detect different forms of toxicity in online comments. The model predicts one or more of the following labels for each comment:

- `toxic`
- `severe_toxic`
- `obscene`
- `threat`
- `insult`
- `identity_hate`

I used the **Jigsaw Toxic Comment Classification Challenge dataset** from Kaggle, which contains real user comments from Wikipedia's talk pages along with labeled toxicity categories.

---

##  Project Workflow

1. **Load Dataset**  
   Import the Kaggle dataset and explore the structure of comments and toxicity labels.

2. **Preprocess Text and Labels**  
   Clean the comment text and convert target labels into binary format suitable for multi-label classification.

3. **Vectorize Text**  
   Use TF-IDF vectorization to convert cleaned text into numerical features.

4. **Train Classifier**  
   Apply One-vs-Rest strategy with Logistic Regression for multi-label classification.

5. **Evaluate Model**  
   Use metrics like Precision, Recall, and F1-Score to assess performance for each label.

6. **Deploy or Package**  
    Save model artifacts for deployment or add to portfolio.

##  Dataset Description

The dataset consists of over 150,000 Wikipedia comments, each labeled with six possible toxicity types. A comment can belong to none, one, or multiple of these categories.

- **comment_text**: The actual text of the comment
- **Labels**:
  - `toxic`
  - `severe_toxic`
  - `obscene`
  - `threat`
  - `insult`
  - `identity_hate`

This is a classic example of a **multi-label classification task**.

In [1]:
# Import required libraries
import pandas as pd

# Load the dataset
df = pd.read_csv('/content/train.csv')

# Display the shape and first few rows
print("Dataset shape:", df.shape)
df.head()

Dataset shape: (159571, 8)


Unnamed: 0,id,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate
0,0000997932d777bf,Explanation\nWhy the edits made under my usern...,0,0,0,0,0,0
1,000103f0d9cfb60f,D'aww! He matches this background colour I'm s...,0,0,0,0,0,0
2,000113f07ec002fd,"Hey man, I'm really not trying to edit war. It...",0,0,0,0,0,0
3,0001b41b1c6bb37e,"""\nMore\nI can't make any real suggestions on ...",0,0,0,0,0,0
4,0001d958c54c6e35,"You, sir, are my hero. Any chance you remember...",0,0,0,0,0,0


##  Preview of the Dataset

The dataset contains the following columns:

- `id`: Unique identifier for the comment
- `comment_text`: The actual comment made by the user
- `toxic`, `severe_toxic`, `obscene`, `threat`, `insult`, `identity_hate`: Binary labels indicating types of toxicity

We will primarily focus on the `comment_text` and these six label columns for training our multi-label classifier.

In [2]:
# Check how many comments are labeled as each category
label_columns = ['toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate']
label_counts = df[label_columns].sum().sort_values(ascending=False)

# Display the counts
print("Label distribution:")
print(label_counts)

Label distribution:
toxic            15294
obscene           8449
insult            7877
severe_toxic      1595
identity_hate     1405
threat             478
dtype: int64


##  Label Distribution

This table shows the total number of comments labeled under each toxicity type.  
It gives us an idea of the class imbalance — for example, "toxic" and "obscene" occur more frequently, while "threat" and "identity_hate" are rare.

We'll keep this in mind during evaluation to ensure our model isn't biased toward the more common labels.

In [3]:
import re

# Function to clean text: lowercase, remove newlines, links, punctuation
def clean_text(text):
    text = text.lower()
    text = re.sub(r'\n', ' ', text)
    text = re.sub(r'http\S+|www.\S+', '', text)
    text = re.sub(r'[^a-zA-Z\s]', '', text)
    text = re.sub(r'\s+', ' ', text).strip()
    return text

# Apply to comment_text column
df['cleaned_comment'] = df['comment_text'].apply(clean_text)

# Show sample
df[['comment_text', 'cleaned_comment']].head()

Unnamed: 0,comment_text,cleaned_comment
0,Explanation\nWhy the edits made under my usern...,explanation why the edits made under my userna...
1,D'aww! He matches this background colour I'm s...,daww he matches this background colour im seem...
2,"Hey man, I'm really not trying to edit war. It...",hey man im really not trying to edit war its j...
3,"""\nMore\nI can't make any real suggestions on ...",more i cant make any real suggestions on impro...
4,"You, sir, are my hero. Any chance you remember...",you sir are my hero any chance you remember wh...


##  Text Cleaning

I have applied basic text preprocessing to make the comment texts easier for the model to understand:
- Convert to lowercase
- Remove line breaks and URLs
- Remove punctuation and extra whitespace

The cleaned version of each comment will be used for feature extraction in the next step.

##  Text Cleaning

We apply basic text preprocessing to make the comment texts easier for the model to understand:
- Convert to lowercase
- Remove line breaks and URLs
- Remove punctuation and extra whitespace

The cleaned version of each comment will be used for feature extraction in the next step.

In [4]:
from sklearn.feature_extraction.text import TfidfVectorizer

# Limit features for speed; you can increase later
vectorizer = TfidfVectorizer(max_features=10000)

# Fit-transform the cleaned comment text
X = vectorizer.fit_transform(df['cleaned_comment'])

# Check shape of the feature matrix
print("TF-IDF matrix shape:", X.shape)

TF-IDF matrix shape: (159571, 10000)


In [5]:
# Extract the target labels as a NumPy array
y = df[label_columns].values

# Confirm shape of label matrix
print("Label matrix shape:", y.shape)

Label matrix shape: (159571, 6)


##  Feature and Label Preparation

- We used **TF-IDF** to convert the cleaned comments into a numerical matrix of token importance values.
- We also extracted the binary label matrix representing each of the six toxicity types.

These will serve as the input (`X`) and output (`y`) for our multi-label classification model.

In [6]:
from sklearn.model_selection import train_test_split

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print("Training samples:", X_train.shape[0])
print("Test samples:", X_test.shape[0])

Training samples: 127656
Test samples: 31915


In [7]:
from sklearn.linear_model import LogisticRegression
from sklearn.multiclass import OneVsRestClassifier

# Build the model using One-vs-Rest with Logistic Regression
model = OneVsRestClassifier(LogisticRegression(solver='liblinear'))

# Train the model
model.fit(X_train, y_train)

##  Model Training

We use a **One-vs-Rest strategy** with Logistic Regression, which trains a separate binary classifier for each toxicity label. This is a common and efficient baseline for multi-label text classification tasks.

The model is trained on 80% of the data, and we will evaluate it on the remaining 20%.

In [8]:
from sklearn.metrics import classification_report

# Predict on the test set
y_pred = model.predict(X_test)

# Display evaluation report
report = classification_report(y_test, y_pred, target_names=label_columns)
print(report)

               precision    recall  f1-score   support

        toxic       0.90      0.62      0.74      3056
 severe_toxic       0.61      0.25      0.35       321
      obscene       0.92      0.62      0.74      1715
       threat       0.80      0.16      0.27        74
       insult       0.84      0.52      0.64      1614
identity_hate       0.70      0.13      0.22       294

    micro avg       0.88      0.55      0.68      7074
    macro avg       0.80      0.38      0.49      7074
 weighted avg       0.87      0.55      0.67      7074
  samples avg       0.06      0.05      0.05      7074



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


##  Model Evaluation

We evaluate the model using the following metrics for each toxicity label:
- **Precision**: How many predicted positives were actually correct
- **Recall**: How many actual positives were correctly predicted
- **F1-Score**: Harmonic mean of precision and recall

This helps assess how well the model performs for both common and rare labels in this multi-label classification task.

## Evaluation Summary

- The model performs well on common labels like `toxic`, `obscene`, and `insult`, with F1-scores above 0.6.
- Performance on rare labels (`threat`, `severe_toxic`, `identity_hate`) is weaker due to class imbalance and fewer training examples.
- This is expected in multi-label settings and can be improved with more data or sampling strategies.

Micro-averaged scores reflect good overall performance across all instances, while macro-averaged scores show sensitivity to rare class prediction.

In [9]:
import joblib

# Save the trained model and vectorizer
joblib.dump(model, 'toxic_comment_model.pkl')
joblib.dump(vectorizer, 'tfidf_vectorizer.pkl')

['tfidf_vectorizer.pkl']