# Toxic Comment Detection

This notebook documents the end-to-end development of a machine learning model
for detecting toxic online comments. The focus is on minimizing false positives
while maintaining reasonable recall, making the system suitable for real-world
content moderation scenarios.


Importing necessary libraries

In [2]:
import pandas as pd
import numpy as np

In [3]:
pd.set_option('display.max_colwidth', None)


## Dataset Overview

The dataset is sourced from the Jigsaw Toxic Comment Classification challenge.
Each comment is annotated with multiple toxicity-related labels.

For this project, the multi-label problem is converted into a binary
classification task indicating whether a comment is toxic or non-toxic.

In [4]:
url= "./train.csv"
df= pd.read_csv(url)
df

Unnamed: 0,id,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate
0,0000997932d777bf,"Explanation\nWhy the edits made under my username Hardcore Metallica Fan were reverted? They weren't vandalisms, just closure on some GAs after I voted at New York Dolls FAC. And please don't remove the template from the talk page since I'm retired now.89.205.38.27",0,0,0,0,0,0
1,000103f0d9cfb60f,"D'aww! He matches this background colour I'm seemingly stuck with. Thanks. (talk) 21:51, January 11, 2016 (UTC)",0,0,0,0,0,0
2,000113f07ec002fd,"Hey man, I'm really not trying to edit war. It's just that this guy is constantly removing relevant information and talking to me through edits instead of my talk page. He seems to care more about the formatting than the actual info.",0,0,0,0,0,0
3,0001b41b1c6bb37e,"""\nMore\nI can't make any real suggestions on improvement - I wondered if the section statistics should be later on, or a subsection of """"types of accidents"""" -I think the references may need tidying so that they are all in the exact same format ie date format etc. I can do that later on, if no-one else does first - if you have any preferences for formatting style on references or want to do it yourself please let me know.\n\nThere appears to be a backlog on articles for review so I guess there may be a delay until a reviewer turns up. It's listed in the relevant form eg Wikipedia:Good_article_nominations#Transport """,0,0,0,0,0,0
4,0001d958c54c6e35,"You, sir, are my hero. Any chance you remember what page that's on?",0,0,0,0,0,0
...,...,...,...,...,...,...,...,...
159566,ffe987279560d7ff,""":::::And for the second time of asking, when your view completely contradicts the coverage in reliable sources, why should anyone care what you feel? You can't even give a consistent argument - is the opening only supposed to mention significant aspects, or the """"most significant"""" ones? \n\n""",0,0,0,0,0,0
159567,ffea4adeee384e90,You should be ashamed of yourself \n\nThat is a horrible thing you put on my talk page. 128.61.19.93,0,0,0,0,0,0
159568,ffee36eab5c267c9,"Spitzer \n\nUmm, theres no actual article for prostitution ring. - Crunch Captain.",0,0,0,0,0,0
159569,fff125370e4aaaf3,And it looks like it was actually you who put on the speedy to have the first version deleted now that I look at it.,0,0,0,0,0,0


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 159571 entries, 0 to 159570
Data columns (total 8 columns):
 #   Column         Non-Null Count   Dtype 
---  ------         --------------   ----- 
 0   id             159571 non-null  object
 1   comment_text   159571 non-null  object
 2   toxic          159571 non-null  int64 
 3   severe_toxic   159571 non-null  int64 
 4   obscene        159571 non-null  int64 
 5   threat         159571 non-null  int64 
 6   insult         159571 non-null  int64 
 7   identity_hate  159571 non-null  int64 
dtypes: int64(6), object(2)
memory usage: 9.7+ MB


### Initial Observations

- The dataset contains ~160k comments
- No missing values are present
- Comments are stored as text objects
- Toxicity labels are binary (0 or 1)

These observations inform the preprocessing steps that follow.

In [6]:
df['comment_text'] = df['comment_text'].astype(str)

## Label Engineering

The original dataset contains multiple toxicity-related labels.
For simplicity and practical deployment considerations, these labels
are combined into a single binary target variable `is_toxic`.

A comment is considered toxic if **any** of the original toxicity labels is present.

In [7]:
label_cols = [
    'toxic',
    'severe_toxic',
    'obscene',
    'threat',
    'insult',
    'identity_hate'
]

df['is_toxic'] = df[label_cols].any(axis=1).astype(int)

In [8]:
df.drop(columns=label_cols, inplace=True)

In [9]:
df.drop(columns=['id'], inplace=True)

In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 159571 entries, 0 to 159570
Data columns (total 2 columns):
 #   Column        Non-Null Count   Dtype 
---  ------        --------------   ----- 
 0   comment_text  159571 non-null  object
 1   is_toxic      159571 non-null  int64 
dtypes: int64(1), object(1)
memory usage: 2.4+ MB


## Data Cleaning

The following cleaning steps are applied:
- Ensuring right data type
- Removal of empty or whitespace-only comments
- Removal of duplicate comments

These steps reduce noise and prevent the model from learning redundant patterns.


In [11]:
# ensure text is string
df['comment_text'] = df['comment_text'].astype(str)

# strip whitespace
df['comment_text'].str.strip()

# remove empty comments
df.drop(df[df['comment_text'] == ''].index, inplace=True)

# remove duplicate comments
df.drop_duplicates(subset='comment_text', inplace=True)

# reset index
df.reset_index(drop=True, inplace=True)

## Train–Validation Split

The dataset is split into training and validation sets to evaluate
model generalization. Stratification is used to preserve class balance.

In [12]:
from sklearn.model_selection import train_test_split

X = df['comment_text']
y = df['is_toxic']

X_train, X_val, y_train, y_val = train_test_split(
    X,
    y,
    test_size=0.2,
    random_state=42,
    stratify=y
)

## Baseline Model

A baseline text classification model is built using:
- Word-level TF-IDF features
- Logistic Regression classifier

This model serves as a reference point for all subsequent improvements.

In [13]:
from sklearn.feature_extraction.text import TfidfVectorizer

tfidf = TfidfVectorizer(
    max_features=20000,
    ngram_range=(1, 2),
    stop_words='english'
)

X_train_tfidf = tfidf.fit_transform(X_train)
X_val_tfidf = tfidf.transform(X_val)


In [14]:
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(
    max_iter=1000,
    class_weight='balanced'
)

model.fit(X_train_tfidf, y_train)


0,1,2
,penalty,'l2'
,dual,False
,tol,0.0001
,C,1.0
,fit_intercept,True
,intercept_scaling,1
,class_weight,'balanced'
,random_state,
,solver,'lbfgs'
,max_iter,1000


In [15]:
from sklearn.metrics import classification_report, f1_score

y_pred = model.predict(X_val_tfidf)

print("F1 Score:", f1_score(y_val, y_pred))
print(classification_report(y_val, y_pred))


F1 Score: 0.7333862854117024
              precision    recall  f1-score   support

           0       0.98      0.95      0.96     28670
           1       0.64      0.85      0.73      3245

    accuracy                           0.94     31915
   macro avg       0.81      0.90      0.85     31915
weighted avg       0.95      0.94      0.94     31915



## Error Analysis

To understand model behavior, misclassified examples were analyzed,
including both false positives and false negatives.

This analysis guided subsequent feature engineering decisions.

In [16]:
errors = pd.DataFrame({
    'comment': X_val,
    'true_label': y_val,
    'pred_label': y_pred
})


In [17]:
errors = errors[errors['true_label'] != errors['pred_label']]


In [18]:
false_positives = errors[
    (errors['true_label'] == 0) &
    (errors['pred_label'] == 1)
]


In [19]:
false_negatives = errors[
    (errors['true_label'] == 1) &
    (errors['pred_label'] == 0)
]

In [20]:
false_positives.sample(10, random_state=42)['comment']


105367    At least answer this question for me little guy: Why do you feel the need to bolster your low self-esteem being a cyber bully? Were you shunned at school dances, maybe stuffed into lockers? You're not a better person than me despite what you've convinced yourself. This is the internet where *gasp* I have the First Amendment on my side. Silence me if that makes you feel better about yourself but I truly pray for you.
145018                                                                   "\n\nYou are another Genius I see. ;-)\n\nFred Alan Wolf\nWiki has a page on my long-time partner Fred Alan Wolf who has written many books on the same kind of paranormal consciousness stuff that I am skewered for. Yet nowhere on Fred's page is he called a ""kook"", a ""crackpot"" etc. Try the stupid Google test on him.\n 12:26 AM, 14 October 2005 (PCT)"
150832                                                                                                                                      

In [21]:
false_negatives.sample(5, random_state=42)['comment']


157938                                                                                                                                                                                                                                                                                                                         glad i dont read your bible \n\nyou have a member of the kkk on your team. way to go, asswipe!
144576    If she is a linguist then what theoretical framework does she work in? Generative Grammar? She probably doesn't even know what Generative Grammar is, she has probably never read anything by Chomsky (to do with linguistics) and she grossly misrepresents our field. All of these things are worthy of being in the article! Hey wiki nazis, don't you bastards take this down again because I'll keep doing it.
72244                                                                                                                                                                       

## Attempted Improvement: Threshold Tuning

An attempt was made to improve recall by lowering the decision threshold.
While this increased recall, it caused precision to collapse, making the
model unsuitable for moderation use cases where false positives are costly.

This approach was therefore rejected.

In [22]:
y_scores = model.predict_proba(X_val_tfidf)[:, 1]

In [23]:
from sklearn.metrics import precision_recall_curve

precision, recall, thresholds = precision_recall_curve(y_val, y_scores)


In [24]:
target_recall = 0.80
idx = np.where(recall >= target_recall)[0][0]
chosen_threshold = thresholds[idx]


In [25]:
chosen_threshold

np.float64(0.00021918324199347462)

In [26]:
y_pred_custom = (y_scores >= chosen_threshold).astype(int)


In [27]:
print("Custom threshold:", chosen_threshold)
print(classification_report(y_val, y_pred_custom))
#This approach gave the worst result

Custom threshold: 0.00021918324199347462
              precision    recall  f1-score   support

           0       0.00      0.00      0.00     28670
           1       0.10      1.00      0.18      3245

    accuracy                           0.10     31915
   macro avg       0.05      0.50      0.09     31915
weighted avg       0.01      0.10      0.02     31915



  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


## Feature Engineering: Word + Character TF-IDF

To reduce keyword bias and improve robustness to spelling variations,
word-level and character-level TF-IDF features are combined.

This approach balances precision and recall more effectively than
either representation alone.

In [28]:
tfidf_char = TfidfVectorizer(
    analyzer="char",
    ngram_range=(3, 5),
    min_df=5
)

X_train_char = tfidf_char.fit_transform(X_train)
X_val_char = tfidf_char.transform(X_val)


In [29]:
model_char = LogisticRegression(max_iter=1000)
model_char.fit(X_train_char, y_train)


0,1,2
,penalty,'l2'
,dual,False
,tol,0.0001
,C,1.0
,fit_intercept,True
,intercept_scaling,1
,class_weight,
,random_state,
,solver,'lbfgs'
,max_iter,1000


In [30]:
y_pred_char = model_char.predict(X_val_char)
print(classification_report(y_val, y_pred_char))

              precision    recall  f1-score   support

           0       0.96      0.99      0.98     28670
           1       0.92      0.60      0.73      3245

    accuracy                           0.95     31915
   macro avg       0.94      0.80      0.85     31915
weighted avg       0.95      0.95      0.95     31915



In [31]:
from sklearn.pipeline import FeatureUnion

word_tfidf = TfidfVectorizer(
    ngram_range=(1, 2),
    max_features=50000
)

char_tfidf = TfidfVectorizer(
    analyzer="char",
    ngram_range=(3, 5),
    min_df=5
)


In [32]:
combined_tfidf = FeatureUnion([
    ("word", word_tfidf),
    ("char", char_tfidf)
])

In [33]:
X_train_combined = combined_tfidf.fit_transform(X_train)
X_val_combined = combined_tfidf.transform(X_val)


In [34]:
model_combined = LogisticRegression(max_iter=1000)
model_combined.fit(X_train_combined, y_train)


0,1,2
,penalty,'l2'
,dual,False
,tol,0.0001
,C,1.0
,fit_intercept,True
,intercept_scaling,1
,class_weight,
,random_state,
,solver,'lbfgs'
,max_iter,1000


In [35]:
y_pred_combined = model_combined.predict(X_val_combined)
print(classification_report(y_val, y_pred_combined))


              precision    recall  f1-score   support

           0       0.96      0.99      0.98     28670
           1       0.91      0.66      0.77      3245

    accuracy                           0.96     31915
   macro avg       0.94      0.83      0.87     31915
weighted avg       0.96      0.96      0.96     31915



## Conclusion

The final model achieved a strong balance between precision and recall,
with a particular emphasis on reducing false positives.

Key takeaways:
- Error analysis was critical for meaningful improvements
- Feature engineering had a larger impact than aggressive tuning