---
Life-Threatening Comment Detection and Filtering

*author*:
- Name: Saif-Ur-Rehman
- Date: 2024-03-21
- Tags: NLP, Sentiment Analysis, Text Classification, Python, Regular Expression
---

## Description

This project aims to develop a Python script for detecting and filtering life-threatening comments from a dataset of social media comments. The script utilizes Natural Language Processing (NLP) techniques, including sentiment analysis, to identify comments with negative sentiment and containing language indicative of life-threatening behavior.



## Project Overview

Social media platforms often face challenges in managing harmful content, including comments that contain threats of violence or harm to individuals. Automatic detection and filtering of such comments are essential for maintaining a safe online environment.

In this project, we employ the following steps:

1. **Data Preprocessing**: The comments are preprocessed to remove non-alphanumeric characters and converted to lowercase for uniformity.

2. **Sentiment Analysis**: We use NLTK's SentimentIntensityAnalyzer to analyze the sentiment of each comment. Comments with negative sentiment are considered for further analysis.

3. **Keyword Matching**: We identify comments containing keywords associated with life-threatening behavior, such as "kill", "murder", "bomb", etc.

4. **Filtering**: Comments meeting both the negative sentiment and keyword matching criteria are filtered out as potential life-threatening comments.

5. **Output**: The filtered comments are saved to a new CSV file for further analysis or moderation.

## Tools Used

- Python: Programming language used for script development.
- NLTK: Natural Language Toolkit library for sentiment analysis.
- Pandas: Library for data manipulation and handling CSV files.
- Google Colab: Online platform for running Python scripts collaboratively.

## Dataset

The dataset used in this project consists of social media comments scrapped from various platforms using Google API client. The comments include opinions, threats, and discussions that may contain life-threatening language.

## Instructions

1. Upload the 'ThreatsComments.csv' file containing social media comments to your Google Colab environment.
2. Execute the provided Python script to detect and filter life-threatening comments.
3. View the output CSV file 'FilteredLifeThreatCommentsUsingSentimentAnalysis.csv' containing the filtered comments.

Feel free to customize and extend the script as needed for your specific use case or dataset.

## **Basic Filtering Life Threat Comments Using Simple Regular Expression Pattern**

In [10]:
import pandas as pd
import re

# Read the CSV file using pandas
input_file = 'ThreatsComments.csv'
output_file = 'FilteredLifeThreatComments.csv'

# Read the CSV file using pandas
comments_df = pd.read_csv(input_file)

# Regular expression pattern to match life-threatening language
pattern = r'\b(kill|murder|choke|strangle|bomb|shoot|stab|assault|attack|harm|destroy|exterminate|terminate|eliminate|slaughter|execute|smother|suffocate|blow up|explode|gun down|bludgeon|snuff out|strife|annihilate|obliterate|eradicate|crush)\b'

# Filter comments containing life-threatening language
filtered_comments_df = comments_df[comments_df['Comments'].str.contains(pattern, case=False, na=False)]

# Write filtered life-threatening comments to a new CSV file
filtered_comments_df.to_csv(output_file, index=False)

print("Filtered life-threatening comments saved to", output_file)


Filtered life-threatening comments saved to FilteredLifeThreatComments.csv


  filtered_comments_df = comments_df[comments_df['Comments'].str.contains(pattern, case=False, na=False)]


# **Life-Threatening Comment Filtering using Sentiment Analysis**

In [15]:
import pandas as pd
import re
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer


nltk.download('vader_lexicon')

# Loading the sentiment analyzer
sia = SentimentIntensityAnalyzer()

# Function to preprocess text
def preprocess_text(comment):
    if not isinstance(comment, str):
        comment = str(comment)
    # Remove non-alphanumeric characters and convert to lowercase
    text = re.sub(r'[^a-zA-Z0-9\s]', '', comment.lower())
    return text

# Function to filter out life-threatening comments
def filter_life_threat_comments(comment):
    # List of keywords indicating life-threatening language
    life_threat_keywords = r'\b(kill|murder|choke|strangle|bomb|shoot|stab|assault|attack|harm|destroy|exterminate|terminate|eliminate|slaughter|execute|smother|suffocate|blow up|explode|gun down|bludgeon|snuff out|strife|annihilate|obliterate|eradicate|crush)\b'
    for keyword in life_threat_keywords:
        if keyword in comment:
            return True
    return False

# Read the CSV file and filter out life-threatening comments
input_file = 'ThreatsComments.csv'
output_file = 'FilteredLifeThreatCommentsUsingSentimentAnalysis.csv'

# Read the CSV file using pandas
comments_df = pd.read_csv(input_file)

filtered_comments = []

for index, row in comments_df.iterrows():
    comment = row['Comments']  # Corrected column name
    # Preprocess the comment
    comment = preprocess_text(comment)
    # Perform sentiment analysis
    sentiment_score = sia.polarity_scores(comment)['compound']
    # Check if the sentiment is negative and the comment contains life-threatening language
    if sentiment_score < 0 and filter_life_threat_comments(comment):
        filtered_comments.append({'Id': row['ID'], 'Comments': comment})

# Convert the filtered comments list to a DataFrame
filtered_comments_df = pd.DataFrame(filtered_comments)

# Write filtered life-threatening comments to a new CSV file
filtered_comments_df.to_csv(output_file, index=False)

print("Filtered life-threatening comments saved to", output_file)

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


Filtered life-threatening comments saved to FilteredLifeThreatComments1.csv
