### Hate Speech Detection Using Detoxify: Project Overview
This notebook demonstrates a project pipeline for detecting hate speech in YouTube comments using the Detoxify model, a pretrained NLP model developed for toxicity detection. The project's focus is to automate hate speech labeling for text data, leveraging the Detoxify model to classify comments as either "Hate" or "Not Hate."

**Notebook Outline**
**Objective**: The goal of this project is to label YouTube comments related to the Lok Sabha elections to identify and analyze hate speech. By automating the labeling process with a pretrained model, we can efficiently categorize large volumes of comments and gain insights into public discourse.

**Detoxify Model:** Detoxify is a state-of-the-art deep learning model specifically trained to identify toxic language in various forms. The model uses a transformer-based architecture, making it effective in processing and understanding the nuances of toxic language across a variety of contexts.

# **Data Preprocessing:**

**Text Cleaning:** Before feeding comments into Detoxify, basic text preprocessing steps are applied to remove links, special characters, and other irrelevant content.

**Translation:** Since the comments dataset contains multilingual text, non-English comments are translated to English to ensure Detoxify can accurately classify them.

#**Detoxify Inference:**

**Thresholding for Labeling:** Detoxify outputs probability scores for various toxicity labels (e.g., toxicity, severe_toxicity, insult, identity_attack). By setting a threshold for these scores, comments are categorized as "Hate" or "Not Hate."

**Custom Labeling Logic:** Based on the Detoxify output probabilities, comments are labeled according to customized thresholds that best capture hate speech in the context of election-related discourse.

In [None]:
import pandas as pd
import matplotlib.pyplot as pyplot
import seaborn as sns
import numpy as np
import nltk
import re

In [None]:
df=pd.read_excel('/content/translated.xlsx')

In [None]:
df.head(10)

Unnamed: 0,text
0,Dont remember the last time hindus crashed a p...
1,Being a Muslim it is our duty to te...
2,Very good
3,All Indian muslim go Pakistan
4,So modi pushing for more children 🧒
5,40 million Hindus killed in bangladesh
6,He is telling what people want every politicia...
7,🫡🫡 India
8,modi is not anti muslim\npakistanis dont want ...
9,Please 🙏 muslim leave india 😂😂😂😂😂


In [None]:
# df=df.drop('Unnamed: 1',axis=1)

In [None]:
# df.head()

In [None]:
# removal of capitalization
def lower_case(text):
    return text.lower()
df['text'] = df['text'].apply(lower_case)

In [None]:
df.head(10)

Unnamed: 0,text
0,dont remember the last time hindus crashed a p...
1,being a muslim it is our duty to te...
2,very good
3,all indian muslim go pakistan
4,so modi pushing for more children 🧒
5,40 million hindus killed in bangladesh
6,he is telling what people want every politicia...
7,🫡🫡 india
8,modi is not anti muslim\npakistanis dont want ...
9,please 🙏 muslim leave india 😂😂😂😂😂


In [None]:
# Compile the regex pattern to match @mentions
regex_pat = re.compile(r'@[\w\-]+')

# Function to remove @mentions from the text
def remove_mentions(text):
    return re.sub(regex_pat, '', text)

# Apply the function to the 'text' column
df['text'] = df['text'].apply(remove_mentions)

In [None]:
df.head(10)

Unnamed: 0,text
0,dont remember the last time hindus crashed a p...
1,being a muslim it is our duty to te...
2,very good
3,all indian muslim go pakistan
4,so modi pushing for more children 🧒
5,40 million hindus killed in bangladesh
6,he is telling what people want every politicia...
7,🫡🫡 india
8,modi is not anti muslim\npakistanis dont want ...
9,please 🙏 muslim leave india 😂😂😂😂😂


In [None]:
# Removal of extra spaces using pandas' str.replace with regex=True
df['text'] = df['text'].str.replace(r'\s+', ' ', regex=True)

In [None]:
# remove whitespace with a single space
df['text']=df['text'].str.replace(r'\s+', ' ')

In [None]:
# from google.colab import files
# df.to_excel('trans.xlsx', index=False)
# files.download('trans.xlsx')

In [None]:
#finding duplicates comments
# df[df.duplicated(subset='text')]

In [None]:
# Remove duplicates while keeping the first occurrence
# df = df.drop_duplicates(subset='text', keep='first')

In [None]:
# Optionally, reset index after removing duplicates
# df.reset_index(drop=True, inplace=True)

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21428 entries, 0 to 21427
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   text    21428 non-null  object
dtypes: object(1)
memory usage: 167.5+ KB


In [None]:
# Removing leading and trailing whitespace from the 'text' column
df['text'] = df['text'].str.strip()

In [None]:
hash_comments = df[df['text'] == '#value!']

In [None]:
hash_comments

Unnamed: 0,text
5954,#value!
15438,#value!
20944,#value!
21214,#value!


In [None]:
df=df[df['text'] != '#value!']

In [None]:
# Install Emoji library.
!pip install emoji

Collecting emoji
  Downloading emoji-2.14.0-py3-none-any.whl.metadata (5.7 kB)
Downloading emoji-2.14.0-py3-none-any.whl (586 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m586.9/586.9 kB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: emoji
Successfully installed emoji-2.14.0


In [None]:
# Import module emoji
import emoji

In [None]:
# Function to extract emojis from a comment
def extract_emojis(comment):
    return ''.join([char for char in comment if char in emoji.EMOJI_DATA])

# Apply the function to the 'text' column
emojis = df['text'].apply(extract_emojis)

# Display the DataFrame with extracted emojis
print(emojis)

0        ☪
1         
2         
3         
4        🧒
        ..
21423     
21424     
21425     
21426     
21427     
Name: text, Length: 21424, dtype: object


In [None]:
str=''
for i in df.text:
    list=[c for c in i if c in emoji.EMOJI_DATA]
    for ele in list:
        str= str+ele

In [None]:
# How many emojis do we have in our dataset?
len(str)

18668

In [None]:
# This is how our str looks like
str

'☪🧒\U0001fae1\U0001fae1🙏😂😂😂😂😂😢😢😢😢😂😂😢😢😂❤😊😂❤❤❤😅😅😅❤🤔🤔🤔🤔😂😂😂💩💀❤❤😂🙏🩴😢😢❤❤😊😡😡😡🙏😂😂😂😂😂😂⛑🍬❤❤😂😂😂😂😂😂🗿😂😂😂😂😊😊❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤😊😅😅🤬🤬🤬❤❤❤❤👏❤😂😂😅😂❤❤❤😂😂😂🚩🚩🚩🚩😂😂😂😅😅😢☕👎☕☕\U0001faf5😃😃😂😂😂😂😅😂😅😂😅❤❤❤😂🙏🙏🙏😢❤❤❤❤🙂🔥💪😊😢😂😂😂😂❌✅😂😂😂😂😂😂🥱😂🤦♂🤷♂🕉🙏❤🕉😢😢😊😊🔥🔥🔥❤❤❤🙏🙏🙏🤮🤮🤮❤❤❤❤❤❤❤❤❤🌼🌺🙏🌼🌺🙏❤❤❤❤❤❤😢😢😢😢😢❤😂✅😢👎👎👎👎👎👎❤😂😂😂😂😂😂✅🤚🤚🌎😅😅😡😭😭😢🤣🤣🤣🤣🤣😂😡😡😡😡😡😡😡😡😡😡😡😂✌💔🥲😂😂😂😂😂😂❤🚩🚩🚩🚩😂😂👈👈👎👎👎😂😂😂😂😂😂😂😂🤣🤣🤣🤣😢😢😢😢😢😢😢😢🕉❤😢😢😢😢😢❤😂😂😂😂😂😅🥶🗿😂🗿❤❤❤❤❤👏😊👎😱😈👿😡😱🤪😨😡👿👿👿👿😂😂💞🤲🕌💖🤞😂😂😂🎉😡😂😂🎉😂❤❤❤❤❤❤❤😂😂👹👹❌✅❤❤❤❤😊😂🗳🤬🤬🤬😡😡😢😢😭😢🔥🔥🔥😂😊😂😂😂😂😂😂🕉🌸☝🏻💚🤍😂😂😢😢😢👍👍👍❤🤔💔😂😂😂😂❎✅⚡😂😡😂🥺🥺☪☝🏼☝🏾☝🏾☝🏾☝🏾☝🏽😢😂😂😂😂😂😂😂😂😂😂😡😡😡🔥😂😮😢😮😮😮😮😮😮😮😅😡😂😂😂💩🤮🤮😤😂😂❌✅🙄🙏😕❤💙😂😂😂😂😂😂😂🤡❤❤❤❤❤😮❤📈📈📈📈🗿🗿🗿🗿😢😢😊😢😢😂😂😂😂😂😂😂🙏😢🤣🤣🤣🤣❌✅💀😂😅😅😅✌😂🤣💔❤💯💯💪💪💪🚩🚩🚩😈🐄🚩🚩🤣🤣🤣🤣🤣🤣😂😮👍💐💐💐😂😢😢😢🗳😢😢😢😢😢😢😢😅🤩🤩😂❌😡😠😡😠❤❤❤😊😂😂😂😂😂💩💩💩💩😢😢🙂🤣🤦🏼♂😊😢👹👺😡🤡😂😂😢👈🐶🐶🐶🐶🐶🐶🐶🐶🐶🐶🐶🐶😂😂😂😂😂😂😂☕☕☕😅😂😂😂❤🚩😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡❌❌❌❌❌❌❌❌❌❌❌❌🙂❤❤❤❤😢😢😢😢😢😢😢❤❤❤❤😊😢😢😢😢😢😢😮💨🦛🦄🥲🥲🥲😢😢😢😢🎉❤😂😂😂😂😂😂☪😅😅😅😅😂🪑😢😨😂😂🤢😢🅱😂❤😢😢😢😂✊😆🤬🤬🤬😂😂😂😂🤡🙌🏻🙃❤☪😁😢😂💪👍💪👍❤🤣🤣👍😢😢😢😴😂😂❤❤❤❤❤❤😢🚩🚩🚩🚩🚩🚩🚩🚩🕉🔯❤😅😡😡😡😡😡😡😢😢😢😢😡🎈😂😂😂😂😂😃😆😡❤❤❤❤🚩🚩🚩🚩😂😢😢😢😢👍🙏❤❤❤😢😅🐖☕🐶😂😂😂😂🙏🙏😮😡😢😢❤🤣🤣🤣😶❤🩹✊❤🩹😢😂😂👎👎👎😆😆😆😆😆😆😆😆😆❤😂☯😡🛑🤐🤫😢😢😢😢😢☠😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😅👹👹👍🙏❤😡👍💯👎👎👎🙄👺👺👺😜😜🥊🥊🥊🥊😡😍😢😢🤮💩🐖🐍🤣🤣😅😅☪🕉☪😂❌❌❌❌❌❌❌❌❌❌✅✅✅✅✅✅✅✅✅✅✅

In [None]:
# Let's count the unique emojis
result={}
for i in set(str):
    result[i]= str.count(i)

In [None]:
result.items()

dict_items([('📲', 1), ('🤣', 535), ('🌙', 2), ('🌿', 1), ('🤙', 1), ('🐮', 15), ('🥲', 6), ('⚡', 2), ('💬', 2), ('😋', 5), ('🌀', 1), ('💜', 1), ('🌻', 2), ('\U0001faf5', 2), ('😑', 4), ('💨', 1), ('🤢', 1), ('🤩', 7), ('🛑', 1), ('☝', 23), ('🎇', 5), ('🦜', 1), ('💓', 7), ('👹', 11), ('🚀', 12), ('🧒', 1), ('🎅', 3), ('🤪', 24), ('🩸', 2), ('☘', 1), ('🐍', 5), ('🩳', 1), ('\U0001fae8', 1), ('💌', 1), ('😎', 8), ('👐', 1), ('💥', 3), ('🤯', 2), ('👣', 1), ('👠', 9), ('📙', 15), ('🦨', 1), ('🌹', 110), ('🚷', 2), ('📜', 2), ('🪑', 6), ('🔥', 216), ('🤞', 4), ('😈', 22), ('☺', 3), ('🤘', 3), ('📚', 1), ('🛕', 12), ('💙', 57), ('🪲', 1), ('🙉', 5), ('👕', 1), ('🚨', 3), ('😌', 4), ('🤮', 34), ('🚳', 2), ('👦', 5), ('🐂', 1), ('🐖', 16), ('💦', 2), ('🌐', 1), ('☪', 26), ('🏍', 1), ('😉', 2), ('🦾', 1), ('🌽', 1), ('✝', 6), ('😖', 1), ('💊', 8), ('😘', 4), ('🏭', 3), ('⭐', 5), ('🐕', 17), ('🐣', 4), ('🦛', 1), ('🧡', 79), ('👋', 6), ('🧍', 1), ('⛔', 2), ('🍷', 1), ('🙃', 5), ('🥿', 5), ('🐭', 2), ('🏽', 11), ('🙌', 17), ('🐎', 5), ('😨', 2), ('🪁', 1), ('😄', 85), ('😯', 2

In [None]:
# I will define a dictionary final that has each imoji(key) and its count(value)
final={}
for key, value in sorted(result.items(), key= lambda item:item[1]):
    final[key]= value

In [None]:
# Display our final result
final

{'📲': 1,
 '🌿': 1,
 '🤙': 1,
 '🌀': 1,
 '💜': 1,
 '💨': 1,
 '🤢': 1,
 '🛑': 1,
 '🦜': 1,
 '🧒': 1,
 '☘': 1,
 '🩳': 1,
 '\U0001fae8': 1,
 '💌': 1,
 '👐': 1,
 '👣': 1,
 '🦨': 1,
 '📚': 1,
 '🪲': 1,
 '👕': 1,
 '🐂': 1,
 '🌐': 1,
 '🏍': 1,
 '🦾': 1,
 '🌽': 1,
 '😖': 1,
 '🦛': 1,
 '🧍': 1,
 '🍷': 1,
 '🪁': 1,
 '\U0001faf4': 1,
 '🪐': 1,
 '💛': 1,
 '🐑': 1,
 '🍥': 1,
 '🤥': 1,
 '🐦': 1,
 '♋': 1,
 '😺': 1,
 '🎀': 1,
 '🤧': 1,
 '🥋': 1,
 '👸': 1,
 '🕊': 1,
 '⛑': 1,
 '🎩': 1,
 '🦠': 1,
 '👖': 1,
 '⚠': 1,
 '👀': 1,
 '🍬': 1,
 '\U0001f979': 1,
 '🐜': 1,
 '🦉': 1,
 '✈': 1,
 '\U0001fae4': 1,
 '🤓': 1,
 '🛺': 1,
 '🅱': 1,
 '\U0001fae2': 1,
 '🤠': 1,
 '💷': 1,
 '🍎': 1,
 '🔯': 1,
 '\U0001faf2': 1,
 '🗡': 1,
 '🍍': 1,
 '🦋': 1,
 '💺': 1,
 '📷': 1,
 '🍒': 1,
 '📿': 1,
 '🦗': 1,
 '🐀': 1,
 '😛': 1,
 '👻': 1,
 '🖤': 1,
 '🦟': 1,
 '🥧': 1,
 '🐾': 1,
 '😵': 1,
 '‼': 1,
 '🦇': 1,
 '🌝': 1,
 '🙅': 1,
 '😬': 1,
 '🐩': 1,
 '🎓': 1,
 '☯': 1,
 '📰': 1,
 '🦶': 1,
 '\U0001faf6': 1,
 '🏁': 1,
 '🔘': 1,
 '🌙': 2,
 '⚡': 2,
 '💬': 2,
 '🌻': 2,
 '\U0001faf5': 2,
 '🩸': 2,
 '🤯': 2,
 '🚷': 2,
 '📜': 2,
 

In [None]:
# Now, we create a data frame for the top used 10 emojis
keys= [*final.keys()]
values=[*final.values()]
emojis= pd.DataFrame(keys[-10:], values[-10:])

In [None]:
emojis= pd.DataFrame({'chars': keys[-10:], 'num': values[-10]})

In [None]:
emojis.head()

Unnamed: 0,chars,num
0,😊,308
1,👍,308
2,😢,308
3,🤣,308
4,🎉,308


In [None]:
# Import libraries and modules
import plotly.graph_objs as go
from plotly.offline import iplot

In [None]:
graph = go.Bar(
x= emojis['chars'],
y= emojis['num'])
iplot([graph] )
# Hover over the bars to view the emojis along with the count

In [None]:
from transformers import AutoTokenizer
import emoji

In [None]:
import re
import emoji

# Function to remove duplicate emojis
def remove_duplicate_emojis(text):
    # Create a set to track used emojis
    used_emojis = set()
    # Iterate over each character in the text
    result = []
    for char in text:
        # Check if the character is an emoji
        if char in emoji.EMOJI_DATA:
            # If emoji is not already used, add it to result and mark as used
            if char not in used_emojis:
                used_emojis.add(char)
                result.append(char)
        else:
            # If it's not an emoji, just add the character to result
            result.append(char)
    return ''.join(result)

# Apply the function to the 'text' column in your dataset
df['text'] = df['text'].apply(remove_duplicate_emojis)

# Display the updated DataFrame
print(df[['text']])


                                                    text
0      dont remember the last time hindus crashed a p...
1      being a muslim it is our duty to tell you on i...
2                                              very good
3                          all indian muslim go pakistan
4                    so modi pushing for more children 🧒
...                                                  ...
21423                          bjp win 25 seat in bengal
21424  an opinion poll done on the theme of 400 plus ...
21425                              paid channel from bjp
21426                                maharashtra bjp+ 44
21427                                      manipur bjp 1

[21424 rows x 1 columns]


In [None]:
def emoji2description(text):
    # Modify the replace function to add a single colon around the emoji description
    return emoji.replace_emoji(text, replace=lambda chars, data_dict: data_dict['en'] )


In [None]:
df['text'] = df['text'].apply(emoji2description)

In [None]:
df.head(10)

Unnamed: 0,text
0,dont remember the last time hindus crashed a p...
1,being a muslim it is our duty to tell you on i...
2,very good
3,all indian muslim go pakistan
4,so modi pushing for more children :child:
5,40 million hindus killed in bangladesh
6,he is telling what people want every politicia...
7,:saluting_face: india
8,modi is not anti muslim pakistanis dont want u...
9,please :folded_hands: muslim leave india :face...


In [None]:
# Assuming your text data is in the 'text' column of the dataframe
special_chars = set()

# Regular expression to match special characters (excluding alphanumeric characters and spaces)
regex = re.compile(r'[^a-zA-Z0-9\s]')

# Loop through each text entry in the dataframe
for text in df['text']:
    # Find all special characters in the text
    found_chars = regex.findall(text)  # Convert to string to avoid errors
    special_chars.update(found_chars)

# Display the unique special characters found
print(special_chars)

{'`', '।', '¹', '&', '(', '۔', '-', '\u200b', '!', '“', '》', '⁹', '❞', '/', '<', '￼', '\u2060', '❝', '"', '٫', '=', '‘', '”', '–', '•', '…', '{', '\\', '%', '[', '\U0001fbed', '>', '’', '}', '౼', ':', ')', '+', '~', '²', '″', '.', '∞', ']', '#', '☬', '✓', '@', '★', '|', "'", '*', '→', '·', '°', ';', '?', '$', ',', '±', '_', '—', '₹', '✧', '\u200d'}


In [None]:
# Function to remove special characters except #*@!?
def clean_comments(comment):
    # Keep letters, numbers, spaces, and the specified characters
    return re.sub(r'[^a-zA-Z0-9\s#*@!?:]', '', comment)

# Apply the function to the 'comments' column
df['text'] = df['text'].apply(clean_comments)

In [None]:
df.head(10)

Unnamed: 0,text
0,dont remember the last time hindus crashed a p...
1,being a muslim it is our duty to tell you on i...
2,very good
3,all indian muslim go pakistan
4,so modi pushing for more children :child:
5,40 million hindus killed in bangladesh
6,he is telling what people want every politicia...
7,:salutingface: india
8,modi is not anti muslim pakistanis dont want u...
9,please :foldedhands: muslim leave india :facew...


In [None]:
# Removal of extra spaces using pandas' str.replace with regex=True
df['text'] = df['text'].str.replace(r'\s+', ' ', regex=True)


In [None]:
!pip install detoxify

Collecting detoxify
  Downloading detoxify-0.5.2-py3-none-any.whl.metadata (13 kB)
Downloading detoxify-0.5.2-py3-none-any.whl (12 kB)
Installing collected packages: detoxify
Successfully installed detoxify-0.5.2


In [None]:
from concurrent.futures import ThreadPoolExecutor, as_completed
from detoxify import Detoxify

# Function to get hate speech label
def get_hate_speech_label(text):
    results = Detoxify('original').predict(text)
    if results['toxicity'] > 0.3:  # Adjust threshold as needed
        return 'hate'
    else:
        return 'not hate'

# Function to process a chunk of the dataframe
def process_chunk(chunk):
    chunk['label'] = chunk['text'].apply(get_hate_speech_label)
    return chunk

# Split the dataframe into chunks
chunk_size = 100  # Adjust based on your memory and system capability
chunks = [df.iloc[i:i + chunk_size].copy() for i in range(0, df.shape[0], chunk_size)]

# Process the chunks in parallel
with ThreadPoolExecutor() as executor:
    futures = [executor.submit(process_chunk, chunk) for chunk in chunks]
    results = []
    first_chunk_processed = False
    for future in as_completed(futures):
        result = future.result()
        results.append(result)
        if not first_chunk_processed:
            print("First batch processed:")
            print(result.head())
            first_chunk_processed = True

# Concatenate the results back into a single dataframe
df_labeled = pd.concat(results)

# Save the labeled data
df_labeled.to_excel('labeled_comments.xlsx', index=False)

# Also save individual chunks
for i, chunk in enumerate(chunks):
    chunk.to_excel(f'labeled_comments_chunk_{i + 1}.xlsx', index=False)

Downloading: "https://github.com/unitaryai/detoxify/releases/download/v0.1-alpha/toxic_original-c1212f89.ckpt" to /root/.cache/torch/hub/checkpoints/toxic_original-c1212f89.ckpt
Downloading: "https://github.com/unitaryai/detoxify/releases/download/v0.1-alpha/toxic_original-c1212f89.ckpt" to /root/.cache/torch/hub/checkpoints/toxic_original-c1212f89.ckpt
Downloading: "https://github.com/unitaryai/detoxify/releases/download/v0.1-alpha/toxic_original-c1212f89.ckpt" to /root/.cache/torch/hub/checkpoints/toxic_original-c1212f89.ckpt
Downloading: "https://github.com/unitaryai/detoxify/releases/download/v0.1-alpha/toxic_original-c1212f89.ckpt" to /root/.cache/torch/hub/checkpoints/toxic_original-c1212f89.ckpt
Downloading: "https://github.com/unitaryai/detoxify/releases/download/v0.1-alpha/toxic_original-c1212f89.ckpt" to /root/.cache/torch/hub/checkpoints/toxic_original-c1212f89.ckpt
Downloading: "https://github.com/unitaryai/detoxify/releases/download/v0.1-alpha/toxic_original-c1212f89.ckpt"

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]


`clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884


`clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884



First batch processed:
                                                  text     label
500  modi fail hatred spreading is not good for cou...  not hate
501  he shouldnt be selected because he cant respec...  not hate
502  indian butcher prime minister main accused of ...      hate
503  its very shameful speech by a pm of world bigg...  not hate
504  in india muslim is rape and murder without any...      hate



`clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884


`clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884


`clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884


`clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by defaul