## End-to-End Hate Speech Detection with Python
To create an end-to-end application for the task of hate speech detection, you must first learn how to train a machine learning model to detect if there is hate speech in a piece of text.

To deploy this model as an end-to-end application, I will be using the streamlit library in Python which will help us see the predictions of the hate speech detection model in real-time. If you have never used the streamlit library before, you need to install it on your system using the pip command:
* pip install streamlit

**Import Libraries:**

In [1]:
from nltk.util import pr
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

**Import Dataset:**

In [2]:
# Load the dataset from a CSV file
data = pd.read_csv("twitter.csv")

data.head()

Unnamed: 0.1,Unnamed: 0,count,hate_speech,offensive_language,neither,class,tweet
0,0,3,0,0,3,2,!!! RT @mayasolovely: As a woman you shouldn't...
1,1,3,0,3,0,1,!!!!! RT @mleew17: boy dats cold...tyga dwn ba...
2,2,3,0,3,0,1,!!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...
3,3,3,0,2,1,1,!!!!!!!!! RT @C_G_Anderson: @viva_based she lo...
4,4,6,0,6,0,1,!!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...


**Cleaning and Transformation:**

In [3]:
'''
Map numerical labels to descriptive categories:
'''
data["labels"] = data["class"].map({0: "Hate Speech", 
                                    1: "Offensive Language", 
                                    2: "No Hate and Offensive"})

# Display the updated dataset
data.head()



Unnamed: 0.1,Unnamed: 0,count,hate_speech,offensive_language,neither,class,tweet,labels
0,0,3,0,0,3,2,!!! RT @mayasolovely: As a woman you shouldn't...,No Hate and Offensive
1,1,3,0,3,0,1,!!!!! RT @mleew17: boy dats cold...tyga dwn ba...,Offensive Language
2,2,3,0,3,0,1,!!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...,Offensive Language
3,3,3,0,2,1,1,!!!!!!!!! RT @C_G_Anderson: @viva_based she lo...,Offensive Language
4,4,6,0,6,0,1,!!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...,Offensive Language


In [4]:
'''
Select relevant columns for analysis
'''
data = data[["tweet", "labels"]]

data.head()

Unnamed: 0,tweet,labels
0,!!! RT @mayasolovely: As a woman you shouldn't...,No Hate and Offensive
1,!!!!! RT @mleew17: boy dats cold...tyga dwn ba...,Offensive Language
2,!!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...,Offensive Language
3,!!!!!!!!! RT @C_G_Anderson: @viva_based she lo...,Offensive Language
4,!!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...,Offensive Language


In [5]:
'''
Perform text cleaning on the tweet column
'''
import re
import nltk
stemmer = nltk.SnowballStemmer("english")
from nltk.corpus import stopwords
import string
stopword = set(stopwords.words('english'))

def clean(text):
    text = str(text).lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)
    text = [word for word in text.split(' ') if word not in stopword]
    text = " ".join(text)
    text = [stemmer.stem(word) for word in text.split(' ')]
    text = " ".join(text)
    return text



In [6]:
'''
Apply the cleaning function to the tweet column
'''
data["tweet"] = data["tweet"].apply(clean)

data.head()

Unnamed: 0,tweet,labels
0,rt mayasolov woman shouldnt complain clean ho...,No Hate and Offensive
1,rt boy dat coldtyga dwn bad cuffin dat hoe ...,Offensive Language
2,rt urkindofbrand dawg rt ever fuck bitch sta...,Offensive Language
3,rt cganderson vivabas look like tranni,Offensive Language
4,rt shenikarobert shit hear might true might f...,Offensive Language


In [7]:
'''
Split the dataset into features (X) and labels (y)
'''
x = np.array(data["tweet"])
y = np.array(data["labels"])

# Initialize the CountVectorizer to convert text data into a bag-of-words representation
cv = CountVectorizer()
X = cv.fit_transform(x)  # Fit the Data

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

# Initialize the Decision Tree Classifier and fit it to the training data
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)

# Evaluate the classifier on the test set
accuracy = accuracy_score
accuracy = clf.score(X_test, y_test) * 100

# Display the accuracy of the model
print(f"Model Accuracy: {accuracy:.2f}")



Model Accuracy: 87.61


**Test Hate Speech Model:**

In [8]:
'''
Function to detect hate speech using the trained model
'''

def hate_speech_detection():
    import streamlit as st
    st.title("Hate Speech Detection")
    user = st.text_area("Enter any Tweet: ")
    if len(user) < 1:
        st.write("  ")
    else:
        sample = user
        data = cv.transform([sample]).toarray()
        prediction = clf.predict(data)
        st.title(prediction)

# Run the hate_speech_detection function when the script is executed
hate_speech_detection()

ImportError: DLL load failed while importing lib: The specified procedure could not be found.

In [None]:
#Run
!streamlit run Hate.py

**Summary**<br>
So, this is how you can easily build an end-to-end application to detect hate speech using the Python programming language. Hate speech is one of the serious issues we see on social media platforms like Facebook and Twitter, mostly from people with political views.