# End-to-End Hate Speech Detection with Python

Hate speech is one of the serious issues we see on social media platforms like Facebook and Twitter, mostly from people with political views. I recently shared an article on how to train a machine learning model for the hate speech detection task which you can find here. With its continuation, in this article, I’ll walk you through how to build an end-to-end hate speech detection system with Python.

To create an end-to-end application for the task of hate speech detection, you must first learn how to train a machine learning model to detect if there is hate speech in a piece of text. You can easily learn all about hate speech detection with machine learning from <a href="https://github.com/Rasel1435/Advanced-Data-Science-Machine-Learning-Projects/blob/main/Hate%20Speech%20Detection%20with%20Machine%20Learning/Hate_Speech_Detection_with_Machine_Learning.ipynb">here</a>. To deploy this model as an end-to-end application, I will be using the streamlit library in Python which will help us see the predictions of the hate speech detection model in real-time. If you have never used the streamlit library before, you need to install it on your system using the pip command

pip install streamlit

I hope you must have gone through <a href="https://github.com/Rasel1435/Advanced-Data-Science-Machine-Learning-Projects/blob/main/Hate%20Speech%20Detection%20with%20Machine%20Learning/Hate_Speech_Detection_with_Machine_Learning.ipynb">this article</a> for learning about training a hate speech detection model, now here is how you can build an end-to-end application for the task of hate speech detection using Python

In [None]:
#Importing Necessary Libraies
from nltk.util import pr 
import pandas as pd 
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer

#Feature Selection
from sklearn.model_selection import train_test_split

#Model Selection
from sklearn.tree import DecisionTreeClassifier

#Data Collection
data = pd.read_csv("data/twitter.csv")

#I will add a new column to this dataset as labels which will contain the values as
data["labels"] = data["class"].map({0: "Hate Speech",
                                    1: "Offensive Langiage",
                                    2: "No Hate and Offensive"})


data = data[["tweet", "labels"]]
#print(data.head())


import re
import nltk
stemmer = nltk.SnowballStemmer("english")
from nltk.corpus import stopwords
import string
stopword = set(stopwords.words('english'))

#Cleaning Data
def clean(text):
    text = str(text).lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)
    text = [word for word in text.split(' ') if word not in stopword]
    text=" ".join(text)
    text = [stemmer.stem(word) for word in text.split(' ')]
    text=" ".join(text)
    return text
data["tweet"] = data["tweet"].apply(clean)

#Feature Selection
x = np.array(data["tweet"])
y = np.array(data["labels"])

# Model Selection
cv = CountVectorizer()
X = cv.fit_transform(x) #Fit the data
X_train, Xtest, y_train, y_test = train_test_split(X,y, test_size=0.33, random_state=42)


clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
clf.score(X_train, y_test)

def hate_speech_detection():
    import streamlit as st
    st.title("Hate Speech Detection")
    user = st.text_area("Enter any Tweet: ")
    if len(user) < 1:
        st.write(" ")
    else:
        sample = user
        data = cv.transform([sample]).toarray()
        a = clf.predict(data)
        st.title(a)
hate_speech_detection()

As we are using the streamlit library in Python here so you cannot run this application the same way you run your other Python programs. You need to write the command mentioned below in your command prompt or terminal

streamlit run filename.py

Once the above command is executed, it will open a link on your default web browser which will show an end-to-end application where you have to write some text and it will detect if the text contains hate speech, offensive language or not, as shown in the image below.

# Summary

So, this is how you can easily build an end-to-end application to detect hate speech using the Python programming language. Hate speech is one of the serious issues we see on social media platforms like Facebook and Twitter, mostly from people with political views. Hope you liked this article on how to build an end-to-end application to detect hate speech with Python. Please feel free to ask your valuable questions in the comments section below.

# Sheikh Rasel Ahmed

#### Data Science || Machine Learning || Deep Learning || Artificial Intelligence Enthusiast

##### LinkedIn - https://www.linkedin.com/in/shekhnirob1
##### GitHub - https://github.com/Rasel1435
##### Behance - https://www.behance.net/Shekhrasel2513