Skip to content

A prediction model that seeks to measure and classify sentences. Three classifiers from Sklearn (LogisticRegression(), MultinomialNB(), LinearSVC()) were used. Through them, it is possible to identify whether the sentence in question is hate speech or not.

License

DarlanNoetzold/HateSpeech-portuguese

Repository files navigation

HateSpeech-portuguese

Detection of hate speech in Portuguese, English, and Spanish using machine learning techniques.

Development:

  • Python 3.8 was used as the base language;
  • Auxiliary libraries were used for data preparation (pandas, numpy, nltk, pickle);
  • An API, developed by me and hosted on Heroku, was used to save the data. Flask was used to develop the API;
  • Sklearn was used for creating the prediction model. The models used were: Logistic Regression, Multinomial Naive Bayes, and Linear SVC (SVM).

Project:

  • Proof of concept project for the development of NLPs to recognize texts and predict the sentiments they convey;
  • This prediction model is part of a larger project called Remote-Analyser, which is a system developed by me, for collecting suspicious data on corporate and/or institutional computers. Thus, serving as a more efficient monitoring of these entities' assets;
  • This model in Python uses various specific libraries to assist in development. The classifiers were trained with a dataset and exported using pickle. The exported file was imported into an API built with Flask, this API receives, in addition to the classifiers (Logistic Regression, Multinomial Naive Bayes, and Linear SVC (SVM)), a json body through the /predict endpoint, with a phrase to predict whether it is hate speech or not;
  • The input body for the /predict endpoint should be like this: { 'valor': 0, 'frase': 'test API' }
  • The return will be a json like the one shown above, but the 'value' will be 0 if it is not hate speech or 1 if it is.

How to use:

  • The complete application containing all configured microservices can be obtained at DockerHub.
  • To run it more easily, just execute the following commands:
docker container run --platform=linux/amd64 -it -p 8091:8091 -p 8090:8090 -p 5000:5000 -p 9091:9090 -p 3000:3000 --name=app -d darlannoetzold/tcc-spyware:4.0

docker exec -itd app /init-spyware-api.sh
docker exec -itd app /init-remoteanalyser.sh
docker exec -itd app /init-handler-hatespeech.sh


HateSpeech API:


spyware API:


Spyware Script:


Remote-Analyser:


Charts:

  • Accuracy: image


⭐️ From DarlanNoetzold

About

A prediction model that seeks to measure and classify sentences. Three classifiers from Sklearn (LogisticRegression(), MultinomialNB(), LinearSVC()) were used. Through them, it is possible to identify whether the sentence in question is hate speech or not.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published