HateSpeech-portuguese

Detection of hate speech in Portuguese, English, and Spanish using machine learning techniques.

Development:

Python 3.8 was used as the base language;
Auxiliary libraries were used for data preparation (pandas, numpy, nltk, pickle);
An API, developed by me and hosted on Heroku, was used to save the data. Flask was used to develop the API;
Sklearn was used for creating the prediction model. The models used were: Logistic Regression, Multinomial Naive Bayes, and Linear SVC (SVM).

Project:

Proof of concept project for the development of NLPs to recognize texts and predict the sentiments they convey;
This prediction model is part of a larger project called Remote-Analyser, which is a system developed by me, for collecting suspicious data on corporate and/or institutional computers. Thus, serving as a more efficient monitoring of these entities' assets;
This model in Python uses various specific libraries to assist in development. The classifiers were trained with a dataset and exported using pickle. The exported file was imported into an API built with Flask, this API receives, in addition to the classifiers (Logistic Regression, Multinomial Naive Bayes, and Linear SVC (SVM)), a json body through the /predict endpoint, with a phrase to predict whether it is hate speech or not;
The input body for the /predict endpoint should be like this: { 'valor': 0, 'frase': 'test API' }
The return will be a json like the one shown above, but the 'value' will be 0 if it is not hate speech or 1 if it is.

How to use:

The complete application containing all configured microservices can be obtained at DockerHub.
To run it more easily, just execute the following commands:

docker container run --platform=linux/amd64 -it -p 8091:8091 -p 8090:8090 -p 5000:5000 -p 9091:9090 -p 3000:3000 --name=app -d darlannoetzold/tcc-spyware:4.0

docker exec -itd app /init-spyware-api.sh
docker exec -itd app /init-remoteanalyser.sh
docker exec -itd app /init-handler-hatespeech.sh

HateSpeech API:

GitHub Repository:
Link: https://github.com/DarlanNoetzold/HateSpeech-portuguese

spyware API:

GitHub Repository:
Link: https://github.com/DarlanNoetzold/spyware-API

Spyware Script:

GitHub Repository:
Link: https://github.com/DarlanNoetzold/spyware

Remote-Analyser:

GitHub Repository:
Link: https://github.com/DarlanNoetzold/Remote-Analyser

Charts:

Accuracy:

⭐️ From DarlanNoetzold

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.idea		.idea
__pycache__		__pycache__
base		base
base_en		base_en
base_sp		base_sp
model		model
.gitignore		.gitignore
Grafics.py		Grafics.py
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
TextProcessor.py		TextProcessor.py
TextTokenizer.py		TextTokenizer.py
app.yaml		app.yaml
handler.py		handler.py
requirements.txt		requirements.txt
teste-Accuracy.png		teste-Accuracy.png
teste-Acurácia Balanceada.png		teste-Acurácia Balanceada.png
teste-Acurácia.png		teste-Acurácia.png
teste-Area under ROC curve.png		teste-Area under ROC curve.png
teste-Balanced Accuracy.png		teste-Balanced Accuracy.png
teste-Fit Time (s).png		teste-Fit Time (s).png
teste-Área sobre curva ROC.png		teste-Área sobre curva ROC.png
train_en.py		train_en.py
train_pt.py		train_pt.py
train_sp.py		train_sp.py
valid-Acurácia (na Validação).png		valid-Acurácia (na Validação).png

License

DarlanNoetzold/HateSpeech-portuguese

Folders and files

Latest commit

History

Repository files navigation

HateSpeech-portuguese

Development:

Project:

How to use:

HateSpeech API:

spyware API:

Spyware Script:

Remote-Analyser:

Charts:

About

Resources

License

Stars

Watchers

Forks

Languages