ReSt - A Deep Learning Approach for Italian Stereotype Detection in Social Media

Introduction

The Hate Speech Detection (HaSpeeDe 2) task presented at Evalita 2020 was composed of the main task (hate speech detection) and two Pilot tasks (stereotype and nominal utterance detection). This project aims to investigate different models for solving the stereotype detection task. Our study includes different types of neural networks such as convolutional neural networks (CNNs), recurrent neural networks model (BiLSTM), and BiLSTM with a soft-attention module. We also evaluated a BERT model by using an Italian pre-trained BERT and then fine-tuned the entire model for our classification task. In our experiments, it emerged that the choice of model and the combination of features extracted from the deep models was important. Moreover, with Bert, we noticed how pre-trained models on large datasets can give a significant improvement when applied to other tasks.

This project was developed for the course of Human Language Technologies at the University of Pisa under the guide of Prof. Giuseppe Attardi.

All the detalis can be found on the full report here.

Usage

setup the repo

This code requires Python 3.8 or later, to download the repository:

git clone https://github.com/alessandrocuda/ReSt

Then you need to install the basic dependencies to run the project on your system:

cd ReSt
pip install -r requirements.txt

Download the Italian Twitter Embeddings and move to:

!mv twitter128.bin results/model/word2vec

and you are ready to go.

Docker file

As an alternative, there is also a docker file that can instantiate a web app.

You can build the docker image via:

cd ReSt/app
docker build -t rest .

and then you have to run it in the following way: docker run -dp 3000:3000 rest

After that you can access to the webapp by using the following url: localhost:3000

Models

All the models explored in this project are listed below and are all avaible as H5 tensorflow models in the results folder:

KCNN, inspired by the Kim’s model
D-KCNN, a KCNN that combines text, PoS tags and all the extra features extracted in this project
D-BiLSTM, follow the D-KCNN architecture but with two BiLSTM
A-BiLSTM, concatenate the text and PoS tagging as input to a BiLSTM and to take advantage of all the features extracted by by the BiLSTM, we weighted each output with an attention mechanism.
BERT we used a cased pretrained bert model provided by DBMZ and fine tuned to our task.

Results

Model	Macro F1-score Test
BERT	0.737
A-BiLSTM	0.722
D-KCNN	0.715
Baseline_SVC	0.714
D-BiLSTM	0.703
KCN	0.700
Baseline_MFC	0.354

Contributing

Fork it!
Create your feature branch: git checkout -b my-new-feature
Commit your changes: git commit -am 'Add some feature'
Push to the branch: git push origin my-new-feature
Submit a pull request :D

Contact

Alessandro Cudazzo - @alessandrocuda - alessandro@cudazzo.com

Giulia Volpi - giuliavolpi25.93@gmail.com

Project Link: https://github.com/alessandrocuda/ReSt

License

This library is free software; you can redistribute it and/or modify it under the terms of the MIT license.

MIT license

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
app		app
dataset		dataset
notebooks		notebooks
report		report
results/model		results/model
script/grid_search		script/grid_search
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReSt - A Deep Learning Approach for Italian Stereotype Detection in Social Media

Introduction

Table of Contents

Usage

setup the repo

Docker file

Models

Results

Contributing

Contact

License

About

Releases

Packages

Languages

License

alessandrocuda/ReSt

Folders and files

Latest commit

History

Repository files navigation

ReSt - A Deep Learning Approach for Italian Stereotype Detection in Social Media

Introduction

Table of Contents

Usage

setup the repo

Docker file

Models

Results

Contributing

Contact

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages