Twitter-Inappropriate-Language-Detection

Project Description

Our project aims to perform inappropriate language detection in tweets. In order to do that, the tweets will be fetched from the Twitter Streaming API and temporarily stored in Apache Kafka. After storing the tweets in Kafka, there will be two consumers. Each consumer, using Spark streaming, will read the tweets from Kafka, preprocess and classify the tweets as inappropriate or not.

The two consumers will do the following:

One consumer will save the preprocessed tweets to Apache Cassandra NoSQL database and create some analytics. Then, from the database, the data will be read, and Tableau will be used for visualization. The statistics include grouping of inappropriate tweets per country and per user.
The other consumer will present analytics in real time and print the results. This analysis will show the real-time statistics grouped by user (i.e. how many offensive tweets each user has).

To prevent failure checkpoints are added in the two consumers and if a consumer fail it restarts from the latest checkpoint. Moreover, the model predictions of inappropriate tweets will be based on the pretrained model of this repo Toxic-Language-Detection-in-Online-Content

The architecture is shown below

Dependencies

Docker (change memory requirements of docker to at least 4 GB)
Tableau (optional)

RUNNING

start docker

Then open a terminal

$ cd /path/to/root/folder
$ docker-compose up -d --build

Additional

Portainer is added inside the docker-compose.yml file which is a web interface of the docker that allows easy management of the containers etc. To see an web interface of the container with logs, results etc. open a browser and type the url http://DOCKER_HOST:9000. A web ui (portainer) will be shown. If it is run locally then http://localhost:9000. Create account and login to see containers.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
cassandra-analytics		cassandra-analytics
docker-entrypoint-initdb.d		docker-entrypoint-initdb.d
kafka-consumer-cassandra		kafka-consumer-cassandra
kafka-consumer-realtime		kafka-consumer-realtime
kafka-producer		kafka-producer
model		model
.gitignore		.gitignore
README.md		README.md
architecture.png		architecture.png
docker-compose.yml		docker-compose.yml
docker-readme		docker-readme

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cassandra-analytics

cassandra-analytics

docker-entrypoint-initdb.d

docker-entrypoint-initdb.d

kafka-consumer-cassandra

kafka-consumer-cassandra

kafka-consumer-realtime

kafka-consumer-realtime

kafka-producer

kafka-producer

model

model

.gitignore

.gitignore

README.md

README.md

architecture.png

architecture.png

docker-compose.yml

docker-compose.yml

docker-readme

docker-readme

Repository files navigation

Twitter-Inappropriate-Language-Detection

Project Description

Dependencies

RUNNING

Additional

About

Releases

Packages

Contributors 2

Languages

valavanisleonidas/Twitter-Inappropriate-Language-Detection

Folders and files

Latest commit

History

Repository files navigation

Twitter-Inappropriate-Language-Detection

Project Description

Dependencies

RUNNING

Additional

About

Topics

Resources

Stars

Watchers

Forks

Languages