The Reddit Streaming classifier

This project aim is to classify directly streamed reddit comments from some subreddits on some criteria. In a first version this project detects hate speech comments from new comments or title of submission.

The project is written in Golang for the streaming components and the storage utilities and Python for the classification. The services communicate with Kafka and the data is stored in a cassandra database. The stored data is exposed with elasticsearch (elassandra)

Dependencies

Go:

Python:

Keras
Kafka

And others: see python requirements

Requirements

A reddit account and a reddit_kafka/auth.conf file that looks like that:

CLIENT_ID=<client-id>
CLIENT_SECRET=<client-secret>
USER_AGENT=<user-agent>
USERNAME=<username>
PASSWORD=<password>

For further info check the reddit api doc and the API access rules

Build & Run instructions:

Setup the environnemnt in reddit_classifier:

Create a virtual env: python3 -m venv venv
Activate the virtual env: . venv/bin/activate
Install the required packages : pip3 install -r requirements.txt
Train the word2vec model: python train_word2vec.py
Train the hate speech classifier: python train_lstm.py

To run everything together:

Build the images: docker-compose -f docker-compose.yml build
Run the docker-compose: docker-compose -f docker-compose.yml up

Kubernetes setup

If you want to deploy this project to kubernetes, see the kubernetes README

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
cli		cli
images		images
kube-deploy		kube-deploy
reddit_classifier		reddit_classifier
reddit_kafka		reddit_kafka
reddit_storage		reddit_storage
LICENSE		LICENSE
README.md		README.md
cassandra-init.sh		cassandra-init.sh
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cli

cli

images

images

kube-deploy

kube-deploy

reddit_classifier

reddit_classifier

reddit_kafka

reddit_kafka

reddit_storage

reddit_storage

LICENSE

LICENSE

README.md

README.md

cassandra-init.sh

cassandra-init.sh

docker-compose.yml

docker-compose.yml

Repository files navigation

The Reddit Streaming classifier

Dependencies

Requirements

Build & Run instructions:

Kubernetes setup

About

Releases

Packages

Languages

License

Baumanar/reddit_streaming_classifier

Folders and files

Latest commit

History

Repository files navigation

The Reddit Streaming classifier

Dependencies

Requirements

Build & Run instructions:

Kubernetes setup

About

Resources

License

Stars

Watchers

Forks

Languages