communicabio-ml

Our solution for Metachallenge hackathon.

This repo contains:

GPT-2 deployment configs for Kubernetes (CPU + not working GPU) and Cloud Run.
GPT-2 are available in Russian and English
BERT for classification training scripts

Check communicabio-hints for BERT deployment configs

Training

We tried to create two models:

For toxicity detection/measurement
For positivity detection

For deployment purposes we were unable to fine-tune BERT*, thus, we added a small 1 linear layer network to BERT and trained it, instead of the whole model.

*To minimize server cold start time we aimed to unify BERT servers. Due to 2GB Cloud Run restrictions, it`s impossible to store different BERT models in one image.

Datasets

Sentiment

http://www.dialog-21.ru/evaluation/2016/sentiment/

https://gitlab.com/kensand/rusentiment

http://study.mokoron.com/

https://github.com/oldaandozerskaya/auto_reviews

Toxic

http://tpc.at.ispras.ru/prakticheskoe-zadanie-2015/

https://www.kaggle.com/blackmoon/russian-language-toxic-comments/data

http://files.deeppavlov.ai/models/obscenity_classifier/ru_obscenity_dataset.zip

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
datasets		datasets
gpt2_server		gpt2_server
.gitignore		.gitignore
README.md		README.md
datasets.py		datasets.py
metrics.py		metrics.py
models.py		models.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

communicabio-ml

Training

Datasets

Sentiment

Toxic

About

Releases

Packages

Languages

Communicabio/communicabio-ml

Folders and files

Latest commit

History

Repository files navigation

communicabio-ml

Training

Datasets

Sentiment

Toxic

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages