Our solution for Metachallenge hackathon.
This repo contains:
GPT-2
deployment configs forKubernetes
(CPU + not working GPU) andCloud Run
.GPT-2
are available in Russian and EnglishBERT
for classification training scripts
Check communicabio-hints for BERT
deployment configs
We tried to create two models:
- For toxicity detection/measurement
- For positivity detection
For deployment purposes we were unable to fine-tune BERT
*, thus, we added a small 1 linear layer network to BERT
and trained it, instead of the whole model.
*To minimize server cold start time we aimed to unify BERT
servers. Due to 2GB
Cloud Run
restrictions, it`s impossible to store different BERT models in one image.
http://www.dialog-21.ru/evaluation/2016/sentiment/
https://gitlab.com/kensand/rusentiment
https://github.com/oldaandozerskaya/auto_reviews
http://tpc.at.ispras.ru/prakticheskoe-zadanie-2015/
https://www.kaggle.com/blackmoon/russian-language-toxic-comments/data
http://files.deeppavlov.ai/models/obscenity_classifier/ru_obscenity_dataset.zip