- local
Named Entity Recognition task in DeepPavlov is solved with BERT-based model. The models predict tags (in BIO format) for tokens in input.
BERT-based model is described in BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
+---------------------------------------------------------+-------+--------------------------------------------------------------------------------------------+-------------+ | Dataset | Lang | Model | Test F1 | +=========================================================+=======+============================================================================================+=============+ | Persons-1000 dataset with additional LOC and ORG markup | Ru | ner_rus_bert.json <ner/ner_rus_bert.json>
| 97.9 | + + +--------------------------------------------------------------------------------------------+-------------+ | (Collection 3) | | ner_rus_convers_distilrubert_2L.json <ner/ner_rus_convers_distilrubert_2L.json>
| 88.4 ± 0.5 | + + +--------------------------------------------------------------------------------------------+-------------+ | | | ner_rus_convers_distilrubert_6L.json <ner/ner_rus_convers_distilrubert_6L.json>
| 93.3 ± 0.3 | +---------------------------------------------------------+-------+--------------------------------------------------------------------------------------------+-------------+ | Ontonotes | Multi | ner_ontonotes_bert_mult.json <ner/ner_ontonotes_bert_mult.json>
| 88.9 | + +-------+--------------------------------------------------------------------------------------------+-------------+ | | En | ner_ontonotes_bert.json <ner/ner_ontonotes_bert.json>
| 89.2 | +---------------------------------------------------------+ +--------------------------------------------------------------------------------------------+-------------+ | ConLL-2003 | | ner_conll2003_bert.json <ner/ner_conll2003_bert.json>
| 91.7 | +---------------------------------------------------------+-------+--------------------------------------------------------------------------------------------+-------------+
Model for classification tasks (intents, sentiment, etc) on word-level. Shallow-and-wide CNN, Deep CNN, BiLSTM, BiLSTM with self-attention and other models are presented. The model also allows multilabel classification of texts. Several pre-trained models are available and presented in Table below.
+------------------+---------------------+------+----------------------------------------------------------------------------------------------------+-------------+------------------+-----------------+-----------+ | Task | Dataset | Lang | Model | Metric | Valid | Test | Downloads | +==================+=====================+======+====================================================================================================+=============+==================+=================+===========+ | Insult detection | Insults | En | English BERT<classifiers/insults_kaggle_bert.json>
| ROC-AUC | 0.9327 | 0.8602 | 1.1 Gb | +------------------+---------------------+ +----------------------------------------------------------------------------------------------------+-------------+------------------+-----------------+-----------+ | Sentiment | SST | | 5-classes SST on conversational BERT <classifiers/sentiment_sst_conv_bert.json>
| Accuracy | 0.6293 | 0.6626 | 1.1 Gb | +------------------+---------------------+------+----------------------------------------------------------------------------------------------------+-------------+------------------+-----------------+-----------+ | Sentiment | Twitter mokoron | Ru | RuWiki+Lenta emb w/o preprocessing <classifiers/sentiment_twitter.json>
| Accuracy | 0.9918 | 0.9923 | 5.8 Gb | + +---------------------+ +----------------------------------------------------------------------------------------------------+-------------+------------------+-----------------+-----------+ | | RuSentiment | | Multi-language BERT <classifiers/rusentiment_bert.json>
| F1-weighted | 0.6787 | 0.7005 | 1.3 Gb | + + + +----------------------------------------------------------------------------------------------------+ +------------------+-----------------+-----------+ | | | | Conversational RuBERT <classifiers/rusentiment_convers_bert.json>
| | 0.739 | 0.7724 | 1.5 Gb | + + + +----------------------------------------------------------------------------------------------------+ +------------------+-----------------+-----------+ | | | | Conversational DistilRuBERT-tiny <classifiers/rusentiment_convers_distilrubert_2L.json>
| | 0.703 ± 0.0031 | 0.7348 ± 0.0028 | 690 Mb | + + + +----------------------------------------------------------------------------------------------------+ +------------------+-----------------+-----------+ | | | | Conversational DistilRuBERT-base <classifiers/rusentiment_convers_distilrubert_6L.json>
| | 0.7376 ± 0.0045 | 0.7645 ± 0.035 | 1.0 Gb | +------------------+---------------------+------+----------------------------------------------------------------------------------------------------+-------------+------------------+-----------------+-----------+
As no one had published intent recognition for DSTC-2 data, the comparison of the presented model is given on SNIPS dataset. The evaluation of model scores was conducted in the same way as in1 to compare with the results from the report of the authors of the dataset. The results were achieved with tuning of parameters and embeddings trained on Reddit dataset.
+------------------------+-----------------+------------------+---------------+--------------+--------------+----------------------+------------------------+ | Model | AddToPlaylist | BookRestaurant | GetWheather | PlayMusic | RateBook | SearchCreativeWork | SearchScreeningEvent | +========================+=================+==================+===============+==============+==============+======================+========================+ | api.ai | 0.9931 | 0.9949 | 0.9935 | 0.9811 | 0.9992 | 0.9659 | 0.9801 | +------------------------+-----------------+------------------+---------------+--------------+--------------+----------------------+------------------------+ | ibm.watson | 0.9931 | 0.9950 | 0.9950 | 0.9822 | 0.9996 | 0.9643 | 0.9750 | +------------------------+-----------------+------------------+---------------+--------------+--------------+----------------------+------------------------+ | microsoft.luis | 0.9943 | 0.9935 | 0.9925 | 0.9815 | 0.9988 | 0.9620 | 0.9749 | +------------------------+-----------------+------------------+---------------+--------------+--------------+----------------------+------------------------+ | wit.ai | 0.9877 | 0.9913 | 0.9921 | 0.9766 | 0.9977 | 0.9458 | 0.9673 | +------------------------+-----------------+------------------+---------------+--------------+--------------+----------------------+------------------------+ | snips.ai | 0.9873 | 0.9921 | 0.9939 | 0.9729 | 0.9985 | 0.9455 | 0.9613 | +------------------------+-----------------+------------------+---------------+--------------+--------------+----------------------+------------------------+ | recast.ai | 0.9894 | 0.9943 | 0.9910 | 0.9660 | 0.9981 | 0.9424 | 0.9539 | +------------------------+-----------------+------------------+---------------+--------------+--------------+----------------------+------------------------+ | amazon.lex | 0.9930 | 0.9862 | 0.9825 | 0.9709 | 0.9981 | 0.9427 | 0.9581 | +------------------------+-----------------+------------------+---------------+--------------+--------------+----------------------+------------------------+ +------------------------+-----------------+------------------+---------------+--------------+--------------+----------------------+------------------------+ | Shallow-and-wide CNN | 0.9956 | 0.9973 | 0.9968 | 0.9871 | 0.9998 | 0.9752 | 0.9854 | +------------------------+-----------------+------------------+---------------+--------------+--------------+----------------------+------------------------+
Pipelines that use candidates search in a static dictionary and an ARPA language model to correct spelling errors.
Note
About 4.4 GB on disc required for the Russian language model and about 7 GB for the English one.
Comparison on the test set for the SpellRuEval competition on Automatic Spelling Correction for Russian:
Correction method | Precision | Recall | F-measure | Speed (sentences/s) |
---|---|---|---|---|
Yandex.Speller | 83.09 | 59.86 | 69.59 | 5. |
Damerau Levenshtein 1 + lm<spelling_correction/levenshtein_corrector_ru.json> |
53.26 | 53.74 | 53.50 | 29.3 |
Hunspell + lm | 41.03 | 48.89 | 44.61 | 2.1 |
JamSpell | 44.57 | 35.69 | 39.64 | 136.2 |
Hunspell | 30.30 | 34.02 | 32.06 | 20.3 |
Available pre-trained models for paraphrase identification:
Dataset | Model config | Val (accuracy) | Test (accuracy) | Val (F1) | Test (F1) | Val (log_loss) | Test (log_loss) | Downloads |
---|---|---|---|---|---|---|---|---|
paraphraser.ru | paraphrase_rubert <classifiers/paraphraser_rubert.json> |
|
|
|
|
|
|
1325M |
paraphraser.ru | paraphraser_convers_distilrubert_2L <classifiers/paraphraser_convers_distilrubert_2L.json> |
|
|
81.8 ± 0.2 | 73.9 ± 0.8 |
|
|
618M |
paraphraser.ru | paraphraser_convers_distilrubert_6L <classifiers/paraphraser_convers_distilrubert_6L.json> |
|
|
89.6 ± 0.3 | 83.2 ± 0.5 |
|
|
930M |
References:
- Yu Wu, Wei Wu, Ming Zhou, and Zhoujun Li. 2017. Sequential match network: A new architecture for multi-turn response selection in retrieval-based chatbots. In ACL, pages 372–381. https://www.aclweb.org/anthology/P17-1046
- Xiangyang Zhou, Lu Li, Daxiang Dong, Yi Liu, Ying Chen, Wayne Xin Zhao, Dianhai Yu and Hua Wu. 2018. Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1118-1127, ACL. http://aclweb.org/anthology/P18-1103
- Chongyang Tao, Wei Wu, Can Xu, Wenpeng Hu, Dongyan Zhao, and Rui Yan. Multi-Representation Fusion Network for Multi-turn Response Selection in Retrieval-based Chatbots. In WSDM'19. https://dl.acm.org/citation.cfm?id=3290985
- Gu, Jia-Chen & Ling, Zhen-Hua & Liu, Quan. (2019). Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots. https://arxiv.org/abs/1901.01824
Based on Reading Wikipedia to Answer Open-Domain Questions. The model solves the task of document retrieval for a given query.
+---------------+-------------------------------------------------------------------+----------------------+-----------------+-----------+ | Dataset | Model | Wiki dump | Recall@5 | Downloads | +===============+========================================================+==========+======================+=================+===========+ | SQuAD-v1.1 | doc_retrieval <doc_retrieval/en_ranker_tfidf_wiki.json>
| enwiki (2018-02-11) | 75.6 | 33 GB | +---------------+-------------------------------------------------+-----------------+----------------------+-----------------+-----------+
Models in this section solve the task of looking for an answer on a question in a given context (SQuAD task format). There are two models for this task in DeepPavlov: BERT-based and R-Net. Both models predict answer start and end position in a given context.
BERT-based model is described in BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
RuBERT-based model is described in Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language.
Dataset | Model config | lang | EM (dev) | F-1 (dev) | Downloads |
---|---|---|---|---|---|
SQuAD-v1.1 | DeepPavlov BERT <squad/squad_bert.json> |
|
|
|
|
SQuAD-v2.0 | DeepPavlov BERT <squad/qa_squad2_bert.json> |
|
|
|
|
SDSJ Task B | DeepPavlov RuBERT <squad/squad_ru_bert.json.json> |
|
|
|
|
SDSJ Task B | DeepPavlov RuBERT, trained with tfidf-retrieved negative samples <squad/qa_sberquad2_bert.json> |
|
|
|
|
SDSJ Task B | DeepPavlov DistilRuBERT-tiny <squad/squad_ru_convers_distilrubert_2L.json> |
|
|
|
|
SDSJ Task B | DeepPavlov DistilRuBERT-base <squad/squad_ru_convers_distilrubert_6L.json> |
|
|
|
|
In the case when answer is not necessary present in given context we have qa_squad2_bert <squad/qa_squad2_bert.json>
model. This model outputs empty string in case if there is no answer in context.
Set of pipelines for FAQ task: classifying incoming question into set of known questions and return prepared answer. You can build different pipelines based on: tf-idf, weighted fasttext, cosine similarity, logistic regression.
An open domain question answering skill. The skill accepts free-form questions about the world and outputs an answer based on its Wikipedia knowledge.
Dataset | Model config | Wiki dump | F1 | Downloads |
---|---|---|---|---|
SQuAD-v1.1 | ODQA <odqa/en_odqa_infer_wiki.json> |
enwiki (2018-02-11) |
|
9.7Gb |
SDSJ Task B | ODQA with RuBERT <odqa/ru_odqa_infer_wiki.json> |
ruwiki (2018-04-01) |
|
4.3Gb |
Hyperparameters optimization by cross-validation for DeepPavlov models that requires only some small changes in a config file.
Word vectors for the Russian language trained on joint Russian Wikipedia and Lenta.ru corpora.
Run insults detection model with console interface:
python -m deeppavlov interact insults_kaggle_bert -d
Run insults detection model with REST API:
python -m deeppavlov riseapi insults_kaggle_bert -d
Predict whether it is an insult on every line in a file:
python -m deeppavlov predict insults_kaggle_bert -d --batch-size 15 < /data/in.txt > /data/out.txt