Playground for Russian NLP packages.
Tested under Python 3.9.9 Anaconda miniforge Apple arm (M1)
Install https://pypi.org/project/pymystem3/. Installing the latest version, as described on that page, errors out, but the current stable version works as advertised.
Notes:
- It is not necessary to install MyStem separately; the wrapper package fetches it. If desired, though, a command-line executable for MyStem can be downloading from
https://download.cdn.yandex.net/mystem/mystem-3.1-macosx.tar.gz. Untar
with
tar -xvf mystem-3.1-macosx.tar.gz
, which unpacks an executable calledmystem
. Remove quarantine withxattr -d com.apple.quarantine mystem
and run from the command line. - Downloading the command-line executable failed on 2022-03-12 from the link on the main MyStem page (https://yandex.ru/dev/mystem/), but worked from the direct link above.
- The executable is for Apple Intel, but works under M1.
There are three Russian models (small, medium, large), all trained on news data. Install spaCy (https://spacy.io/) and then the Russian models (https://spacy.io/models/ru).
See also spaCy + Stanza (formerly StanfordNLP) https://github.com/explosion/spacy-stanza, a wrapper for using Stanford models (https://stanfordnlp.github.io/stanfordnlp/) from inside spaCy.
A custom tokenizer for adjusting errata in spaCy's tokenization of Russian: https://github.com/aatimofeev/spacy_russian_tokenizer
Slovnet (https://github.com/natasha/slovnet) is part of the Natasha project (https://github.com/natasha).
DeepPavlov (https://github.com/deepmipt/DeepPavlov) is designed for development of production ready chat-bots and complex conversational systems, research in the area of NLP and, particularly, of dialog systems.
- Red Hen Lab Russian NLP https://www.redhenlab.org/home/the-cognitive-core-research-topics-in-red-hen/the-barnyard/russian-nlp
- Stanford RussianNLP for literature and history https://russiannlp.sites.stanford.edu/resources
- Iuliia Volkova’s Russian NLP links https://github.com/xnuinside/russian-language-nlp
- Tatiana Shavrina’s #xhaustive list of open-source corpora for Russian https://tatianashavrina.github.io/2018/08/30/datasets/