Skip to content

StorkST/RussianWordsClusters

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Russian Words Clusters

Russian Words Clusters offers a way to cluster russian words by criterias (by a common stem, by the closeness of vowels or consonants).

For now it supports verbs but was not built for clusterings words that may have different suffixes, as would be a noun and an adjective of a same stem.

It offers options:

  • to merge clusters built on different criterias
  • to input words pairs, useful for clustering verbs and their aspects

The clustering algorithm can be used either with the CLI or with the class's methods.

Simple example: clustering verbs by stem and vowel transformation

Content of file1:

выстрелить
отличать
застрелить
отличить

python3.8 clustering.py --input file1 --criterias STEM TRANS --merge

выстрелить
застрелить
отличать
отличить

Usage

CLI

The CLI clustering.py offers the possibility to cluster words and words pairs.

Classes

Classes in clustering.py can be called to cluster words. As for usage examples, you can refer to the code in the Main part of clustering.py, or to the code contained in the tests folder.

Project has a pip package: https://pypi.org/project/russian-words-clusters/
Once the package installed you can import classes into your Python code using from russianwords.clustering import *.

A more complex example: clustering words pairs

Content of file2:

посещать/посетить
разделять
разбираться/разобраться
выделиться/выделить
изменять/изменить
выделяться/выделять

python3.8 clustering.py --input file2 --criterias STEM TRANS --merge --are-pairs

посещать/посетить
разделять
выделяться/выделять
выделиться/выделить
разбираться/разобраться
изменять/изменить

About

Russian Words Clusters offers a way to cluster russian words by criterias (by a common stem, by the proximity of vowels or consonants).

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published