Skip to content

branover/word_frequencifizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

word_frequencifizer

Description

This script uses the Yandex Mystem 3.0 morphological analyzer to read in a text file of Russian and output a frequency list after converting all words to their base. Words are output in descending order of frequency, and can be output to a file (in CSV format) for easy import to Anki, or other flashcard programs.

Installation

All you need to do for this to work is install one dependency:

pip install pymystem3

Usage

Usage: python main.py [options]

Options:

  -h, --help            									show this help message and exit
  
  -i FILENAME, --input=FILENAME           Read in from file
                        
  -o OUTFILE, --output=OUTFILE            Output to results to file
                        
  -d DIRECTORY, --directory=DIRECTORY     Read in from every file in directory
                        
  -q , --quantity                         Add the quantity to the word
                        
  -m MINIMUM, --minimum=MINIMUM           Minimum number of occurences before showing a word
                        
  -l LENGTH, --length=LENGTH              Maximum number of results to show (sorted by frequency)
                        
  -e EXCLUDE, --exclude=EXCLUDE           Load file with list of words to exclude from results (will also be converted to root word before comparison)```

About

Takes a text file and creates a word frequency list based on root words (Russian only)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages