This script creates a list of unique words from Persian text. Words are sorted by the frequency that they appear in the source.txt file. This is a new project, there could be major bugs in the code. Words with accent marks are excluded from results.
-
sort by frequency or alphabetical order
-
extract words from source.txt or online links
- Create a file named 'source.txt' in root directory and paste source text inside.
- Run 'main.py'
- Follow CLI instructions.
- Results will be written to 'output.txt' in root directory.
Feel free to tweak the code to suit your needs.
I ran this script on a large body of Persian text to extract words for contribution to Monkeytype. I added the "Persian 1k" & "Persian 5k" tests. My first open-source contribution!!