Skip to content

ddasturbek/UzbekLemma

Repository files navigation

Authors

Author1: Maksudbek

Author2: Dasturbek

Lemma & Lemmatization

The package finds lemmas of Uzbek words based on the dictionary.

The process of finding a lemma is called lemmatization.

There are 4 different ways of lemmatization: rule, dictionary, model, hybrid.

It is dictionary-based lemmatization algorithm [program, package].

Install & Clone

pip install UzbekLemma
git clone https://github.com/ddasturbek/UzbekLemma.git

Usage

import UzbekLemma as UL

print(UL.lemmatize("kelganlar")) #kelmoq

The algorithm flowchart

Flowchart algorithm

The dictionary structure

soz_turkumlari

Scientific field

Certificate

Patent

image

Some results of the program

image

Corpus & Results

We collected an equal number of texts from 23 different fields and stored them as a corpus.

We tested all the files (i.e. corpora) in the program and got these results.

About

Finds the lemma of Uzbek words

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages