Skip to content

drkbluescience/NGrams

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The completion date is the first half of 2021.

Implemented an n-gram algorithm (for 1,2 and 3-grams of Markov Model), and tested it on a part of Turkish Novel Corpus, which includes 5 novels.

What is N-Grams?

In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech.

N-Grams is a word prediction algorithm using probabilistic methods to predict next item after observing N-1 items. Therefore, computing the probability of the next word is closely related to computing the probability of a sequence of items.

The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus. 

An n-gram of size 1 is referred to as a "unigram"; size 2 is a "bigram" (less commonly, "digram"); size 3 is a "trigram". Larger sizes are sometimes referred to by the value of n, e.g., "four-gram", "five-gram", and so on.

About

Unigram, Bigram, Trigram

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages