The completion date is the first half of 2021.
Implemented an n-gram algorithm (for 1,2 and 3-grams of Markov Model), and tested it on a part of Turkish Novel Corpus, which includes 5 novels.
In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech.
N-Grams is a word prediction algorithm using probabilistic methods to predict next item after observing N-1 items. Therefore, computing the probability of the next word is closely related to computing the probability of a sequence of items.
The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus.
An n-gram of size 1 is referred to as a "unigram"; size 2 is a "bigram" (less commonly, "digram"); size 3 is a "trigram". Larger sizes are sometimes referred to by the value of n, e.g., "four-gram", "five-gram", and so on.