GitHub - drkbluescience/NGrams: Unigram, Bigram, Trigram

The completion date is the first half of 2021.

Implemented an n-gram algorithm (for 1,2 and 3-grams of Markov Model), and tested it on a part of Turkish Novel Corpus, which includes 5 novels.

What is N-Grams?

In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech.

N-Grams is a word prediction algorithm using probabilistic methods to predict next item after observing N-1 items. Therefore, computing the probability of the next word is closely related to computing the probability of a sequence of items.

The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus.

An n-gram of size 1 is referred to as a "unigram"; size 2 is a "bigram" (less commonly, "digram"); size 3 is a "trigram". Larger sizes are sometimes referred to by the value of n, e.g., "four-gram", "five-gram", and so on.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
WindowsFormsApp1		WindowsFormsApp1
.gitattributes		.gitattributes
.gitignore		.gitignore
NGrams.sln		NGrams.sln
README.md		README.md
output_ss.PNG		output_ss.PNG

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is N-Grams?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What is N-Grams?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages