Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
DA.txt		DA.txt
LICENSE		LICENSE
README.md		README.md
bigram_achopra6.pl		bigram_achopra6.pl
lm_achopra6.pl		lm_achopra6.pl
unigram_achopra6.pl		unigram_achopra6.pl
unix_achopra6.txt		unix_achopra6.txt

Repository files navigation

Bigram-Model-Sentence-Probability-Estimation

Construction of a Bigram Model in Prolog from a small Data Corpus and calculation of Probability of a word sequence using the model.

Steps Invoved:

Normalization of the text - 1.1 Removing Capitalization 1.2 Replacing sentence boundary punctuation with '@' 1.3 Removing any other punctuation 1.4 Normalizing contractions like (I’d, she’d, he’d, I’ll, etc.)
Creation of a Prolog Readable Bigram Model
Prolog Program utilizing the Bigram Model for estimation of Probability of any word sequence (even sequences containing words not in the model).
Laplace Smoothing also done for probability estimation.

Note:

Probability calculated is log probability (log base 10)
Linux commands like tr, sed, egrep used for Normalization and Bigram and Unigram model creation.

Data corpus also included in the repository.

Files Included:

'DA.txt' is the Data Corpus
'unix_achopra6.txt' contains the commands for normaliation and bigram model creation
'lm_achopra6.pl' is the prolog code for Probability Estimation.
'bigram_achopra6.pl' and 'unigram_achopra6.pl' are the prolog readable bigram and unigram model created after running the commands from 'unix_achopra6.txt' file

About

Word Sequence probability estimation using Bigram model

probability bigram-model

Report repository

Releases

No releases published

Packages

Languages