Statistical Language Modeling Using N-grams

Course Project for MA 202 [Probability and Statistics]

Abstract

Using Natural Language Processing, the model predicts the most probable next word and outputs the correctness of an input English sentence. To achieve the optimum accuracy, a large reliable dataset or corpus is extracted from Wikipedia, preprocessed, and then analyzed before using it to train the model. Analyzing the dataset and its visualization can be an insightful technique to understand the corpus before using it for the model's training. Choosing an appropriate model for any problem is a crucial step. In our case, using a trigram model to train the data proved to be the best trade-off. This trained model is finally used in the code to predict the next word and find the perplexity of a given sentence based on the trigram model.

Problem Statement

Computers were once thought of as “dumb terminals,” and human interactions were based on the principle of “garbage in, garbage out.” Computers could only communicate in sophisticated hand-coded rules. Natural Language Processing bridges the gap between humans and computers by enabling humans to interact with computers in human-developed languages. It can have various use-cases such as voice assistants, speech recognition, computer-assisted coding, and word & sentence prediction. The boundless possibilities in NLP, yet to be explored, motivate us to work in this field.

Interface

Link to the Interface

Project Report

Code

Requirements

nltk

License

The code is licenced under the MIT license and free to use by anyone without any restrictions.

Created with ❤️ by Mumuksh Tayal

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
LICENSE		LICENSE
Project_Code.ipynb		Project_Code.ipynb
README.md		README.md
Report.pdf		Report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Statistical Language Modeling Using N-grams

Course Project for MA 202 [Probability and Statistics]

Abstract

Problem Statement

Interface

Project Report

Code

Requirements

License

About

Releases

Packages

Languages

License

MumukshTayal/Statistical-Language-Modeling-N-grams

Folders and files

Latest commit

History

Repository files navigation

Statistical Language Modeling Using N-grams

Course Project for MA 202 [Probability and Statistics]

Abstract

Problem Statement

Interface

Project Report

Code

Requirements

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages