Our project is a typing prediction program that combines trigram and bigram probabilistic models to suggest words based on user input. This program contains two components written in python—model training and graphical user interface (GUI). The model training component receives a raw text file of well-formed sentences for training data. Sentences are first split into words, then special characters/symbols are removed with regex. In a sequential manner, the occurrence of each word is mapped to their immediate predecessor or two predecessors in python dictionaries to construct the bigram and trigram models, respectively. Both models are then trimmed to reduce size, where all entries except for the three highest occurrences are pruned from the model. This also serves to reduce noise in the model, since we are not processing the data on a sentence-by-sentence basis. Trimmed models are saved as JSON files, which would enable features such as using pre-trained models. The GUI component contains two primary sections. Users are expected to type in the text input field. As words are typed in, the program reads the user’s most recently typed one or two words and accesses the bigram and trigram models loaded from the aforementioned JSON files once the user presses the spacebar. Special characters are trimmed from the user’s typed words before accessing the respective models for the three most likely words (a.k.a. words with the highest occurrences) that follow. The suggested words are displayed as clickable labels on the bottom of the interface, where the user could click the respective label to enter the word into their text.
- Python 3.5+
- tkinter
- numpy
- json
- re
python typepred.py
The program loads pre-built models from trigram_model.json and bigram_model.json and starts the GUI application automatically.
python typepredBI.py
The program loads the pre-built model from bigram_model.json and starts the GUI application automatically.
python train.py
The program reads training data from the file input.txt and creates trigram_model.json and bigram_model.json.
python trainBI.py
The program reads training data from the file input.txt and creates bigram_model.json.
Training data is stored in the input.txt file. No specific requirements, preferably well formed natural sentences.
- Unzip this zip file
- Copy the contents of final/en_US/en_US.news.txt
- Paste the contents into input.txt
- Run train.py or trainBI.py