Skip to content

This project implements Markov analysis for text prediction from a given text file. Utilizes urllib.request to read text file from project gutenberg.

Notifications You must be signed in to change notification settings

bpbirch/markov_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

markov_analysis

This project implements Markov analysis for text prediction from a given text file. Utilizes urllib.request to read text file from project gutenberg. The program works by first gathering a text file of a book from project gutenberg. Words are then stripped of punctuation. A dictionary is then created for that book, with each unique word being a key, and the words that follow it composing a list as that key's value. So if the word 'he' is followed in the book at different times by 'went', 'said', 'will', 'needs', 'went', 'said', 'said', and 'can', then the entry in our dictionary would be wordDic['he'] = ['went', 'said', 'will', 'needs', 'went', 'said', 'said', 'can']. Note that this means that we are essentially using a graph structure here, with individual words being vertices, and edges being drawn to words that follow each individual word in our text. Then, when we predict a sentence, a word is chosen at random from our value list. Since words appear in different frequencies, the probability of any word these words following 'went' is probabilistically chained to how often each word actually follows 'went' in our book. If we wanted to predict a 10-word sentence, and our second word cosen is 'said,' then our next word will be chosen from the dictionary values for the key 'said'. So our sentence becomes 'he said'...some word, up through ten words.

About

This project implements Markov analysis for text prediction from a given text file. Utilizes urllib.request to read text file from project gutenberg.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages