# Markov Chain Sentence Builder
This is a program to build random sentences based on the data with sentences fed into it. This program uses a simple Markov chain that checks at every one and/or two words and/or three words in which the user can choose the number of Markov chains to be applied.

## Import Libraries

In [1]:
import random
from collections import defaultdict

## Load and Process Corpus

In [2]:
def load_training_file(file):
    with open(file) as f:
        raw_sentences = f.read()
        return raw_sentences

def prep_training(raw_sentences):
    raw_sentences = raw_sentences.lower()
    raw_sentences = raw_sentences.strip(",_”“")
    raw_sentences = raw_sentences.replace(' \'', "")    
    raw_sentences = raw_sentences.replace('\'' , "")    
    raw_sentences = raw_sentences.replace(',', "")
    raw_sentences = raw_sentences.replace('"', "")
    raw_sentences = raw_sentences.replace('_', "")
    raw_sentences = raw_sentences.replace('”', "")
    raw_sentences = raw_sentences.replace('“', "")
    corpus = raw_sentences.replace('\n',' ').split()
    return corpus

## Build Markov Models

In [3]:
def map_word_to_word(corpus):
    limit = len(corpus) - 1
    dict1_to_1 = defaultdict(list)
    for index, word in enumerate(corpus):
        if index < limit:
            suffix = corpus[index + 1]
            dict1_to_1[word].append(suffix)
    return dict1_to_1

def map_2_words_to_word(corpus):
    limit = len(corpus) - 2
    dict2_to_1 = defaultdict(list)
    for index, word in enumerate(corpus):
        if index < limit:
            key = word + ' ' + corpus[index + 1]
            suffix = corpus[index + 2]
            dict2_to_1[key].append(suffix)
    return dict2_to_1

def map_3_words_to_word(corpus):
    limit = len(corpus) - 3
    dict3_to_1 = defaultdict(list)
    for index, word in enumerate(corpus):
        if index < limit:
            key = word + ' ' + corpus[index + 1] + ' ' + corpus[index + 2]
            suffix = corpus[index + 3]
            dict3_to_1[key].append(suffix)
    return dict3_to_1

## Select Random Seed

In [4]:
def random_word(corpus):            
    seed = input("Enter a word to start a sentence: ")
    if seed in corpus:
        word = seed
    else:
        word = None
        print("Try another word as a seed that exists in the corpus used.")      
    return word

## Apply the Markov Models

In [5]:
def word_after_single(prefix, suffix_map_1):
    accepted_words = []
    suffixes = suffix_map_1.get(prefix)
    if suffixes != None:
        for candidate in suffixes:
            accepted_words.append(candidate)
    return accepted_words

def  word_after_double(prefix, suffix_map_2):
    accepted_words = []
    suffixes = suffix_map_2.get(prefix)
    if suffixes != None:
        for candidate in suffixes:
            accepted_words.append(candidate)
    return accepted_words

def  word_after_triple(prefix, suffix_map_3):
    accepted_words = []
    suffixes = suffix_map_3.get(prefix)
    if suffixes != None:
        for candidate in suffixes:
            accepted_words.append(candidate)
    return accepted_words

## Build a Sentence

In [6]:
def sentence_builder(suffix_map_1, suffix_map_2, suffix_map_3, corpus):
    final_sentence = ""
    try:
        number_of_sentences = int(input("How many sentences do you want? "))
        number_of_markov_chains = int(input("Choose 1, 2, or 3 Markov chains to be applied. How many Markov chains would you want to apply? "))
    except:
        print("You entered something other than integers. Enter only integers.")
        return final_sentence
    stop_characters = [".",":","!","?"]
    current_sentence = []
    word = random_word(corpus)    
    keep_building = True
    if word != None:
        current_sentence.append(word)         
        for i in range(number_of_sentences):
            keep_building = True      
            while keep_building == True:
                if number_of_markov_chains == 1:
                    word_choices = word_after_single(word, suffix_map_1)
                    word = random.choice(word_choices)
                    current_sentence.append(word)
                    if any(character in word[-1] for character in stop_characters):
                        keep_building = False
                        break
                elif number_of_markov_chains == 2:
                    try:
                        if len(current_sentence) >= 2:
                            prefix = current_sentence[-2] + ' ' + current_sentence[-1]
                            word_choices = word_after_double(prefix, suffix_map_2)
                            word = random.choice(word_choices)
                            current_sentence.append(word)
                            if any(character in word[-1] for character in stop_characters):
                                keep_building = False
                                break
                        elif len(current_sentence) <= 1:
                            word_choices = word_after_single(word, suffix_map_1)
                            word = random.choice(word_choices)
                            current_sentence.append(word)
                            if any(character in word[-1] for character in stop_characters):
                                keep_building = False
                                break
                    except:
                        word_choices = word_after_single(word, suffix_map_1)
                        word = random.choice(word_choices)
                        current_sentence.append(word)
                        if any(character in word[-1] for character in stop_characters):
                            keep_building = False
                            break
                        prefix = current_sentence[-2] + ' ' + current_sentence[-1]
                        word_choices = word_after_double(prefix, suffix_map_2)
                        word = random.choice(word_choices)
                        current_sentence.append(word)
                        if any(character in word[-1] for character in stop_characters):
                            keep_building = False
                            break
                elif number_of_markov_chains == 3:
                    try:
                        if len(current_sentence) >= 3:
                            prefix = current_sentence[-2] + ' ' + current_sentence[-1]
                            word_choices = word_after_double(prefix, suffix_map_2)
                            word = random.choice(word_choices)
                            current_sentence.append(word)
                            if any(character in word[-1] for character in stop_characters):
                                keep_building = False
                                break
                            prefix = current_sentence[-3] + ' ' + current_sentence[-2] + ' ' + current_sentence[-1]
                            word_choices = word_after_triple(prefix, suffix_map_3)
                            word = random.choice(word_choices)
                            current_sentence.append(word)
                            if any(character in word[-1] for character in stop_characters):
                                keep_building = False
                                break
                        elif len(current_sentence) == 2:
                            prefix = current_sentence[-2] + ' ' + current_sentence[-1]
                            word_choices = word_after_double(prefix, suffix_map_2)
                            word = random.choice(word_choices)
                            current_sentence.append(word)
                            if any(character in word[-1] for character in stop_characters):
                                keep_building = False
                                break
                        elif len(current_sentence) <= 1:
                            word_choices = word_after_single(word, suffix_map_1)
                            word = random.choice(word_choices)
                            current_sentence.append(word)
                            if any(character in word[-1] for character in stop_characters):
                                keep_building = False
                                break
                    except:
                        word_choices = word_after_single(word, suffix_map_1)
                        word = random.choice(word_choices)
                        current_sentence.append(word)
                        if any(character in word[-1] for character in stop_characters):
                            keep_building = False
                            break
                        prefix = current_sentence[-2] + ' ' + current_sentence[-1]
                        word_choices = word_after_double(prefix, suffix_map_2)
                        word = random.choice(word_choices)
                        current_sentence.append(word)
                        if any(character in word[-1] for character in stop_characters):
                            keep_building = False
                            break
                        prefix = current_sentence[-3] + ' ' + current_sentence[-2] + ' ' + current_sentence[-1]
                        word_choices = word_after_triple(prefix, suffix_map_3)
                        word = random.choice(word_choices)
                        current_sentence.append(word)
                        if any(character in word[-1] for character in stop_characters):
                            keep_building = False
                            break
                else:
                    print("You entered an integer of Markov chains either less than 1 or more than 3, which are not available options to choose. Please only choose 1, 2, or 3 Markov chains to be applied.")
    else:
        pass
    for i in current_sentence:
        if final_sentence == "":
            final_sentence = final_sentence + i
        else:
            final_sentence = final_sentence + ' ' + i
    return final_sentence

## Code to Generate Random Sentences

In [7]:
raw_sentences = load_training_file("Frankenstein.txt")
corpus = prep_training(raw_sentences)
suffix_map_1 = map_word_to_word(corpus)
suffix_map_2 = map_2_words_to_word(corpus)
suffix_map_3 = map_3_words_to_word(corpus)

In [8]:
print(sentence_builder(suffix_map_1, suffix_map_2, suffix_map_3, corpus))

How many sentences do you want? 30
Choose 1, 2, or 3 Markov chains to be applied. How many Markov chains would you want to apply? 1
Enter a word to start a sentence: the
the tree shattered spirit had been made a pearly whiteness; but a sensitive and three weeks; and my promise with the others he endeavours to his prize-money the agony of agitation; then assuredly take a glorious for me from the duty towards him to fly from the blood boils at my first made for the spoiler has ever been drowned and more obvious laws of the poor medium for another means of the house at secheron a burning hatred and explained to me make us that moment i sought my hand feebly and examine. good will be perfect forms of it moved by and then i alighted and godlike in their conversation. i have feared that we resided principally in the projects but unlike his soul. my bosom. i uncovered it for all my prejudices unsoftened by a deep wood on one feeling of certain dignity in a feeble voice became more terrible re

In [9]:
print(sentence_builder(suffix_map_1, suffix_map_2, suffix_map_3, corpus))

How many sentences do you want? 30
Choose 1, 2, or 3 Markov chains to be applied. How many Markov chains would you want to apply? 2
Enter a word to start a sentence: the
the trees formed a scene terrifically desolate. in a few jewels to have feared the effects of the room which had that night at evian and continuing our voyage on the foe. he spoke i drew near a close and in the gloom and misery were strongly expressed. sometimes she struggled with her foster parents and bloomed in their most terrific guise. still as i spoke of penury in its series internal evidence of facts a weight of anguish that was spoken. while i watched the tempest so beautiful yet terrific i wandered towards these mountains i felt as if to seek amusement in society. i abhorred the face of the country town where the corpse lay and was overcome by gloom and he hoped that with which i have already reached a very confused knowledge of which haunted me. i continued to utter exclamations of grief commences. yet from w

In [10]:
print(sentence_builder(suffix_map_1, suffix_map_2, suffix_map_3, corpus))

How many sentences do you want? 30
Choose 1, 2, or 3 Markov chains to be applied. How many Markov chains would you want to apply? 3
Enter a word to start a sentence: the
the cause of all this? do not destroy the saviour of his child? nay these are virtuous and immaculate beings! i the creator who would believe me to life. soon after this he inquired if i thought of my aunt; grief had given me life? among the earliest sensations i can make you so wretched a condition. we attempted to carry him into the ocean. i walked i sought for the latter town in a fit. poor clerval! what must have been and are certainly possessed of dauntless courage. but i fear from what you have yourself described to be his properties that this will prove impracticable; and thus i was a mere skeleton and fever night and day preyed upon my wasted frame. still as i urged our leaving ireland with such inquietude and impatience my father discovered his abode. overjoyed at this discovery he hastened to paris and delive