# Markov Chain Sentence Builder
This is a program to build random sentences based on the data with sentences fed into it. This program uses a simple Markov chain that checks at every one and/or two words and/or three words in which the user can choose the number of Markov chains to be applied.

## Import Libraries

In [1]:
import random
from collections import defaultdict

## Load and Process Corpus

In [2]:
def load_training_file(file):
    with open(file) as f:
        raw_sentences = f.read()
        return raw_sentences

def prep_training(raw_sentences):
    raw_sentences = raw_sentences.lower()
    raw_sentences = raw_sentences.strip(",_”“")
    raw_sentences = raw_sentences.replace('"', "")
    raw_sentences = raw_sentences.replace('_', "")
    raw_sentences = raw_sentences.replace('”', "")
    raw_sentences = raw_sentences.replace('“', "")
    corpus = raw_sentences.replace('\n',' ').split()
    return corpus

## Build Markov Models

In [3]:
def map_word_to_word(corpus):
    limit = len(corpus) - 1
    dict1_to_1 = defaultdict(list)
    for index, word in enumerate(corpus):
        if index < limit:
            suffix = corpus[index + 1]
            dict1_to_1[word].append(suffix)
    return dict1_to_1

def map_2_words_to_word(corpus):
    limit = len(corpus) - 2
    dict2_to_1 = defaultdict(list)
    for index, word in enumerate(corpus):
        if index < limit:
            key = word + ' ' + corpus[index + 1]
            suffix = corpus[index + 2]
            dict2_to_1[key].append(suffix)
    return dict2_to_1

def map_3_words_to_word(corpus):
    limit = len(corpus) - 3
    dict3_to_1 = defaultdict(list)
    for index, word in enumerate(corpus):
        if index < limit:
            key = word + ' ' + corpus[index + 1] + ' ' + corpus[index + 2]
            suffix = corpus[index + 3]
            dict3_to_1[key].append(suffix)
    return dict3_to_1

## Select Random Seed

In [4]:
def random_word(corpus):            
    seed = input("Enter a word to start a sentence: ")
    if seed in corpus:
        word = seed
    else:
        word = None
        print("Try another word as a seed that exists in the corpus used.")      
    return word

## Apply the Markov Models

In [5]:
def word_after_single(prefix, suffix_map_1):
    accepted_words = []
    suffixes = suffix_map_1.get(prefix)
    if suffixes != None:
        for candidate in suffixes:
            accepted_words.append(candidate)
    return accepted_words

def  word_after_double(prefix, suffix_map_2):
    accepted_words = []
    suffixes = suffix_map_2.get(prefix)
    if suffixes != None:
        for candidate in suffixes:
            accepted_words.append(candidate)
    return accepted_words

def  word_after_triple(prefix, suffix_map_3):
    accepted_words = []
    suffixes = suffix_map_3.get(prefix)
    if suffixes != None:
        for candidate in suffixes:
            accepted_words.append(candidate)
    return accepted_words

## Build a Sentence

In [6]:
def sentence_builder(suffix_map_1, suffix_map_2, suffix_map_3, corpus):
    final_sentence = ""
    try:
        number_of_sentences = int(input("How many sentences do you want? "))
        number_of_markov_chains = int(input("Choose 1, 2, or 3 Markov chains to be applied. How many Markov chains would you want to apply? "))
    except:
        print("You entered something other than integers. Enter only integers.")
        return final_sentence
    stop_characters = [".",":","!","?"]
    current_sentence = []
    word = random_word(corpus)    
    keep_building = True
    if word != None:
        current_sentence.append(word)         
        for i in range(number_of_sentences):
            keep_building = True      
            while keep_building == True:
                if number_of_markov_chains == 1:
                    word_choices = word_after_single(word, suffix_map_1)
                    word = random.choice(word_choices)
                    current_sentence.append(word)
                    if any(character in word[-1] for character in stop_characters):
                        keep_building = False
                        break
                elif number_of_markov_chains == 2:
                    try:
                        if len(current_sentence) >= 2:
                            prefix = current_sentence[-2] + ' ' + current_sentence[-1]
                            word_choices = word_after_double(prefix, suffix_map_2)
                            word = random.choice(word_choices)
                            current_sentence.append(word)
                            if any(character in word[-1] for character in stop_characters):
                                keep_building = False
                                break
                        elif len(current_sentence) <= 1:
                            word_choices = word_after_single(word, suffix_map_1)
                            word = random.choice(word_choices)
                            current_sentence.append(word)
                            if any(character in word[-1] for character in stop_characters):
                                keep_building = False
                                break
                    except:
                        word_choices = word_after_single(word, suffix_map_1)
                        word = random.choice(word_choices)
                        current_sentence.append(word)
                        if any(character in word[-1] for character in stop_characters):
                            keep_building = False
                            break
                        prefix = current_sentence[-2] + ' ' + current_sentence[-1]
                        word_choices = word_after_double(prefix, suffix_map_2)
                        word = random.choice(word_choices)
                        current_sentence.append(word)
                        if any(character in word[-1] for character in stop_characters):
                            keep_building = False
                            break
                elif number_of_markov_chains == 3:
                    try:
                        if len(current_sentence) >= 3:
                            prefix = current_sentence[-2] + ' ' + current_sentence[-1]
                            word_choices = word_after_double(prefix, suffix_map_2)
                            word = random.choice(word_choices)
                            current_sentence.append(word)
                            if any(character in word[-1] for character in stop_characters):
                                keep_building = False
                                break
                            prefix = current_sentence[-3] + ' ' + current_sentence[-2] + ' ' + current_sentence[-1]
                            word_choices = word_after_triple(prefix, suffix_map_3)
                            word = random.choice(word_choices)
                            current_sentence.append(word)
                            if any(character in word[-1] for character in stop_characters):
                                keep_building = False
                                break
                        elif len(current_sentence) == 2:
                            prefix = current_sentence[-2] + ' ' + current_sentence[-1]
                            word_choices = word_after_double(prefix, suffix_map_2)
                            word = random.choice(word_choices)
                            current_sentence.append(word)
                            if any(character in word[-1] for character in stop_characters):
                                keep_building = False
                                break
                        elif len(current_sentence) <= 1:
                            word_choices = word_after_single(word, suffix_map_1)
                            word = random.choice(word_choices)
                            current_sentence.append(word)
                            if any(character in word[-1] for character in stop_characters):
                                keep_building = False
                                break
                    except:
                        word_choices = word_after_single(word, suffix_map_1)
                        word = random.choice(word_choices)
                        current_sentence.append(word)
                        if any(character in word[-1] for character in stop_characters):
                            keep_building = False
                            break
                        prefix = current_sentence[-2] + ' ' + current_sentence[-1]
                        word_choices = word_after_double(prefix, suffix_map_2)
                        word = random.choice(word_choices)
                        current_sentence.append(word)
                        if any(character in word[-1] for character in stop_characters):
                            keep_building = False
                            break
                        prefix = current_sentence[-3] + ' ' + current_sentence[-2] + ' ' + current_sentence[-1]
                        word_choices = word_after_triple(prefix, suffix_map_3)
                        word = random.choice(word_choices)
                        current_sentence.append(word)
                        if any(character in word[-1] for character in stop_characters):
                            keep_building = False
                            break
                else:
                    print("You entered an integer of Markov chains either less than 1 or more than 3, which are not available options to choose. Please only choose 1, 2, or 3 Markov chains to be applied.")
    else:
        pass
    for i in current_sentence:
        if final_sentence == "":
            final_sentence = final_sentence + i
        else:
            final_sentence = final_sentence + ' ' + i
    return final_sentence

## Code to Generate Random Sentences

In [7]:
raw_sentences = load_training_file("Frankenstein.txt")
corpus = prep_training(raw_sentences)
suffix_map_1 = map_word_to_word(corpus)
suffix_map_2 = map_2_words_to_word(corpus)
suffix_map_3 = map_3_words_to_word(corpus)

In [8]:
print(sentence_builder(suffix_map_1, suffix_map_2, suffix_map_3, corpus))

How many sentences do you want? 30
Choose 1, 2, or 3 Markov chains to be applied. How many Markov chains would you want to apply? 2
Enter a word to start a sentence: the
the company of strangers. such were the sight of her, and she quitted our house; and so was disposed to set a greater appearance of courage. when she again lived in daily fear lest my fire should be united to his work, and after an interval reconciled me to record your words which agatha endeavoured to change my course southwards. this speech convinced my father that my disorder indeed owed its origin to some degree consoled. she indeed gained the friendship of the weather; my imagination as the lion rends the antelope. but my heart yearned to be a creature capable of observing outward objects with any kind of conversation, although i could not refuse. we were affectionate playfellows during childhood, and, i believe, suffered the same spot from which a naval adventurer might derive the greatest attention to it. as i a

In [9]:
print(sentence_builder(suffix_map_1, suffix_map_2, suffix_map_3, corpus))

How many sentences do you want? 30
Choose 1, 2, or 3 Markov chains to be applied. How many Markov chains would you want to apply? 3
Enter a word to start a sentence: the
the scenes of home so dear to me, now, in my feelings: he exerted himself to have possessed while he associated with the lighthearted gaiety of boyhood. the very remembrance of us both will speedily vanish. i shall quit your vessel on its course and wrecked it—thus! i am inexorable. it is impossible to return to your families with the greatest avidity. when i was about five in the morning to labour; but in giving an account of the conclusions i had desired it with an air of divine benignity to one so utterly occupied by far other ideas than those of summer were already in bud. i was impatient to arrive at once at the summit of the kitchen stove. by slow degrees he recovered and ate a little soup, which restored him wonderfully. two days passed in the same reason that ariosto gives concerning the probabilities by which 

In [10]:
print(sentence_builder(suffix_map_1, suffix_map_2, suffix_map_3, corpus))

How many sentences do you want? 30
Choose 1, 2, or 3 Markov chains to be applied. How many Markov chains would you want to apply? 1
Enter a word to start a sentence: the
the lessons of combustibles around me the ancient mariner. you smile at other familiarly by a cabriolet, and expressed all a never-ending source in the death of the inmate of scene of money, on their idol, and by the fare was, all that night on his own situation. the standard of her tears, but very poignantly, especially as omnipotence—and i knew that while we shall quit the traces of the tale and the hope which i saw a word from the prospect of deformity. as i will you and a stranger. he could i dare not to a small possession of elizabeth i felt a student’s thirst for some mode unknown powers, and sometimes he did my ashes this night of happiness that of agrippa and i respect and the structure with groans. how happy and i should load my dear elizabeth! i discovered on the sun, which she looked upon it. by stones conti