# poetry generator

Through a collection of poems by *Robert Frost*, the goal of this notebook is to generate poetry using the Markov model

The `Markov model` that will be used is of first and second order, so the variable will not depend solely on the state preceding it but also on the state preceding its predecessor.

## Markov Model

The Markov model is a class of statistical models used to describe sequences of events where the probability of a subsequent event depends only on the current state and is independent of previous states. In other words, a Markov process satisfies the Markov property, which is known as the property of the future being dependent only on the present.

A Markov process is defined by:

1) `States`: A finite set of possible states (often denoted by letters, numbers, or names).

2) `Transition Matrix`: A matrix representing the transition probabilities between states. Each element (i, j) of the matrix represents the probability of transitioning from state i to state j in a single step.

`Markov chains` are a common type of Markov process in which states and transitions between them are used to model sequential behavior in a wide range of applications.

In [9]:
import numpy as np 
import string

np.random.seed(42)
print("Done")

Done


In [10]:
initial = {}           # start of a phrase
first_order = {}       # first order transiction probabilities for second word 
second_order = {}      # second oreder 

**`RemovePunctuation`:** Function that makes the text more manageable for processing, removing punctuation, converting the text to lowercase, and eliminating possible spaces. Finally, using the `split()` function to separate each individual word into a string.

In [11]:
def clean_text_and_tokenize(text):
    text = text.rstrip().lower()
    return text.translate(str.maketrans('', '', string.punctuation)).split()

**Add2Dict:** Function that checks if the element is present in the passed dictionary; if it is not, it creates the element and finally adds the value passed to the function.

In [12]:
def Add2Dict(dict_, key, value):
    if key not in dict_:
        dict_[key] = []
    dict_[key].append(value)

**List2ProbDict:** Function that calculates the percentage of each element based on its distribution in the passed list. This way, I can assign each element its respective probability in a subsequent step.

In [13]:
def List2ProbDict(word_list):
    # Convert each list of possibilities into a dictionary of probabilities
    word_prob_dict = {}
    total_words = len(word_list)
    
    for word in word_list:
        word_prob_dict[word] = word_prob_dict.get(word, 0.0) + 1
    
    for word, count in word_prob_dict.items():
        word_prob_dict[word] = count / total_words
    
    return word_prob_dict

**Fill Dictionary:** In this loop, I am populating the containers for the first-order Markov model and the second-order Markov model. In the first-order model, `initial`, we will have every word that begins a sentence as a key, with its value being the number of times it appears.

`first_order` is a dictionary, it will have each word that appears in initial as a key, and its value will be a list of all the words that can follow it.

`second_order` is a dictionary where each key is a tuple containing the two words that precede it, the first and second, or the second and third in our case, while its value, as usual, is a list composed of all the possible words that can follow this branch.

In [14]:
for line in open('robert_frost.txt', 'r'):
    words = clean_text_and_tokenize(line)
    num_words = len(words)
    
    for i in range(num_words):
        current_word = words[i]
        
        if i == 0:
            # Measure the distribution of the first word
            initial[current_word] = initial.get(current_word, 0.) + 1
        else:
            previous_word = words[i - 1]
            
            if i == num_words - 1:
                # Measure the probability of ending the line
                Add2Dict(second_order, (previous_word, current_word), 'END')
            
            if i == 1:
                # Measure the distribution of the second word given only the first word
                Add2Dict(first_order, previous_word, current_word)
            else:
                two_words_back = words[i - 2]
                Add2Dict(second_order, (two_words_back, previous_word), current_word)

## normalize the distributions 

In [15]:
# normalize the distributions
initial_total = sum(initial.values())
for word, count in initial.items():
    initial[word] = count / initial_total

In [16]:
for word, prob in first_order.items():
    # Replace list with dictionary of probabilities
    first_order[word] = List2ProbDict(prob)

In [17]:
for key, element_list in second_order.items():
    second_order[key] = List2ProbDict(element_list)

## Sampling and Generation

In [18]:
def SampleWord(probability_dict):
    random_value = np.random.random()  # valore casuale tra 0 e 1 
    cumulative_probability = 0
    for word, probability in probability_dict.items():
        cumulative_probability += probability
        if random_value < cumulative_probability:
            return word
    assert False, "Should never get here"

In [19]:
LINES = 3

def GenerateText():
    for i in range(LINES):  # Generate 3 lines
        sentence = []

        # Initial word
        first_word = SampleWord(initial)
        sentence.append(first_word)

        # Sample second word
        second_word = SampleWord(first_order[first_word])
        sentence.append(second_word)

        # Second-order transitions until END
        while True:
            third_word = SampleWord(second_order[(first_word, second_word)])
            if third_word == 'END':
                break
            sentence.append(third_word)
            first_word = second_word
            second_word = third_word
        print(' '.join(sentence))

In [20]:
GenerateText()

but estelle dont complain shes like him there
ill double theirs for both of them
she hadnt found the fingerbone she wanted


In [21]:
GenerateText()

come straight down off this mountain just as well off is it isnt
one lizard at the kitchen to yourself
it ought to have the job
