## Working with text files

Before we get started, we'll first need some text! Grab two [plain text files from Project Gutenberg](http://www.gutenberg.org/) (or from another source of your choice) and save them to the same directory as this notebook. (I suggest working with two files because we'll be running some code explicitly to "compare" two texts. Also, I think seeing two different outputs from the text generation methods discussed in this notebook will help you better understand how those methods work.) The code in the following cell loads into Python variables the contents of *two plain text files*, assigned to variables `text_a` and `text_b`. You'll need to replace the filenames with the names of the files that you downloaded, keeping the quotation marks (`"`) intact.

In [241]:
text_a = open("pg12116.txt").read().split("\n")
text_b = open("rupikaur.txt").read().split("\n")

In [242]:
text_a = [item.strip(".!?-:;, ") for item in text_a]
text_b = [item.strip(".!?-:;, ") for item in text_b]

In [142]:
text_a

['',
 '',
 'Merry Stories and',
 'Funny Pictures',
 '',
 '',
 'When the children have been good',
 'That is, be it understood',
 'Good at meal-times, good at play',
 'Good all night and good all day',
 'They shall have the pretty things',
 'Merry Christmas always brings',
 '',
 'Naughty, romping girls and boys',
 'Tear their clothes and make a noise',
 'Spoil their pinafores and frocks',
 'And deserve no Christmas-box',
 'Such as these shall never look',
 'At this pretty Picture-book',
 '',
 '',
 '',
 '',
 'Shock-headed Peter',
 '',
 '',
 'Just look at him! there he stands',
 'With his nasty hair and hands',
 'See! his nails are never cut',
 'They are grimed as black as soot',
 'And the sloven, I declare',
 'Never once has combed his hair',
 'Anything to me is sweeter',
 'Than to see Shock-headed Peter',
 '',
 '',
 '',
 '',
 'Cruel Frederick',
 '',
 '',
 'Here is cruel Frederick, see',
 'A horrid wicked boy was he',
 'He caught the flies, poor little things',
 'And then tore off their 

In [243]:
text_b

['',
 '',
 'hurting',
 '',
 '',
 '',
 'how is it so easy for you',
 'to be kind to people he asked',
 '',
 'milk and honey dripped',
 'from my lips as i answered',
 '',
 'cause people have not',
 'been kind to me',
 '',
 '',
 '',
 'the first boy that kissed me',
 'held my shoulders down',
 'like the handlebars of',
 'the first bicycle',
 'he ever rode',
 'i was five',
 '',
 'he had the smell of',
 '',
 'starvation on his lips',
 '',
 'which he picked up from',
 '',
 'his father feasting on his mother at 4 a.m',
 '',
 'he was the first boy',
 'to teach me my body was',
 'for giving to those that wanted',
 'that i should feel anything',
 'less than whole',
 '',
 'and my god',
 '',
 'did i feel as empty',
 '',
 'as his mother at 4:25 a.m',
 '',
 '',
 '',
 "have Ix'c n",
 'isus&rhE your liegs',
 'ii.ru: b pis Mop Ibr men',
 'Ifml ix-ud a place? Jo rest',
 'sl \\ atan( body empty enough',
 'for guests bttE rnn one',
 'ever tomes ;md i*',
 '',
 '\\ will its" io /',
 '',
 '',
 '',
 'it is you

In [146]:
a_words = text_a.split()
b_words = text_b.split()

AttributeError: 'list' object has no attribute 'split'

In [147]:
import random
import pronouncing
import markovify

In [148]:
reversed_by_word = [" ".join(list(reversed(item.split(" ")))) for item in text_a]

In [149]:
rhyme_reversed_by_word = "\n".join(reversed_by_word)

In [150]:
reversed_model = markovify.NewlineText(rhyme_reversed_by_word, state_size=1)

In [151]:
reversed_model.make_sentence()

'river the To'

In [152]:
" ".join(list(reversed(reversed_model.make_sentence().split(" "))))

'And laughs to see the cottage there'

In [153]:
" ".join(list(reversed(reversed_model.make_sentence(init_state=('good',)).split())))

'Through the good'

In [154]:
end_words = [item.split()[0] for item in reversed_by_word if len(item) > 0]

In [155]:
# find rhyming words for word, limited to words in list other_wirds
def find_rhyme_in(word, other_words):
    rhymes = []
    phones = pronouncing.phones_for_word(word)
    # if there are no pronunciations for this word, return empty
    if len(phones) == 0:
        return []
    # get the "rhyming part" of the list of phones
    word_rhyme = pronouncing.rhyming_part(phones[0])
    # for each of the words in the other_words list...
    # (TODO: optimize this for really big corpora)
    for item in other_words:
        phones = pronouncing.phones_for_word(item)
        if len(phones) == 0:
            continue
        # check to see if its rhyming part is the same as the word
        if pronouncing.rhyming_part(phones[0]) == word_rhyme:
            rhymes.append(item)
    return list(set(rhymes)) # remove duplicates

In [156]:
find_rhyme_in(random.choice(end_words), end_words)

['claws', 'paws']

In [157]:
end_words_with_rhymes = {}
for i, item in enumerate(end_words):
    rhymes = find_rhyme_in(item, end_words)
    if len(rhymes) >= 2:
        end_words_with_rhymes[item] = rhymes

In [158]:
random.sample(a_words, 10)

['her',
 'will',
 'nigh!"',
 'stairs;',
 'say:',
 'and',
 'legs',
 'and,',
 'Hare!"',
 'at']

In [225]:
random.sample(b_words, 10)

['me',
 'too',
 'you',
 'confession',
 'of',
 'the',
 'turn',
 'thing',
 'knowing',
 'the']

In [160]:
end_words_with_rhymes['head']

['thread', 'bed', 'Ned', 'red', 'bled', 'said', 'head']

In [161]:
import sys
!{sys.executable} -m pip install markovify



And then run this cell to make the library available in your notebook:

In [162]:
import markovify

In [163]:
generator_a = markovify.Text(text_a)

In [164]:
print(generator_a.make_sentence())

None


In [165]:
print(generator_a.make_short_sentence(50))

They have been good


In [166]:
print(generator_a.make_short_sentence(40, tries=100))

He's like a little gentleman


In [167]:
print(generator_a.make_short_sentence(40, test_output=False))

Once, with head as high as ever


In [168]:
gen_a_1 = markovify.Text(text_a, state_size=1)
gen_a_4 = markovify.Text(text_a, state_size=4)

In [169]:
print("order 1")
print(gen_a_1.make_sentence(test_output=False))
print()
print("order 4")
print(gen_a_4.make_sentence(test_output=False))

order 1
And stretch their tiny wings

order 4
I must go out and leave you here


In [170]:
class SentencesByChar(markovify.Text):
    def word_split(self, sentence):
        return list(sentence)
    def word_join(self, words):
        return "".join(words)

In [171]:
con_model = SentencesByChar("condescendences", state_size=2)

In [172]:
gen_a_char = SentencesByChar(text_a, state_size=7)

In [173]:
print(gen_a_char.make_sentence(test_output=False).replace("\n", " "))

And then, I declare


In [244]:
generator_a = markovify.Text(text_a)
generator_b = markovify.Text(text_b)
combo = markovify.combine([generator_a, generator_b], [0.8, 0.2])

The bit of code `[0.5, 0.5]` controls the "weights" of the models, i.e., how much to emphasize the probabilities of any model. You can change this to suit your tastes. (E.g., if you want mostly text A with but a *soupçon* of text B, you would write `[0.9, 0.1]`. Try it!) 

Then you can create sentences using the combined model:

In [227]:
print(combo.make_sentence())

thai will be the most beautiful thing i’d ever meet, how i’d spend the rest of us wants to leave


In [322]:
# change to "word" for a word-level model
level = "word"
# controls the length of the n-gram
order = 7
# controls the number of lines to output
output_n = 5
# weights between the models; text A first, text B second.
# if you want to completely exclude one model, set its corresponding value to 0
weights = [0.6, 0.4]
# limit sentence output to this number of characters
length_limit = 200

In [311]:
first = random.choice(list(end_words_with_rhymes.keys()))

In [312]:
end_words_with_rhymes[first]

['hark', 'mark']

In [313]:
 print(" ".join(list(reversed(reversed_model.make_sentence(init_state=( first,)).split()))))

And left her mark


In [314]:
second= random.choice(end_words_with_rhymes[first])

In [315]:
print(" ".join(list(reversed(reversed_model.make_sentence(init_state=(second,)).split()))))

What a little hare stopped short, took aim and, hark


In [206]:
model_cls = markovify.Text if level == "word" else SentencesByChar
gen_a = model_cls(text_a, state_size=order)
gen_b = model_cls(text_b, state_size=order)
gen_combo = markovify.combine([gen_a, gen_b], weights)
for i in range(output_n):
    out = gen_combo.make_short_sentence(length_limit, test_output=False)
    out2 = gen_combo.make_short_sentence(length_limit, test_output=False)
    #first= random.choice(end_words)
    first = random.choice(list(end_words_with_rhymes.keys()))
    second= random.choice(end_words_with_rhymes[first])
    print(out)
    print(" ".join(list(reversed(reversed_model.make_sentence(init_state=( first,)).split()))))
    print(out2)
    print(" ".join(list(reversed(reversed_model.make_sentence(init_state=(second,)).split()))))
    out = out.replace("\n", " ")
    print()

if one person is lying there not doing anything
And tilts up a knife, and wild
i struggle so deeply
And ere they are off at last he fell and wild

And his hat flew on before him
And the gun--she missed her toes
it means there is beauty rooted
And the picture shows

my heartbeat quickens at
All good at him, now she's trying all alone
you have helped me
That both his hat flew open, in an angry tone

at the end of the day all this
This is able
skin the color of earth
Next day, he is Philip, where is able

Each popped out his little head
Bob was high
Just look at him! there he stands
Such as high



In [212]:
model_cls = markovify.Text if level == "word" else SentencesByChar
gen_a = model_cls(text_a, state_size=order)
gen_b = model_cls(text_b, state_size=order)
gen_combo = markovify.combine([gen_a, gen_b], weights)
for i in range(output_n):
    out = gen_combo.make_short_sentence(length_limit, test_output=False)
    out2 = gen_combo.make_short_sentence(length_limit, test_output=False)
    #first= random.choice(end_words)
    first = random.choice(list(end_words_with_rhymes.keys()))
    second= random.choice(end_words_with_rhymes[first])
    print(out)
    print(" ".join(list(reversed(reversed_model.make_sentence(init_state=( first,)).split()))))
    print(out2)
    print(" ".join(list(reversed(reversed_model.make_sentence(init_state=(second,)).split()))))
    out = out.replace("\n", " ")
    print()

a daughter should
And was too pleased to be looking at last
out of love
He took with all so fast

how do you turn
Philip is in poor little hare stopped short, took aim and, hark
i will not have you
He scarcely turned her mark

And William came in jacket trim
When she saw with joy
what you mean is
What great Agrippa foams with joy

And hooting at the Black-a-moor
In the table close by his might
But Harriet would not take advice
Dog and wonder how it out of sight

tell stories
Look at last he slept like a tree
to help those around
What great Agrippa lived close at the nasty physic too pleased to be



In [219]:
model_cls = markovify.Text if level == "word" else SentencesByChar
gen_a = model_cls(text_a, state_size=order)
gen_b = model_cls(text_b, state_size=order)
gen_combo = markovify.combine([gen_a, gen_b], weights)
for i in range(output_n):
    out = gen_combo.make_short_sentence(length_limit, test_output=False)
    out2 = gen_combo.make_short_sentence(length_limit, test_output=False)
    #first= random.choice(end_words)
    first = random.choice(list(end_words_with_rhymes.keys()))
    second= random.choice(end_words_with_rhymes[first])
    print(out)
    print(" ".join(list(reversed(reversed_model.make_sentence(init_state=( first,)).split()))))
    print(out2)
    print(" ".join(list(reversed(reversed_model.make_sentence(init_state=(second,)).split()))))
    out = out.replace("\n", " ")
    print()

could set it on fire
He takes his chair
that even in a bed full of safety
Yet, when the little pond at him, now the birds, and hair

what i’ve lost
And hooting at play
There lived close by the cottage there
Such as before him lay

carry
The fire has combed his toys
with their body it’s not love
But mind a noise

Hooked poor Johnny out again
Headlong in he is Philip, this is plain
when my mother was pregnant
Philip, where is plain

if that’s what you want to do
And raise their toys
give to those
And he a noise



In [252]:
model_cls = markovify.Text if level == "word" else SentencesByChar
gen_a = model_cls(text_a, state_size=order)
gen_b = model_cls(text_b, state_size=order)
gen_combo = markovify.combine([gen_a, gen_b], weights)
for i in range(output_n):
    out = gen_combo.make_short_sentence(length_limit, test_output=False)
    out2 = gen_combo.make_short_sentence(length_limit, test_output=False)
    #first= random.choice(end_words)
    first = random.choice(list(end_words_with_rhymes.keys()))
    second= random.choice(end_words_with_rhymes[first])
    print(out)
    print(" ".join(list(reversed(reversed_model.make_sentence(init_state=( first,)).split()))))
    print(out2)
    print(" ".join(list(reversed(reversed_model.make_sentence(init_state=(second,)).split()))))
    out = out.replace("\n", " ")
    print()

you must
And arms, and call
except love and human connection
Their tears ran down all

don’t need him
Bang went the hat went the bright round sun
Among her ashes on the ground
Bob was a heavy gun

Came a little dog one day
Bob was never will forget
to kiss
He cries out one and wet

Head over ears, and in he fell
Then you see the sky
The Story of Fidgety Philip
And everybody saw them cry

And looks quite sad, and shows his hands
And never will forget
One step more! oh! sad to tell
And never will forget



In [271]:
model_cls = markovify.Text if level == "word" else SentencesByChar
gen_a = model_cls(text_a, state_size=order)
gen_b = model_cls(text_b, state_size=order)
gen_combo = markovify.combine([gen_a, gen_b], weights)
for i in range(output_n):
    out = gen_combo.make_short_sentence(length_limit, test_output=False)
    out2 = gen_combo.make_short_sentence(length_limit, test_output=False)
    #first= random.choice(end_words)
    first = random.choice(list(end_words_with_rhymes.keys()))
    second= random.choice(end_words_with_rhymes[first])
    print(out)
    print(" ".join(list(reversed(reversed_model.make_sentence(init_state=( first,)).split()))))
    print(out2)
    print(" ".join(list(reversed(reversed_model.make_sentence(init_state=(second,)).split()))))
    out = out.replace("\n", " ")
    print()

is so erotic
And threw the skies
with your solitude
Bob was all night and cries

curled inside your legs
Here you should have been made such a roar
is the most you have to be proud of when your
How Mamma had nothing more

and men
Snip! the bank was very page
the breaking
And they fell, with rage

on your body is okay
Cloth and call
you belong only to yourself
And brought his hat flew on him all

What foolish Harriet befell
Good all his thumbs
Now, as the sun grew very hot
Where is cruel Fred did not take the rain comes



In [324]:
model_cls = markovify.Text if level == "word" else SentencesByChar
gen_a = model_cls(text_a, state_size=order)
gen_b = model_cls(text_b, state_size=order)
gen_combo = markovify.combine([gen_a, gen_b], weights)
for i in range(output_n):
    out = gen_combo.make_short_sentence(length_limit, test_output=False)
    out2 = gen_combo.make_short_sentence(length_limit, test_output=False)
    #first= random.choice(end_words)
    first = random.choice(list(end_words_with_rhymes.keys()))
    second= random.choice(end_words_with_rhymes[first])
    print(out)
    print(" ".join(list(reversed(reversed_model.make_sentence(init_state=( first,)).split()))))
    print(out2)
    print(" ".join(list(reversed(reversed_model.make_sentence(init_state=(second,)).split()))))
    out = out.replace("\n", " ")
    print()

when the two of you
No one sultry day
to be
When the nasty soup get cold winter's day

I must go out and leave you here
Just like the sloven, I declare
and to have a grown man tell me something
Merry Christmas always comes home: there

i am ready for you
Exactly like any top
accept yourself
The hare came, hop, hop

And left her all alone at play
Of spectacles, to tell
And they may scream and kick and call
And laughed and Johnny fell

Up he flies
The fire has caught out little scarlet shoes
his father feasting on his mother at 4 a.m
In his little scarlet shoes



In [325]:
model_cls = markovify.Text if level == "word" else SentencesByChar
gen_a = model_cls(text_a, state_size=order)
gen_b = model_cls(text_b, state_size=order)
gen_combo = markovify.combine([gen_a, gen_b], weights)
for i in range(output_n):
    out = gen_combo.make_short_sentence(length_limit, test_output=False)
    out2 = gen_combo.make_short_sentence(length_limit, test_output=False)
    #first= random.choice(end_words)
    first = random.choice(list(end_words_with_rhymes.keys()))
    second= random.choice(end_words_with_rhymes[first])
    print(out)
    print(" ".join(list(reversed(reversed_model.make_sentence(init_state=( first,)).split()))))
    print(out2)
    print(" ".join(list(reversed(reversed_model.make_sentence(init_state=(second,)).split()))))
    out = out.replace("\n", " ")
    print()

even they can’t fix
And arms, and poor little noisy wag
pointing upward to the sky
Underneath his flag

there are no neon lights here
Quite black can be looking at home and heads, and bit of sight
when
He takes his might

Up they came the moment after
To the table close at meal-times, good Tray till he
Till they are black as black can be
And brought his Mary, till he had a tree

he only whispers i love you
Never once has pulled down
or both
Snip! Snap! Snip! They shall see the town

you have to stop
And Papa bade Phil behave
you whisper
But fidgety Phil behave

