<a href="https://colab.research.google.com/github/devipriya2006/PRODIGY_GA_03/blob/main/aitask3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Text generation with Markovify**
## Implementing a simple text generation algorithm using Markov chains. This task involves creating a statistical model that predicts the probability of a character or word based on the previous one(s).

# Working with text files

Before we get started, we'll first need some text! Grab two plain text files from Project Gutenberg (or from another source of your choice) and save them to the same directory as this notebook. (I suggest working with two files because we'll be running some code explicitly to "compare" two texts. Also, I think seeing two different outputs from the text generation methods discussed in this notebook will help you better understand how those methods work.) The code in the following cell loads into Python variables the contents of two plain text files, assigned to variables text_a and text_b. You'll need to replace the filenames with the names of the files that you downloaded, keeping the quotation marks (") intact.

In [3]:
text_a = open("book1.txt").read()
text_b = open("book2.txt").read()

These variables are strings, which are essentially just long lists of the characters that occur in the text, in the order that they occur. The code in the following cell shows the first two hundred characters of text A:

In [4]:
print(text_a[:200])

﻿The Project Gutenberg eBook of Aaron the Jew: A Novel
    
This ebook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictio


You can change text_a to text_b to see the output from your second text, or change 200 to a number of your choosing.

The random.sample() function gives us a random sampling of the contents of a variable (as long as that variable is a sequence of things, like a string or a list). So, for example, to see twenty random characters from text B:

In [5]:
import random
random.sample(text_b, 20)

['u',
 ' ',
 'a',
 'l',
 'h',
 ' ',
 'r',
 'r',
 'l',
 'y',
 'm',
 ',',
 'l',
 'y',
 'b',
 's',
 '.',
 ' ',
 'e',
 'o']

Perhaps more interesting would be to see a randomly-sampled list of words. To do this, we'll make separate variables for the words in the text, using a Python function called .split(), which takes a string and turns it into a list of words contained in that string. The following cell makes two new variables that contain the words from both texts respectively:

In [6]:
a_words = text_a.split()
b_words = text_b.split()

Now, ten random words from both text A and text B:

In [8]:
random.sample(a_words, 10)

['through',
 'with',
 'never',
 'match,',
 'child',
 'beloved!',
 'in',
 "affliction?'",
 'a',
 'that']

In [9]:
random.sample(b_words, 10)

['and',
 'in',
 'any',
 'learn',
 'based',
 'any',
 'facility:',
 'Gutenberg™.',
 'a',
 'the']

The code in the following cell uses Python's Counter object to count the most common letters in the first of these texts:

In [10]:
from collections import Counter
Counter(text_a).most_common(12)

[(' ', 120468),
 ('e', 66340),
 ('t', 44644),
 ('o', 41780),
 ('a', 40269),
 ('n', 36241),
 ('h', 34379),
 ('i', 34318),
 ('s', 33532),
 ('r', 31302),
 ('d', 23476),
 ('l', 19907)]

Specifying the a_words variable gives the most frequent words instead:

In [11]:
Counter(a_words).most_common(12)


[('the', 5610),
 ('to', 3848),
 ('and', 3694),
 ('of', 3316),
 ('a', 2719),
 ('in', 2221),
 ('was', 1890),
 ('he', 1773),
 ('I', 1712),
 ('his', 1473),
 ('not', 1246),
 ('that', 1237)]

Compare these to the most common words in text B:



In [12]:
Counter(b_words).most_common(12)


[('the', 209),
 ('of', 134),
 ('and', 113),
 ('to', 109),
 ('in', 84),
 ('Project', 79),
 ('you', 73),
 ('a', 72),
 ('or', 71),
 ('with', 55),
 ('Gutenberg™', 53),
 ('this', 43)]

# **Generating with Markovify**
To install Markovify on your computer, run the cell below:

In [13]:
import sys
!{sys.executable} -m pip install markovify


Collecting markovify
  Downloading markovify-0.9.4.tar.gz (27 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting unidecode (from markovify)
  Downloading Unidecode-1.3.8-py3-none-any.whl.metadata (13 kB)
Downloading Unidecode-1.3.8-py3-none-any.whl (235 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m235.5/235.5 kB[0m [31m16.1 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: markovify
  Building wheel for markovify (setup.py) ... [?25l[?25hdone
  Created wheel for markovify: filename=markovify-0.9.4-py3-none-any.whl size=18607 sha256=b4e3764ae40198a43c5a1e47f1b68ab573e5bbde067e1e4e5b4d0a755d1bafdc
  Stored in directory: /root/.cache/pip/wheels/9c/20/eb/1a3fb93f3132f2f9683e4efd834800f80c53aeddf50e84ae80
Successfully built markovify
Installing collected packages: unidecode, markovify
Successfully installed markovify-0.9.4 unidecode-1.3.8


In [14]:
import markovify

The code in the following cell creates a new text generator, using the text in the variable specified to build the Markov model, which is then assigned to the variable generator_a.

In [17]:
generator_a = markovify.Text(text_a)

You can then call the .make_sentence() method to generate a sentence from the model:

In [16]:
print(generator_a.make_sentence())

Thus, we do not deny it; he undertook to find a home and an expression of her feelings it seemed as if for guidance.


The .make_short_sentence() method allows you to specify a maximum length for the generated sentence:

In [18]:
print(generator_a.make_short_sentence(50))


She stared at him in respect of his religion.


In [19]:
print(generator_a.make_short_sentence(40, tries=100))

Rachel, stealing to the men gave way.


In [20]:
print(generator_a.make_short_sentence(40, test_output=False))


CHAPTER XLIV.


# **Changing the order**
When you create the model, you can specify the order of the model using the state_size parameter. It defaults to 2. Let's make two model with different orders and compare:

In [21]:
gen_a_1 = markovify.Text(text_a, state_size=1)
gen_a_4 = markovify.Text(text_a, state_size=4)

In [22]:
print("order 1")
print(gen_a_1.make_sentence(test_output=False))
print()
print("order 4")
print(gen_a_4.make_sentence(test_output=False))

order 1
That upon his judgment upon his eyes.

order 4
Of such stuff are martyrs made; from such elements springs the lofty ideal into which, once in a generation, is breathed the breath of life, the self-sacrificing hero who sheds his blood and dies with a glad light on his face in the battle of weak innocence against the ruthless hand of power.


# **Changing the level**
Markovify, by default, works with words as the individual unit. It doesn't come out-of-the-box with support for character-level models. The following code defines a new kind of Markovify generator that implements character-level models. Execute it before continuing:

In [23]:
class SentencesByChar(markovify.Text):
    def word_split(self, sentence):
        return list(sentence)
    def word_join(self, words):
        return "".join(words)

Any of the parameters you passed to markovify.Text you can also pass to SentencesByChar. The state_size parameter still controls the order of the model, but now the n-grams are characters, not words.

In [24]:
con_model = SentencesByChar("condescendences", state_size=2)

Execute the cell below to see the output—it'll be a lot like what we implemented by hand earlier!

In [25]:
con_model.make_sentence()

'condendescescendesces'

Of course, you can use a character-level model on any text of your choice. So, for example, the following cell creates a character-level order-7 Markov chain text generator from text A:

In [26]:
gen_a_char = SentencesByChar(text_a, state_size=7)

And the cell below prints out a random sentence from this generator. (The .replace() is to get rid of any newline characters in the output.)

In [28]:
print(gen_a_char.make_sentence(test_output=False).replace("\n", " "))

You are about it.


# **Combining models**
Markovify has a handy feature that allows you to combine models, creating a new model that draws on probabilities from both of the source models. You can use this to create hybrid output that mixes the style and content of two (or more!) different source texts. To do this, you need to create the models independently, and then call .combine() to combine them.

In [29]:
generator_a = markovify.Text(text_a)
generator_b = markovify.Text(text_b)
combo = markovify.combine([generator_a, generator_b], [0.5, 0.5])

The bit of code [0.5, 0.5] controls the "weights" of the models, i.e., how much to emphasize the probabilities of any model. You can change this to suit your tastes. (E.g., if you want mostly text A with but a soupçon of text B, you would write [0.9, 0.1]. Try it!)

Then you can create sentences using the combined model:

In [30]:
print(combo.make_sentence())

It is a necessity for impressing it upon him.


We all know that his religion and taught to look on the subject, but that the arguments with which I feared to meet her again.

# **Bringing it all together**
I've pre-written some code below to make it easy for you to experiment and produce output from Markovify. Just make adjustments to the values assigned to the variables in the cell below:

In [31]:
# change to "word" for a word-level model
level = "char"
# controls the length of the n-gram
order = 7
# controls the number of lines to output
output_n = 14
# weights between the models; text A first, text B second.
# if you want to completely exclude one model, set its corresponding value to 0
weights = [0.5, 0.5]
# limit sentence output to this number of characters
length_limit = 280

(The lines beginning with # are "comments"—they don't do anything, they're just there to explain what's happening in the code.)

After making your changes above, run the cell below to generate text according to your parameters. Repeat as necessary until you get something you really like!

In [32]:
model_cls = markovify.Text if level == "word" else SentencesByChar
gen_a = model_cls(text_a, state_size=order)
gen_b = model_cls(text_b, state_size=order)
gen_combo = markovify.combine([gen_a, gen_b], weights)
for i in range(output_n):
    out = gen_combo.make_short_sentence(length_limit, test_output=False)
    out = out.replace("\n", " ")
    print(out)
    print()

Blessed with the name of them were the rear of the poor.

His argument, because you if you could do was to make is--and a worthy endeavour to her, should inhabitant when I think, to act in a holiday times, or from burglars, there was angry at the point in Aaron was waiting for.

She, on her aunt's shop, found that they gazed up and donations, and heard the wilderness.

There is another.

Not a harsh thought it was electronically precious to obtain from such unselfishness than flight as to how far the finger is more than a kingdom; but a large a particular state visit has had never did, and the country.

Esther Moss, without inconvenience.

Simply dressed his should be here the day I received the victory have been something which is the following more than himself in the Bank of England she awoke from the fair vales beneath its roof.

I promise to carry out my best workmen and which filled herself.

You would reach the son of his wife.

With Aaron expectancy of the child.

She trembled 

# Generating with non-prose text
Markovify assumes you're feeding it prose, i.e., a text file that can be parsed into sentences by separating on sentence-ending punctuation. But often you're not working with text like this. For example, let's generate some sonnets. First, download this plaintext version of Shakespeare's sonnets and keep it in the same directory as this notebook. We'll define the sonnet-generating task as consisting of (a) training a Markov chain on lines of poetry and then (b) generating a sequence of fourteen lines of poetry. Since the line is the unit now and not the sentence, we need to use Markovify's NewlineText class instead of Text:

In [34]:
sonnets_text = open("sonnets.txt").read()
sonnets_model = markovify.NewlineText(sonnets_text, state_size=1)

And then we can generate:

In [35]:
sonnets_model.make_sentence()

'To give physic to make,'

And now make a sonnet, sorta:

In [36]:
for i in range(14):
    print(sonnets_model.make_sentence())

And, thou the travail of life, and fell sick withal, the proudest sail to life, being dead;
Serving with eyes and think on all,
And threescore year would corrupt by this change my lover's life:
When swift foot back?
Why should nothing that I hallow'd thy self uprear,
And make those.
And heavily he so possess'd with outward praise that I have been before,
As, thou not at pleasure of seldom coming end you yourself may character,
With my soul's thought,
And in the blind fool, Love, what wealth she stores, to remove:
Which my rose,
Save that, to flow,
Since every where.
Nor think the day?


Doing this with a character-level model is a bit more tricky. I've written code in the cell below that defines a new class, LinesByCharacter that works like NewlineText but operates character-by-character instead of word-by-word:

In [37]:
class LinesByChar(markovify.NewlineText):
    def word_split(self, sentence):
        return list(sentence)
    def word_join(self, words):
        return "".join(words)

Now we can create a character model with the sonnets, line by line:

In [38]:
sonnets_char_model = LinesByChar(sonnets_text, state_size=4)

And generate new sonnets:

In [39]:
for i in range(14):
    print(sonnets_char_model.make_sentence())

For the perfumes of stage,
For I hast dost his sworn, thy child, those is, by thou should be.
Knowing best my nature of all his to gazers remain,
Happy your me upon dwell-seeing sights,
Her proud as pity of thy delight,
This they knight,
Since vouches go,
Beshrew acquainting Time do cries:
Praise is they loss,
Was it was of he lamb he trouble-vantage for tented face
When let not to thee view;
Before I sweet beauty tell;
O! less gone
Anothere onsecrate:


# **New moods**
Character-level Markov chains are especially suitable, in my experience, for generating shorter texts, like individual words or names. Let's generate names of new moods using this technique. First, download this JSON file of moods from Corpora Project and save to the same directory as this notebook.

Then load the JSON file and grab just the list of words naming moods:

In [42]:
import json
mood_data = json.loads(open("./moods.json").read())
moods = mood_data['moods']

The easiest way to use this is to make one big string with the moods joined together with newlines:

In [43]:
moods_text = "\n".join(moods)

Then use LinesByChar to make the model:

In [44]:
moods_char_model = LinesByChar(moods_text, state_size=3)

And voila, new moods:

In [45]:
for i in range(24):
    print(moods_char_model.make_sentence())

impeless
introng
powerloadequashed
overwhelplenished
difficated
None
molenchallow
disheepy
sombenterrupted
revenate
suppointronious
talienthreased
horribald
talkatirictory
indescued
ridicalm
warnisconfit
neurottenewed
melancholistalistled
thrillenischievil
phonorant
rembarratenished
peacefulfish
guardly
