In [1]:
%reload_ext nb_black

<IPython.core.display.Javascript object>

# Text Summarization and Natural Language Generation Assignment

In [2]:
import re
import requests
from bs4 import BeautifulSoup
import markovify
import nltk
from nltk import pos_tag
from nltk import sent_tokenize
from gensim.summarization import summarize

<IPython.core.display.Javascript object>

### Scrape and clean the text from the 3 Presidential State of the Union Address URLs below and save them into a list.

In [3]:
lincoln = "https://en.wikisource.org/wiki/Abraham_Lincoln%27s_First_State_of_the_Union_Address"
roosevelt = "https://en.wikisource.org/wiki/Theodore_Roosevelt%27s_First_State_of_the_Union_Address"
obama = (
    "https://en.wikisource.org/wiki/Barack_Obama%27s_Second_State_of_the_Union_Address"
)
urls = {"lincoln": lincoln, "roosevelt": roosevelt, "obama": obama}

<IPython.core.display.Javascript object>

In [4]:
full_texts = {}

for pres, url in urls.items():
    response = requests.get(url)
    content = response.text
    soup = BeautifulSoup(content)
    text = [
        p.text.replace("\n", "").strip()
        for p in soup.find("div", class_="mw-parser-output").find_all("p")
    ]

    full_texts[pres] = " ".join(text)

<IPython.core.display.Javascript object>

### For each State of the Union Address, use the Gensim `summarize` function and print a summary of each address approximately 200 words long.

In [5]:
for pres, speech in full_texts.items():
    print(pres.title() + ":")
    print(summarize(speech, word_count=200))
    print("-------------------------------------")

Lincoln:
But the powers of Congress, I suppose, are equal to the anomalous occasion, and therefore I refer the whole matter to Congress, with the hope that a plan may be devised for the administration of justice in all such parts of the insurgent States and Territories as may be under the control of this Government, whether by a voluntary return to allegiance and order or by the power of our arms; this, however, not to be a permanent institution, but a temporary substitute, and to cease as soon as the ordinay courts can be reestablished in peace.
In such case I recommend that Congress provide for accepting such persons from such States, according to some mode of valuation, in lieu, pro tanto, of direct taxes, or upon some other plan to be agreed on with such States respectively; that such persons, on such acceptance by the General Government, be at once deemed free, and that in any event steps be taken for colonizing both classes (or the one first mentioned if the other shall not be br

<IPython.core.display.Javascript object>

### Sentence tokenize each address and save the tokenized sentences to a separate list.

In [6]:
speech_sents = {}
for pres, speech in full_texts.items():
    speech_sents[pres] = sent_tokenize(speech)

<IPython.core.display.Javascript object>

### Train a Markov chain model for each tokenized address and generate 5 sentences based on the language used for each one.

In [7]:
for pres, sents in speech_sents.items():
    model = markovify.Text(sents, state_size=3)
    sentence_list = []
    print(pres.title() + ":")
    while len(sentence_list) <= 5:
        sentence = model.make_short_sentence(max_chars=200, min_chars=30, tries=100)
        if sentence not in sentence_list:
            sentence_list.append(sentence)
            print(sentence, end="\n")
    print("---------------------")

Lincoln:
Much of the national loan has been taken by citizens of the United States steamer Massachusetts for a supposed breach of the blockade.
Two mates of vessels engaged in the practical administration of them.
I repeat the recommendation of the Secretary of the Navy by introducing additional grades in the service.
Again, as has already been said, there is not of necessity any such thing as a free man being fixed for life in the condition of the entire Army.
I submit, therefore, for your consideration the expediency of regaining that part of the District of Columbia.
In those documents we find the abridgment of the existing war, have already been made.
---------------------
Roosevelt:
In all industries carried on directly or indirectly for the United States have been committed to the Smithsonian Institution.
For the sake of the welfare of any other portion of our country.
The standard of living on our ships is far superior to the standard of living is also higher than ever before.
N

<IPython.core.display.Javascript object>

### Add part of speech tags to the Markov chain model and regenerate 5 sentences for each address.

In [8]:
class POSifiedText(markovify.Text):
    def word_split(self, sentence):
        words = re.split(self.word_split_pattern, sentence)
        words = ["::".join(tag) for tag in nltk.pos_tag(words)]
        return words

    def word_join(self, words):
        sentence = " ".join(word.split("::")[0] for word in words)
        return sentence

<IPython.core.display.Javascript object>

In [9]:
for pres, sents in speech_sents.items():
    model = POSifiedText(sents, state_size=3)
    sentence_list = []
    print(pres.title() + ":")
    while len(sentence_list) <= 5:
        sentence = model.make_short_sentence(max_chars=200, min_chars=30, tries=100)
        if sentence not in sentence_list:
            sentence_list.append(sentence)
            print(sentence, end="\n")
    print("---------------------")

Lincoln:
I ask attention to the views of the Secretary of the Navy presents in detail the operations of the Patent and General Land Offices.
In the exercise of my best discretion I have adhered to the act of Congress to confiscate property used for insurrectionary purposes.
If useful, no State should be denied them; if not useful, no State should be denied them; if not useful, no State should be denied them; if not useful, no State should have them.
In most of the Southern States a majority of the whole people of all right to participate in the selection of the country as well as with regiments.
I commend their interests and their duties.
Much of the national loan has been taken by citizens of the United States steamer Massachusetts for a supposed breach of the blockade.
---------------------
Roosevelt:
The anarchist, and especially the anarchist in the United States, the product of this period.
The Congress should provide means whereby it will be covered by this kind of service.
But f

<IPython.core.display.Javascript object>