Skip to content
Topic analysis on US Presidents' State of the Union Addresses and Messages
Python Jupyter Notebook C JavaScript C++ TeX
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Topic analysis of US Presidents' State of the Union Addresses and Messages

The text is scraped from: The American Presidency Project

Inspiration for this project: Topic Modeling the State of the Union: TV and Partisanship

US Presidents' State of the Union Addresses and Messages

State of the Union Messages to the Congress are mandated by the US constitution. In modern times messages are orally delivered message presented to a joint session of Congress, but the State of the Union was a written report sent to Congress to coincide with a new Session of Congress.

In the texts considered here, Nixon submited multiple documents or gave both oral and written messages. Roosevelt's last (1945) and Eisenhower's 4th (1956) were technically written messages although they also addressed the American people via radio.


The text is scraped using urllib2 and BeautifulSoup from this site: The American Presidency Project

To avoid having to scrape the site too often, the scraped texts are stored in documents_raw.pkl using Pickle.

See Scrape.ipynb for the code doing the scraping.

Topic analysis

The text is imported from documents_raw.pkl and preprocessed. Preprocessing includes removing removing non-unicode characters, words starting and ending with non-letter characters ("1st" is ok, "123" not), removing punctuation and stop words ("and", "won't"), lemmatization.

After that Latent Dirichlet Allocation LDA and Non-Negative Matrix Factorization NMF are applied. The topics and the analysis are plotted using pyLDAvis and WordCloud.

See Presidentspeech.ipynb


Install a virtual environment or use the --user flag after pip.

pip3 install -r requirements.txt

Also download NLTK data with a command similar to the following (more details on

python -m nltk.downloader -d /usr/local/share/nltk_data all
You can’t perform that action at this time.