This repo contains the transcript from every State of the Union (SOTU) address, from George Washington's first address on Jan 8, 1790 through Donald Trump's address on Jan 30, 2018.
The datasets also contains transcripts from addresses that aren't technically SOTU addresses, but are instead considered "Address Before a Joint Session of the Congress" (see an explanation here).
Transcripts through 2017 were scraped from the University of California, Santa Barbara's American Presidency Project.
The 2018 transcript was scraped from CNN.
Transcripts are exported as a CSV and as JSON. The code used to scrape and clean the transcripts is included in the get_transcripts.R
file.