This R data package loads a concatenated dataframe composed of two-party
US Presidential debate transcripts from 1960 to 2020. Each debate
transcript is delineated in a one-row per utterance format marked by a
distinct document_id (debate and year), candidate (e.g., REAGAN) and
turn#.
Each utterance is also marked with associated metadata regarding the
candidate (e.g. name, party) election outcome (e.g., party_winner), and
various other statistics about the US in that debate year including:
1) Democrat candidate age
2) Republican candidate age
3) Age difference between candidates
3) Inflatiion rate in the
year of the debate
4) GDP % change from prior year of debate
We edited each raw transcript to omit commentator questions and remarks
(e.g., “What is the most important issue facing the economy?”). We also
ommitted parenthetical transcription notes (e.g., […coughing]).
We initially compiled debate transcripts from the Commission on
Presidential Debates (debates.org). Since each of the debates we
included were widely televised and publicly available, the data are
considered fair use (https://www.copyright.gov/fair-use/). That is,
their intended use is for academic/scholarly research.
Researchers are free to make use of this data for any purpose. We
optimized the format to work well with our companion R-package
(ConversationAlign) designed to analyze alignment between two
interlocutors.
Install the development version of USAPresidentialDebates from GitHub by typing the following in your console or script (make sure you have devtools installed):
install.packages("devtools")
devtools::install_github("Reilly-ConceptsCognitionLab/USAPresidentialDebates")
Could not be more simple… call the library, then type the word debates.
library(USAPresidentialDebates)
str(debates)
Contact jamie_reilly@temple.edu for feedback and assistance.