Skip to content

Latest commit

 

History

History
16 lines (10 loc) · 962 Bytes

README.md

File metadata and controls

16 lines (10 loc) · 962 Bytes

Democratic Primaries 2020 Analysis

Applying Topic Modeling to Analyze the 2020 Democratic Debates

Topic Modeling is a type of dimensionality reduction that helps to reveal latent topics in large texts. In this analysis, I use a flavor of Topic Modeling, Nonnegative Matrix Factorization (NMF) to see what each of the 2020 Democratic presidential candidates focused on during the debates.

To learn more about Topic Modeling, see my short Two-Pager on a very brief mathematical explanation of Topic-Modeling Methods, including SVD, NMF, and LDA. https://github.com/branden-ciranni/papers/blob/main/Mathematics_of_Topic_Modeling.pdf

Two notebooks are present,

  • Scraping Debate Transcripts.ipynb: Data Scraping, Cleaning, and Transformation
  • eda_topic_modeling_where_candidates_focus.ipynb: EDA, Text Preprocessing, Topic Modeling

Take a look at this dataset on Kaggle: https://www.kaggle.com/brandenciranni/democratic-debate-transcripts-2020