Skip to content

Topic Modeling Analysis on the Democratic Debates & Data Cleanup on Scraped data

Notifications You must be signed in to change notification settings

branden-ciranni/Topic-Modeling-Debates

Repository files navigation

Democratic Primaries 2020 Analysis

Applying Topic Modeling to Analyze the 2020 Democratic Debates

Topic Modeling is a type of dimensionality reduction that helps to reveal latent topics in large texts. In this analysis, I use a flavor of Topic Modeling, Nonnegative Matrix Factorization (NMF) to see what each of the 2020 Democratic presidential candidates focused on during the debates.

To learn more about Topic Modeling, see my short Two-Pager on a very brief mathematical explanation of Topic-Modeling Methods, including SVD, NMF, and LDA. https://github.com/branden-ciranni/papers/blob/main/Mathematics_of_Topic_Modeling.pdf

Two notebooks are present,

  • Scraping Debate Transcripts.ipynb: Data Scraping, Cleaning, and Transformation
  • eda_topic_modeling_where_candidates_focus.ipynb: EDA, Text Preprocessing, Topic Modeling

Take a look at this dataset on Kaggle: https://www.kaggle.com/brandenciranni/democratic-debate-transcripts-2020

About

Topic Modeling Analysis on the Democratic Debates & Data Cleanup on Scraped data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published