In this project, I will use natural language processing techniques to explore a dataset containing tweets from members of the 116th United States Congress that met from January 3, 2019 to January 2, 2021. The dataset has already been cleaned to contain information about each legislator. Concretely, I will do the following:
- Preprocess the text of legislators’ tweets
- Conduct Exploratory Data Analysis of the text
- Use sentiment analysis to explore differences between legislators’ tweets
- Featurize text with manual feature engineering, frequency-based, and vectorbased techniques
- Predict legislators’ political parties and whether they are a Senator or Representative
- Explore whether asymmetric polarization shows up in how politicians communicate to their constituents through tweets
- Explore whether Senators' tweets support the theory that the Senate is more moderate