Article link: https://towardsdatascience.com/analyzing-justin-trudeaus-speeches-3ba2690ad57a
Canada will be entering the election season soon, with the projected election date to be on October 21, 2019. This election, in many ways, will be an interesting event. From the rise of populism across the world to refugee crises, Prime Minister Justin Trudeau had an extremely difficult term. These elections will be the chance for Canadian citizens to voice their concerns over Prime Minister Trudeau's policies.
Usually, citizens like to listen to debates and speeches by candidates on the campaign trail and occasionally dive into party platforms. But I propose a new way of judging candidates, especially incumbents: their official speeches. More often than not, they are a general representation of the government's agenda. I was inspired to analyze Prime Minister Trudeau's speeches when I heard of individuals examining President Trump's tweets; I thought speeches would be a great way of looking at a politician's sentiment over time, especially in lieu of elections.
The general structure of the project was as follows:
- Find a way to scrape speeches from Prime Minister's Trudeau's website (https://pm.gc.ca/en/news/speeches)
- Store the speeches in some database
- Analyze the speeches sentiment
- Analyze and predict speech topics from speech transcripts
I am proud that I was able to accomplish each of these steps and learn so many new techniques and technologies. If you would like to experiment with this on your own, please follow these instructions.
- Navigate to your local directory and
git clone
this repo - Navigate to the project repo using your CLI and type the following commands:
source env/bin/activate
: activates the virtual environment that hosts all modulesmongod
: initiates MongoDB server to store speeches
- Run the scraping script by typing
python src/crawler_ajax.py
. Note: AJAX requests were used. Selenium was the initial choice but it was hard to implement. The code for my initial work can be found insrc/crawler_selenium.py
- Clean the speech by running
python src/speech_clean.py
- Process the speech for natural language processing by running
python src/speech_process.py
- Analyze speech sentiments by running
python src/sentiment_analysis.py
- Find and predict speech topics by running
python src/topic_modelling.py
. Please follow the CLI instructions!
Note: visualizations with accomapnying analysis can be found at src/Visualizations.ipynb
- Languages: Python
- Techniques learned: natural language processing, topic modelling via latent Dirichlet allocation models, sentiment analysis, web scraping via Selenium, database storage
- Frameworks: Selenium, MongoDB, NLTK
- Refactoring to run all scripts with one command
- Scrape more speeches and potentially predict sentiment scores over time