@realDonaldTrump's top mentions

I wanted to run a meaningful analysis of @realDonaldTrump's frequently used words when he was tweeting as presidential candidate, president-elect and president.

Here's how I did it. (And here's what I found out.)

Scraping and cleaning tweets:

Got Twitter oAuth access via apps.twitter.com
Ran scraping package in R using stringr, twitteR, etc. for further analysis. Here's the script I used.
After running the package with two separate packages and Twitter API keys, I found that the package/Twitter API failed to get the complete collection of tweets I wanted: not only did it not scrape 3,200 tweets, it also left out numerous tweets from certain time periods (for example, tweets between 29 September and 5 October 2016 were completely missing).
I then ran a Python script forked from bpb27 with a separate Twitter API key, to see if my scrape would be better. All the scripts I used can be found in my fork of bpb27's repo, and I've pasted the scripts from my scrape in part 1 of this repo.
Oddly enough, tweets were also missing in this package (those between 11 March and 15 March 2017, for instance, were completely missing from this scrape). At this point I started to think it was my Twitter API keys, rather than the scripts, acting up. Sad!
To resolve this issue, I merged the data from both the R and Python scrapes, and removed the duplicates to get the final collection. Due to many different special characters present, I had to manually sort through every tweet again to find duplicates.
I divided the data from the final dataset into three time frames: tweets as candidate, from 6 Oct 2016 to 8 Nov 2016; as president-elect from 9 November 2016 to 20 January 2017; as president from 20 January 2017 to 15 March 2017. This gave me 160 days' worth of tweets to examine. The csv files are in part 2 of this repo, here, here and here.

Examining data

Ran text mining and wordcloud programmes on the scraped tweets, now sorted into separate csv files according to the time frame. I found eight2late's tutorial the clearest and the most useful guide.
The scripts are in part 3 of this repo, or here, here and here.
Downloaded data as csv files for further cleaning and analysis. The cleaned files are in part 4 of this repo, or here: top words as Candidate, as President-elect and as President.
Created separate Excel file highlighting mentions of the media (e.g. "news", "fake", "nytimes") during these three time periods. Here it is.

Presenting data

Ran ggplot scripts to visualise top mentions per period. The scripts are in part 5 of this repo: here, here and here. The final jpegs of the graphs are here, here and here.
Created interactive presentation of the graphs: script here.
Created interactive line graph via JavaScript and Highcharts documentation to visualise mentions of the media during each period: script here.

The finishing touches

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
1. pythonScrapingScript		1. pythonScrapingScript
1. twitterScraper.R		1. twitterScraper.R
2. raw_tweetsAsCandidate.csv		2. raw_tweetsAsCandidate.csv
2. raw_tweetsAsPeotus.csv		2. raw_tweetsAsPeotus.csv
2. raw_tweetsAsPotus.csv		2. raw_tweetsAsPotus.csv
3. tm_tweetsAsCandidate.R		3. tm_tweetsAsCandidate.R
3. tm_tweetsAsPeotus.R		3. tm_tweetsAsPeotus.R
3. tm_tweetsAsPotus.R		3. tm_tweetsAsPotus.R
4. media mentions.xlsx		4. media mentions.xlsx
4. wordsAsCandidate.csv		4. wordsAsCandidate.csv
4. wordsAsPeotus.csv		4. wordsAsPeotus.csv
4. wordsAsPotus.csv		4. wordsAsPotus.csv
5. plot_wordsAsCandidate.R		5. plot_wordsAsCandidate.R
5. plot_wordsAsPeotus.R		5. plot_wordsAsPeotus.R
5. plot_wordsAsPotus.R		5. plot_wordsAsPotus.R
6. media.js		6. media.js
6. topMentions.js		6. topMentions.js
7. index.html		7. index.html
7. style.css		7. style.css
README.md		README.md
jquery-3.1.1.min.js		jquery-3.1.1.min.js
wordsAsCandidate_Rplot.jpeg		wordsAsCandidate_Rplot.jpeg
wordsAsPeotus_Rplot.jpeg		wordsAsPeotus_Rplot.jpeg
wordsAsPotus_Rplot.jpeg		wordsAsPotus_Rplot.jpeg

alexandrama/scraping-twitter-for-mentions

Folders and files

Latest commit

History

Repository files navigation

@realDonaldTrump's top mentions

About

Resources

Stars

Watchers

Forks

Languages