This small project aims at helping researchers who are working on SNS and learning analytics collect data in an easier way, specifically those researchers who want to collect data from Twitter by hashtag. There are two scenarios:
- Collect data from the past
- Collect data in future
Authentication, first thing first
- Install R (http://www.r-project.org; if you are on a PC, you may want to install R studio (http://www.rstudio.com) too)
- Create a new application on https://apps.twitter.com (you need a Twitter account first). Remember to set the Callback URL as http://127.0.0.1:1410 when creating the application.
- Save the consumer key and consumer secret for future use.
- Run the code in Authentication.R; please remember to replace "xxxxx" with your own consumer key and secrets.
Collect data from the past
Twitter API limits data collection using hashtag from what happened before. If you have to collect full data in the past, you may consider doing the following things.
- Search the hashtage on Twitter, and scroll down untill all tweets are loaded on that webpage.
- Save the webpage as html file on your local drive, and put it in the working directory of R
- Run the code in parseTwitterPageByHashtag.R
- Type in the Console the function named getData. Provide the needed parameters to the function, including the hastag, the file name of the html file you just saved, and the name of output csv file. An example is as: getData('#mayedit2000', 'tweets.html', 'cleanTotalTweets').
Note: If the codes in parseTwitterPageByHashtag.R don't run or give you errors, please try the codes in parseTwitterPageByHashtag-version2.R. The steps are exactly the same as above.
Collect data in future
If you plan carefully, you can collect the data by hashtag from what will happen in future in an easier way (e.g., some learning activity on Twitter). Please note that this method works only for small scale studies. You may go through the following steps one time per week as the online activities go on.
- Run the code in hashtagSearch.R
- Type in the Console the function named tweetCollect. Provide the needed parameters to the function, including the hashtag you would like to collect data by, number of tweets you would like to collect, and the name of output csv file. An example is as: tweetCollect('#edit2000', 300, 'tweets'). On PC, the same example should be as tweetCollect('#edit2000', 300, 'tweets')