Skip to content
Connect to Twitter's API, collect data, save, visualize in R
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Twitter Data Basics

Connect to Twitter’s API using Mike Kearney’s ‘rtweet’ package. Collect tweets via keywords, from a specific geographic location, or from a specific user’s timeline. Also do some basic plotting to explore data like geo-maps and a word cloud. Important: this script assumes you have a Twitter developer account, which you can easily set up following this tutorial.


# load libraries ----

1 - Authenticate for Twitter API access

If you don’t know how to get these values see this tutorial

# first establish authentication. Replace the dummy values below with your own
# store api keys (these are fake example values; replace with your own keys)
api_key <- "afYS4vbIlPAj096E60c4W1fiK"
api_secret_key <- "bI91kqnqFoNCrZFbsjAWHD4gJ91LQAhdCJXCj3yscfuULtNkuu"

# authenticate via web browser
token <- create_token(
  app = "YourApp",
  consumer_key = api_key,
  consumer_secret = api_secret_key)

2 - Search tweets and save them

(Or jump to step 3 if you’ve downloaded the “trump_tweets.csv” data directly)

# search for 5000 tweets sent from the US mentioning Trump
tweets <- search_tweets("#trump", geocode = lookup_coords("usa"), n = 5000)

# you could also search a user's timeline
trump <- get_timelines("realdonaldtrump", n = 500)

# unlist and save as csv
save_as_csv(tweets, "trump_tweets.csv")

3 - In what states are people tweeting about Trump?

#if you didn't collect your own tweets load in file of Trump tweets
tweets <- read_csv("trump_tweets.csv")

# plot on a map
# create lat/lng variables using all available tweet and profile geo-location data
# note: you may not have much lat_lng data in your available set, depends on which users appear in your set
# geo-tagged data is often sparse
tweets <- lat_lng(tweets)

#plot state boundaries
par(mar = c(0, 0, 0, 0))
maps::map("state", lwd = .25)

#plot lat and lng points onto state map
with(tweets, points(lng, lat, pch = 20, cex = .75, col = rgb(0, .3, .7, .75)))

3 - Explore most used words in dataset - construct wordcloud

# remove URLs
tweets$text <- gsub("https\\S*","", tweets$text)

# remove "@username" tags
tweets$text <- gsub("@\\w+", "", tweets$text) 

# put data into tidy text format - note we use 'token = 'tweets'' for twitter-specific text preprocessing
tweets_tokens <- tweets %>% 
  unnest_tokens(word, text, token = "tweets") %>% 
  # remove numbers
  filter(!str_detect(word, "^[0-9]*$")) %>%
  # remove stop words
  anti_join(stop_words) %>%
  # stem the words
  mutate(word = SnowballC::wordStem(word))

wordcloud(tweets_tokens$word, min.freq=200)

You can take out commonly occuring expressions that aren’t of interest to make a cleaner word cloud using something like this

tweets_tokens_trim <- tweets_tokens %>% filter(word != 'amp')

wordcloud(tweets_tokens_trim$word, min.freq=200)

You can’t perform that action at this time.