# Lab 4.0 stream_tweets()

This lab introduces four different ways of capturing Twitter data in realtime.

In [3]:
library(rtweet)
library(tidyverse)

In [4]:
## source rlib script
source("../rlib.R")

### stream_tweets basics

The `stream_tweets()` function offers several arguments, but to get started using the function, you really need to know two of them: `q` and `timeout`. 

The first argument, `q`, is the stream query. This is what determines the streaming method (each will be described below). By default, `q = ""`, which indicates you want to use the random search method. For most queries, `q` will be a comma separated string, with keywords between commas and commas acting like a boolean "OR" operator. The `q` argument also accepts a vectors of geographical coordinates or user IDs*(NOTE: the tracking by USER IDs feature is broken in current version of rtweet; it is still possible to track by screen name).

The second argument, `timeout`, is the amount of time **in seconds** to keep the stream running. It's easiest if this number is broken down and specified using common time intervals. For example, to stream for 3 minutes, set `timeout = 60 * 3`. To stream for 3 days, set `timeout = 60 * 60 * 24 * 3`. As you can see, values represent the number of units to the left (with an assumed 1 to start). So 60 seconds = 1 minute, 60 minutes = 1 hour, 24 hours = 1 day, and so on. OR, you can type into your Google bar, "how many seconds in 3 days," because that's what Google is for!

Read more about `stream_tweets()` in the package documentation.

In [None]:
?stream_tweets

### 1. Stream via `statuses/sample`

The first method for streaming tweets is Twitter's random sampling method, which Twitter asserts will return a "random" sample of about 1% of all tweets. The methodology is a bit of black box, but I'd venture to guess the algorithm isn't completely worthless. And even though 1% is unfortunately small, that still works out to be quite a few tweets.

To use this method in rtweet, pass an empty string to `q` (this is the default value).

In [5]:
st <- stream_tweets(q = "", timeout = 15)
head(st)

Streaming tweets for 15 seconds...


Downloading: 530 kB     

Finished streaming tweets!
opening file input connection.


 Found 358 records... Imported 358 records. Simplifying...


closing file input connection.


status_id,created_at,user_id,screen_name,text,source,reply_to_status_id,reply_to_user_id,reply_to_screen_name,is_quote,⋯,retweet_text,place_url,place_name,place_full_name,place_type,country,country_code,geo_coords,coords_coords,bbox_coords
963610834650976256,2018-02-14 03:08:20,770129197,gabbyygandini,RT @AMIGALESI: i need a bag not a nigga,Twitter for iPhone,,,,False,,,,,,,,,"NA, NA","NA, NA","NA, NA, NA, NA, NA, NA, NA, NA"
963610834646786049,2018-02-14 03:08:20,170383225,_vanessabsilva,RT @NetflixBrasil: AGORA AS MADAMES VOLTAM N<U+00C9>.,Twitter for Android,,,,False,,,,,,,,,"NA, NA","NA, NA","NA, NA, NA, NA, NA, NA, NA, NA"
963610834650980353,2018-02-14 03:08:20,4561608132,zoealbury,RT @NBCOlympics: SHAUN. WHITE.,Twitter for iPhone,,,,False,,,,,,,,,"NA, NA","NA, NA","NA, NA, NA, NA, NA, NA, NA, NA"
963610834646814720,2018-02-14 03:08:20,110151486,katiee_babay,"RT @cloutboyjojo: so is macaroni and cheese ""mac"" because it<U+2019>s short for macaroni...or is it because ""mac"" is an acronym for macaroni and c<U+2026>",Twitter for iPhone,,,,False,,,,,,,,,"NA, NA","NA, NA","NA, NA, NA, NA, NA, NA, NA, NA"
963610834655236096,2018-02-14 03:08:20,716249509978849280,agosstinaperez,@JoaquinCabral10 @nicomagliano6,Twitter for Android,9.636013897284851e+17,3035433801.0,JoaquinCabral10,False,,,,,,,,,"NA, NA","NA, NA","NA, NA, NA, NA, NA, NA, NA, NA"
963610834655211520,2018-02-14 03:08:20,879178964664811520,imanis146,I liked a @YouTube video https://t.co/umKuwrxbTU Snoop dogg Smoke weed everyday HD (dubstep remix) [Antoine Daniel],Google,,,,False,,,,,,,,,"NA, NA","NA, NA","NA, NA, NA, NA, NA, NA, NA, NA"


### 2. Stream via `statuses/filter/track`

The second streaming method is to filter tweets in realtime by tracking keywords and phrases. To do this with rweet, each search term should be separated by comma. The comma will act like a boolean `OR` operator, meaning it should filter any tweet that matches at least one of the provided search terms. It's possible to include exact phrases, but the search syntax is limited in that it does not do partial matching and has few advanced features.

In [6]:
st <- stream_tweets(q = "boston,philadelphia", timeout = 15)
head(st)

Streaming tweets for 15 seconds...


Downloading: 30 kB      

Finished streaming tweets!
opening file input connection.


 Found 18 records... Imported 18 records. Simplifying...


closing file input connection.


status_id,created_at,user_id,screen_name,text,source,reply_to_status_id,reply_to_user_id,reply_to_screen_name,is_quote,⋯,retweet_text,place_url,place_name,place_full_name,place_type,country,country_code,geo_coords,coords_coords,bbox_coords
963613596872790016,2018-02-14 03:19:19,2285254543,BostoniaPublic,"RT @BostoniaPublic: Forgot to make #Valentinesday plans?! Spend it with us #boston ! #oysters special, #jdktrio at 830pm. call and make you<U+2026>",Twitter for Android,,,,False,,,,,,,,,"NA, NA","NA, NA","NA, NA, NA, NA, NA, NA, NA, NA"
963613602103152643,2018-02-14 03:19:20,287430319,MikeCampbell2,"RT @kylegriffin1: Just 2 days after WaPo<U+2019>s report on Scott Pruitt's extensive first class travel, Playbook finds Pruitt flying first class<U+2026>",Twitter for iPhone,,,,False,,,,,,,,,"NA, NA","NA, NA","NA, NA, NA, NA, NA, NA, NA, NA"
963613606251134976,2018-02-14 03:19:21,82069319,Unge_369,"RT @McJesse: GREAT JOB BOSTON DYNAMICS. YOU NAILED THE WAY I, A NORMAL HUMAN, ALWAYS ENTER A ROOM. https://t.co/OGmGTKHnC1",Twitter for Android,,,,False,,,,,,,,,"NA, NA","NA, NA","NA, NA, NA, NA, NA, NA, NA, NA"
963613606729404416,2018-02-14 03:19:21,529686942,the__mariano,RT @PauGG: So it begins <U+0001F627> #MetalHead #BlackMirror https://t.co/ruwxUzDvB3,Twitter for iPhone,,,,True,,,,,,,,,"NA, NA","NA, NA","NA, NA, NA, NA, NA, NA, NA, NA"
963613611884204033,2018-02-14 03:19:22,2874991611,34_colin,"RT @GenMJohansson: Dearest Abigail, GLORY TO NEW JERSEY! The dirty rat Gudas and the pinstriped militia did all they could to stop us, but<U+2026>",Twitter for iPhone,,,,False,,,,,,,,,"NA, NA","NA, NA","NA, NA, NA, NA, NA, NA, NA, NA"
963613612869865472,2018-02-14 03:19:23,9341102,cbrew,"RT @McJesse: GREAT JOB BOSTON DYNAMICS. YOU NAILED THE WAY I, A NORMAL HUMAN, ALWAYS ENTER A ROOM. https://t.co/OGmGTKHnC1",Twitter Web Client,,,,False,,,,,,,,,"NA, NA","NA, NA","NA, NA, NA, NA, NA, NA, NA, NA"


### 3. Stream via `statuses/filter/locations`

The third streaming method is to filter tweets in realtime by locating tweets via a bounding box of geographical coordinates. rtweet provides a convenience function `lookup_coords()` for looking up coordinate values. One nice feature is that there appears to be no limit on the size of the bounding box.

In [7]:
## lookup coords (the API can be a bit picky; you may need to run the first call for COMO twice to see get the values)
lookup_coords("Columbia, MO")

To format a bounding box query, enter (x,y) coordinates in the order of the bottom left (long, lat) and bottom right (long, lat). To search for tweets anywhere in the word, you would run `search_tweets(c(-180, -90, 180, 90))`. Alternatively, you can pass the output from lookup_coords and `search_tweets()` will do the rest!

In [8]:
st <- stream_tweets(q = lookup_coords("world"), timeout = 15)
head(st)

Streaming tweets for 15 seconds...


Downloading: 780 kB     

Finished streaming tweets!
opening file input connection.


 Found 500 records... Found 579 records... Imported 579 records. Simplifying...


closing file input connection.


status_id,created_at,user_id,screen_name,text,source,reply_to_status_id,reply_to_user_id,reply_to_screen_name,is_quote,⋯,retweet_text,place_url,place_name,place_full_name,place_type,country,country_code,geo_coords,coords_coords,bbox_coords
963616045637316608,2018-02-14 03:29:03,198647300,madgx_,this me rn. lol https://t.co/L1H4feqGmY,Twitter for iPhone,,,,True,,,https://api.twitter.com/1.1/geo/id/0de54c88126954b8.json,Waipahu,"Waipahu, HI",city,United States,US,"NA, NA","NA, NA","-158.03213, -158.03213, -157.99021, -157.99021, 21.36976, 21.39942, 21.39942, 21.36976"
963616045901721600,2018-02-14 03:29:03,1146959894,Double_Dimple13,<U+0001F602><U+0001F937><U+0001F3FD><U+200D><U+2640><U+FE0F> https://t.co/BjEYfolfnS,Twitter for iPhone,,,,True,,,https://api.twitter.com/1.1/geo/id/b8a1ceabefed490b.json,Muskegon Heights,"Muskegon Heights, MI",city,United States,US,"NA, NA","NA, NA","-86.26031, -86.26031, -86.22440, -86.22440, 43.18518, 43.21614, 43.21614, 43.18518"
963616046350561280,2018-02-14 03:29:03,2513587834,IsasmendiMingo,"@Clon_40 Perd<U+00F3>n <U+00BF>Y q culpa tiene el volc<U+00E1>n, para q los llenen de mierda?",Twitter for Android,9.636039472598221e+17,2737379453.0,Clon_40,False,,,https://api.twitter.com/1.1/geo/id/0085b01d4f65f576.json,Salta,"Salta, Argentina",city,Argentina,AR,"NA, NA","NA, NA","-65.53251, -65.53251, -65.35870, -65.35870, -24.87162, -24.74685, -24.74685, -24.87162"
963616041178935297,2018-02-14 03:29:01,3333648687,FelipePontin,@grazzibfk @hiromi_jdv @giupimentell,Twitter for iPhone,9.63610867786027e+17,2538254939.0,grazzibfk,False,,,https://api.twitter.com/1.1/geo/id/c77de3537317f046.json,Engenheiro Beltr<U+00E3>o,"Engenheiro Beltr<U+00E3>o, Brasil",city,Brasil,BR,"NA, NA","NA, NA","-52.44698, -52.44698, -52.11107, -52.11107, -23.88948, -23.63676, -23.63676, -23.88948"
963616044375052288,2018-02-14 03:29:02,2859804285,ManuelSmyller,falo pra mim que gosta de sexo bruto se eu quebrar essa mina aonde <U+00E9> que eu compro outra <U+0001F3B6>,Twitter Web Client,,,,False,,,https://api.twitter.com/1.1/geo/id/97bcdfca1a2dca59.json,Rio de Janeiro,"Rio de Janeiro, Brasil",city,Brasil,BR,"NA, NA","NA, NA","-43.79545, -43.79545, -43.08771, -43.08771, -23.08302, -22.73982, -22.73982, -23.08302"
963616046329495559,2018-02-14 03:29:03,344784262,Mendeeeeez,Aqu<U+00ED> casual ense<U+00F1><U+00E1>ndole a rocio.98 la nueva t<U+00E9>cnica del Tenis ((: Te<U+2026> https://t.co/66ly48qZjl,Instagram,,,,False,,,https://api.twitter.com/1.1/geo/id/13d479b108707983.json,Guatemala,Guatemala,country,Guatemala,GT,"14.59526, -90.48416","-90.48416, 14.59526","-92.22739, -92.22739, -88.22106, -88.22106, 13.72873, 17.81621, 17.81621, 13.72873"


### 4. Stream via `statuses/filter/follow`

The fourth streaming method is to filter tweets in realtime by following tweets from or about specific users. At the current time, this functionality is broken in rtweet, though it's possible to achieve the same effect by including a comma separated list of screen names.

In [9]:
st <- stream_tweets(q = "realDonaldTrump,HillaryClinton", timeout = 15)
head(st)

Streaming tweets for 15 seconds...


Downloading: 290 kB     

Finished streaming tweets!
opening file input connection.


 Found 150 records... Imported 150 records. Simplifying...


closing file input connection.


status_id,created_at,user_id,screen_name,text,source,reply_to_status_id,reply_to_user_id,reply_to_screen_name,is_quote,⋯,retweet_text,place_url,place_name,place_full_name,place_type,country,country_code,geo_coords,coords_coords,bbox_coords
963617056309547009,2018-02-14 03:33:04,340354157,PapasunBill,RT @krassenstein: Hilarious! RT to @realDonaldTrump https://t.co/Hs0w7cnNME,Twitter for iPad,,,,False,,,,,,,,,"NA, NA","NA, NA","NA, NA, NA, NA, NA, NA, NA, NA"
963617056666001408,2018-02-14 03:33:04,1731983996,theinfoaddict,RT @tedlieu: Michael Cohen admits he paid $130k in hush money to Daniels during presidential campaign to cover up her affair with @realDona<U+2026>,Twitter Web Client,,,,True,,,,,,,,,"NA, NA","NA, NA","NA, NA, NA, NA, NA, NA, NA, NA"
963617058176069632,2018-02-14 03:33:04,3960210861,trumpcardiac,RT @complaintdept__: @realDonaldTrump working on that heart attack I see. Eat up! https://t.co/xor7S5CnmN,Twitter for iPhone,,,,True,,,,,,,,,"NA, NA","NA, NA","NA, NA, NA, NA, NA, NA, NA, NA"
963617058956152837,2018-02-14 03:33:04,516447380,BacksweetBranch,RT @4YrsToday: Are you disgusted @realDonaldTrump had a conversation with Putin yesterday (02/12) and no one knew about it?,Twitter Web Client,,,,False,,,,,,,,,"NA, NA","NA, NA","NA, NA, NA, NA, NA, NA, NA, NA"
963617059115483136,2018-02-14 03:33:04,15685838,ToddHeadleeAZ,"The more I hear about this innovative idea the more I like it. This is what we need in this country, someone like<U+2026> https://t.co/6Vv2GXutwp",Twitter for iPhone,,,,True,,,,,,,,,"NA, NA","NA, NA","NA, NA, NA, NA, NA, NA, NA, NA"
963617059497234432,2018-02-14 03:33:04,156680202,RMJewell,"RT @WilDonnelly: Michael Cohen now admits that he paid the $130,000 to Stormy Daniels out of his own pocket. Since it was meant to help Tru<U+2026>",Twitter for iPhone,,,,True,,,,,,,,,"NA, NA","NA, NA","NA, NA, NA, NA, NA, NA, NA, NA"
