# **Collecting Twitter Data**

This interactive Jupyter notebook will allow you to experiment with Twarc without prior set-up or installation.

Most of the cells below include code that should be run on the command line. These cells all begin with an exclamation point `!`. The `!` allows a Jupyter notebook to run code from a shell.
![the command line](images/command-line.png)


# **Run a Jupyter cell**

In [119]:
print('Nice! You did it. You just ran a cell.')

Nice! You did it. You just ran a cell.


# **Installation**

### Python

I recommend installing Python with Anaconda:

https://docs.continuum.io/anaconda/install/

### Twarc 

To install twarc, you can run "pip install twarc" on the command line. The command below specifies the latest version.

In [None]:
!pip install 'twarc == 1.7.5'

Or you can download and open twarc as a zip file: https://github.com/DocNow/twarc/archive/master.zip

More detailed instruction about twarc and installation can be found at https://github.com/DocNow/twarc

# **Set up a Twitter developer account**

*Twarc won't work in this notebook unless you configure it with your own consumer key, consumer secret, access token, and access token secret.

1. Create a Twitter developer account and Twitter application. If you haven't done so yet, you can follow the instructions on our GitHub repo: https://github.com/melaniewalsh/Humanities-Data-Society/blob/master/TwitterFirstSteps.md

2. Record consumer key, consumer secret, access token, and access token secret

3. Open a terminal

![](images/terminal.png)

4. Configure twarc by entering `twarc configure` and following the prompts

![](images/twarc-configure.png)

Now you should be able to use twarc in this notebook!

# **Collecting Twitter Data**

>**“Ok boomer”** has become Generation Z’s endlessly repeated retort to the problem of older people who just don’t get it, a rallying cry for millions of fed up kids. Teenagers use it to reply to cringey YouTube videos, Donald Trump tweets, and basically any person over 30 who says something condescending about young people — and the issues that matter to them."

> -Taylor Lorenz, ["‘OK Boomer’ Marks the End of Friendly Generational Relations"](https://www.nytimes.com/2019/10/29/style/ok-boomer.html)

## Filter realtime (live)

In [None]:
!twarc filter "ok boomer" > ok_boomer_filter.jsonl

## Search (last 7 days)

In [None]:
!twarc search "ok boomer" > ok_boomer_search.jsonl

## Check how many tweets have been collected

*The command "wc" with the "-l" flag tells you how many lines are in a file*

In [124]:
!wc -l ok_boomer_filter.jsonl

     166 ok_boomer_filter.jsonl


In [126]:
!wc -l ok_boomer_search.jsonl

   18000 ok_boomer_search.jsonl


## Convert JSON file to CSV file

In [127]:
!python twarc/utils/json2csv.py ok_boomer_filter.jsonl > ok_boomer_filter.csv
!python twarc/utils/json2csv.py ok_boomer_search.jsonl > ok_boomer_search.csv

## Import the Python library "pandas" and read in tweet CSV files

In [132]:
import pandas
pandas.set_option('max_colwidth', 2000)
pandas.set_option('max_columns', 2000)
pandas.set_option('max_rows', 100)
ok_boomer_filter = pandas.read_csv('ok_boomer_filter.csv')
ok_boomer_search = pandas.read_csv('ok_boomer_search.csv')

## See entire CSV file for first 10 rows

In [129]:
ok_boomer_filter.head(10)

Unnamed: 0,id,tweet_url,created_at,parsed_created_at,user_screen_name,text,tweet_type,coordinates,hashtags,media,urls,favorite_count,in_reply_to_screen_name,in_reply_to_status_id,in_reply_to_user_id,lang,place,possibly_sensitive,retweet_count,retweet_or_quote_id,retweet_or_quote_screen_name,retweet_or_quote_user_id,source,user_id,user_created_at,user_default_profile_image,user_description,user_favourites_count,user_followers_count,user_friends_count,user_listed_count,user_location,user_name,user_statuses_count,user_time_zone,user_urls,user_verified
0,1195371497151447041,https://twitter.com/add_a_shack/status/1195371497151447041,Fri Nov 15 16:02:28 +0000 2019,2019-11-15 16:02:28+00:00,add_a_shack,RT @thenoelmiller: y’all sayin “ok boomer” like you don’t smoke 4 juul packs a day. U and grandma gona have matching neck holes,retweet,,,,,0,,,,en,,,0,1.194878e+18,thenoelmiller,2633625000.0,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>",577169778,Fri May 11 12:18:01 +0000 2012,False,"#SHS2016 Not class of TU2020. Hello, is it me you’re looking for? No? I’ll see myself out.",7585,93,188,0,Olney,Noam Adashek,4025,,,False
1,1195371497206042629,https://twitter.com/Tarleton_exe/status/1195371497206042629,Fri Nov 15 16:02:28 +0000 2019,2019-11-15 16:02:28+00:00,Tarleton_exe,"RT @RAZ0RFIST: 'ok boomer' came and went so fast, actual boomers didn't even have time to use it incorrectly.",retweet,,,,,0,,,,en,,,0,1.195298e+18,RAZ0RFIST,210424000.0,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>",795081826197008384,Sun Nov 06 01:54:14 +0000 2016,False,"Reporter for Conflict News, owner of r/rickandmorty",22820,373,1527,4,"District of Columbia, USA",Richard Tarleton,31418,,,False
2,1195371500804673537,https://twitter.com/miagreymane/status/1195371500804673537,Fri Nov 15 16:02:29 +0000 2019,2019-11-15 16:02:29+00:00,miagreymane,RT @PlayStationUK: OK boomer https://t.co/SdQf3DIiap,retweet,,,https://pbs.twimg.com/media/EJafcafW4AIcsNA.jpg,,0,,,,en,,False,0,1.195318e+18,PlayStationUK,347877300.0,"<a href=""http://twitter.com"" rel=""nofollow"">Twitter Web Client</a>",1069676753142960128,Mon Dec 03 19:36:26 +0000 2018,False,,5513,46,41,0,,Mia,2116,,,False
3,1195371501064605697,https://twitter.com/elitepettersson/status/1195371501064605697,Fri Nov 15 16:02:29 +0000 2019,2019-11-15 16:02:29+00:00,elitepettersson,ok boomer,quote,,,,,0,,,,en,,,0,1.19519e+18,ByMHarrington,44474000.0,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>",1003739311,Tue Dec 11 10:08:56 +0000 2012,False,im going to die on this planet,19737,264,268,12,hell,Karen the Manager,9913,,,False
4,1195371504197931009,https://twitter.com/jldobrinsky/status/1195371504197931009,Fri Nov 15 16:02:30 +0000 2019,2019-11-15 16:02:30+00:00,jldobrinsky,RT @FauxNealBrown: ThE StUdEnT SeCtiOn Is a PrObLeM!!!\n\nok boomer https://t.co/7Oks84x43x,retweet,,,https://pbs.twimg.com/media/EJYGB-OXUAAn9oC.jpg https://pbs.twimg.com/media/EJYGCC5WoAE2dw7.jpg,,0,,,,en,,False,0,1.195149e+18,FauxNealBrown,1.046555e+18,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>",3389510987,Thu Jul 23 16:50:20 +0000 2015,False,"Chairman @wvfcr, mountaineer, big school choice gal, writer, & dog mom. #wvpol INTJ. ✞ opinions are my own.",21695,1648,696,24,"Morgantown, WV",Jessica Dobrinsky,3674,,,False
5,1195371505158426624,https://twitter.com/Blablaaff/status/1195371505158426624,Fri Nov 15 16:02:30 +0000 2019,2019-11-15 16:02:30+00:00,Blablaaff,RT @Maillloche: ok boomer https://t.co/U3AA1eoqx5,retweet,,,,https://twitter.com/libe/status/1194921347350245376,0,,,,en,,False,0,1.194956e+18,Maillloche,469450700.0,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>",465702985,Mon Jan 16 17:04:44 +0000 2012,False,"On m'a demandé de choisir un avenir, mais j'ai décidé de devenir technicien du spectacle 🤷‍♂️",11171,88,386,1,,Bébé D'amour,2316,,,False
6,1195371507511431168,https://twitter.com/uchronik451/status/1195371507511431168,Fri Nov 15 16:02:30 +0000 2019,2019-11-15 16:02:30+00:00,uchronik451,@Le___Doc Ok boomer.,reply,,,,,0,Le___Doc,1.195327e+18,8.046639e+17,en,,,0,,,,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>",301995765,Fri May 20 12:27:38 +0000 2011,False,Underground Railroad,8139,198,462,4,"Paris, France",Uchronik,23657,,,False
7,1195371513790251008,https://twitter.com/ratatouvee/status/1195371513790251008,Fri Nov 15 16:02:32 +0000 2019,2019-11-15 16:02:32+00:00,ratatouvee,i take joy in commenting ok boomer on evan stans and people who say they should have made nancy drew had sex on tv time,original,,,,,0,,,,en,,,0,,,,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>",981938856999247872,Thu Apr 05 16:57:21 +0000 2018,False,𝘧𝘪𝘳𝘴𝘵 𝘰𝘧𝘧... 𝙞𝙩 𝙨𝙪𝙘𝙠𝙨 𝙮𝙤𝙪’𝙧𝙚 𝙖𝙫𝙤𝙞𝙙𝙞𝙣𝙜 𝙢𝙚. @parkfivel,24111,2313,1165,27,18 | they/them 💫,vee 🐀,11997,,,False
8,1195371517032435712,https://twitter.com/Hufflepoouff/status/1195371517032435712,Fri Nov 15 16:02:33 +0000 2019,2019-11-15 16:02:33+00:00,Hufflepoouff,RT @Maillloche: ok boomer https://t.co/U3AA1eoqx5,retweet,,,,https://twitter.com/libe/status/1194921347350245376,0,,,,en,,False,0,1.194956e+18,Maillloche,469450700.0,"<a href=""http://twitter.com/download/android"" rel=""nofollow"">Twitter for Android</a>",50816873,Thu Jun 25 23:37:09 +0000 2009,False,"Absolutely no racism, homophobia, transphobia, sexism, ableism, fatphobia or general hatefulness allowed in this area",46683,364,678,34,Paris,Qui m'a envoyé ? ⭐⭐,22703,,,False
9,1195371518009757696,https://twitter.com/cronebitch/status/1195371518009757696,Fri Nov 15 16:02:33 +0000 2019,2019-11-15 16:02:33+00:00,cronebitch,RT @thenoelmiller: y’all sayin “ok boomer” like you don’t smoke 4 juul packs a day. U and grandma gona have matching neck holes,retweet,,,,,0,,,,en,,,0,1.194878e+18,thenoelmiller,2633625000.0,"<a href=""http://twitter.com/download/android"" rel=""nofollow"">Twitter for Android</a>",708452051609587716,Sat Mar 12 00:38:26 +0000 2016,False,I could be a man with a fistful of hammers and a trunk full of duct tape. -Dennis Reynolds,33485,269,322,2,Death Mountain,B🔮,21659,,,False


## See only select select columns

In [133]:
ok_boomer_filter[['created_at', 'tweet_type', 'text', 'user_name', 'user_screen_name' , 'user_location', 'hashtags', 'urls', 'retweet_count']].head(100)

Unnamed: 0,created_at,tweet_type,text,user_name,user_screen_name,user_location,hashtags,urls,retweet_count
0,Fri Nov 15 16:02:28 +0000 2019,retweet,RT @thenoelmiller: y’all sayin “ok boomer” like you don’t smoke 4 juul packs a day. U and grandma gona have matching neck holes,Noam Adashek,add_a_shack,Olney,,,0
1,Fri Nov 15 16:02:28 +0000 2019,retweet,"RT @RAZ0RFIST: 'ok boomer' came and went so fast, actual boomers didn't even have time to use it incorrectly.",Richard Tarleton,Tarleton_exe,"District of Columbia, USA",,,0
2,Fri Nov 15 16:02:29 +0000 2019,retweet,RT @PlayStationUK: OK boomer https://t.co/SdQf3DIiap,Mia,miagreymane,,,,0
3,Fri Nov 15 16:02:29 +0000 2019,quote,ok boomer,Karen the Manager,elitepettersson,hell,,,0
4,Fri Nov 15 16:02:30 +0000 2019,retweet,RT @FauxNealBrown: ThE StUdEnT SeCtiOn Is a PrObLeM!!!\n\nok boomer https://t.co/7Oks84x43x,Jessica Dobrinsky,jldobrinsky,"Morgantown, WV",,,0
5,Fri Nov 15 16:02:30 +0000 2019,retweet,RT @Maillloche: ok boomer https://t.co/U3AA1eoqx5,Bébé D'amour,Blablaaff,,,https://twitter.com/libe/status/1194921347350245376,0
6,Fri Nov 15 16:02:30 +0000 2019,reply,@Le___Doc Ok boomer.,Uchronik,uchronik451,"Paris, France",,,0
7,Fri Nov 15 16:02:32 +0000 2019,original,i take joy in commenting ok boomer on evan stans and people who say they should have made nancy drew had sex on tv time,vee 🐀,ratatouvee,18 | they/them 💫,,,0
8,Fri Nov 15 16:02:33 +0000 2019,retweet,RT @Maillloche: ok boomer https://t.co/U3AA1eoqx5,Qui m'a envoyé ? ⭐⭐,Hufflepoouff,Paris,,https://twitter.com/libe/status/1194921347350245376,0
9,Fri Nov 15 16:02:33 +0000 2019,retweet,RT @thenoelmiller: y’all sayin “ok boomer” like you don’t smoke 4 juul packs a day. U and grandma gona have matching neck holes,B🔮,cronebitch,Death Mountain,,,0


In [134]:
ok_boomer_search[['created_at', 'tweet_type','text', 'user_name', 'user_screen_name', 'user_location', 'hashtags', 'urls','retweet_count']].head(100)

Unnamed: 0,created_at,tweet_type,text,user_name,user_screen_name,user_location,hashtags,urls,retweet_count
0,Fri Nov 15 16:05:37 +0000 2019,reply,@TheresaWarring @kozykaychuck @TheSocialCTV Ok boomer,♛ Gordon Bombay ♛,blackpumahat,"Oakland, CA",,,0
1,Fri Nov 15 16:05:36 +0000 2019,reply,@Buff78Sea Ok boomer,Anthony 🎸,anthonyonguitar,"Nashville, TN",,,0
2,Fri Nov 15 16:05:36 +0000 2019,quote,"......\n\n""OK, boomer."" https://t.co/4Fnh3mRIRh",♞ : 世界を喰う者。,ruovedevour,"⠀⠀──zwei seiten, endymion's.",,https://twitter.com/averruncare/status/1195361939095863296,0
3,Fri Nov 15 16:05:35 +0000 2019,retweet,RT @thenoelmiller: y’all sayin “ok boomer” like you don’t smoke 4 juul packs a day. U and grandma gona have matching neck holes,XxxJolly_Lee_ClarkxxX,TrapRegent,"DeKalb, IL",,,32397
4,Fri Nov 15 16:05:35 +0000 2019,retweet,RT @thenoelmiller: y’all sayin “ok boomer” like you don’t smoke 4 juul packs a day. U and grandma gona have matching neck holes,ThisDudeNamedMichael,MikeKKatch22,,,,32397
5,Fri Nov 15 16:05:30 +0000 2019,original,"I think the fact boomers get mad at the meme “ok boomer” is keeping it alive, \n\nand they don’t understand that because they are 𝘉𝘰𝘰𝘮𝘦𝘳𝘴",M.DOOM,MotherOfDoggons,"Las Vegas, NV",,,0
6,Fri Nov 15 16:05:30 +0000 2019,reply,@lgoonyflyboy sag ihr einfach: ok boomer,Jower,jowajohv,"Berlin, Deutschland",,,0
7,Fri Nov 15 16:05:29 +0000 2019,retweet,RT @qyutapie: nct: lets all colour code to support dream!\nyuta: ok boomer\n\n#THE_DREAM_SHOW https://t.co/UidjTucQNf,jackie misses superm :(,etherealjackie,ENG | ESP | 日本語,THE_DREAM_SHOW,,15
8,Fri Nov 15 16:05:24 +0000 2019,reply,@teobrien3 Ok boomer.,TJ Ledbury,tjledbury,781 --- 603,,,0
9,Fri Nov 15 16:05:22 +0000 2019,retweet,RT @Maillloche: ok boomer https://t.co/U3AA1eoqx5,Purveiller et sunir,PtiteSourceuse,,,https://twitter.com/libe/status/1194921347350245376,496


# **Basic Tweet Analysis**

### Twarc utilities `twarc/utils`

## Identify Top Hashtags `twarc/utils/tags.py`

In [None]:
!python twarc/utils/tags.py ok_boomer_search.jsonl

## Create a Word Cloud `twarc/utils/wordcloud.py`

In [118]:
!python twarc/utils/wordcloud.py ok_boomer_search.jsonl > ok_boomer_search.html

[ok_boomer_search.html](ok_boomer_search.html)

In [121]:
!python twarc/utils/wordcloud.py ok_boomer_filter.jsonl > ok_boomer_filer.html

View your word cloud:

[ok_boomer_filter.html](ok_boomer_filter.html)

## Identify Top Emojis `twarc/utils/emojis.py`

In [None]:
!python twarc/utils/emojis.py ok_boomer_search.jsonl | head -n 20