# Article Notebook for Scraping Twitter Using snscrape's CLI Commands With Python
<br>Package Github: https://github.com/JustAnotherArchivist/snscrape
<br>This notebook will be using the development version of snscrape

Article Read-Along: https://medium.com/better-programming/how-to-scrape-tweets-with-snscrape-90124ed006af

### Notebook Author: Martin Beck
<b>Information current as of November, 26th 2020</b><br>

This notebook contains materials for scraping tweets from Twitter using snscrape's CLI commands with Python

<b>Dependencies: </b> 
- Your <b>Python</b> version must be <b>3.8</b> or higher. The development version of snscrape will not work with Python 3.7 or lower. You can download the latest Python version [here](https://www.python.org/downloads/).
- <b>Development version of snscrape</b>, uncomment the pip install line in the below cell to pip install in the notebook if you don't already have it.
- <b>Pandas</b>, the dataframes allows easy manipulation and indexing of data, this is more of a preference but is what I follow in this notebook.

In [4]:
# Run the pip install command below if you don't already have the library
# !pip install git+https://github.com/JustAnotherArchivist/snscrape.git

# Run the below command if you don't already have Pandas
# !pip install pandas

# Imports
import os
import pandas as pd

# Query by Username
The code below will scrape for 100 tweets by a username then provide a CSV file with Pandas

In [None]:
# Setting variables to be used in format string command below
tweet_count = 100
username = "jack"

# Using OS library to call CLI commands in Python
os.system("snscrape --jsonl --max-results {} twitter-search 'from:{}'> user-tweets.json".format(tweet_count, username))

In [6]:
# Reads the json generated from the CLI command above and creates a pandas dataframe
tweets_df1 = pd.read_json('user-tweets.json', lines=True)

# Displays first 5 entries from dataframe
tweets_df1.head()

Unnamed: 0,url,date,content,renderedContent,id,user,outlinks,tcooutlinks,replyCount,retweetCount,likeCount,quoteCount,conversationId,lang,source,media,retweetedTweet,quotedTweet,mentionedUsers
0,https://twitter.com/jack/status/13324354308016...,2020-11-27 21:25:36+00:00,@JesseDorogusker @Square ❤️,@JesseDorogusker @Square ❤️,1332435430801690624,"{'username': 'jack', 'displayname': 'jack', 'i...",[],[],54,8,226,1,1332428871891775488,und,"<a href=""http://twitter.com/download/iphone"" r...",,,,"[{'username': 'JesseDorogusker', 'displayname'..."
1,https://twitter.com/jack/status/13291496370060...,2020-11-18 19:49:02+00:00,@NeerajKA Welcome!,@NeerajKA Welcome!,1329149637006041088,"{'username': 'jack', 'displayname': 'jack', 'i...",[],[],72,14,800,8,1329140522565439490,en,"<a href=""http://twitter.com/download/iphone"" r...",,,,"[{'username': 'NeerajKA', 'displayname': 'Neer..."
2,https://twitter.com/jack/status/13291372550263...,2020-11-18 18:59:50+00:00,Join @CashApp! #Bitcoin https://t.co/SbYANIZyix,Join @CashApp! #Bitcoin twitter.com/owenbjenni...,1329137255026311168,"{'username': 'jack', 'displayname': 'jack', 'i...",[https://twitter.com/owenbjennings/status/1329...,[https://t.co/SbYANIZyix],585,277,2507,132,1329137255026311168,en,"<a href=""http://twitter.com/download/iphone"" r...",,,{'url': 'https://twitter.com/owenbjennings/sta...,"[{'username': 'CashApp', 'displayname': 'Cash ..."
3,https://twitter.com/jack/status/13291366656847...,2020-11-18 18:57:29+00:00,@kateconger @sarahintampa Nah,@kateconger @sarahintampa Nah,1329136665684705280,"{'username': 'jack', 'displayname': 'jack', 'i...",[],[],38,5,176,10,1329126492731699203,und,"<a href=""http://twitter.com/download/iphone"" r...",,,,"[{'username': 'kateconger', 'displayname': 'o...."
4,https://twitter.com/jack/status/13291358061921...,2020-11-18 18:54:05+00:00,@mmasnick Terrible idea! And terribly false.,@mmasnick Terrible idea! And terribly false.,1329135806192107521,"{'username': 'jack', 'displayname': 'jack', 'i...",[],[],51,13,222,16,1329128773845860352,en,"<a href=""http://twitter.com/download/iphone"" r...",,,,"[{'username': 'mmasnick', 'displayname': 'Mike..."


In [7]:
# Export dataframe into a CSV
tweets_df1.to_csv('user-tweets.csv', sep=',', index=False)

# Query by Text Search
The code below will scrape for 500 tweets between June 1st, 2020 and July 31st, 2020, by a text search then provide a CSV file with Pandas

In [None]:
# Setting variables to be used in format string command below
tweet_count = 500
text_query = "its the elephant"
since_date = "2020-06-01"
until_date = "2020-07-31"

# Using OS library to call CLI commands in Python
os.system('snscrape --jsonl --max-results {} --since {} twitter-search "{} until:{}"> text-query-tweets.json'.format(tweet_count, since_date, text_query, until_date))

In [9]:
# Reads the json generated from the CLI command above and creates a pandas dataframe
tweets_df2 = pd.read_json('text-query-tweets.json', lines=True)

# Displays first 5 entries from dataframe
tweets_df2.head()

Unnamed: 0,url,date,content,renderedContent,id,user,outlinks,tcooutlinks,replyCount,retweetCount,likeCount,quoteCount,conversationId,lang,source,media,retweetedTweet,quotedTweet,mentionedUsers
0,https://twitter.com/TylerPaulUtt1/status/12889...,2020-07-30 23:57:02+00:00,@SiBuduh @langoinstitute do you know the Ko wo...,@SiBuduh @langoinstitute do you know the Ko wo...,1288986997143601152,"{'username': 'TylerPaulUtt1', 'displayname': '...",[],[],1,0,0,0,1288307058928947204,en,"<a href=""http://twitter.com/#!/download/ipad"" ...",,,,"[{'username': 'SiBuduh', 'displayname': 'Ed Lu..."
1,https://twitter.com/EndlessSynthwav/status/128...,2020-07-30 23:44:04+00:00,@RockstarGames Any idea if the elephant rifle ...,@RockstarGames Any idea if the elephant rifle ...,1288983731122966534,"{'username': 'EndlessSynthwav', 'displayname':...",[],[],0,0,0,0,1288983731122966534,en,"<a href=""http://twitter.com/download/android"" ...",,,,"[{'username': 'RockstarGames', 'displayname': ..."
2,https://twitter.com/aanalyst50/status/12889677...,2020-07-30 22:40:40+00:00,@realDonaldTrump Trump just keeps ignoring the...,@realDonaldTrump Trump just keeps ignoring the...,1288967774795116550,"{'username': 'aanalyst50', 'displayname': 'Don...",[],[],0,0,1,0,1288966119676616704,en,"<a href=""http://twitter.com/download/iphone"" r...",,,,"[{'username': 'realDonaldTrump', 'displayname'..."
3,https://twitter.com/RozeyBozzy/status/12889669...,2020-07-30 22:37:18+00:00,@cslogan88 Famous 19th century song from Engla...,@cslogan88 Famous 19th century song from Engla...,1288966929236066309,"{'username': 'RozeyBozzy', 'displayname': 'Roz...",[],[],0,0,0,0,1288965246602838017,en,"<a href=""https://mobile.twitter.com"" rel=""nofo...",,,,"[{'username': 'cslogan88', 'displayname': 'Chr..."
4,https://twitter.com/alfred_hanan/status/128896...,2020-07-30 22:32:44+00:00,@realDonaldTrump #RepublicanTrumpVirus.\nLets ...,@realDonaldTrump #RepublicanTrumpVirus.\nLets ...,1288965780030144512,"{'username': 'alfred_hanan', 'displayname': 'A...",[],[],0,0,0,0,1288947487911419905,en,"<a href=""http://twitter.com/download/android"" ...",,,,"[{'username': 'realDonaldTrump', 'displayname'..."


In [10]:
# Export dataframe into a CSV
tweets_df2.to_csv('text-query-tweets.csv', sep=',', index=False)