# Article Notebook for Scraping Twitter Using snscrape's Python Wrapper
<br>Package Github: https://github.com/JustAnotherArchivist/snscrape
<br>This notebook will be using the development version of snscrape

Article Read-Along: https://medium.com/better-programming/how-to-scrape-tweets-with-snscrape-90124ed006af

### Notebook Author: Martin Beck
<b>Information current as of November, 28th 2020</b><br>

This notebook contains materials for scraping tweets from Twitter using snscrape's Python Wrapper

<b>Dependencies: </b> 
- Your <b>Python</b> version must be <b>3.8</b> or higher. The development version of snscrape will not work with Python 3.7 or lower. You can download the latest Python version [here](https://www.python.org/downloads/).
- <b>Development version of snscrape</b>, uncomment the pip install line in the below cell to pip install in the notebook if you don't already have it.
- <b>Pandas</b>, the dataframes allows easy manipulation and indexing of data, this is more of a preference but is what I follow in this notebook.

In [1]:
# Run the pip install command below if you don't already have the library
# !pip install git+https://github.com/JustAnotherArchivist/snscrape.git

# Run the below command if you don't already have Pandas
# !pip install pandas

# Imports
import snscrape.modules.twitter as sntwitter
import pandas as pd

# Query by Username
The code below will scrape for 100 tweets by a username then provide a CSV file with Pandas

In [2]:
# Setting variables to be used below
maxTweets = 10

# Creating list to append tweet data to
tweets_list1 = []

# Using TwitterSearchScraper to scrape data and append tweets to list
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:jack').get_items()):
    if i>maxTweets:
        break
    tweets_list1.append([tweet.date, tweet.id, tweet.retweetCount, tweet.likeCount, tweet.replyCount, tweet.content, tweet.lang, tweet.user.username])

  tweets_list1.append([tweet.date, tweet.id, tweet.retweetCount, tweet.likeCount, tweet.replyCount, tweet.content, tweet.lang, tweet.user.username])


In [3]:

# Creating a dataframe from the tweets list above
tweets_df1 = pd.DataFrame(tweets_list1, columns=['Datetime', 'Tweet Id', 'RetweetCount', 'Likes','Replies','Text','language','Username'])

# Display first 5 entries from dataframe
tweets_df1.head()

Unnamed: 0,Datetime,Tweet Id,RetweetCount,Likes,Replies,Text,language,Username
0,2023-01-31 22:38:10+00:00,1620552043609096192,606,2423,247,and Google Play Store: https://t.co/1Ve7GIBG0F...,en,jack
1,2023-01-31 22:38:09+00:00,1620552041600000000,1960,7723,848,a milestone for open protocols...\n\n#nostr is...,en,jack
2,2023-01-02 01:13:43+00:00,1609719555827408897,64,263,45,@damusapp 👀,und,jack
3,2023-01-01 06:00:30+00:00,1609429338641793027,1481,16128,2274,pura vida,pt,jack
4,2022-12-30 19:05:37+00:00,1608902142924062722,37,84,13,@jb55 👀,und,jack


In [4]:
# Export dataframe into a CSV
tweets_df1.to_csv('user-tweets.csv', sep=',', index=False)

# Query by Text Search
The code below will scrape for 500 tweets between June 1st, 2020 and July 31st, 2020, by a text search then provide a CSV file with Pandas

In [11]:
# Setting variables to be used below
def twitter_scraper(title, maxTweets):

    # Creating list to append tweet data to
    tweets_list2 = []

    # Using TwitterSearchScraper to scrape data and append tweets to list
    for i,tweet in enumerate(sntwitter.TwitterSearchScraper('social '+ title+ ' Africa since:2018-01-01 until:2018-12-31').get_items()):
        if i>maxTweets:
            break
        tweets_list2.append([tweet.date, tweet.id, tweet.retweetCount, tweet.likeCount, tweet.replyCount, tweet.content, tweet.lang, tweet.user.username])

    return tweets_list2

def list_twitter_scraper(list_title, maxTweets):
    list_title_tweets = []
    for title in list_title:
        list_title_tweets.append(twitter_scraper(title, maxTweets))

    return list_title_tweets
tweet_lists = list_twitter_scraper(["concern", "problem"], 5000)
#tweet_lists = list_twitter_scraper(["concern", "problem", "challenge", "worry", "issue", "question"], 5000)

  tweets_list2.append([tweet.date, tweet.id, tweet.retweetCount, tweet.likeCount, tweet.replyCount, tweet.content, tweet.lang, tweet.user.username])


In [12]:
# Creating a dataframe from the tweets list above
tweet_df = []
for tweet in tweet_lists:

    tweet_df.append(pd.DataFrame(tweet, columns=['Datetime', 'Tweet Id', 'RetweetCount', 'Likes','Replies', 'Text','language','Username']))

# Display first 5 entries from dataframe
df_tweet = pd.concat(tweet_df)
df_tweet.head(10)

Unnamed: 0,Datetime,Tweet Id,RetweetCount,Likes,Replies,Text,language,Username
0,2018-12-23 20:17:23+00:00,1076934818481491968,1,37,4,4/ Any changes to our fares will be communicat...,en,pamushana_
1,2018-12-22 19:11:33+00:00,1076555860766322688,0,0,0,The South African Police Service has noted wit...,en,CPFWierda
2,2018-12-21 08:00:37+00:00,1076024627892232192,0,0,0,"#AlShabaab, #ISIL and #BokoHaram all use socia...",en,RANDEurope
3,2018-12-15 08:45:54+00:00,1073861695544668160,0,3,0,@Zemedeneh @toluogunlesi @Heritage Quality con...,en,ilnana55
4,2018-12-11 03:58:18+00:00,1072339769009553408,14,36,39,#ICYMI #IFP leader Mangosuthu Buthelezi expres...,en,SABCNews
5,2018-12-09 07:40:52+00:00,1071671004664131584,0,0,0,@IFPinParliament leader #MangosuthuButhelezi h...,en,KiratLalla
6,2018-12-09 07:40:24+00:00,1071670887584329728,0,0,0,@IFPinParliament leader #MangosuthuButhelezi h...,en,SAfmnews
7,2018-12-09 07:25:48+00:00,1071667212321218560,0,0,1,IFP leader Mangosuthu Buthelezi has expressed...,en,Radio2000_ZA
8,2018-12-01 20:38:03+00:00,1068967486241075200,0,1,1,@FrVonk @Kholofelo_Nubia @MerchantPickens @Bru...,en,grootdawid
9,2018-12-01 02:40:32+00:00,1068696320473530368,0,0,0,@ProfChalmers I am writing from Sierra Leone ...,en,ChildrenWelfare


In [13]:
# Export dataframe into a CSV
df_tweet.to_csv('concerns20182.csv', sep=',', index=False)