# SNScrape Twitter Scraper

In [2]:
import pandas as pd
from tqdm.auto import tqdm
import snscrape.modules.twitter as sntwitter

  from .autonotebook import tqdm as notebook_tqdm


## Available Functions of Snscrape

In [3]:
print(dir(sntwitter))

['Coordinates', 'DescriptionURL', 'Gif', 'GuestTokenManager', 'Medium', 'Photo', 'Place', 'TextLink', 'Trend', 'Tweet', 'TwitterHashtagScraper', 'TwitterListPostsScraper', 'TwitterProfileScraper', 'TwitterSearchScraper', 'TwitterTrendsScraper', 'TwitterTweetScraper', 'TwitterTweetScraperMode', 'TwitterUserScraper', 'User', 'UserLabel', 'Video', 'VideoVariant']


We can use the above functions to scrape tweets from Twitter. Unlike the Twitter API, snscrape does not require authentication. However, it is important to note that Twitter has a rate limit on the number of requests that can be made to their servers. If you are scraping a large number of tweets, you may need to wait a few minutes between requests.

## Scraping Tweets

In [4]:
## Let's scrape tweets with the hashtag #MurdaughTrial
for i, tweet in enumerate(sntwitter.TwitterSearchScraper('#MurdaughTrial').get_items()):
    if i > 5:
        break
    print(tweet.content)

I’m watching testimony that I had missed out on and in regards to PM and the boat crash, I’d just once like to hear that family say their sorry for ALL of the behavior/mistakes/deception. Like full on take responsibility. #MurdaughTrial
Well this family is some real upstanding Christian citizens I see. Apples 🍎 didn't fall to far from the tree at all. #murdaughtrial
LOL Waters fucked up 🤣🤣
#AlexMurdaughTrial what a freaking joke, just goes to show all they had was some BS motive and wouldn't allow him to confess to it so they could use it in this trial #MurdaughTrial

https://t.co/zUDAssBW7A
#MurdaughTrial Alex Murdaugh used the restroom before testifying and then had the severe dry mouth. It got me wondering...would performance anxiety meds cause dry mouth? Why yes!!! 🤔😳💊 #Drymouth #AlexMurdaughTrial https://t.co/FSDfO83a2v
Paul Paul or paw paw solved. #MurdaughTrial #AlexMurdaugh https://t.co/BJRyvdXJpX
Not feeling sorry for the brothers of AM anymore after watching that Netflix doc!

  print(tweet.content)


### Tweet data

If we examine the data structure of a tweet, we can see that it contains a lot of information. The following is a list of the attributes of a tweet object:

- `card`
- `cashtags`
- `content`
- `conversationId`
- `coordinates`
- `date`
- `hashtags`
- `id`
- `inReplyToTweetId`
- `inReplyToUser`
- `json`
- `lang`
- `likeCount`
- `links`
- `media`
- `mentionedUsers`
- `outlinks`
- `outlinksss`
- `place`
- `quoteCount`
- `quotedTweet`
- `rawContent`
- `renderedContent`
- `replyCount`
- `retweetCount`
- `retweetedTweet`
- `source`
- `sourceLabel`
- `sourceUrl`
- `tcooutlinks`
- `tcooutlinksss`
- `url`
- `user`
- `username`
- `vibe`
- `viewCount`

## Other Functions of Snscrape

In [5]:
dir(sntwitter)

['Coordinates',
 'DescriptionURL',
 'Gif',
 'GuestTokenManager',
 'Medium',
 'Photo',
 'Place',
 'TextLink',
 'Trend',
 'Tweet',
 'TwitterHashtagScraper',
 'TwitterListPostsScraper',
 'TwitterProfileScraper',
 'TwitterSearchScraper',
 'TwitterTrendsScraper',
 'TwitterTweetScraper',
 'TwitterTweetScraperMode',
 'TwitterUserScraper',
 'User',
 'UserLabel',
 'Video',
 'VideoVariant']

In [6]:
## We can use the notebook magic to expore the documentation
sntwitter.Trend?

[0;31mInit signature:[0m
[0msntwitter[0m[0;34m.[0m[0mTrend[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mname[0m[0;34m:[0m [0mstr[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdomainContext[0m[0;34m:[0m [0mstr[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmetaDescription[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mstr[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m [0;34m->[0m [0;32mNone[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m      Trend(name: str, domainContext: str, metaDescription: Optional[str] = None)
[0;31mFile:[0m           /media/james/Projects/GitHub/DATA_340_NLP/Notebooks/venv/lib/python3.10/site-packages/snscrape/modules/twitter.py
[0;31mType:[0m           type
[0;31mSubclasses:[0m     

## Let's build a political data set: Tweets from leading politicians

In [7]:
## Our politicians

politicians_usernames = ['realDonaldTrump', 'JoeBiden', 'BernieSanders',
               'KamalaHarris', 'AndrewYang', 'PeteButtigieg', 
               'ewarren', 'Mike_Pence', 'MikeBloomberg', 'GovBillWeld',
               'SenGillibrand', 'amyklobuchar', 'TomSteyer',
               'TulsiGabbard', 'GovMikeHuckabee']

In [74]:
## Fetch the latest 50 tweets for each politician

political_tweets = []

for user_name in politicians_usernames:
    for i, tweet in enumerate(sntwitter.TwitterUserScraper(user_name).get_items()):
        political_tweets.append(tweet) 
        if i > 100:
            break

In [76]:
## Convert our tweets list to a dataframe

df = pd.DataFrame([tweet.__dict__ for tweet in political_tweets])

In [77]:
# Examine the dataframe columns (which match the above the attributes of the Tweet object)

df.columns

Index(['url', 'date', 'rawContent', 'renderedContent', 'id', 'user',
       'replyCount', 'retweetCount', 'likeCount', 'quoteCount',
       'conversationId', 'lang', 'source', 'sourceUrl', 'sourceLabel', 'links',
       'media', 'retweetedTweet', 'quotedTweet', 'inReplyToTweetId',
       'inReplyToUser', 'mentionedUsers', 'coordinates', 'place', 'hashtags',
       'cashtags', 'card', 'viewCount', 'vibe'],
      dtype='object')

In [78]:
# We can check our datatypes of each column

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1428 entries, 0 to 1427
Data columns (total 29 columns):
 #   Column            Non-Null Count  Dtype              
---  ------            --------------  -----              
 0   url               1428 non-null   object             
 1   date              1428 non-null   datetime64[ns, UTC]
 2   rawContent        1428 non-null   object             
 3   renderedContent   1428 non-null   object             
 4   id                1428 non-null   int64              
 5   user              1428 non-null   object             
 6   replyCount        1428 non-null   int64              
 7   retweetCount      1428 non-null   int64              
 8   likeCount         1428 non-null   int64              
 9   quoteCount        1428 non-null   int64              
 10  conversationId    1428 non-null   int64              
 11  lang              1428 non-null   object             
 12  source            1428 non-null   object             
 13  sou

In [80]:
most_viewed_tweets = df.sort_values(by='viewCount', ascending=False).head(10).copy()
most_viewed_tweets

Unnamed: 0,url,date,rawContent,renderedContent,id,user,replyCount,retweetCount,likeCount,quoteCount,...,inReplyToTweetId,inReplyToUser,mentionedUsers,coordinates,place,hashtags,cashtags,card,viewCount,vibe
39,https://twitter.com/JoeBiden/status/1623157185...,2023-02-08 03:10:04+00:00,Make no mistake: If Congress passes a national...,Make no mistake: If Congress passes a national...,1623157185868972036,https://twitter.com/JoeBiden,8495,13161,242289,2234,...,,,,,,,,,21767682.0,
585,https://twitter.com/ewarren/status/16200905490...,2023-01-30 16:04:21+00:00,D.C. should be a state.,D.C. should be a state.,1620090549037588482,https://twitter.com/ewarren,31505,7409,79296,3561,...,,,,,,,,,12806129.0,
84,https://twitter.com/JoeBiden/status/1618420963...,2023-01-26 01:30:01+00:00,If Republicans try to cut Social Security or M...,If Republicans try to cut Social Security or M...,1618420963942580225,https://twitter.com/JoeBiden,20275,11972,100846,1892,...,,,,,,,,,9762133.0,
76,https://twitter.com/JoeBiden/status/1619495290...,2023-01-29 00:39:00+00:00,You paid for your Social Security. Every singl...,You paid for your Social Security. Every singl...,1619495290284740609,https://twitter.com/JoeBiden,19191,12244,93217,1573,...,,,,,,,,,9276022.0,
23,https://twitter.com/JoeBiden/status/1625188557...,2023-02-13 17:42:01+00:00,Let's give public school teachers a raise.,Let's give public school teachers a raise.,1625188557668204570,https://twitter.com/JoeBiden,13675,7292,74354,1847,...,,,,,,,,,8406298.0,
40,https://twitter.com/JoeBiden/status/1623156251...,2023-02-08 03:06:21+00:00,Ban assault weapons now. Once and for all.\n\n...,Ban assault weapons now. Once and for all.\n\n...,1623156251956838400,https://twitter.com/JoeBiden,13671,7228,74279,1456,...,,,,,,,,,7165095.0,
216,https://twitter.com/KamalaHarris/status/162734...,2023-02-19 16:47:06+00:00,My message to Black women and girls everywhere...,My message to Black women and girls everywhere...,1627349063791255553,https://twitter.com/KamalaHarris,13432,6023,55101,1835,...,,,,,,,,,6659359.0,
21,https://twitter.com/JoeBiden/status/1625279676...,2023-02-13 23:44:05+00:00,America is back and we're leading the world ag...,America is back and we're leading the world ag...,1625279676158554112,https://twitter.com/JoeBiden,14519,2413,21412,1140,...,,,,,,,,,5764167.0,
30,https://twitter.com/JoeBiden/status/1623854765...,2023-02-10 01:22:00+00:00,Let’s make it real simple:\n\nIf Republicans t...,Let’s make it real simple:\n\nIf Republicans t...,1623854765519036417,https://twitter.com/JoeBiden,11791,6499,54320,784,...,,,,,,,,,5514557.0,
10,https://twitter.com/JoeBiden/status/1626391480...,2023-02-17 01:22:00+00:00,Let me tell you a secret about trickle-down ec...,Let me tell you a secret about trickle-down ec...,1626391480888696833,https://twitter.com/JoeBiden,15565,4727,39510,1286,...,,,,,,,,,5470390.0,
