<a href="https://www.kaggle.com/code/ankitkumar2635/scrape-tweets-without-twitter-s-api?scriptVersionId=113300432" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# This notebook shows how to scrape tweets without Twitter's API 
## Using "snscrape"
snscrape is a scraper for social networking services (SNS). It scrapes things like user profiles, hashtags, or searches and returns the discovered items, e.g. the relevant posts.

*  Can scrape historical tweets for any timeline, whereas Twitters free API access lets you scrape tweets for maximum 1 week period.
*  "snscrape" is undocumented so far, so there is counfusion around this library 


#### Author: Ankit Kumar

#### Created on : December 6, 2022

### Installing snscrape

In [1]:
!pip install snscrape

Collecting snscrape
  Downloading snscrape-0.3.4-py3-none-any.whl (35 kB)
Installing collected packages: snscrape
Successfully installed snscrape-0.3.4
[0m

### Import the required libraries 

In [2]:
import snscrape.modules.twitter as sntwitter
from tqdm.notebook import tqdm # make your loops show a progress meter
import pandas as pd

# 1. Scraping user's tweets 
## Lets scrape 500 tweets of Elon Musk (@elonmusk)

In [3]:
# Create a attribute container 
musk_tweet_list = []

In [4]:
# Set up the query (what you wish to scrape)
query = 'from:elonmusk'

# Passing query to TwitterSearchScraper
get_tweets = sntwitter.TwitterSearchScraper(query).get_items()

# Run a loop to get tweets and append the desired attributes such as 'tweet Id', 'Date', 'Text' to tweet_list
for i, tweet in enumerate(tqdm(get_tweets, total = 500)):
    if i>499:
        break
    musk_tweet_list.append([tweet.date, tweet.id, tweet.content, tweet.username])

  0%|          | 0/500 [00:00<?, ?it/s]

## Lets store the data in a dataframe

In [5]:
musk_tweets = pd.DataFrame(musk_tweet_list, columns = ['Datetime', 'Tweet Id', 'Text', 'Username'])
musk_tweets.head()

Unnamed: 0,Datetime,Tweet Id,Text,Username
0,2022-12-08 09:44:29+00:00,1600788394375753729,Woke v Woke https://t.co/hmhC5eelik,elonmusk
1,2022-12-08 09:25:59+00:00,1600783739633643521,@EthanBitcoin @dergigi lol,elonmusk
2,2022-12-08 07:20:25+00:00,1600752140711972865,@EddieZipperer https://t.co/92voOIH5d8,elonmusk
3,2022-12-07 20:41:51+00:00,1600591440186179585,@ChrisJBakke 🤣🤣,elonmusk
4,2022-12-07 20:35:39+00:00,1600589878982758400,@jack Most important data was hidden (from you...,elonmusk


We have scraped 500 tweets from @elonmusk, as a verification - look at the username column.  Lets look into how we can scrape a mention:

# 2. Scrapping a mention
### Lets' scrape mentions of Amazon India (@amazonIN)

In [6]:
amazon_tweets_list = []

# Setting up the query
query = '@amazonIN lang:en'
get_tweets = sntwitter.TwitterSearchScraper(query).get_items()
for i, tweet in enumerate (tqdm(get_tweets, total = 200)):
    if i>199:
        break
    amazon_tweets_list.append([tweet.date, tweet.id, tweet.content,
                               tweet.username])

  0%|          | 0/200 [00:00<?, ?it/s]

* The **"lang:en"** in the query is to filter tweets which are in 'english' language. You can filter several other languages as well. 


In [7]:
# Storing the data in a pandas DataFrame
amzn_tweets = pd.DataFrame(amazon_tweets_list, columns = ['Datetime', 'Tweet ID', 'Text', 'Username'])
amzn_tweets.head()

Unnamed: 0,Datetime,Tweet ID,Text,Username
0,2022-12-08 16:02:32+00:00,1600883536525131776,@AmazonHelp\n@amazonIN \nbought redmi k20 pro ...,rohitmeemroth
1,2022-12-08 16:01:53+00:00,1600883371496079360,@AmazonHelp Done… also uploading his video lea...,GyaniGovardhan
2,2022-12-08 15:59:03+00:00,1600882657608765445,@AmazonHelp @amazon @amazonIN @AmazonHelp stil...,subodh_1202
3,2022-12-08 15:58:32+00:00,1600882529203986432,Here are the Top features of the all new Lava ...,15Pawanaryan
4,2022-12-08 15:57:33+00:00,1600882282122051591,Hey @NokiamobileIN.. Why am I not able to TRAD...,PraveenParihar1


#### This search can be used to scrape text based tweets as well. For example if you want to search for tweets which mention the text "crude prices", just replace the query "@amazonIN lang:en" with "crude prices". Let's give it a run!

*  We can define a timeline as per our liking as well. For example: In this example let's scrape tweets during Dec. 5 and Dec. 6.

In [8]:
crude_tweets_list = []

# Setting up the query
query = 'crude prices lang:en since:2022-12-05 until:2022-12-07'
get_tweets = sntwitter.TwitterSearchScraper(query).get_items()
for i, tweet in enumerate (tqdm(get_tweets, total = 500)):
    if i>499:
        break
    crude_tweets_list.append([tweet.date, tweet.id, tweet.content,
                               tweet.username])

  0%|          | 0/500 [00:00<?, ?it/s]

Notice I have put last date as Dec. 7, this is because snsscrape excludes the last date. So if you want tweets till today, keep next date to 'untill'. 

In [9]:
# Storing the data as DataFrame
crude_tweets = pd.DataFrame(crude_tweets_list, columns = ['Datetime', 'Tweet ID', 'Text', 'Username'])
crude_tweets.head()

Unnamed: 0,Datetime,Tweet ID,Text,Username
0,2022-12-06 23:50:00+00:00,1600276401289994241,Fuel prices still sky high whilst crude is poi...,TruthMinistryOz
1,2022-12-06 23:48:55+00:00,1600276127150673921,"The price cap, a G7 idea, aims to reduce Russi...",ripper1fl
2,2022-12-06 23:45:01+00:00,1600275149453828096,Saudi Arabia Sets January Arab Light Crude Pri...,India24hoursliv
3,2022-12-06 23:36:16+00:00,1600272944713695232,@LogicalNumbers @GardinerIsland agree on crude...,mp4995491
4,2022-12-06 23:31:26+00:00,1600271731205099520,I better see these gas prices go down!! They a...,DownandOut1489


Lets have a look at the "Datetime" minimum and maximum value.

In [10]:
print(crude_tweets.Datetime.max())
print(crude_tweets.Datetime.min())

2022-12-06 23:50:00+00:00
2022-12-05 18:38:08+00:00


Let's read a random tweet

In [11]:
crude_tweets.Text.values[15]

'Brent crude below $80 a barrel - lowest point this year. \n\nDespite fears of the cap on Russian oil roiling the market.\n\nhttps://t.co/JVmeAE4TYD\nhttps://t.co/m2AZVqBskM\n\nAnd forecourt prices falling too slowly:\n\nhttps://t.co/MbPQcuvV0r https://t.co/kQQ9hSpmm7'

**No surprise! Crude prices hit lowest since January while I was working on this notebook.**

There are several other attributes that can be extracted apart from the four (Date, Id, Content and Username) seen here. As said in the beginning, since snscrape is undocumentated I could not figure out other attributes. 