In this ipython notebook we scrape tweets by employing various searching modes depending on the use-case at hand, e.g., scrape tweets from a particular user, scrape followers' information etc. 

We use the OSINT scrapper `Twint` in a Pythonic way. Check out [Twint's Github page](https://github.com/twintproject/twint) for more details. 

For configuration options of Twint see [here](https://github.com/twintproject/twint/wiki/Configuration).

### 1 - Installs and imports

In [1]:
!pip3 install --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Obtaining twint from git+https://github.com/twintproject/twint.git@origin/master#egg=twint
  Cloning https://github.com/twintproject/twint.git (to revision origin/master) to ./src/twint
  Running command git clone -q https://github.com/twintproject/twint.git /content/src/twint
  Running command git checkout -q origin/master
Collecting aiodns
  Downloading aiodns-3.0.0-py3-none-any.whl (5.0 kB)
Collecting cchardet
  Downloading cchardet-2.1.7-cp37-cp37m-manylinux2010_x86_64.whl (263 kB)
[K     |████████████████████████████████| 263 kB 29.3 MB/s 
[?25hCollecting dataclasses
  Downloading dataclasses-0.6-py3-none-any.whl (14 kB)
Collecting elasticsearch
  Downloading elasticsearch-8.3.3-py3-none-any.whl (382 kB)
[K     |████████████████████████████████| 382 kB 58.4 MB/s 
Collecting aiohttp_socks
  Downloading aiohttp_socks-0.7.1-py3-none-any.whl (9.3 kB)
Collecting schedule
  Downloading 

In [2]:
!pip install nest_asyncio

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


**RESTART RUNTIME**

In [1]:
import twint
import pandas as pd
import nest_asyncio
from collections import Counter

In [2]:
nest_asyncio.apply()

### 2 -  Declare parameters to acquire tweets/account info

In [24]:
# provide username
username = 'narendramodi' #'larrouturou'

# provide keyword or search string
keyword = 'zubair'

# provide conversation id for which we want to scrape the replies
conversation_id = '1553932362123452416'

# provide the number of most recent tweets you want to scrape
N = 1000

# provide the beginning date ( '%Y-%m-%d %H:%M:%S' format)
since_date = '2022-06-25 00:00:00'

# provide the end date
until_date = '2022-06-30 00:00:00'

### 3 - Scrape tweets from a Twitter acccount

In [25]:
def get_tweets (user_name, num, since_date, until_date):
    print ("======================================")
    print(":: Acquiring tweets of", user_name, "::")
    print ("======================================")

    # Configure
    c = twint.Config()
    c.Username = user_name

    c.Since = since_date
    c.Until = until_date
    c.Limit = num

    c.Store_object =  True
    c.User_full = True
    c.Profile_full = True
    c.Hide_output = True

    c.Pandas = True
    twint.run.Search(c)

    return pd.DataFrame(twint.storage.panda.Tweets_df)

In [26]:
output  = get_tweets (user_name=username, num=N, since_date=since_date, until_date=until_date)
output.head()

:: Acquiring tweets of narendramodi ::
[!] No more data! Scraping will stop now.
found 0 deleted tweets in this search.


Unnamed: 0,id,conversation_id,created_at,date,timezone,place,tweet,language,hashtags,cashtags,...,geo,source,user_rt_id,user_rt,retweet_id,reply_to,retweet_date,translate,trans_src,trans_dest
0,1542162870812790794,1542162870812790794,1656515000000.0,2022-06-29 15:07:35,0,,"At 10:30 AM tomorrow, 30th June, will be takin...",en,[],[],...,,,,,,[],,,,
1,1542162387767332869,1542162387767332869,1656515000000.0,2022-06-29 15:05:39,0,,"During this month’s #MannKiBaat, we covered di...",en,[mannkibaat],[],...,,,,,,[],,,,
2,1542162172976971776,1542162172976971776,1656515000000.0,2022-06-29 15:04:48,0,,Glad to see interest in the PM Sangrahalaya. I...,en,[],[],...,,,,,,[],,,,
3,1542161800430493697,1542161800430493697,1656515000000.0,2022-06-29 15:03:19,0,,Today’s Cabinet decision on computerization of...,en,[],[],...,,,,,,[],,,,
4,1541788005748584449,1541788005748584449,1656426000000.0,2022-06-28 14:18:00,0,,كان الشيخ خليفة بن زايد آل نهيان رجل دولة يحظى...,ar,[],[],...,,,,,,[],,,,


### 4 - Scrape the account details of an user

In [27]:
def get_account_info (user_name):
    print ("===========================================")
    print(":: Scrapping account info of", user_name, "::")
    print ("===========================================")

    # Configure
    c = twint.Config()
    c.Username = user_name

    c.Store_object =  True
    c.User_full = True
    c.Profile_full = True
    c.Hide_output = True

    c.Pandas = True
    twint.run.Lookup(c)

    return pd.DataFrame(twint.storage.panda.User_df)

In [28]:
result  = get_account_info (user_name=username)
result.head()

:: Scrapping account info of narendramodi ::


Unnamed: 0,id,name,username,bio,url,join_datetime,join_date,join_time,tweets,location,following,followers,likes,media,private,verified,avatar,background_image
0,18839785,Narendra Modi,narendramodi,Prime Minister of India,https://t.co/m2qxixtyKj,2009-01-10 17:18:56 UTC,2009-01-10,17:18:56 UTC,33611,India,2437,81275202,0,10998,False,True,https://pbs.twimg.com/profile_images/155429501...,https://pbs.twimg.com/profile_banners/18839785...


### 5 - Scrape tweets using a keyword

In [29]:
def get_all_tweets (keyword, num, since_date, until_date):
    print ("==============================================")
    print(":: Acquiring tweets with keyword", keyword, "::")
    print ("==============================================")

    # Configure
    c = twint.Config()
    c.Search = keyword


    c.Since = since_date
    c.Until = until_date

    c.Limit = num

    #c.Lang = 'nl'
    #c.Translate = True 
    #c.TranslateDest = "en"

    c.Store_object =  True
    c.User_full = True
    c.Profile_full = True
    c.Hide_output = True

    c.Pandas = True
    twint.run.Search(c)

    return pd.DataFrame(twint.storage.panda.Tweets_df)

In [30]:
result  = get_all_tweets (keyword=keyword, num=N, since_date=since_date, until_date=until_date)
result.head()

:: Acquiring tweets with keyword zubair ::


Unnamed: 0,id,conversation_id,created_at,date,timezone,place,tweet,language,hashtags,cashtags,...,geo,source,user_rt_id,user_rt,retweet_id,reply_to,retweet_date,translate,trans_src,trans_dest
0,1542296760806060035,1541673228254552066,1656547000000.0,2022-06-29 23:59:36,0,,@AminaaKausar @zoo_bear Inshallah,in,[],[],...,,,,,,"[{'screen_name': 'AminaaKausar', 'name': 'أمين...",,,,
1,1542296375681028096,1542080777881456641,1656547000000.0,2022-06-29 23:58:05,0,,@KhalidBaig85 @thehawkeyex @zoo_bear It is sus...,en,[],[],...,,,,,,"[{'screen_name': 'KhalidBaig85', 'name': 'Khal...",,,,
2,1542296164661329921,1542296164661329921,1656547000000.0,2022-06-29 23:57:14,0,,मोहम्मद ज़ुबैर और नुपुर शर्मा पर FIR में एक सी...,hi,[istandwithzubair],[],...,,,,,,[],,,,
3,1542296152992690177,1542296152992690177,1656547000000.0,2022-06-29 23:57:12,0,,Release @zoo_bear #ReleaseZubair,en,[releasezubair],[],...,,,,,,[],,,,
4,1542295829993533441,1542049884471119873,1656547000000.0,2022-06-29 23:55:54,0,,@ARanganathan72 आज ठंडक पड़ गयी होगी तुम्हारी ...,hi,[],[],...,,,,,,"[{'screen_name': 'ARanganathan72', 'name': 'An...",,,,


### 6 - Scrape replies to a tweet

In [19]:
def get_replies (user_name, conversation_id, num, since_date, until_date):
    print ("=======================================================")
    print(":: Acquiring tweet replies to ", conversation_id, "::")
    print ("=======================================================")

    # configure replies call
    replies = twint.Config()
    replies.Since = since_date
    replies.Until = until_date
    replies.Limit = num
    replies.To = user_name

    replies.Store_object =  True
    replies.User_full = True
    replies.Profile_full = True
    replies.Hide_output = True

    replies.Pandas = True
    twint.run.Search(replies)
    df = twint.storage.panda.Tweets_df

    df = df [df ['conversation_id'] == conversation_id]

    return pd.DataFrame(df)

In [20]:
replies = get_replies (user_name=username, conversation_id=conversation_id, num=N, since_date=since_date, until_date=until_date)

replies.head()

:: Acquiring tweet replies to  1553932362123452416 ::


Unnamed: 0,id,conversation_id,created_at,date,timezone,place,tweet,language,hashtags,cashtags,...,geo,source,user_rt_id,user_rt,retweet_id,reply_to,retweet_date,translate,trans_src,trans_dest
46,1554406705429037058,1553932362123452416,1659434000000.0,2022-08-02 10:00:12,0,,@narendramodi Jay modi sir 🙏❤️🙏,ht,[],[],...,,,,,,"[{'screen_name': 'narendramodi', 'name': 'Nare...",,,,
1008,1554362168568467456,1553932362123452416,1659424000000.0,2022-08-02 07:03:14,0,,@narendramodi ये हैं नमो का नया भारत 💪💪,hi,[],[],...,,,,,,"[{'screen_name': 'narendramodi', 'name': 'Nare...",,,,
1045,1554360645046644741,1553932362123452416,1659423000000.0,2022-08-02 06:57:11,0,,@narendramodi Shri Modiji has spoken to the At...,en,[],[],...,,,,,,"[{'screen_name': 'narendramodi', 'name': 'Nare...",,,,
1179,1554353664730746880,1553932362123452416,1659422000000.0,2022-08-02 06:29:26,0,,@narendramodi Sir kendriya vidyalay me piche ...,hi,[],[],...,,,,,,"[{'screen_name': 'narendramodi', 'name': 'Nare...",,,,
1964,1554325096134627329,1553932362123452416,1659415000000.0,2022-08-02 04:35:55,0,,@narendramodi @smritiirani Indeed it gives imm...,en,[],[],...,,,,,,"[{'screen_name': 'narendramodi', 'name': 'Nare...",,,,
