In this ipython notebook we scrape tweets by employing various searching modes depending on the use-case at hand, e.g., scrape tweets from a particular user, scrape followers' information etc. 

We use the OSINT scrapper `Twint` in a Pythonic way. Check out [Twint's Github page](https://github.com/twintproject/twint) for more details. 

For configuration options of Twint see [here](https://github.com/twintproject/twint/wiki/Configuration).

### 1 - Installs and imports

In [40]:
!pip3 install --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Obtaining twint from git+https://github.com/twintproject/twint.git@origin/master#egg=twint
  Updating ./src/twint clone (to revision origin/master)
  Running command git fetch -q --tags
  Running command git reset --hard -q origin/master
Installing collected packages: twint
  Attempting uninstall: twint
    Found existing installation: twint 2.1.21
    Can't uninstall 'twint'. No files were found to uninstall.
  Running setup.py develop for twint
Successfully installed twint-2.1.21


In [41]:
!pip install nest_asyncio

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


**RESTART RUNTIME**

In [42]:
import twint
import pandas as pd
import nest_asyncio

In [43]:
nest_asyncio.apply()

### 2 -  Declare parameters to acquire tweets/account info

In [71]:
# provide username
username = 'larrouturou'

# provide keyword or search string
keyword = 'zubair'

# provide the number of most recent tweets you want to scrape
N = 500

# provide the beginning date ('YYYY-MM-DD' format)
since_date = '2011-01-01'

# provide the end date
until_date = '2022-08-01'

### 3 - Scrape tweets from a Twitter acccount

In [70]:
def get_tweets (user_name, num, since_date, until_date):
    print ("======================================")
    print(":: Acquiring tweets of", user_name, "::")
    print ("======================================")

    # Configure
    c = twint.Config()

    c.Username = user_name

    c.Since = since_date
    c.Until = until_date

    c.Limit = num

    #c.Retweets = True

    #c.Lang = 'nl'
    #c.Translate = True 
    #c.TranslateDest = "en"

    c.Store_object =  True
    c.User_full = True
    c.Profile_full = True
    c.Hide_output = True

    c.Pandas = True
    twint.run.Search(c)

    return pd.DataFrame(twint.storage.panda.Tweets_df)

In [51]:
output  = get_tweets (user_name=username, num=N, since_date=since_date, until_date=until_date)
output.head()

:: Acquiring tweets of larrouturou ::
[!] No more data! Scraping will stop now.
found 0 deleted tweets in this search.


Unnamed: 0,id,conversation_id,created_at,date,timezone,place,tweet,language,hashtags,cashtags,...,geo,source,user_rt_id,user_rt,retweet_id,reply_to,retweet_date,translate,trans_src,trans_dest
0,1552978165756215296,1552978165756215296,1659094000000.0,2022-07-29 11:23:42,0,,Ce qui se passe “équivaut à des incendies sous...,fr,[],[],...,,,,,,[],,,,
1,1548951688555397120,1548948519909728257,1658134000000.0,2022-07-18 08:43:55,0,,@mbrehin On le fait tous les jours! 💪 @Fresque...,fr,[],[],...,,,,,,"[{'screen_name': 'mbrehin', 'name': 'Maxime Br...",,,,
2,1548949157473357834,1548948519909728257,1658133000000.0,2022-07-18 08:33:51,0,,L'étude en question : https://t.co/itwSxY1BNN...,fr,[],[],...,,,,,,[],,,,
3,1548948519909728257,1548948519909728257,1658133000000.0,2022-07-18 08:31:19,0,,Projections de chercheurs français en 2017: ce...,fr,[],[],...,,,,,,[],,,,
4,1548377536404393986,1548377536404393986,1657997000000.0,2022-07-16 18:42:26,0,,"L’été dernier, c’était les pluies torrentielle...",fr,[],[],...,,,,,,[],,,,


### 4 - Scrape the account details of an user

In [67]:
def get_account_info (user_name, num, since_date, until_date):
    print ("===========================================")
    print(":: Scrapping account info of", user_name, "::")
    print ("===========================================")

    # Configure
    c = twint.Config()

    c.Username = user_name

    c.Since = since_date
    c.Until = until_date

    c.Limit = num

    #c.Lang = 'nl'

    c.Store_object =  True
    c.User_full = True
    c.Profile_full = True
    c.Hide_output = True

    c.Pandas = True
    twint.run.Lookup(c)

    return pd.DataFrame(twint.storage.panda.User_df)

In [68]:
result  = get_account_info (user_name=username, num=N, since_date=since_date, until_date=until_date)
result.head()

:: Scrapping account info of larrouturou ::


Unnamed: 0,id,name,username,bio,url,join_datetime,join_date,join_time,tweets,location,following,followers,likes,media,private,verified,avatar,background_image
0,18578222,Pierre Larrouturou,larrouturou,Justice sociale & climat - Député européen @No...,https://t.co/EVf5dMr0pA,2009-01-03 12:10:04 UTC,2009-01-03,12:10:04 UTC,15051,,53295,56808,78723,1711,False,True,https://pbs.twimg.com/profile_images/144102932...,https://pbs.twimg.com/profile_banners/18578222...


### 5 - Scrape tweets using a keyword

In [73]:
def get_all_tweets (keyword, num, since_date, until_date):
    print ("==============================================")
    print(":: Acquiring tweets with keyword", keyword, "::")
    print ("==============================================")

    # Configure
    c = twint.Config()

    c.Search = keyword


    c.Since = since_date
    c.Until = until_date

    c.Limit = num

    #c.Retweets = True

    #c.Lang = 'nl'
    #c.Translate = True 
    #c.TranslateDest = "en"

    c.Store_object =  True
    c.User_full = True
    c.Profile_full = True
    c.Hide_output = True

    c.Pandas = True
    twint.run.Search(c)

    return pd.DataFrame(twint.storage.panda.Tweets_df)

In [74]:
result  = get_all_tweets (keyword=keyword, num=N, since_date=since_date, until_date=until_date)
result.head()

:: Acquiring tweets with keyword zubair ::


Unnamed: 0,id,conversation_id,created_at,date,timezone,place,tweet,language,hashtags,cashtags,...,geo,source,user_rt_id,user_rt,retweet_id,reply_to,retweet_date,translate,trans_src,trans_dest
0,1553893150514499584,1553650620380225537,1659312000000.0,2022-07-31 23:59:31,0,,@shutup349 @sakshijoshii @zoo_bear What is thi...,en,[],[],...,,,,,,"[{'screen_name': 'shutup349', 'name': 'Gup Sup...",,,,
1,1553893070977974273,1553796252906663937,1659312000000.0,2022-07-31 23:59:12,0,,@MahuaMoitra @zoo_bear @thewire_in Is this tru...,en,[],[],...,,,,,,"[{'screen_name': 'MahuaMoitra', 'name': 'Mahua...",,,,
2,1553893046743248896,1553719258189770755,1659312000000.0,2022-07-31 23:59:06,0,,@zoo_bear Dog returned,en,[],[],...,,,,,,"[{'screen_name': 'zoo_bear', 'name': 'Mohammed...",,,,
3,1553892341005529088,1553650620380225537,1659312000000.0,2022-07-31 23:56:18,0,,@zoo_bear And you @zoo_bear had restored peace...,en,[],[],...,,,,,,"[{'screen_name': 'zoo_bear', 'name': 'Mohammed...",,,,
4,1553892270952546304,1552693100165926912,1659312000000.0,2022-07-31 23:56:01,0,,@YSBayero @zubair_nasr Why so harsh brother?if...,en,[],[],...,,,,,,"[{'screen_name': 'YSBayero', 'name': 'YSB', 'i...",,,,
