Makes twitter scrapping with multiple twitters apps easy again!

Support me

Install from PyPi

pip3 install mtweepy

Or Install from main branch

pip3 install git+https://github.com/Souvic/mtweepy.git

Example usage

There are three functions in the repo: get_followers, get_timelines, get_users.

All the functions use all the auth tokens optimally for fastest scraping.

Apart from self explantory inputs:

As auths, a list of tweepy bearer tokens are expected if you want to use oauth2 limits for twitter api.
As auths, a list of [oauth_consumer_key, oauth_consumer_secret, client_secret, oauth_token, oauth_token_secret] are expected if you want to use oauth1 limits for twitter api.
use_userid parameter is by default False. If it is passed as True in get_followers, get_followers will treat the screen_name_or_userid parameter as userid for which follower is to be scraped.
output_folder is supposed to be an empty folder to save output from get_timelines and get_users functions.

An example usage is provided here.

Gets 5000*ceil(max_num/5000) number of followers' userids as a list for screen_name INCIndia

from mtweepy import get_followers, get_users, get_timelines
list_followers= get_followers(auths, "INCIndia", max_num=500)#gets list of followers appended in chunk of 5000, if max_num<5000, will get last 5000 followers.

Gets all the maximally extended user objects for list_followers(a list of user ids)

The output is saved in the output_folder as multiple jsonl files(one file per access token). Each line of jsonl files contains the maximally extended user object for one user.

get_users(auths, list_followers, output_folder="./testfolder1")

Gets all the tweets in the timelines of list_followers(a list of user ids)

The output is saved in the output_folder as multiple jsonl files(one file per access token). Each line of jsonl files contains last 3200 tweets of a user.

get_timelines(auths, list_followers, output_folder="./testfolder2")

To get the total number of lines written in files in the directory ./testfolder1

Type this in commandline at any point of data collection

find ./testfolder1 -name '*.jsonl' | xargs wc -l

For get_users function: Each line contains 100 users approximately. For get_timelines function: Each line contains 1 user timeline.

So you can calculate an approximate rate with this function to know when data collection will be finished.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
dist		dist
src		src
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dist

dist

src

src

LICENSE

LICENSE

README.md

README.md

pyproject.toml

pyproject.toml

requirements.txt

requirements.txt

setup.cfg

setup.cfg

Repository files navigation

Makes twitter scrapping with multiple twitters apps easy again!

Support me

Install from PyPi

Or Install from main branch

Example usage

Gets 5000*ceil(max_num/5000) number of followers' userids as a list for screen_name INCIndia

Gets all the maximally extended user objects for list_followers(a list of user ids)

Gets all the tweets in the timelines of list_followers(a list of user ids)

To get the total number of lines written in files in the directory ./testfolder1

About

Releases 1

Packages

Languages

License

Souvic/mtweepy

Folders and files

Latest commit

History

Repository files navigation

Makes twitter scrapping with multiple twitters apps easy again!

Support me

Install from PyPi

Or Install from main branch

Example usage

Gets 5000*ceil(max_num/5000) number of followers' userids as a list for screen_name INCIndia

Gets all the maximally extended user objects for list_followers(a list of user ids)

Gets all the tweets in the timelines of list_followers(a list of user ids)

To get the total number of lines written in files in the directory ./testfolder1

About

Topics

Resources

License

Stars

Watchers

Forks

Languages