Twitter Data Extractor

This command-line tool extracts user and tweet data from Twitter and reports the results to CSV, Excel, Google Sheets documents or MongoDB, SQLite databases.

Supported Features

  • Extract single/multiple user data.
  • Extract user’s friends/followers data.
  • Extract tweets data for a user.
  • Extract tweets data for a search keyword.
  • Report results to CSV, Excel or Google Sheets documents.
  • Report results to MongoDB or SQLite databases.

Fields to extract for user data

  • User ID
  • Username
  • Name
  • Account creation date
  • Bio
  • URLs, Hashtags, Mentions
  • Location
  • Pinned Tweet ID
  • Pinned Tweet
  • Profile image URL
  • Account protected flag
  • Public metrics (followers/following/tweet/listed counts)
  • External URL
  • Verified flag

Fields to extract for tweet data

  • Tweet ID
  • Tweet text
  • Tweet creation date
  • Source
  • Language
  • Public metrics (retweet/reply/like/quote count)
  • URLs, Hashtags, Mentions
  • Media (key, type, url, duration_ms(for video), width, height, public_metrics)
  • Place (ID, full name, country, country code, place type, geo coordinates)
  • Author data (for search tweets)

You can see the user manual here.

How to setup

  • Run the following commands to install required packages in the project directory.

    • python -m venv env
    • source env/bin/activate
    • python -m pip install -r requirements.txt

Setting Environment Variables

For using the Twitter API service, set the TWITTER_BEARER_TOKEN_CODE environment variable with your bearer token value. Set the TWITTER_CONSUMER_KEY_CODE and TWITTER_CONSUMER_SECRET_CODE environment variables for your consumer key and consumer secret tokens to use the tool on behalf of another user account.

You can see the instructions to set environment variables here for Linux, here for Windows, and here for Mac.

MongoDB Installation

If you will use MongoDB to save users/tweets data, install it from here.

You can check the running status after installation and start the database server with the following commands on Linux.

  • sudo service mongod status
  • sudo service mongod start

How to use

usage: python [-h] [-c] [-cf CONFIGFILE] [--forme] [-u USER] [-ul USERS] [-fr] [-fl] [-ut] [-s SEARCH]
                                        [-tc TWEET_COUNT] [-e EXCLUDES] [-ot OUTPUT_TYPE] [-of OUTPUT_FILE] [-sm SHARE_MAIL]

optional arguments:
  -h, --help                                  show this help message and exit
  -c, --useconfig                             Read configuration from config.json file
  -cf CONFIGFILE, --configfile CONFIGFILE     Read configuration from given file
  --forme                                     Determine API user(account owner or on behalf of a user)
  -u USER, --user USER                        Extract user data for the given username
  -ul USERS, --users USERS                    Extract user data for the given comma separated usernames
  -fr, --friends                              Extract friends data for the given username
  -fl, --followers                            Extract followers data for the given username
  -ut, --user_tweets                          Extract tweets of user with the given username
  -s SEARCH, --search SEARCH                  Extract latest tweets for the given search keyword
  -tc TWEET_COUNT, --tweet_count TWEET_COUNT  Limit the number of tweets gathered
  -e EXCLUDES, --excludes EXCLUDES            Fields to exclude from tweets queried as comma separated values (replies,retweets)
  -ot OUTPUT_TYPE, --output_type OUTPUT_TYPE  Output file type (csv, xlsx, gsheets, mongodb or sqlite)
  -of OUTPUT_FILE, --output_file OUTPUT_FILE  Output file name
  -sm SHARE_MAIL, --share_mail SHARE_MAIL     Mail address to share Google Sheets document
  • If config will be used for getting parameters, boolean parameters like --forme, --friends, --followers, --user_tweets still must be passed as command-line option.
  • "user" and "users" field should be empty for "search" keyword to be used.

The following is an example of config.json content.

    "user": "gvanrossum",
    "users": "",
    "search": "",
    "excludes": "retweets",
    "tweet_count": 20,
    "output_type": "xlsx",
    "output_file": "results.xlsx",
    "share_mail": ""

Basic Usage

The following commands are a few examples of getting user data, user’s friends, tweets or tweets of a given keyword.

  • python -u gvanrossum
  • python --forme -ul "gvanrossum,nedbat"
  • python -u gvanrossum -fr
  • python --forme -u gvanrossum -ut
  • python -s python

Results are written to results.xlsx file by default. Logs can be seen in the tw_data_extractor.log file in the project directory.

Example Commands

  • Get user data for username gvanrossum and save results to results.xlsx file on behalf of another account.

    • python -u gvanrossum
  • Get user data for username gvanrossum and save results to results.xlsx file for your own account.

    • python --forme -u gvanrossum
  • Get user data for gvanrossum and write the results to results.xlsx by getting parameters from the default config file(config.json).

    • python --forme -c
  • Get user data for gvanrossum and write the results to results.xlsx by getting parameters from the given config file.

    • python --forme -c -cf /home/coskun/custom_config.json
  • Get user data for usernames gvanrossum and nedbat.

    • python -ul "gvanrossum,nedbat"
  • Get friends data for username gvanrossum and save results to results.csv file.

    • python -u gvanrossum -fr -ot csv -of results.csv
  • Get followers data for username gvanrossum.

    • python -u gvanrossum -fl
  • Get the last tweets data for username gvanrossum.

    • python --forme -u gvanrossum -ut
  • Get the last 50 tweets data for username gvanrossum and exclude both replies and retweets.

    • python --forme -u gvanrossum -ut -tc 50 -e "replies,retweets"
  • Get the last 50 tweets data for keyword python.

    • python -s python -tc 50
  • Get the last tweets data for keyword python and write results to Google Sheets document with name last_tweets and share with the given email.

    • python -s python -ot gsheets -of last_tweets -sm

Example Runs & Outputs

  • Search the last 20 tweet data for the keyword "python" and save it to the "results.xlsx" file.
    • python -s python -tc 20

Tweets Search

  • Get the last 5 tweets data excluding replies and retweets for the username "gvanrossum" and write results to Google Sheets document named as "gvanrossum_last_tweets" and share with the given email.
    • python --forme -u gvanrossum -ut -tc 5 -e "replies-retweets" -ot gsheets -of gvanrossum_last_tweets -sm

User Tweets Run

User Tweets Output


