TwitterScraper

This bot can be used to scrape Twitter for email addresses, geolocations and phone numbers associated with user accounts. Additionally, it can use AI to build profiles about those users' personalities including their values and needs. The results are stored in a SQLite database.

Usage

[joe@centos TwitterScraper]$ ./twitter-scraper.py --help
usage: twitter-scraper.py [-h] -d DICTIONARY [-a] [-q] [--geolocations]
                          [--verified] [--socks5 SOCKS5]

TwitterScraper searches Twitter for user profiles and scrapes any email
addresses and phone numbers in their bios. The username and any emails or
phone numbers are stored in the SQLite database 'TwitterScraper/users.db'

optional arguments:
  -h, --help            show this help message and exit
  -a, --analyse         Build personality profiles of the users based on their
                        last 200 tweets
  -q, --quiet           Quiet mode minimises console output
  --geolocations        Scrape geolocations from the user's tweets
  --verified            Only select Verified accounts
  --socks5 SOCKS5       Use a SOCKS5 proxy e.g. --socks5 127.0.0.1:9050

required arguments:
  -d DICTIONARY, --dictionary DICTIONARY
                        Specify path to a dictionary file to be used for the
                        search queries

The program requires a dictionary file for the search terms. This can be any file with each search term on its own line. I have provided an ordered list of over 20,000 first names in the namelist.dic file.

You may also specify a SOCKS5 proxy to use. For example, with Tor:

./twitter-scraper.py -d namelist.dic --socks5 127.0.0.1:9050

Getting started

To access the Twitter API, you will need to obtain API keys for the account you want to use. Head over to the Twitter API site and create an app.

Next you will need to input the keys into the source code.

vim twitter-scraper.py

...
CONSUMER_KEY = "your consumer key"
CONSUMER_SECRET = "your consumer secret"
ACCESS_TOKEN_KEY = "your access key"
ACCESS_TOKEN_SECRET = "your access secret"
...

If you haven't already, install the dependencies.

sudo pip3 install -r requirements.txt

Then run the script, specifying a dictionary file to use.

python3 twitter-scraper.py -d namelist.dic

Optional: IBM Watson API

You can use the Watson Personality Insights API to build profiles about the users. You will need a Bluemix account. After creating an account, create a new resource from your dashboard and select the Personality Insights from the Watson platform. Once you've created the new service, get your API credentials and add them to the source.

vim twitter-scraper.py

...
USERNAME = "username"
PASSWORD = "password"
...

Running the web app

To use the web app interface with the db, use the launcher script webapp.py

python3 webapp.py

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
webapp		webapp
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
namelist.dic		namelist.dic
requirements.txt		requirements.txt
twitter-scraper.py		twitter-scraper.py
webapp.py		webapp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TwitterScraper

Usage

Getting started

Optional: IBM Watson API

Running the web app

About

Releases

Packages

Languages

License

ajimenezrosa/TwitterScraper

Folders and files

Latest commit

History

Repository files navigation

TwitterScraper

Usage

Getting started

Optional: IBM Watson API

Running the web app

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages