This bot can be used to scrape Twitter for email addresses, geolocations and phone numbers associated with user accounts. Additionally, it can use AI to build profiles about those users' personalities including their values and needs. The results are stored in a SQLite database.
[joe@centos TwitterScraper]$ ./twitter-scraper.py --help
usage: twitter-scraper.py [-h] -d DICTIONARY [-a] [-q] [--geolocations]
[--verified] [--socks5 SOCKS5]
TwitterScraper searches Twitter for user profiles and scrapes any email
addresses and phone numbers in their bios. The username and any emails or
phone numbers are stored in the SQLite database 'TwitterScraper/users.db'
optional arguments:
-h, --help show this help message and exit
-a, --analyse Build personality profiles of the users based on their
last 200 tweets
-q, --quiet Quiet mode minimises console output
--geolocations Scrape geolocations from the user's tweets
--verified Only select Verified accounts
--socks5 SOCKS5 Use a SOCKS5 proxy e.g. --socks5 127.0.0.1:9050
required arguments:
-d DICTIONARY, --dictionary DICTIONARY
Specify path to a dictionary file to be used for the
search queries
The program requires a dictionary file for the search terms. This can be any file with each search term on its own line. I have provided an ordered list of over 20,000 first names in the namelist.dic file.
You may also specify a SOCKS5 proxy to use. For example, with Tor:
./twitter-scraper.py -d namelist.dic --socks5 127.0.0.1:9050
To access the Twitter API, you will need to obtain API keys for the account you want to use. Head over to the Twitter API site and create an app.
Next you will need to input the keys into the source code.
vim twitter-scraper.py
... CONSUMER_KEY = "your consumer key" CONSUMER_SECRET = "your consumer secret" ACCESS_TOKEN_KEY = "your access key" ACCESS_TOKEN_SECRET = "your access secret" ...
If you haven't already, install the dependencies.
sudo pip3 install -r requirements.txt
Then run the script, specifying a dictionary file to use.
python3 twitter-scraper.py -d namelist.dic
You can use the Watson Personality Insights API to build profiles about the users. You will need a Bluemix account. After creating an account, create a new resource from your dashboard and select the Personality Insights from the Watson platform. Once you've created the new service, get your API credentials and add them to the source.
vim twitter-scraper.py
... USERNAME = "username" PASSWORD = "password" ...
To use the web app interface with the db, use the launcher script webapp.py
python3 webapp.py