## TwitterCollector

### Set environmental variables

In order to properly load modules within this notebook from outside the repository folder, set the script **PATH** below,  e.g. ```C:/TwitterCollector```:

In [None]:
PATH = "/path/to/TwitterCollector" # <-- optional if running from native path

In [None]:
import importlib.util, os

if not os.path.isdir(PATH):
    PATH = os.getcwd()
PATH = os.path.realpath(PATH)

spec = importlib.util.spec_from_file_location("__init__", PATH+'/__init__.py')
init = importlib.util.module_from_spec(spec)
spec.loader.exec_module(init)

%matplotlib inline
%load_ext autoreload
%autoreload 2

### Import functions

In [None]:
from collect import collect_twitter
from convert import convert_json_tweets
from hydrate import dehydrate_tweets
from hydrate import hydrate_tweets
from stream import stream_tweets
from trends import trending_topics
from woeid import WHERE_ON_EARTH

#### Import API credentials

In [None]:
from config import TWITTER_KEYS as APP_KEYS
from config import TWITTER_TOKENS as TOKENS

APP_KEY     = TOKENS[0][0]
APP_TOKEN   = TOKENS[0][1]
OAUTH_KEY   = TOKENS[0][2]
OAUTH_TOKEN = TOKENS[0][3]

#### Override API credentials

User definitions stored in ```config.py``` make this step optional.

In [None]:
#APP_KEY = ""                         # <-- application key
#APP_TOKEN = ""                       # <-- application token
#OAUTH_KEY = ""                       # <-- required for Streaming API only
#OAUTH_TOKEN = ""                     # <-- required for Streaming API only
#APP_KEYS = [ [APP_KEY, APP_TOKEN] ]  # <-- list of application keys (REST API)

### Hydrate and dehydrate

In [None]:
FILENAME = "" # tweets.csv; tweets.txt

#### Hydrate tweets

Get metadata from a list of dehydrated tweet IDs in a text file by querying the Twitter API. This method uses the [REST API](https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline.html).

In [None]:
hydrate_tweets(FILENAME, app_keys=APP_KEYS, format='csv', output_file='tweets_dehydrated.txt')

#### Dehydrate tweets

Dehydrating tweets to store only the IDs is required in order to share Twitter data sets, according to the platform's [Terms of Service](https://developer.twitter.com/en/developer-terms/more-on-restricted-use-cases.html).

In [None]:
dehydrate_tweets(FILENAME, format='csv', output_file='tweets_dehydrated.txt')

### Query tweet data

In [None]:
QUERY   = "" # keywords, #hashtags and/or @mentions 
LANG    = "" # language code, e.g.: en/es/pt/fr/... (optional)
GEOCODE = "" # latitude,longitude,radius, e.g.: "-20.23,-40.43,100km" (optional)
LIMIT   = 0  # limit maximum tweets to capture (optional)

#### Collect recent tweets

Captured tweets are limited to those published up to ~9 days ago. Allows the usage of [Twitter operators](https://developer.twitter.com/en/docs/tweets/rules-and-filtering/overview/standard-operators.html) on input query. Uses the [REST API](https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets).

In [None]:
collect_twitter(QUERY, APP_KEYS, lang=LANG, geocode=GEOCODE, limit=LIMIT, output_folder='tweets')

#### Timeline from profiles

Get timeline of up to 3200 most recent tweets from a user profile. Uses the [REST API](https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline.html).

In [None]:
collect_twitter(QUERY, app_keys=APP_KEYS, limit=LIMIT, query_type='timeline', output_folder='timeline')

#### Stream tweets in real time

Keep script alive in the foreground streaming tweets. **Warning**: without setting ```LIMIT``` it'll run in a loop. Uses the [Stream API](https://developer.twitter.com/en/docs/tweets/filter-realtime/overview).

In [None]:
stream_tweets(APP_KEY, APP_TOKEN, OAUTH_KEY, OAUTH_TOKEN, query=QUERY,
              limit=LIMIT, ats=True, rts=True, output_folder='stream')

#### Sample of flowing data (1%)

Capture up to 1% of data being currently published to Twitter through the API. **Warning**: without setting ```LIMIT``` it'll run in a loop. Uses the [Stream API](https://developer.twitter.com/en/docs/tweets/sample-realtime/overview/GET_statuse_sample).

In [None]:
stream_tweets(APP_KEY, APP_TOKEN, OAUTH_KEY, OAUTH_TOKEN, stream_type='sample',
              limit=LIMIT, ats=False, rts=False, output_folder='sample')

#### Convert JSON streaming tweets

When streaming tweets, outputting is done to a JSON file by default for smaller footprints. This function converts it to a CSV table of the same name.

In [None]:
convert_json_tweets('tweets.json') # 'stream/tweets.json'; 'sample/tweets.json'

### Expand retweets

Get retweets for existing tweets in a data set, expanding available data **in file (!)**. **Allows resuming**. Uses the [REST](https://developer.twitter.com/en/docs/tweets/post-and-engage/api-reference/get-statuses-retweets-id) [API](https://developer.twitter.com/en/docs/tweets/post-and-engage/api-reference/get-statuses-retweeters-ids.html).

In [None]:
collect_twitter('tweets.txt', app_keys=APP_KEYS, query_type='rts', resume=False)

### Trending topics

List currently trending topics in a region or worldwide. Uses the [REST API](https://developer.twitter.com/en/docs/trends/trends-for-location/api-reference/get-trends-place.html)

In [None]:
trending_topics(query=1, app_keys=APP_KEYS, show_all_topics=False)

In [None]:
trending_topics('list') # WHERE_ON_EARTH

### Query user data

#### Followers (API)

Get a list of user IDs currently following a profile. Uses the [REST API](https://developer.twitter.com/en/docs/accounts-and-users/follow-search-get-users/api-reference/get-followers-ids.html).

In [None]:
collect_twitter(INPUT, app_keys=APP_KEYS, query_type='followers', output='followers')

#### Friends (API)

Get a list of user IDs currently friends with (followed by) a profile. Uses the [REST API](https://developer.twitter.com/en/docs/accounts-and-users/follow-search-get-users/api-reference/get-friends-ids).

In [None]:
collect_twitter(INPUT, app_keys=APP_KEYS, query_type='friends', output='friends')

#### User metadata (API)

Get user metadata information for a list of profiles. Uses the [REST API](https://developer.twitter.com/en/docs/accounts-and-users/follow-search-get-users/api-reference/get-users-lookup.html).

In [None]:
collect_twitter(INPUT, app_keys=APP_KEYS, query_type='users', output='users')

#### Compress output →  `output.zip`

In [None]:
!zip output.zip *csv *xls *xlsx *txt *json tweets timeline stream sample followers friends users

### [Download output files](output.zip)

___

### References

* Twython @ PyPI: https://pypi.org/project/twython/

* GWU datasets: https://tweetsets.library.gwu.edu/datasets

* GWU dataverse: https://dataverse.harvard.edu/dataverse/gwu-libraries

* Twitter Event Datasets: https://figshare.com/articles/Twitter_event_datasets_2012-2016_/5100460

* Twitter API Documentation: https://developer.twitter.com