# Twitter API Data Extraction

Connect to twitter API, extract tweets using key words, hashtags, etc.

To extract data via the twitter API, follow these steps:

1. Sign up for twitter (if you already have an account, skip this step): https://twitter.com/
2. Sign into the twitter developer site: https://developer.twitter.com/en
3. Follow the site's instructions to locate you API key, secret, and bearer token. Save these as environment variables on your computer. Name each variable:

    1. API Key: api_key_twitter
    2. API Secret: api_secret_twitter
    3. Bearer Token: api_bearer_token_twitter

Once your environment variables have been created, follow the steps outlined in the code comments (within each cell) to pull tweets from the twitter API!

NOTE: Read the following API documentation for details regarding the GET tweets API request used in this notebook:
https://developer.twitter.com/en/docs/twitter-api/tweets/lookup/api-reference/get-tweets

In [None]:
import requests
import json
import os
import pandas as pd

In [None]:
# twitter API credentials stored in environment variables
api_key = os.environ.get('api_key_twitter')
api_key_secret = os.environ.get('api_secret_twitter')
bearer_token = os.environ.get('api_bearer_token_twitter')

In [None]:
# create headers for requests
headers = {"Authorization": f"Bearer {bearer_token}"}

In [None]:
# function to send GET request from TWEETS

def tweet_search(query, header, next_token=None, max_results=100):
    '''
    Parameters:
        max_results = integer 10 to 100
        header = bearer token --> {"Authorization": "Bearer {bearer_token}"}
        query = the hashtag/keyword we're searching for --> example: #queensfuneral
    '''
    
    param = {'query': query,
             'max_results': max_results,
             'next_token': {next_token},
             'place.fields': 'full_name,id,country,country_code,geo,name,place_type',
             'tweet.fields': 'id,text,author_id,in_reply_to_user_id,geo,conversation_id,created_at,lang,public_metrics,referenced_tweets,reply_settings,source',
             'expansions': 'author_id,geo.place_id',
             'user.fields': 'location'}
    
    response = requests.request("GET", "https://api.twitter.com/2/tweets/search/recent", headers=header, params=param)
        
    if response.status_code != 200:
        raise Exception(response.status_code, response.text)
        
    return response.json()


# Retrieve Tweets using hashtag

Follow the cells to pull tweets containing specified hashtags.

In [None]:
#### ENTER HASHTAG HERE #####

# enter hashtag in the 'tag' variable
# if you want to test out a new hashtag, change the 'tag' variable and re-run this cell, then run the next cell 

tag = 'queenelizabeth'
hashtag = '#' + tag

# determines the number of loops (API calls) to make. total number of tweets pulled per run is MAX 100*loops
loops = 10

# new_token variable used by API to ensure you aren't receiving duplicate tweets, i.e., tweets you already received with 
# previous API calls.

##### DO NOT CHANGE #####
new_token = None
result = []
##### DO NOT CHANGE #####

In [None]:
# Run this cell to pull some tweets
# NOTE: sometimes the API sends an error when you've made to many consecutive requests. when this happens, you can just
# re-run this cell
# If 'KeyError: next_token' is returned, it means there are no more tweets matching the hashtag

for number in range(loops):

    data = tweet_search(query=hashtag, max_results=100, header=headers, next_token=new_token)
    result += data['data']
    new_token = data['meta']['next_token']

In [None]:
# how many tweets were pulled?
print(len(result))

In [None]:
# sample of the tweets we received
tweets_df = pd.DataFrame(result)
display(tweets_df.head(2))

In [None]:
# how many unique authors?
print(len(tweets_df['author_id'].unique()))

In [None]:
# how many tweets with location data?
print(len(tweets_df)-len(tweets_df[tweets_df['geo'].isna()]))

In [None]:
# save into json file
tw = json.dumps(result, indent=4)
with open(f"data/twitter_data_{tag}.json", "w") as outfile:
    outfile.write(tw)

## Next Steps

After the twitter data was extracted, our team needed enrich the data by retrieving geolocation data and then pre-process the data (cleaning, feature engineering, etc.). These activities are covered in subsequent notebooks.