# Twitter APIv2 Search with Pagination 📄

This notebook provides a quick example for querying TwitterAPIv2 for a keyword of interest **with a focus on getting historical results through implementing pagination.**

> More info on Twitter Pagination docs [here](https://developer.twitter.com/en/docs/twitter-api/pagination)
____________________________


### 🌎 Load Modules

In [12]:
import json
import requests
import os
from time import sleep

### 🔑 Set access tokens / Bearer Token

> *Do not commit to github!*

In [13]:
BEARER_TOKEN = 'AAAAAAAAAAAAAAAAAAAAANkNTwEAAAAA......KR7DbnKlt'

### 👉 Set Query for Keyword, Geolocation, & Pagination!

For your API call, you should:

#### #️⃣ Prepare your Query with your parameters & terms of interest:

Some query parameters you may consider for filtering on APIv2:

- **Hashtags**
- **Language**
- **Geolocated**
- **Historical**

> *Note: in the below example, we look at the French #metoo movement, geolocated in Montreal, going back to year 2014

#### #️⃣ Prepare your Tweet Expansions

Set your desired expanded tweet fields, and add as parameters for your query!

> You should add these once you are confident in your testing! I believe the expanded fields will count additionally against your API calls. More info on expansions here: https://developer.twitter.com/en/docs/twitter-api/expansions



In [14]:
##########################################
# Set your query parameters
##########################################

query = "(#metoo OR #moiaussi OR #balancetonporc) lang:fr point_radius:[-73.6380306004768 45.505845653789784 25mi]"

In [15]:
##########################################
# Set your query date range (Standard APiv2 is max 1 week!)
##########################################

start_time = "2014-01-01T00:00:00Z"
end_time = "2021-10-10T00:00:00Z"

In [16]:
##########################################
# Set your expanded fields
##########################################
EXPANSIONS = "author_id,referenced_tweets.id,referenced_tweets.id.author_id,in_reply_to_user_id,attachments.media_keys"

MEDIA_FIELDS = (
    "duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics"
)

TWEET_FIELDS = "created_at,author_id,public_metrics,source"

USER_FIELDS = (
    "description,name,username,created_at,location,url,verified,public_metrics"
)

### ⚙️ Setup APIv2 Endpoint

- Please consider using endpoint `2/tweets/count/all` before searching on endpoint `2/tweets/search/all`...
    - With the counts you get a better idea on how many Tweets we should expect for a given query...

In [17]:
##########################################
# Set your Twitter APIv2 endpoint URL
# -----------------------------------
# Optional params: start_time,end_time,since_id,until_id,next_token,granularity
##########################################

# FOR SEARCH API (uncomment these two lines):
search_url = "https://api.twitter.com/2/tweets/search/all"
query_params = {'query': {query},'start_time':start_time,'end_time':end_time}
# WITH EXPANDED FIELDS
query_params = {"query": {query},
                "start_time":start_time,
                "end_time":end_time,
                "expansions": EXPANSIONS, # < -- Comment out if your expansions are empty...
                "media.fields": MEDIA_FIELDS,
                "tweet.fields": TWEET_FIELDS,
                "user.fields": USER_FIELDS,       
               }


# FOR COUNTS API (uncomment these two lines):
#search_url = "https://api.twitter.com/2/tweets/counts/all"
#query_params = {'query': {query}, 'start_time':start_time,'end_time':end_time,'granularity':'day'}

### 🚀 Run your APIv2 call with Pagination

- In order to get all historical results, you must paginate using the `next_token` returned in the API call

> *Note: here I implement a rudimentary rate limitting, which just pauses for at least one second (1.25 seconds), though the proper way would be to wait on a 429 rate limit response and wait only as long as is necessary...tbd*

####  make sure you have an out directory to save results

- should be `./data/`

In [18]:
mkdir ./data

mkdir: cannot create directory ‘./data’: File exists


In [19]:
def bearer_oauth(r):
    """
    Method required by bearer token authentication.
    """

    r.headers["Authorization"] = f"Bearer {BEARER_TOKEN}"
    r.headers["User-Agent"] = "v2FullArchiveSearchPython"
    return r


def connect_to_endpoint(url, params):
    response = requests.request("GET", search_url, auth=bearer_oauth, params=params)
    print(response.status_code)
    if response.status_code != 200:
        raise Exception(response.status_code, response.text)
    return response.json()


def main():
    json_response = connect_to_endpoint(search_url, query_params)
    # Create first outfile (pre-pagination)
    with open('./data/result_.json', 'w', encoding='utf8') as file:
            json.dump(json_response, file)
    #print(json.dumps(json_response, indent=4, sort_keys=True))
    
    # FOR PAGINATION LOOP
    sleep(1.25) # Sleep for at least one second to prevent 429 error on Rate Limiting
    while 'next_token' in json_response['meta']:
        next_token = json_response['meta']['next_token']
        json_response = connect_to_endpoint(search_url, {"query": {query},
                                                        "start_time":start_time,
                                                        "end_time":end_time,
                                                        "expansions": EXPANSIONS, # < -- Comment out if your expansions are empty...
                                                        "media.fields": MEDIA_FIELDS,
                                                        "tweet.fields": TWEET_FIELDS,
                                                        "user.fields": USER_FIELDS, 
                                                        "next_token":next_token})
                                                         #'granularity':'day' # <-- for counts (Comment out for search)
        #print(json.dumps(json_response, indent=4, sort_keys=True))
        # Create all next outfiles (post-pagination)
        with open('./data/result_{}.json'.format(next_token), 'w', encoding='utf8') as file:
            json.dump(json_response, file)
        sleep(1.25)

if __name__ == "__main__":
    main()

401


Exception: (401, '{"title":"Unauthorized","detail":"Unauthorized","type":"about:blank","status":401}')

### Above returns Unauthorized, obviously because the bearer token is blank!