# Twitter API Tutorial


Today, we're going to look at how to interact with Twitter's API so that we can easily access some tweets.

Recall that API stands for Application Programming Interface, and is a way for some programmes to interact with other programmes. (An interface is a standard way to access some functionality.)

We'll use the requests library to make some API requests for past tweets, and Twitter's twitter-stream library to get a real-time stream of tweets.

But first: A bit on access tokens

## API Access Tokens

Last week, we were able to use a weather API by going to the appropiate URL endpoint with the right queries. Some services like to restrict access to all or some of their APIs behind access tokens.

Firstly, this let's them keep track of who is using which resources, so anyone abusing the services (intentionally or not) can have access cut off. This is called 'rate limiting'.

Secondly, it let's them give different levels of access to different people. Advertisers on Twitter, as well as accedemic researchers, can get access to more powerful APIs that the rest of us!

To save time, I'll be giving you all access tokens that I've previously registered. There are about 400,000 tweets left on it for the month, but with a class of 80+ that can go fast, so please **remember to turn off any tweet streams!**

If you want to use the Twitter API for any projects/CAs, you'll need your own access tokens. Let me know and I can help you set it up!

## Using API tokens (or other authentication materials)

In general, you want to avoid sharing your access tokens.

Beware that versioning software like Git and GitHub keeps ALL of your previous committs. So if you leave access tokens in ANY commit, people can go back though the versioning history and find it!

Because of this, we are going to save our access tokens as **enviromental variables**.

These can then be read by Python into our programme, without having to ever have them explicitly in the code.

Another reason to do this is for when you are sharing your code: people can then just run it with their access tokens.

We'll be using the python-dotenv library to handle this for us.

The access tokens will be saved in a file called .env (hence the library name).

These are normally **hidden files** so you may not be able to view it- you'll have to change your view settings.

In [1]:
#!pip install python-dotenv

In [7]:
from dotenv import dotenv_values

config = dotenv_values(".env")

# your Twitter API key and API secret
# We won't be using these variables, they're just for demonstration.
my_api_key = config["API_KEY"]
my_api_key_secret = config["API_KEY_SECRET"]

In [8]:
print(config["API_KEY"])

MuLjaBIBiINqqAN7tEZFz7wHG


The twitter-stream library will look for the access tokens in a particular place on your computer. This section is to make the correct file in the appropiate location. You can do this manually either.

In [10]:
twitter_keys = f'''keys:
    access_token: {config["API_KEY"]}
    access_token_secret: {config["API_KEY_SECRET"]}
    bearer_token: {config["BEARER_TOKEN"]}
'''
# Mac might be able to use "~/.twitter-keys.yaml"
with open("C:/Users/User/Desktop/Masters 2022/Programming/Week 5th Dec/Week 12 Twitter API/Week 12 Twitter API/.twitter-keys.yaml", "w") as file:
    file.write(twitter_keys)
with open("C:/Users/User/Desktop/Masters 2022/Programming/Week 5th Dec/Week 12 Twitter API/Week 12 Twitter API/.twitter_keys.yaml", "w") as file:
    file.write(twitter_keys)

Now that that is all done, we can let the fun begin!!

## Twitter Queries

In [None]:
#!pip install requests

Source: https://github.com/twitterdev/Twitter-API-v2-sample-code

We've using the recent-search functionality:
https://github.com/twitterdev/Twitter-API-v2-sample-code/blob/main/Recent-Search/recent_search.py

For more on building tweet queries:
https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query

In [11]:
import requests
import json

# As an alternative to the .env file, you can do this:
# To set your environment variables in your terminal run the following line:
# export 'BEARER_TOKEN'='<your_bearer_token>'
bearer_token = config["BEARER_TOKEN"]

search_url = "https://api.twitter.com/2/tweets/search/recent"

# Optional params: start_time,end_time,since_id,until_id,max_results,next_token,
# expansions,tweet.fields,media.fields,poll.fields,place.fields,user.fields
# query_params = {'query': '(from:twitterdev -is:retweet) OR #twitterdev','tweet.fields': 'author_id', "max_results":"10"}
query_params = {
#     'query': 'from:elonmusk -is:retweet is:verified',
    'query' : '"Butter" food prices -is:retweet',
    'tweet.fields': 'author_id', 
    'user.fields': 'name',
    "max_results":"25",
}

def bearer_oauth(r):
    """
    Method required by bearer token authentication.
    """

    r.headers["Authorization"] = f"Bearer {bearer_token}"
    r.headers["User-Agent"] = "v2RecentSearchPython"
    return r

def connect_to_endpoint(url, params):
    response = requests.get(url, auth=bearer_oauth, params=params)
    print(response.status_code)
    if response.status_code != 200:
        raise Exception(response.status_code, response.text)
    return response.json()

# Querying the API
json_response = connect_to_endpoint(search_url, query_params)

# Parsing the response
parsedRes = json.dumps(json_response, indent=4, sort_keys=True, ensure_ascii=False)
print(parsedRes)

200
{
    "data": [
        {
            "author_id": "1521086718711054337",
            "edit_history_tweet_ids": [
                "1603013939956260864"
            ],
            "id": "1603013939956260864",
            "text": "@WhiteHouse Eggs are up 47%, butter close behind, Meat we can No Longer afford, every other food Item Up, Up, Up. The only reason gas prices are down but still much higher then when President Trump was in office. So, stop blowing smoke up our backsides. https://t.co/jYn2UJi4sL"
        },
        {
            "author_id": "358415466",
            "edit_history_tweet_ids": [
                "1602997849607868416"
            ],
            "id": "1602997849607868416",
            "text": "@POTUS #presidentgaslight you can tell us that food inflation is down as much as you want. It doesn’t mean that it’s actually happening. Butter and eggs prices are $7 and $6 respectively.  YOU SUCK"
        },
        {
            "author_id": "1853004673",
            "ed

## Real Time Tweet Streams

We are not limited to just historical tweets: we can collect tweets as they are sent! 

(I don't know the actual delay between tweet and stream, but it is relatively fast)

In [None]:
#!pip install twitter-stream.py

In [16]:
twitter_keys=f'''keys:
    access_token: {config["API_KEY"]}
    access_token_secret: {config["API_KEY_SECRET"]}
    bearer_token: {config["BEARER_TOKEN"]}
'''
with open("C:/Users/User/Desktop/Masters 2022/Programming/Week 5th Dec/Week 12 Twitter API/Week 12 Twitter API/.twitter_keys.yaml","w") as file:
    file.write(twitter_keys)
with open("C:/Users/User/Desktop/Masters 2022/Programming/Week 5th Dec/Week 12 Twitter API/Week 12 Twitter API/.twitter_keys.yaml", "w") as file:
    file.write(twitter_keys)    
#On mac
# "~/.twitter-keys.yaml"

In [14]:
# import json
from twitter_stream import FilteredStream
from time import time

start = time()

In [17]:
# import json
from twitter_stream import FilteredStream
from time import time

start = time()
stream = FilteredStream()
rule = {
    "add" : [
        {
            "value": '"World Cup" -is:retweet ', 
             "tag": "soccer"
        }
    ]
}

stream.add_rule(data=rule)
tweetList = []
for tweet in stream.connect():
    parsedTweet = json.dumps(tweet, indent=4, ensure_ascii=False)
    tweetList.append(parsedTweet)
    print(parsedTweet)
    print(f"There are: {len(tweetList)} tweets, about {len(tweetList)/(time()-start)} tweets per second.")

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\User/.twitter-keys.yaml'

## SampledStream API
Returns a constant stream of tweets, and returns certain properties of those tweets

As is, it will keep going until the notebook kernal is stopped (though you can add in time limits)

In [None]:
#Source: https://github.com/twitivity/twitter-stream.py
# https://github.com/twitivity/twitter-stream.py/blob/main/twitter_stream.py

import json
from twitter_stream import SampledStream

class Stream(SampledStream):
    user_fields = ['name', 'location', 'public_metrics']
    expansions = ['author_id']
    tweet_fields = ['created_at']

stream = Stream()
for tweet in stream.connect():
    print(json.dumps(tweet, indent=4,ensure_ascii=False))

## FilteredStream API
Returns a constant stream of tweets that have been filtered by some rules

As is, it will keep going until the notebook kernal is stopped (though you can add in time limits)

In [None]:
#Source: https://github.com/twitivity/twitter-stream.py
# https://github.com/twitivity/twitter-stream.py/blob/main/twitter_stream.py

import json
from twitter_stream import FilteredStream
from time import time

start = time()
stream = FilteredStream()
rule = {
    "add" : [
        {"value": "\"World Cup\" -is:retweet", "tag":"soccer"}
    ]
}
stream.add_rule(data=rule)
tweetList = []
for tweet in stream.connect():
    parsedTweet = json.dumps(tweet, indent=4,ensure_ascii=False)
    tweetList.append(parsedTweet)
    print(parsedTweet)
    print(f"There are: {len(tweetList)} tweets, about {len(tweetList)/(time()-start)} tweets per second after {(time()-start)} seconds")

In [None]:
len(tweetList)

In [None]:
for tweet in searchTweetList:
    print(tweet)

REST API: Endpoint is watch, queries start at ? and are separated by &


https://youtube.com/watch?v=dQw4w9WgXcQ&t=57

In [None]:
with open("Midterms Tweets.json", "w") as file:
    json.dump(tweetDict, file, indent=4)