# Twitter API Tutorial


Today, we're going to look at how to interact with Twitter's API so that we can easily access some tweets.

Recall that API stands for Application Programming Interface, and is a way for some programmes to interact with other programmes. (An interface is a standard way to access some functionality.)

We'll use the requests library to make some API requests for past tweets, and Twitter's twitter-stream library to get a real-time stream of tweets.

But first: A bit on access tokens

## API Access Tokens

Last week, we were able to use a weather API by going to the appropiate URL endpoint with the right queries. Some services like to restrict access to all or some of their APIs behind access tokens.

Firstly, this let's them keep track of who is using which resources, so anyone abusing the services (intentionally or not) can have access cut off. This is called 'rate limiting'.

Secondly, it let's them give different levels of access to different people. Advertisers on Twitter, as well as accedemic researchers, can get access to more powerful APIs that the rest of us!

To save time, I'll be giving you all access tokens that I've previously registered. There are about 400,000 tweets left on it for the month, but with a class of 80+ that can go fast, so please **remember to turn off any tweet streams!**

If you want to use the Twitter API for any projects/CAs, you'll need your own access tokens. Let me know and I can help you set it up!

## Using API tokens (or other authentication materials)

In general, you want to avoid sharing your access tokens.

Beware that versioning software like Git and GitHub keeps ALL of your previous committs. So if you leave access tokens in ANY commit, people can go back though the versioning history and find it!

Because of this, we are going to save our access tokens as **enviromental variables**.

These can then be read by Python into our programme, without having to ever have them explicitly in the code.

Another reason to do this is for when you are sharing your code: people can then just run it with their access tokens.

We'll be using the python-dotenv library to handle this for us.

The access tokens will be saved in a file called .env (hence the library name).

These are normally **hidden files** so you may not be able to view it- you'll have to change your view settings.

In [1]:
#!pip install python-dotenv

In [5]:
from dotenv import dotenv_values

config = dotenv_values(".env")

# your Twitter API key and API secret
# We won't be using these variables, they're just for demonstration.
my_api_key = config["API_KEY"]
my_api_key_secret = config["API_KEY_SECRET"]

In [4]:
print(config["API_KEY"])

PbxLnKwqeEpOVE9i6GdmMxeCC


The twitter-stream library will look for the access tokens in a particular place on your computer. This section is to make the correct file in the appropiate location. You can do this manually either.

In [6]:
twitter_keys = f'''keys:
    access_token: {config["API_KEY"]}
    access_token_secret: {config["API_KEY_SECRET"]}
    bearer_token: {config["BEARER_TOKEN"]}
'''
# Mac might be able to use "~/.twitter-keys.yaml"
with open("C:/Users/User/Desktop/Masters 2022/Programming/Week 5th Dec/Week 12 Twitter API/Week 12 Twitter API/.twitter-keys.yaml", "w") as file:
    file.write(twitter_keys)
with open("C:/Users/User/Desktop/Masters 2022/Programming/Week 5th Dec/Week 12 Twitter API/Week 12 Twitter API/.twitter_keys.yaml", "w") as file:
    file.write(twitter_keys)

Now that that is all done, we can let the fun begin!!

## Twitter Queries

In [None]:
#!pip install requests

Source: https://github.com/twitterdev/Twitter-API-v2-sample-code

We've using the recent-search functionality:
https://github.com/twitterdev/Twitter-API-v2-sample-code/blob/main/Recent-Search/recent_search.py

For more on building tweet queries:
https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query

In [14]:
import requests
import json
import pandas as pd

# As an alternative to the .env file, you can do this:
# To set your environment variables in your terminal run the following line:
# export 'BEARER_TOKEN'='<your_bearer_token>'
bearer_token = config["BEARER_TOKEN"]

search_url = "https://api.twitter.com/2/tweets/search/recent"

# Optional params: start_time,end_time,since_id,until_id,max_results,next_token,
# expansions,tweet.fields,media.fields,poll.fields,place.fields,user.fields
# query_params = {'query': '(from:twitterdev -is:retweet) OR #twitterdev','tweet.fields': 'author_id', "max_results":"10"}
query_params = {
#     'query': 'from:elonmusk -is:retweet is:verified',
    'query' : '"Butter" prices -is:retweet',
    'tweet.fields': 'author_id', 
    'user.fields': 'name',
    "max_results":"78",
}

def bearer_oauth(r):
    """
    Method required by bearer token authentication.
    """

    r.headers["Authorization"] = f"Bearer {bearer_token}"
    r.headers["User-Agent"] = "v2RecentSearchPython"
    return r

def connect_to_endpoint(url, params):
    response = requests.get(url, auth=bearer_oauth, params=params)
    print(response.status_code)
    if response.status_code != 200:
        raise Exception(response.status_code, response.text)
    return response.json()

# Querying the API
json_response = connect_to_endpoint(search_url, query_params)

df = pd.DataFrame(json_response['data'])
df.to_csv('response2_python.csv')


# Parsing the response
#parsedRes = json.dumps(json_response, indent=4, sort_keys=True, ensure_ascii=False)
print(json_response)


200
{'data': [{'id': '1603432079596789772', 'edit_history_tweet_ids': ['1603432079596789772'], 'text': '@WSJ Let Russia give Africa Billions$. WeThePeople are dealing with the most expensive prices in basic consumer goods MILK BUTTER EGGS OLIVE OIL BREAD VEGETABLES FRUITS something the extreme far-left media(WSJ)knows nothing about.', 'author_id': '2881807216'}, {'id': '1603427003897626624', 'edit_history_tweet_ids': ['1603427003897626624'], 'text': '@Longshoreman912 @Packratmom @NoLieWithBTC Sure, things like jobs/wages, healthcare, prescription drug prices, housing, childcare, education, the solvency of social security and medicare, infrastructure, energy development, etc.\n\nBread and butter issues.', 'author_id': '4039664782'}, {'id': '1603408650688921601', 'edit_history_tweet_ids': ['1603408650688921601'], 'text': '@ChuckGrassley Fact:  19.2 oz of my every day coffee was $5 two years ago.  It is now $9.50.  Butter was $4.95.  It is now $7.95.  Consumer prices are way, way, WAY up.

## **Data Pre-processing for Sentiment Analysis

These steps are applied during data pre-processing:
 Normalizing words.
 Removing stop words.
 Tokenizing sentences.
 Vectorizing text. 


In [None]:
Import all required packages:
import pandas as pd
import numpy as np
import seaborn as sns
import re
import string
from string import punctuation
import nltk
from nltk.corpus import stopwordsnltk.download("stopwords")
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout
from tensorflow.keras.callbacks import EarlyStopping 


In [16]:
ButterPrices = pd.read_csv("response2_python.csv")
ButterPrices.head() 

Unnamed: 0.1,Unnamed: 0,id,edit_history_tweet_ids,text,author_id
0,0,1603432079596789772,['1603432079596789772'],@WSJ Let Russia give Africa Billions$. WeThePe...,2881807216
1,1,1603427003897626624,['1603427003897626624'],@Longshoreman912 @Packratmom @NoLieWithBTC Sur...,4039664782
2,2,1603408650688921601,['1603408650688921601'],@ChuckGrassley Fact: 19.2 oz of my every day ...,1516093917439139844
3,3,1603401158277595137,['1603401158277595137'],"@JanetBa44871836 Oh no. That stinks, nothing w...",1686208430
4,4,1603399220811243520,['1603399220811243520'],Eggs are nearly 50% more expensive than they w...,34664420


In [17]:
#Drop unnecessary columns:
ButterPrices = ButterPrices.drop(['edit_history_tweet_ids', 'Unnamed: 0', ],
axis=1)
ButterPrices.dropna(inplace=True) 


In [1]:
# Parsing the response
#parsedRes = json.dumps(json_response, indent=4, sort_keys=True, ensure_ascii=False)
df = pd.DataFrame(tw_list)
df.to_csv('ButterPricesresponse_python.csv')


NameError: name 'pd' is not defined