# Tweepy tutorial 

Contact: Dr. Hickman (GU) for questions

**Modified from:** https://docs.tweepy.org/en/stable/getting_started.html

`
conda install -c conda-forge tweepy
`

### Authentication

**content modified from:** https://docs.tweepy.org/en/v3.5.0/auth_tutorial.html

Twitter requires all API requests to use OAuth for authentication. 

Tweepy supports oauth authentication. Authentication is handled by the tweepy.AuthHandler class.

Tweepy tries to make OAuth as painless as possible for you. 

To begin the process we need to register our client application with Twitter. 

Create a new application and once you are done you should have your consumer token and secret. 

Keep these two handy, you’ll need them.


**Read API keys from file**

It is a good idea to save your API keys to a file that you can load into your scripts.

* You can then "point" to that file anytime you need it

* This keeps all your keys organized in one place, instead of being pasted into lots of different code scripts

In [2]:
import json 

# READ FILE
f = open("api-keys.json")
input=json.load(f); #print(input)

# LOAD KEYS INTO API
consumer_key=input["consumer_key"]    
consumer_secret=input["consumer_secret"]    
access_token=input["access_token"]    
access_token_secret=input["access_token_secret"]    
bearer_token=input["bearer_token"]    

The next step is creating an OAuthHandler instance by passing our consumer token and secret keys. 

In [3]:
import tweepy
# Set up Connection
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)  

So now that we have our OAuthHandler equipped with an access token, we are ready for business:

### Hello world example

This example will download your home timeline tweets and print each one of their texts to the console. 

The Authentication Tutorial goes into more details about authentication.


In [4]:
import tweepy
print("tweepy version =",tweepy.__version__)

# Set up Connection
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

#print username
my_user_name=api.verify_credentials().screen_name
print("username=",my_user_name)

# Returns the 20 most recent statuses, including retweets, 
# posted by the authenticating user and that user’s friends. 
public_tweets = api.home_timeline()

# Print tweets to screen
for tweet in public_tweets:
    print(tweet.text)

tweepy version = 4.12.1
username= elliotzlii
How to survive the holiday season https://t.co/94sazD4coK  | opinion
The humbling of Xi Jinping https://t.co/zTOYIKBd1k
Some of China’s most severe lock­downs have been in border regions such as Xinjiang and Tibet, both of which have b… https://t.co/OkvU1lWIEo
This year's Best in Travel list has been sorted by the trip "type" — eat, journey, connect, learn, unwind — to help… https://t.co/UZFmutp1aU
President Joe Biden sought to boost Raphael Warnock’s campaign in Georgia, telling Democratic allies that Herschel… https://t.co/oLGZORZtpK
The tree-killing disease that decimated Florida orange groves is intensifying across Brazil, the biggest juice prod… https://t.co/ihL3Qg7cT2
Twitter's credit grade was withdrawn by S&amp;P because it lacks enough information to cover the platform https://t.co/mWGCGXAhFb
French President Emmanuel Macron says he wants to resolve a dispute over industrial subsidies with the US “in the f… https://t.co/7czACD29hO
A

**API**

The API class (tweepy.API) provides access to the entire twitter RESTful API methods. 

Each method can accept various parameters and return responses. 

For more information about these methods please refer to the following 
https://docs.tweepy.org/en/stable/api.html#api-reference

**Models**

When we invoke an API method most of the time it will return a Tweepy model class instance. 

This will contain the data returned from Twitter which we can then use inside our application. 

For example the following code returns to us an User model:

### Twitter search

Returns a collection of relevant Tweets matching a specified query.

**Option-1** 

In [5]:
import requests 

# Set up Connection
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

# Define search twitter function
def search_twitter(query, tweet_fields, bearer_token = bearer_token):
    headers = {"Authorization": "Bearer {}".format(bearer_token)}

    url = "https://api.twitter.com/2/tweets/search/recent?query={}&{}".format(query, tweet_fields)
    
    print("--------------",url,"--------------")
    response = requests.request("GET", url, headers=headers)
    #print(response.status_code)
    # print(response.text)

    if response.status_code != 200:
        raise Exception(response.status_code, response.text)
    return response.json()

json_response = search_twitter(query="mental health", tweet_fields="tweet.fields=text,author_id,created_at", bearer_token=bearer_token)

print(json.dumps(json_response, indent=4, sort_keys=True))

-------------- https://api.twitter.com/2/tweets/search/recent?query=mental health&tweet.fields=text,author_id,created_at --------------
{
    "data": [
        {
            "author_id": "764466229360766979",
            "created_at": "2022-12-02T22:59:47.000Z",
            "edit_history_tweet_ids": [
                "1598814214168383488"
            ],
            "id": "1598814214168383488",
            "text": "RT @sarahswonderr: I think some of y\u2019all need to see it from Shawn\u2019s POV. He\u2019s always been in the spotlight, and deserves a break. He has t\u2026"
        },
        {
            "author_id": "2747525205",
            "created_at": "2022-12-02T22:59:47.000Z",
            "edit_history_tweet_ids": [
                "1598814213065121792"
            ],
            "id": "1598814213065121792",
            "text": "RT @RisenRabbit: one of the weirdest sets of beliefs you get on here is people being really pro-mental health awareness but also believing\u2026"
     

**Option-2** 

Please note that Twitter’s search service and, by extension, the Search API is not meant to be an exhaustive source of Tweets. 

Not all Tweets will be indexed or made available via the search interface.

Twitter’s standard search API only “searches against a sampling of recent Tweets published in the past 7 days.”

https://docs.tweepy.org/en/stable/api.html#tweepy.API.search_tweets

In [6]:
# define pretty print function
def pretty_print_json(input):
    print(json.dumps(input, indent=4))

In [None]:
# Use api basic search to get 2 tweets given the query 
search_results = api.search_tweets("mental health", lang="en", count=5)
print(type(search_results))
print(search_results)


In [None]:
print("--------------First tweet meta JSON data-------------")
pretty_print_json(search_results[1]._json)

### Search multiple tweets

In [6]:
number_of_tweets=100
search_results = api.search_tweets("mental health", lang="en", count=number_of_tweets)

for i in range(0,number_of_tweets):
    try:
        print("-------- tweet = ",i," -------")
        print("text = ",search_results[i]._json["text"])
        print("retweeted = ",search_results[i]._json["retweeted"])
        print("retweet_count = ",search_results[i]._json["retweet_count"])
        print("created_at =" ,search_results[i]._json["created_at"])
        print("id_str = ",search_results[i]._json["id_str"])
        print("url = https://twitter.com/i/web/status/"+search_results[i]._json["id_str"])

    except:
        print("ERROR")

-------- tweet =  0  -------
text =  RT @THed2113: Are you stuck at 1 viewer and inactive chat? Join #KOOPATROOP for 24/7 support runs, mental health awareness and MASSIVE raid…
retweeted =  False
retweet_count =  1
created_at = Fri Dec 02 23:00:03 +0000 2022
id_str =  1598814281142865925
url = https://twitter.com/i/web/status/1598814281142865925
-------- tweet =  1  -------
text =  @cjoblonskiwicz @MuellerSheWrote Not everyone with a mental health condition is a danger to themselves and others b… https://t.co/K5GUvccUHR
retweeted =  False
retweet_count =  0
created_at = Fri Dec 02 23:00:03 +0000 2022
id_str =  1598814279851180036
url = https://twitter.com/i/web/status/1598814279851180036
-------- tweet =  2  -------
text =  Denver sheriff Elias Diggins, an #MSUDenver graduate, has a reputation for empathy. He has also made mental health… https://t.co/PpPFFB1Qlh
retweeted =  False
retweet_count =  0
created_at = Fri Dec 02 23:00:01 +0000 2022
id_str =  1598814271487455232
url = https:/

### Cursor Tutorial

Source: https://docs.tweepy.org/en/v3.5.0/cursor_tutorial.html

Pagination, also known as paging, is the process of dividing a document into discrete pages, either electronic pages or printed pages.

We use pagination a lot in Twitter API development. 

Iterating through timelines, user lists, direct messages, etc. 

In order to perform pagination we must supply a page/cursor parameter with each of our requests.

The problem here is this requires a lot of boiler plate code just to manage the pagination loop. 
 
To help make pagination easier and require less code Tweepy has the Cursor object.

Cursor handles all the pagination work for us behind the scene so our code can now focus entirely on processing the results.

### Collecting a large number of Tweets

Use special options when initializing the API. These tell it to wait while the Twitter time-limit windows elapse

`api = tweepy.API(auth, wait_on_rate_limit=True)`

In [7]:
# NOTE: THE REDUNDANT IMPORTS AND FUNCTION DEFINITIONS IS INTENTIONAL TO MAKE THE CODE CELLS SELF CONTAINED 
import json
from logging import raiseExceptions 
import tweepy
import time
from datetime import datetime
import time
import os

# PRINT TWEEPY VERSION
print("tweepy version =",tweepy.__version__)

#----------------------
# READ API KEY FILE
#----------------------
f = open("api-keys.json")
input=json.load(f); #print(input)

# LOAD KEYS INTO API
consumer_key=input["consumer_key"]    
consumer_secret=input["consumer_secret"]    
access_token=input["access_token"]    
access_token_secret=input["access_token_secret"]    
bearer_token=input["bearer_token"]    

#----------------------
# DEFINE USEFUL FUNCTIONS
#----------------------

# DEFINE PRETTY PRINT JSON FUNCTION
def pretty_print_json(input):
    print(json.dumps(input, indent=4))

# DEFINE FUNCTION TO SAVE TWEEPY SEARCH RESULTS
#   searches=array with various tweepy search objects
#   TODO: ADD "full and sparse" mode
#          full = save all tweet data (100 tweeks ~ 1 MB  --> 100,000 ~ 1 GB)
#          sparse = only save most important info
def save_search_tweets_results(searches,info_str="",output_name="tweet-search.json"):
    # if(str(type(input)) == "<class 'tweepy.models.SearchResults'>"):
    if(str(type(searches)) == "<class 'list'>"):
        #COMBINE ALL JSONS FOR VARIOUS TWEETS INTO ON BIG JSON CALL "out"
        out={}
        out["search_info"]=info_str

        #LOOP OVER SEARCHES
        tweet_ids=[]
        k=0 #counter
        for search in searches:
            #LOOP OVER TWEETS IN SEARCH
            for i in range(0,len(search)):
                out[str(k)]=search[i]._json
                tweet_id=search[i]._json["id_str"]
                #CHECK FOR REDUNDANT TWEETS
                if tweet_id in tweet_ids:
                    print("WARNING: REPEATED TWEETS IN SAVED FILE; ID = ",tweet_id)
                tweet_ids.append(search[i]._json["id_str"])

                k+=1
            #pretty_print_json(out)

        #DELETE FILE IF IT EXIST (START FRESH)
        if os.path.exists(output_name):
            os.remove(output_name)

        #WRITE FILE
        with open(output_name, 'w') as f:
            json.dump(out, f)
    else: 
        raise RuntimeError("ERROR: Incorrect datatype")

#----------------------
# SET UP CONNECTION
#----------------------
#   Use special options when initializing the API. These tell
#   it to wait while the Twitter time-limit windows elapse
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True) 

#----------------------
# RUN SEARCH 
#----------------------

#SEARCH PARAM
query="mental"

# NUMBER OF TWEETS TO SEARCH 
number_of_tweets=100
# number_of_tweets=18000*2 
# ideally use multiples of 100 for number_of_tweets
# should be able to collect 18000 tweets every 15 minutes
start_time = time.time()
max_loop_time_hrs=5

# THIS WILL KEEP DOING SEARCHES FURTHER AND FURTHER BACK IN TIME
# USING THE MAX_ID TO THE TIMELINE 
num_tweets_collected=0
searches=[]
k=0
#KEEP SEARCHING UNTIL DESIRED NUMBER OF TWEETS COLLECTED
while num_tweets_collected<number_of_tweets or (time.time()-start_time)/60./60>max_loop_time_hrs: 
    try: 
        #FIRST SEARCH
        if len(searches)==0:
            search_results = api.search_tweets(query, lang="en", count=100)
        #ADDITIONAL SEARCHES
        else:
            search_results = api.search_tweets(query, lang="en", count=100,max_id=max_id_next)

        #UPDATE PARAMETERS
        num_tweets_collected+=len(search_results)
        max_id_next=int(search_results[-1]._json["id_str"])-1

        #SAVE SEARCH RESULTS
        searches.append(search_results)

        #SAVE TEMPORARY CHECKPOINTS (DONT DO TOO OFTEN .. SLOWS CODE DOWN)
        if(k%10==0):
            print("SEARCH-"+str(k)+" COMPLETED;  TWEETS_COLLECTED=",num_tweets_collected,"; TIME (s) = ",time.time() - start_time)
        if(k%25==0):
            save_search_tweets_results(searches,output_name="tmp-snapshot.json")
            
        k+=1
    except:
        print("WARNING: twitter search failed")

    #SLEEP 5 SECONDS BEFORE NEXT REQUEST 
    if(number_of_tweets>18000):
        time.sleep(5)
    else:
        time.sleep(0.2)
        

# REPORT BASIC SEARCH INFO
print(num_tweets_collected,len(searches))
print("search time (s) =", (time.time() - start_time)) #/60.)

#TIMESTAMP SEARCH 
now = datetime.now()
dt_string = now.strftime("%Y-%m-%Y-H%H-M%M-S%S")

#----------------------
# SAVE RESULTS
#----------------------
info_str="query = "+query+"; number_of_tweets = "+str(number_of_tweets)+"; date = "+str(dt_string)
out_name=str(dt_string)+"-twitter-search.json"
save_search_tweets_results(searches,info_str=info_str,output_name=out_name)

#CLEAN-UP TEMP FILES
os.remove("tmp-snapshot.json")
# import glob
# list_to_delete=glob.glob("./*-tmp-snapshot.json")
# for file in list_to_delete:
#     os.remove(file)

tweepy version = 4.12.1
SEARCH-0 COMPLETED;  TWEETS_COLLECTED= 100 ; TIME (s) =  0.5594151020050049
100 1
search time (s) = 0.7864289283752441


### Tweepy user objects

The User object in Tweepy module contains the information about a user. 

Here are the list of attributes in the User object :

For more see: https://www.geeksforgeeks.org/python-user-object-in-tweepy/


In [8]:
# returns a user object 
user_info=api.get_user(screen_name="POTUS")

# print info about user
print(json.dumps(user_info._json, indent=4, sort_keys=True))


{
    "contributors_enabled": false,
    "created_at": "Wed Jan 13 00:37:08 +0000 2021",
    "default_profile": true,
    "default_profile_image": false,
    "description": "46th President of the United States, husband to @FLOTUS, proud dad & pop. Tweets may be archived: https://t.co/HDhBZBkKpU\nText me: (302) 404-0880",
    "entities": {
        "description": {
            "urls": [
                {
                    "display_url": "whitehouse.gov/privacy",
                    "expanded_url": "http://whitehouse.gov/privacy",
                    "indices": [
                        98,
                        121
                    ],
                    "url": "https://t.co/HDhBZBkKpU"
                }
            ]
        },
        "url": {
            "urls": [
                {
                    "display_url": "WhiteHouse.gov",
                    "expanded_url": "http://WhiteHouse.gov",
                    "indices": [
                        0,
                        2

### XTRA CODE

In [11]:
# #CHECK IF REPEATED SEARCH WITH SAME QUERY RESULT IN SAME OUTPUT 

# def quick_print(search_results):
#     for i in range(0,number_of_tweets):
#         try:
#             print("-------- tweet = ",i," -------")
#             print("text = ",search_results[i]._json["text"])
#             print("url = https://twitter.com/i/web/status/"+search_results[i]._json["id_str"])
#         except:
#             print("ERROR")

# #SEARCH-1
# number_of_tweets=100
# search_results = api.search_tweets("texas OR maryland", lang="en", count=number_of_tweets)
# quick_print(search_results)

# #SEARCH-2
# number_of_tweets=100
# search_results = api.search_tweets("texas OR maryland", lang="en", count=number_of_tweets)
# quick_print(search_results)