## Pulling Descriptions with Tweepy

**By:** _Jordan McNea_

In [1]:
import datetime
import tweepy

# I've put my API keys in a .py file called API_keys.py
from API_keys import key, secret_key, access, secret_access

In [2]:
# Authenticate the Tweepy API
auth = tweepy.OAuthHandler(key,secret_key)
auth.set_access_token(access, secret_access)
api = tweepy.API(auth,wait_on_rate_limit=True)

## Grab follower IDs

I had the WNBA Finals on in the background while creating this Notebook, so I will be collecting followers from the Seattle Storm and Las Vegas Aces, the two finalists. Tweepy only allows users to grab 900 requests per 15 minutes. It'll grab the 900 requests quickly then wait 15 minutes, rather than slowly grab 900 requests over a 15 minute period. Before we start grabbing follower IDs, let's first just check how long it will take. To do this we'll grab the followers_count item from Tweepy. 

In [3]:
# I'm putting the handles in a list to iterate through below
team_handles = ['seattlestorm', 'LVAces']


# This will iterate through each Twitter handle that we're collecting from
for screen_name in team_handles:
    
    # Tells Tweepy we want information on the handle we're collecting from
    # The next line specifies which information we want, which in this case is the number of followers 
    user = api.get_user(screen_name) 
    followers_count = user.followers_count

    # Let's see roughly how long it will take to grab all the follower IDs. 
    print(f'''
    @{screen_name} has {followers_count} followers. 
    That will take roughly {followers_count/(5000*60):.0f} hours and {followers_count/(5000):.2f} minutes
    ''')
    


    @seattlestorm has 71193 followers. 
    That will take roughly 0 hours and 14.24 minutes
    

    @LVAces has 42427 followers. 
    That will take roughly 0 hours and 8.49 minutes
    


It looks like there should only be one fifteen minute break. It'll grab all of the Storm's followers, then some of the Aces before sleeping for fifteen minutes. Let's run it and see how long it'll actually take.

In [4]:
# This creates a dictionary containing a list for each Twitter handle we'll be grabbing follower IDs from
id_dict = {'seattlestorm' : [],
           'LVAces' : []}

# Grabs the time when we start making requests to the API
start_time = datetime.datetime.now()

# .keys() allows us to iterate through each key in the dictionary
for handle in id_dict.keys():
    
    # Each page contains 5,000 records, so since we know there are much more than 5,000 followers for both
    # the Storm and Aces, we must iterate through each of the pages in order to get all follower IDs
    # To grab the follower IDs, we will be using followers_ids
    for page in tweepy.Cursor(api.followers_ids,
                              # This is how we will get around the issue of not being able to grab all ids at once
                              # Once the rate limit is hit, we will be notified that we must wait 15 mins (900 secs)
                              wait_on_rate_limit=True, wait_on_rate_limit_notify=True, compression=True,
                              screen_name=handle).pages():

        # The page variable comes back as a list, so we have to use .extend rather than .append
        id_dict[handle].extend(page)
        

# Let's see how long it took to grab all follower IDs
end_time = datetime.datetime.now()
elapsed_time = end_time - start_time
print(elapsed_time)

Rate limit reached. Sleeping for: 890


0:15:04.720280


Let's look at some ids we gathered.

In [5]:
id_dict['seattlestorm'][:10]

[84764937,
 459860328,
 1120330572059033605,
 1621249062,
 1189031172120240129,
 1286936349862494209,
 437252213,
 2310843745,
 1162598131135057920,
 628702031]

You'll notice they are all numbers. This is because ids are different from screen names. To see the twitter handles we gathered, we'll have to use the scren_name feature.

In [6]:
users = id_dict['seattlestorm'][:10]

for name in users:
    
    user = api.get_user(name)
    print(user.screen_name)

paulfingmurphy
JulieMendoza206
BrenoGa01964440
nickbehnen19
Oswaldo10808143
AbelaBelay
rreesejr_1
ethanbmatt
Omar5paredes
kmzyv


## Grab descriptions based on the followers IDs

That looks much better. We can get all sorts of information from the ID. We don't just want screen names though, that doesn't tell us much. Let's grab each screen name and their description and write it to a text file for each team account.

In [7]:
for team in id_dict.keys():
    
    # Descriptions with emoji or non-Roman letters can cause trouble. Encoding your .txt file in utf-8 will help
    with open(f'{team}_followers.txt','w', encoding='utf-8') as wf:
        wf.write('screen_name\tdescription\n')

        for idx, ids in enumerate(id_dict[team]):
            
            # For accounts set to private, we won't be able to get the description unless we follow them
            # Putting in a try/except statement, we can get around this issue.
            try:
                user = api.get_user(ids)
                description = str(user.description).replace('\t',' ').replace('\n',' ')
                wf.write(user.screen_name + '\t' + user.description + '\n')
                
            except:
                continue
                
            if idx == 100:
                break
                

## Grabbing Tweets by search terms

Tweepy also lets users grab tweets based off of search terms. October 10th was World Mental Health Day, so let's look at tweets containing its official hashtag. Twitter search allows standard search operators (<a href="https://developer.twitter.com/en/docs/twitter-api/v1/rules-and-filtering/overview/standard-operators">read more here</a>). We only want Tweets that occurred on World Mental Health Day, hence the since and until operators, and I'm excluding retweets.


In [8]:
search_words = '#WorldMentalHealthDay since:2020-10-09 until:2020-10-11 -filter:retweets'

# Notice the differences between searching tweets and users. 
for idx, item in enumerate(tweepy.Cursor(api.search,
                   # tweet_mode is defaulted to short, which only holds the first 140 characters of a Tweet.
                   tweet_mode='extended',
                   q=search_words,
                   lang='en').items()):
    
    # There's all sort of information you can get from Tweets
    # Find more tweet objects here: https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/overview/tweet-object
    print(item.user.screen_name)
    print(item.created_at)
    print(item.full_text)
    print('-'*40)
    
    if idx == 50:
        break
    

regillett
2020-10-10 23:59:58
It’s been quite a journey for my friend and co-host @JulieAEller and I, living with arthritis for many years. It can be hard. A very different experience being on the other side for this podcast &amp; we hope sharing our stories can help others. #WeLiveYes #WorldMentalHealthDay https://t.co/fjxbAJHghT
----------------------------------------
NymphInTheWood
2020-10-10 23:59:57
The sad coincidence of my depression hitting hard on #WorldMentalHealthDay I only just realized that was today
----------------------------------------
ShibbuKumar16
2020-10-10 23:59:56
#WorldMentalHealthDay
Holy Bible Genesis 1:27 God created mankind in his own image, in the image of God he created them; male and female he created them.
God is not formless. This a baseless theory.
Holy Bible proves that God is in human form.
https://t.co/VsRktkm7y0
----------------------------------------
zamud_doudd
2020-10-10 23:59:55
@WHO @UnitedGMH @WHOSEARO @DrTedros @momgerm @childreninwar @UNH

FairTooWell1
2020-10-10 23:59:23
#WorldMentalHealthDay not alot of people care about us who had to endure. im recovering well and staying away from those who just dont understand.
----------------------------------------
KSh42981671
2020-10-10 23:59:21
#WorldMentalHealthDay
Salvation can only be attained by taking refuge in true Spiritual Leader Saint Rampal Ji Maharaj. He is the one who provides the true way to worship Eternal God Kabir. https://t.co/nvI7sFsakk
----------------------------------------
TyHoward_Mmag
2020-10-10 23:59:20
You've got this! Be kind to your mind. Do not hesitate to ask for help. ~ Ty Howard #mentalhealthday #worldmentalhealthday #startups #Entrepreneurship #business #sales #workplace #employees #quotes #quote https://t.co/IE2QfuBsOH
----------------------------------------
cakenewsnext
2020-10-10 23:59:19
When you side with Abusers
🤐🤐🤐🤐🤐🤐🤐🤐🤐🤐🤐
#WorldMentalHealthDay
----------------------------------------
Eva1112222
2020-10-10 23:59:18
The definition of heal

It's also possible to use this search feature to grab the mentions of a Twitter account. Mentions are any tweet where another user's handle is included (i.e. they are mentioned in the tweet).

In [9]:
search_words = '@GovernorBullock -filter:retweets'


tweets_all = tweepy.Cursor(api.search,
                   tweet_mode='extended',
                   q=search_words,
                   lang='en').items()

# Put all the Tweet objects for a single Tweet into a tuple, and put all those into a list
tweets = [(tweet.full_text,tweet.created_at,tweet.user.screen_name) for tweet in tweets_all]


In [10]:
tweets[:10]

[('@GovernorBullock Happy Columbus Day.',
  datetime.datetime(2020, 10, 12, 17, 7),
  'nelas_27'),
 ('@GovernorBullock Do you ask that indigenous people also honor the contributions made by the people who are here now?',
  datetime.datetime(2020, 10, 12, 17, 3, 11),
  'Revlucduck'),
 ("@waiakoa @MTGOP @GovernorBullock @SteveDaines Except before the Democrats didn't control both the Senate and the Presidency.",
  datetime.datetime(2020, 10, 12, 16, 51, 15),
  'NickAllevato'),
 ("@MTGOP @GovernorBullock The only person 'packing' the Supreme Court right now is Trump.",
  datetime.datetime(2020, 10, 12, 16, 50, 46),
  'Reilly2020'),
 ('@GovernorBullock Thank you Governor Steve Bullock. Today gets to honor our indigenous people that have made our culture richer. Thank you.',
  datetime.datetime(2020, 10, 12, 16, 49, 18),
  'always_ike'),
 ('@MTGOP @GovernorBullock And @SteveDaines made his position clear too--the senate shouldn\'t confirm a SCOTUS nominee before a new president is elected. 