# Demo 4: APIs and Functions II 

## 4.1 Twitter API


**4.1.1 Installing and importing new modules:** Before we can interact with the Twitter API, we need to install the `tweepy` module. We would usually install modules outside our Jupyter Notebooks using the command line. However, we can actually also interact with the command line from within our Notebooks using the `!` operator. Now, uncomment the cell below and run it.

In [2]:
# # BEFORE WE CAN USE THE TWEEPY LIBRARY, WE NEED DO INSTALL IT
# # THAT IS, UNCOMMENT AND EXECUTE THIS CELL ONCE
# # need to use sys.prefix to install from within jupyter notebook
# # following: https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/
import sys
! conda install --yes --prefix {sys.prefix} -c conda-forge tweepy

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: C:\Users\JeNiE\Anaconda3

  added / updated specs:
    - tweepy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    blinker-1.4                |             py_1          13 KB  conda-forge
    conda-4.8.2                |           py37_0         3.0 MB  conda-forge
    oauthlib-3.0.1             |             py_0          82 KB  conda-forge
    pyjwt-1.7.1                |             py_0          17 KB  conda-forge
    requests-oauthlib-1.2.0    |             py_0          19 KB  conda-forge
    tweepy-3.8.0               |             py_0          26 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.2 MB

The following NEW packages will be



  current version: 4.7.12
  latest version: 4.8.2

Please update conda by running

    $ conda update -n base -c defaults conda




After you have run the above cell, make sure everything worked as expected and that you have successfully installed `tweepy` by importing the module and checking it's version using the commands in the cell below.

In [1]:
import tweepy
print(tweepy.__version__)

3.8.0


**4.1.2 Loading credentials and authenticating to the API**: Now that we have installed and imported the `tweepy` module, we can use it to authenticate ourselves to the Twitter API. To do this, we first need to access our credentials from the file _AppCred.py_ we set up earlier in class. Running the cell below will load your Twitter developer credentials and make them available in this session of your Jupyter Notebook.

In [2]:
from AppCred import CONSUMER_KEY, CONSUMER_SECRET
from AppCred import ACCESS_TOKEN, ACCESS_TOKEN_SECRET

Now we can start the authentication process to access the Twitter API by passing your consumer details to the `OAuthHandler` function from the `tweepy` module.

In [3]:
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)

Next, we add our access details to the `auth` variable we just created.

In [4]:
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)

Finally, we pass our `auth` variable to the `API` function provided in the `tweepy` module to generate variable that allows us to interact with Twitter API.

In [5]:
api = tweepy.API(auth)

**4.1.3 Interacting with the Twitter API:** Now that we authenticated ourselves to the Twitter API, we can use it to post and delete tweets from our own account, favorite and retweet tweets from other accounts, and collect information from other public Twitter accounts. 

**4.1.3.1 Tweeting:** Let's try posting a tweet with our well known example using the `update_status` function.

In [10]:
api.update_status("Hej")

Status(_api=<tweepy.api.API object at 0x000001E324D4B1C8>, _json={'created_at': 'Fri Feb 21 11:54:05 +0000 2020', 'id': 1230822999776952320, 'id_str': '1230822999776952320', 'text': 'Hej', 'truncated': False, 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': []}, 'source': '<a href="https://data.com" rel="nofollow">Digimeth</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 1099070485722251264, 'id_str': '1099070485722251264', 'name': 'Jens Pedersen', 'screen_name': 'JensPed98928586', 'location': '', 'description': '', 'url': None, 'entities': {'description': {'urls': []}}, 'protected': False, 'followers_count': 0, 'friends_count': 33, 'listed_count': 0, 'created_at': 'Fri Feb 22 22:16:38 +0000 2019', 'favourites_count': 0, 'utc_offset': None, 'time_zone': None, 'geo_enabled': False, 'verified': False, 'statuses_count': 1, 'lang': Non

You just posted your first tweet using Python, how exciting! Now, go to your Twitter profile and see if you can find the tweet by going to _twitter.com/YOUR_USERNAME_.

**4.1.3.2 Deleting:** In addition to posting on Twitter, we can also delete our own tweets. To do that we need to find the _tweet id_ of our post. See if you can find your tweet's id, then pass it to the function `destroy_status` below and see what happens when you execute the cell and return to your Twitter profile. 

In [11]:
api.destroy_status("1230822999776952320")

TweepError: [{'code': 144, 'message': 'No status found with that ID.'}]

**4.1.3.3 Reading:** For many research purposes you might be more interested in collecting information such as tweets from Twitter rather than posting your own. We can also do this in Python using the Twitter API. Let's start with a simple example of accessing the complete timeline of an account I created for our class.

In [6]:
example_timeline = api.user_timeline("vicariousveblen")

In [73]:
print(example_timeline)

[Status(_api=<tweepy.api.API object at 0x00000251131CF708>, _json={'created_at': 'Tue Feb 18 16:49:23 +0000 2020', 'id': 1229810150904680452, 'id_str': '1229810150904680452', 'text': 'For the end of vicarious consumption is to enhance, not the fullness of life of the consumer, but the pecuniary rep… https://t.co/x2O7ALCem3', 'truncated': True, 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [{'url': 'https://t.co/x2O7ALCem3', 'expanded_url': 'https://twitter.com/i/web/status/1229810150904680452', 'display_url': 'twitter.com/i/web/status/1…', 'indices': [117, 140]}]}, 'source': '<a href="https://google.com" rel="nofollow">DigitalMethods2020</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 1229786706393620480, 'id_str': '1229786706393620480', 'name': 'vicariousveblen', 'screen_name': 'vicariousveblen', 'location': 'Cato, Wisconsin',

In [24]:
example_user = api.get_user("vicariousveblen")

TypeError: 'User' object is not iterable

This creates a variable of type `tweepy.models.ResultSet` which basically behaves like a list of tweets with the text and a lot of metadata. Knowing that it behaves like a list, how can we see how many tweets we collected?

In [151]:
api.rate_limit_status()

{'rate_limit_context': {'access_token': '1099070485722251264-6Eh24zShbJKUvJDKV8hVeqfM1xrgr7'},
 'resources': {'lists': {'/lists/list': {'limit': 15,
    'remaining': 15,
    'reset': 1582389397},
   '/lists/memberships': {'limit': 75, 'remaining': 75, 'reset': 1582389397},
   '/lists/subscribers/show': {'limit': 15,
    'remaining': 15,
    'reset': 1582389397},
   '/lists/members': {'limit': 900, 'remaining': 900, 'reset': 1582389397},
   '/lists/subscriptions': {'limit': 15, 'remaining': 15, 'reset': 1582389397},
   '/lists/show': {'limit': 75, 'remaining': 75, 'reset': 1582389397},
   '/lists/ownerships': {'limit': 15, 'remaining': 15, 'reset': 1582389397},
   '/lists/subscribers': {'limit': 180, 'remaining': 180, 'reset': 1582389397},
   '/lists/members/show': {'limit': 15, 'remaining': 15, 'reset': 1582389397},
   '/lists/statuses': {'limit': 900, 'remaining': 900, 'reset': 1582389397}},
  'application': {'/application/rate_limit_status': {'limit': 180,
    'remaining': 178,
    '

In [52]:
data10 = []

for tweet in example_timeline:
    data10.append((tweet.id_str, tweet.created_at, tweet.truncated, tweet.text))

In [60]:
for entry in data10:
    print(entry[0])
    print(entry[1])
    print(entry[2])
    print(entry[3])
    print()

1229810150904680452
2020-02-18 16:49:23
True
For the end of vicarious consumption is to enhance, not the fullness of life of the consumer, but the pecuniary rep… https://t.co/x2O7ALCem3

1229809646287933441
2020-02-18 16:47:23
False
As has already been indicated, the distinction between exploit and drudgery is an invidious distinction between employments.

1229809141423669250
2020-02-18 16:45:22
False
High-bred manners and ways of living are items of conformity to the norm of conspicuous leisure and conspicuous consumption.

1229801942915874816
2020-02-18 16:16:46
False
Hello World!



In [62]:
data11 = []

for tweet in example_timeline:
    data11.append((tweet.text))

In [74]:
print(data11)

['For the end of vicarious consumption is to enhance, not the fullness of life of the consumer, but the pecuniary rep… https://t.co/x2O7ALCem3', 'As has already been indicated, the distinction between exploit and drudgery is an invidious distinction between employments.', 'High-bred manners and ways of living are items of conformity to the norm of conspicuous leisure and conspicuous consumption.', 'Hello World!']


In [22]:
print(data10)

[('1229810150904680452', datetime.datetime(2020, 2, 18, 16, 49, 23), True, 'For the end of vicarious consumption is to enhance, not the fullness of life of the consumer, but the pecuniary rep… https://t.co/x2O7ALCem3'), ('1229809646287933441', datetime.datetime(2020, 2, 18, 16, 47, 23), False, 'As has already been indicated, the distinction between exploit and drudgery is an invidious distinction between employments.'), ('1229809141423669250', datetime.datetime(2020, 2, 18, 16, 45, 22), False, 'High-bred manners and ways of living are items of conformity to the norm of conspicuous leisure and conspicuous consumption.'), ('1229801942915874816', datetime.datetime(2020, 2, 18, 16, 16, 46), False, 'Hello World!')]


So we collected a set of tweets and now want to look at the content/the texts of these tweets. Remembering that you can work with the `example_timeline` variable like with a list and that each list element has a key called `text` linked to the content of the tweet, how would you access the first tweet in `example_timeline`?

In [29]:
 for tweet in data10:
    print(tweet.text)

TypeError: 'Status' object is not iterable

In [34]:
import json 

status = example_timeline[0]

#convert to string
json_str = json.dumps(status._json)

#deserialise string into python object
parsed = json.loads(json_str)

print(json.dumps(parsed, indent=4, sort_keys=True))





<function dumps at 0x000002B53B53CF78>


In [95]:
for entry in data10:
    if "..." in entry:
        print(entry)
    else:
        print("u")

u
u
u
u


In [107]:
for tweet in data11:
    if "http" in tweet:
        print(tweet + "    Dette tweet er forkortet")
    else:
        print(tweet + "    Dette tweet er ikke forkortet")

For the end of vicarious consumption is to enhance, not the fullness of life of the consumer, but the pecuniary rep… https://t.co/x2O7ALCem3    Dette tweet er forkortet
As has already been indicated, the distinction between exploit and drudgery is an invidious distinction between employments.    Dette tweet er ikke forkortet
High-bred manners and ways of living are items of conformity to the norm of conspicuous leisure and conspicuous consumption.    Dette tweet er ikke forkortet
Hello World!    Dette tweet er ikke forkortet


Look at this output and compare it to the original tweet [here](https://t.co/x2O7ALCem3). What do you notice? What does that mean for working with the Twitter API in practice?

The Twitter API cuts tweets of a certain length but contains information about which tweets are cut in a key called `truncated`. Can you write a loop to look at which of the tweets we collected were cut short?

In [72]:
data12 = []
for tweet in example_timeline:
    data12.append(tweet.truncated)
    
print(data12)


[True, False, False, False]


In [155]:
data12.created_at

AttributeError: 'list' object has no attribute 'created_at'

If we did not have this information, we could use the tools that we have learned already to provide us with the same information. How would you write a loop that does this? _Hint:_ You will want to look at what distinguishes the `text` in truncated tweets from those in untruncated tweets.

In [None]:
for tweet in data11:
    if "http" in tweet:
        print(tweet + "    Dette tweet er forkortet")
    else:
        print(tweet + "    Dette tweet er ikke forkortet")

In addition to the tweet content, the API provides us with a host of valuable metadata about the tweets such as how often they were retweeted, favorited, and when they were posted. Looking just at the second tweet using `example_timeline[1]`, can you find the right keys to identify 1) when the tweet was posted, 2) how often it has been retweeted, and 3) how often it has been favorited?

In [120]:
data13 = []

for tweet in example_timeline:
    data13.append((tweet.created_at, tweet.retweet_count))
    
data13[3]

(datetime.datetime(2020, 2, 18, 16, 16, 46), 0)

In [138]:
example_timeline[1].created_at

datetime.datetime(2020, 2, 18, 16, 47, 23)

In [139]:
example_timeline[1].retweet_count

0

In [None]:
# 3) code to access number of favorites

Beside allowing us to collect all the tweets produced by public Twitter accounts, the Twitter API also allows us to only access information about the Twitter acounts. The function to do this in `tweepy` is called `get_user`.

In [121]:
example_user = api.get_user("vicariousveblen")

Once we have collected the user profile, we can look at things like their location, their description or about me section, how often they have posted and who they follow and who follows them. The variable type returned by `get_user` is slightly easier to navigate to access these information since they are not nested in tweets.

In [140]:
# where does our example account live?
example_user.location

'Cato, Wisconsin'

In [141]:
# what does the description say
example_user.description

'Living my best Veblen life, vicariously.'

In [142]:
# does the account follow anyone or have any friends?
print("The account " + str(example_user.name) + " has " + str(example_user.followers_count) + " accounts following it.")
print("The account " + str(example_user.name) + " is following " + str(example_user.friends_count) + " accounts.")

The account vicariousveblen has 0 accounts following it.
The account vicariousveblen is following 0 accounts.


Finally the Twitter API has functionality with which we can look for certain keywords in all of Twitter. We can access this using `tweepy`'s  `search` function. For example, if we wanted to look for tweets using the hashtag '#DigitalMethods', we could search like this.

In [143]:
digimeth_tweets = api.search("#DigitalMethods")

Then, we could look at who tweets about '#DigitalMethods' by parsing the returned data like this.

In [144]:
for tweet in digimeth_tweets:
    print(tweet._json['user']['name'])

Nicolo' Dell'Unto
mLab Geography Bern
DOS Research Group
Feministische Geographie Bern
Christian Ziegler
Gale Australia / NZ
K. White
DH i Norden
Raquel Recuero
Earvin Charles Cabalquinto, PhD
@DHUppsala
Marieta Autor-Caparas
GaleEMEA
Chris Houghton
Bonny Doon


**4.1.3.4 Rate limiting:** While we are only working with a few tweets or a limited number of accounts, we will not run into any problems. However, it is good general practice to always keep an eye on the rate limits set on us by the Twitter API. The `tweepy` module provides the function `rate_limit_status` to do so.

In [150]:
# check our current rate limit status
current_limits = api.rate_limit_status()

The variable `rate_limit_status` returns is a dictionary, that means it is intuitive to index once we know the keys. The keys that will likely be most important to us relate to searches and users.

In [146]:
# rate limit on the number of times we can call `get_user` within a 15 minute window
current_limits['resources']['users']['/users/lookup']

{'limit': 900, 'remaining': 900, 'reset': 1582381536}

In [147]:
# rate limit on the number of times we can call `user_timeline` within a 15 minute window
current_limits['resources']['statuses']['/statuses/user_timeline']

{'limit': 900, 'remaining': 900, 'reset': 1582381536}

## 4.2 Using functions to process data from the Twitter API 
Now that we know about some of the information that we can gather from Twitter and the structure in which it is returned to us, we can see even more use for defining our own functions. For example, we can combine all the tweets we retrieved for our example account and see what the account is tweeting about most.

In [148]:
# define the function `user_gist` taking one argument
def user_gist(user_timeline):
    
    # set up empty containers we will need throughout the loop
    word_freq = {}
    word_list = []
    gist = []
    
    # FIRST, loop through tweets in the timeline
    for tweet in user_timeline:
        # split up tweets into lists of words
        tweet_words = tweet.text.split()
        # and combine into one big list using `extend` command
        word_list.extend(tweet_words)
    
    # SECOND, loop through list of words in tweets
    for w in word_list:
        # add each unique word and its `count` to the dictionary `word_frequency`
        if w not in word_freq:
            word_freq[w] = word_list.count(w)

    #looping through the dictionary and adding each value, key pair to the list
    for key in word_freq:
        gist.append((word_freq[key], key))

    #sorting the list
    gist.sort()
    #reversing the sort to be largest to smallest
    gist.reverse()

    #returning the list
    return gist
        

In [149]:
user_gist(example_timeline)

[(6, 'the'),
 (6, 'of'),
 (3, 'and'),
 (2, 'to'),
 (2, 'is'),
 (2, 'distinction'),
 (2, 'conspicuous'),
 (2, 'between'),
 (1, 'ways'),
 (1, 'vicarious'),
 (1, 'rep…'),
 (1, 'pecuniary'),
 (1, 'not'),
 (1, 'norm'),
 (1, 'manners'),
 (1, 'living'),
 (1, 'life'),
 (1, 'leisure'),
 (1, 'items'),
 (1, 'invidious'),
 (1, 'indicated,'),
 (1, 'https://t.co/x2O7ALCem3'),
 (1, 'has'),
 (1, 'fullness'),
 (1, 'exploit'),
 (1, 'enhance,'),
 (1, 'end'),
 (1, 'employments.'),
 (1, 'drudgery'),
 (1, 'consumption.'),
 (1, 'consumption'),
 (1, 'consumer,'),
 (1, 'conformity'),
 (1, 'but'),
 (1, 'been'),
 (1, 'are'),
 (1, 'an'),
 (1, 'already'),
 (1, 'World!'),
 (1, 'High-bred'),
 (1, 'Hello'),
 (1, 'For'),
 (1, 'As')]