# Introduction to Data Science – Lecture 14 – APIs
*COMP 5360 / MATH 4100, University of Utah, http://datasciencecourse.net/*

In this lecture we will explore Twitter API. 
  
 * [Twitter](https://dev.twitter.com/rest/public)
 

## Libraries and Authentication

While we now have the skills to directly talk to an API, it's sometimes a little tedious. Popular APIs often have existing Python libraries that wrap around them. [Here](https://github.com/realpython/list-of-python-api-wrappers) is a long list of wrappers! 

Now we'll explore the Twitter API using the [twython library](https://github.com/ryanmcgrath/twython). Check out the [documentation](https://twython.readthedocs.io/en/latest/).

Unfortunately, most professional APIs will require you to authenticate and will limit you in what you can do – mostly they limit how much data you can retreive at a certain time. To run the following code, you'll have to put in your own credentials (sorry – I can't share mine). 

Install twython:
`pip install twython`

* First, you need to have a developer account https://developer.twitter.com/en
* Second, create an app 
* Third, save your api key and api key secret in a file `credentials.py` in the format:
```python
API_KEY = "KEY"
API_KEY_SECRET = "KEY"
```
* You will need at least an Elevated access level (free) for the following code to work

But before we get started, let's check out what [a tweet looks like](https://dev.twitter.com/overview/api/tweets):

```JSON
{'created_at': 'Mon Mar 01 19:58:00 +0000 2021',
  'id': 1366477842213707784,
  'id_str': '1366477842213707784',
  'text': 'This afternoon, I’ll be meeting virtually with Mexican President Andrés Manuel López Obrador. The U.S.-Mexico relat… https://t.co/4M2OgsL7uX',
  'truncated': True,
  'entities': {'hashtags': [],
   'symbols': [],
   'user_mentions': [],
   'urls': [{'url': 'https://t.co/4M2OgsL7uX',
     'expanded_url': 'https://twitter.com/i/web/status/1366477842213707784',
     'display_url': 'twitter.com/i/web/status/1…',
     'indices': [117, 140]}]},
  'source': '<a href="https://www.sprinklr.com" rel="nofollow">The White House</a>',
  'in_reply_to_status_id': None,
  'in_reply_to_status_id_str': None,
  'in_reply_to_user_id': None,
  'in_reply_to_user_id_str': None,
  'in_reply_to_screen_name': None,
  'user': {'id': 1349149096909668363,
   'id_str': '1349149096909668363',
   'name': 'President Biden',
   'screen_name': 'POTUS',
   'location': '',
   'description': '46th President of the United States, husband to @FLOTUS, proud dad & pop. Tweets may be archived: https://t.co/IURuMIrzxb',
   'url': 'https://t.co/IxLjEB2zlE',
   'entities': {'url': {'urls': [{'url': 'https://t.co/IxLjEB2zlE',
       'expanded_url': 'http://WhiteHouse.gov',
       'display_url': 'WhiteHouse.gov',
       'indices': [0, 23]}]},
    'description': {'urls': [{'url': 'https://t.co/IURuMIrzxb',
       'expanded_url': 'http://whitehouse.gov/privacy',
       'display_url': 'whitehouse.gov/privacy',
       'indices': [98, 121]}]}},
   'protected': False,
   'followers_count': 8329657,
   'friends_count': 12,
   'listed_count': 9248,
   'created_at': 'Wed Jan 13 00:37:08 +0000 2021',
   'favourites_count': 0,
   'utc_offset': None,
   'time_zone': None,
   'geo_enabled': False,
   'verified': True,
   'statuses_count': 238,
   'lang': None,
   'contributors_enabled': False,
   'is_translator': False,
   'is_translation_enabled': False,
   'profile_background_color': 'F5F8FA',
   'profile_background_image_url': None,
   'profile_background_image_url_https': None,
   'profile_background_tile': False,
   'profile_image_url': 'http://pbs.twimg.com/profile_images/1349837426626330628/CRMNXzQJ_normal.jpg',
   'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1349837426626330628/CRMNXzQJ_normal.jpg',
   'profile_banner_url': 'https://pbs.twimg.com/profile_banners/1349149096909668363/1614313035',
   'profile_link_color': '1DA1F2',
   'profile_sidebar_border_color': 'C0DEED',
   'profile_sidebar_fill_color': 'DDEEF6',
   'profile_text_color': '333333',
   'profile_use_background_image': True,
   'has_extended_profile': True,
   'default_profile': True,
   'default_profile_image': False,
   'following': None,
   'follow_request_sent': None,
   'notifications': None,
   'translator_type': 'none'},
  'geo': None,
  'coordinates': None,
  'place': None,
  'contributors': None,
  'is_quote_status': False,
  'retweet_count': 2155,
  'favorite_count': 19136,
  'favorited': False,
  'retweeted': False,
  'lang': 'en'}
```


In [None]:
from twython import Twython
# credentials is a local file with your own credentials.
# you will have to load your own credentials here.
from credentials_bei import * 
twitter = Twython(API_KEY, API_KEY_SECRET)

Here, we have created the Twython library object and authenticated against our user.

Now let's search for a hashtag:

In [None]:
tag = "#covid19"
result = twitter.search(q=tag, tweet_mode="extended")
result

In [None]:
tag = "#war"
result = twitter.search(q=tag, tweet_mode="extended")
result

The result is an array of tweets, we can look at specific text:

In [None]:
result["statuses"][0]["full_text"]

Or print all the tweets:

In [None]:
for status in result["statuses"]:
    print(status["full_text"])
    print("----")

## Analysis of Twitter Popularity

Let's do a brief analysis of twitter popularity of two political figures: Joe Biden's official POTUS account and Alexandria Ocasio-Cortez. As you would expect, Biden's personal account has more twitter followers (29 million) than AOC (12.6 million), though the POTUS account has only 8 million followers. We're using the POTUS account, because Biden's account has mostly been used for retweeting POTUS ever since he took office. As an aside, Barack Obama has 130 million followers, and Justin Bieber has 114 million. 

We can search for tweets based on usernames:

In [None]:
result = twitter.search(q="@joebiden",  tweet_mode="extended")
for status in result["statuses"]:
    print(status["full_text"])
    print("----")

This returns all tweets that are mentioning a username. 

We can also explicitly get the tweets of a person. Let's download Biden's last 50 tweets. [Here](https://dev.twitter.com/rest/reference/get/statuses/user_timeline) is the relevant API documentation, [here](https://github.com/ryanmcgrath/twython/blob/master/twython/endpoints.py) are the definitions for twython. 

Note that you get an error message if you try this with an account that is locked (like Trump's) or that doesn't exist.

In [None]:
#twitter = Twython(CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN, ACCESS_TOKEN_SECRET)

# count is limited to the last 3200 tweets of a user, a max of 200 per request
biden_result = twitter.get_user_timeline(screen_name="potus", count=50)
biden_result

Let's count the numbers for Biden's favorites and retweets.

In [None]:
biden_favorites = []
biden_retweets = []
biden_texts = []
for status in biden_result:
    print(status["text"])
    print(status["favorite_count"])
    print("----")
    biden_favorites.append(status["favorite_count"])
    biden_retweets.append(status["retweet_count"])
    biden_texts.append(status["text"])

Now let's do the same for Alexandria Ocasio-Cortez.

In [None]:
aoc_results = twitter.get_user_timeline(screen_name="aoc", count=50)
aoc_results

In [None]:
aoc_favorites = []
aoc_retweets = []
aoc_texts = []
for status in aoc_results:
    print(status["text"])
    print(status["favorite_count"])
    print("----")
    aoc_favorites.append(status["favorite_count"])
    aoc_retweets.append(status["retweet_count"])
    aoc_texts.append(status["text"])

And let's create DataFrames for both of them and explore their stats:

In [None]:
import pandas as pd 

biden_stats = pd.DataFrame({
        "Biden Fav":biden_favorites,
        "Biden RT":biden_retweets,
        "Biden Text":biden_texts
    })

aoc_stats = pd.DataFrame({
      "AOC Fav":aoc_favorites,
      "AOC RT":aoc_retweets, 
      "AOC Text":aoc_texts
        })

In [None]:
biden_stats.head()

In [None]:
biden_stats.describe()

We will plot the tweet data; but we have to sort them first so that they make sense. 

In [None]:
biden_stats = biden_stats.sort_values("Biden Fav", ascending=False)
biden_stats = biden_stats.reset_index(drop=True)
biden_stats.head(30)

In [None]:
biden_stats.tail(10)

In [None]:
aoc_stats = aoc_stats.sort_values("AOC Fav", ascending=False)
aoc_stats = aoc_stats.reset_index(drop=True)
aoc_stats.head(30)

In [None]:
combined = aoc_stats.copy()
combined["Biden Fav"] = biden_stats["Biden Fav"]
combined["Biden RT"] = biden_stats["Biden RT"]

In [None]:
combined.plot()

In [None]:
russia_results = twitter.get_user_timeline(screen_name="KremlinRussia_E", count=50)
russia_results