# <font color='#1DA1F2'>Twitter API</font>

Twitter implements OAuth 1.0 as its authentication protocol. You'll need 4 credentials in order to use OAuth and make requests to Twitter's API.

## <font color='#1DccF2'>Getting Your Credentials</font>

You will need your own Twitter account to do this section. You can always delete the account afterwards. I can't share my credentials with you this time around for technical reasons.

You will need to obtain four credentials (i.e. API key, API secret, Access Token and Access Token secret) on the Twitter developer site to access the API. The steps are as follows:
- Go to your Twitter settings https://twitter.com/settings/account (you need to login if not already). Select 'Mobile' and add your mobile phone to your account. Twitter will send you a confirmation code via text. This is required to create an application.
- Go to https://apps.twitter.com and create an application. 
- Enter a name/description/website for your application. You can use https://www.google.com for the website. Agree to the TOS and create your Twitter application.
- Go to the *Keys and Access Tokens* tab and copy your **Consumer Key** and **Consumer Secret** to the section below.
- At the bottom of the page, click the button to create your own access token. Copy the **Access Token** and **Access Token Secret** to the section below.

**Note**: You can always regenerate/delete your credentials or delete the application.

In [1]:
import os
consumerKey = os.getenv('consumerkey')
consumerSecret = os.getenv('consumersecret')
oauthToken = os.getenv('accesstoken')
oauthTokenSecret = os.getenv('accesstokensecret')

**Note**: To re-iterate, I do NOT recommend you actually hard code the credentials in a working setting. In Python, you should save the credentials in an environmental variable and retrieve it using the `os.getenv` method thus decoupling the credentials from your code.

## <font color='#1DccF2'>Installing Python Twitter Tools Module</font>

We will be using the Python Twitter Tool module for interacting with the API. There are many available Twitter packages for Python. I chose it because it was popular and seemed simple to use (although lacking good documentation). Documentation is at https://github.com/sixohsix/twitter.

Install the module from the terminal or command prompt with `pip install twitter`. 

In [2]:
import twitter

Next submit your credentials and create a variable containing your authentication.

In [3]:
auth = twitter.OAuth(oauthToken, oauthTokenSecret, consumerKey, consumerSecret)

# <font color='#1DA1F2'>Twitter Search API</font>

Use the Search API to look for historical tweets. The Twitter Search API searches against a __sampling__ of recent Tweets published in the past 7 days. We will see later how to get around the 7 day restriction by invoking other constraints.

**Note**: The Search API is not an exact replica of the Search feature available in Twitter mobile or web clients such as https://twitter.com/search. 

Let's start by creating a Search API handle using your authentication.

In [4]:
twtr = twitter.Twitter(auth=auth)

## <font color='#ffaaff'>Search using a query</font>

Details on the search API can be found here https://dev.twitter.com/rest/public/search.  
The documentation can be found here https://dev.twitter.com/rest/reference/get/search/tweets.

Create a query term and also specify the number of tweets you want (default = 10). Use the `search.tweets` method to search for tweets.

In [6]:
query = 'from:SunshineDadBlog' 
limit = 2
results = twtr.search.tweets(q=query, count=limit)
results

{'search_metadata': {'completed_in': 0.051,
  'count': 2,
  'max_id': 861996177536892928,
  'max_id_str': '861996177536892928',
  'query': 'from%3ASunshineDadBlog',
  'refresh_url': '?since_id=861996177536892928&q=from%3ASunshineDadBlog&include_entities=1',
  'since_id': 0,
  'since_id_str': '0'},
 'statuses': []}

Apparently, there is a lot of data associated with a 140 character tweet. Another reason why an API is better than HTML web scraping.

## <font color='#00aced'>User Info</font>

Here's how you would access some interesting fields about the Tweeter.

In [24]:
tweet

{'contributors': None,
 'coordinates': None,
 'created_at': 'Mon Apr 17 19:29:35 +0000 2017',
 'entities': {'hashtags': [],
  'symbols': [],
  'urls': [{'display_url': 'ngdata.com/top-data-scien…',
    'expanded_url': 'http://www.ngdata.com/top-data-science-resources/',
    'indices': [117, 139],
    'url': 'http://t.co/FbBvPJW9tl'}],
  'user_mentions': [{'id': 132373965,
    'id_str': '132373965',
    'indices': [3, 19],
    'name': 'MiningTheSocialWeb',
    'screen_name': 'SocialWebMining'}]},
 'favorite_count': 0,
 'favorited': False,
 'geo': None,
 'id': 854054251781292033,
 'id_str': '854054251781292033',
 'in_reply_to_screen_name': None,
 'in_reply_to_status_id': None,
 'in_reply_to_status_id_str': None,
 'in_reply_to_user_id': None,
 'in_reply_to_user_id_str': None,
 'is_quote_status': False,
 'lang': 'en',
 'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
 'place': None,
 'possibly_sensitive': False,
 'retweet_count': 20,
 'retweeted': False,
 'retweeted_status

In [22]:
tweet = results['statuses'][1]

In [23]:
print('Name: {}'.format(tweet['user']['name']))
print('ScreenName: {}'.format(tweet['user']['screen_name']))
print('Description: {}'.format(tweet['user']['description']))
print('Location: {}'.format(tweet['user']['location']))
print('# Tweets: {}'.format(tweet['user']['statuses_count']))        
print('# Following: {}'.format(tweet['user']['friends_count']))        
print('# Followers: {}'.format(tweet['user']['followers_count']))
print('# Likes: {}'.format(tweet['user']['favourites_count']))
print('# Lists: {}'.format(tweet['user']['listed_count']))

Name: Joyce Y. Lee
ScreenName: joyceyeaeunlee
Description: PhD student at University of Michigan. Interested in fathers, children's social emotional development, child well-being and safety.
Location: Ann Arbor, MI
# Tweets: 8786
# Following: 871
# Followers: 367
# Likes: 981
# Lists: 8


## <font color='#00aced'>Tweet Info</font>

Here's how you would access info about the tweet itself.

In [16]:
print('Created at: {}'.format(tweet['created_at']))       
print('Text: {}'.format(tweet['text']))
print('Source: {}'.format(tweet['source']))
for hashtag in tweet['entities']['hashtags']:
    print('Hashtag: {}'.format(hashtag['text']))
print('Likes: {}'.format(tweet['favorite_count']))
print('Retweets: {}'.format(tweet['retweet_count']))        
print('Retweet: {}'.format(tweet['retweeted']))
print('Coordinates: {}'.format(tweet['coordinates']))
place = tweet['place']
if place is not None:
    print('Place Name: {}'.format(place['full_name']))
    print('Place Type: {}'.format(place['place_type']))
    print('Place Bounding Box: {}'.format(place['bounding_box']['coordinates']))

Created at: Wed Apr 19 18:24:03 +0000 2017
Text: RT @barbendnews: The tsunami bar: 6 weird (but effective) barbells you've never heard of: https://t.co/imyV2gLxgD #powerlifting #weightlift…
Source: <a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>
Hashtag: powerlifting
Likes: 0
Retweets: 1
Retweet: False
Coordinates: None


## <font color='#00aced'>Search tweets by users located within a given radius of a GPS point.</font>

Add the `geo` argument to the `search.tweets` method.

In [None]:
query = 'food' 
limit = 5
georesults = twtr.search.tweets(q=query, count=limit, geo="42.7,-83.3, 10km")

Let's print out the tweet along with the datetime and coordinates information.

In [None]:
for i, tweet in enumerate(georesults['statuses']):
    print(i, tweet['text'])
    print(tweet['created_at'], tweet['coordinates'])

Q: How do I know what methods and arguments to use?  
A: I read the API documentation.

This URL has a list of API endpoints for Twitter https://dev.twitter.com/rest/reference. This is also a good jumping off point to get to the doucmentation. *Bookmark it!* So far, we've only used one API endpoint.

## <font color='#00aced'>Get a list of Followers</font>

Use the `followers.list` method to get a list of followers for a given user. Returns a default of 20 results. Can set value up to 200 using the `count` argument.

In [None]:
followers = twtr.followers.list(screen_name='arc_um', skip_status=True, include_user_entities=False)
followers

Print out the __screen name__ and the __description__ of the followers.

In [None]:
for i, user in enumerate(followers['users']):
    print(i, user['screen_name'], user['description'])

To iterate beyond 20 (or whatever you initially asked for), use the key `next_cursor` from the initial result along with the `cursor` argument. Think of it as Page 2. Repeat as necessary (see example below for complete acquisition).

In [None]:
followers = twtr.followers.list(screen_name='arc_um', cursor=followers['next_cursor'], skip_status=True, include_user_entities=False, )
for i, user in enumerate(followers['users']):
    print(i, user['screen_name'], user['description'])

## <font color='#00aced'>Get a list of Following</font>

Use the `friends.list` method to see who a given user is following. Returns a default of 20 results. Can set value up to 200 using the `count` argument.

In [None]:
following = twtr.friends.list(screen_name='arc_um', skip_status=True, include_user_entities=False)
for i, user in enumerate(following['users']):
    print(i, user['screen_name'], user['description'])

Same function argument to access "Page 2" of the results as in the followers example.

## <font color='#00aced'>Cross Reference Followers and Following</font>

Here is some example code to cross reference the two lists to see which relationships are reciprocated.

This function appends a list of users to an existing list.

In [7]:
def append_users(f, list_users):
    for user in f['users']:
        list_users.append(user['screen_name'])
    return list_users

Grab entire list of followers.

In [8]:
followers = twtr.followers.list(screen_name='SunshineDadBlog', count=200)
lemmings = []
lemmings = append_users(followers, lemmings)
while (followers['next_cursor'] != 0):
    followers = twtr.followers.list(screen_name='arc_um', count=200, cursor=followers['next_cursor'])
    lemmings = append_users(followers, lemmings)
print('There are {} followers'.format(len(lemmings)))

There are 553 followers


In [12]:
lemmings[:10]

['InfiniteFleur',
 'SilviaMarsz',
 'RebeccaRuchR',
 'AnEvery_DayDad',
 'WealtUSAHealth',
 'FranksRedHot',
 'madame_rosette',
 'lyra_hall',
 'MatthewOluwole2',
 'breezyo']

Grab entire list of following.

In [None]:
following = twtr.friends.list(screen_name='SunshineDadBlog', count=200)
leaders = []
leaders = append_users(following, leaders)
while (following['next_cursor'] != 0):
    following = twtr.friends.list(screen_name='arc_um', count=200, cursor=followers['next_cursor'])
    leaders = append_users(following, leaders)
print('Following {} accounts.'.format(len(leaders)))

Find the intersection of the two groups using a set operations.

In [None]:
lemmings = set(lemmings)
leaders = set(leaders)
relationship = leaders.intersection(lemmings)
print(len(relationship), relationship)

## <font color='#00aced'>Search for Trends by Where on Earth (WOE) ID</font>

Use the `trends.place` method to get a list of trending topics for a given location. The location is specified using the WOE ID.  
A WOE ID is a unique identifier for a place on Earth. 

Here is a dictionary of places and WOE IDs.

In [None]:
woeid = {'World':1, 'USA':23424977, 'San Francisco':2487956, 'Los Angeles':2442047,
         'Canada':23424775, 'Toronto':4118, 'Montreal':3534,
         'United Kingdom':23424975, 'Germany':23424829}

Searching for trends by WOE ID is analagous to searching for YouTube videos by Region Code.

In [None]:
woe_trends = twtr.trends.place(_id=woeid['Canada'])

In [None]:
woe_trends

Print out top 10 trending topics with number of tweets

In [None]:
trends = woe_trends[0]['trends']
for trend in trends[:10]:
    print(trend['name'], trend['tweet_volume'])

Here is a URL detailing the WOE ID's supported by Twitter.  
https://twittercommunity.com/t/what-are-the-list-of-woeids-supported-by-twitter/8493/2.

**Note**: Not all WOE IDs are supported by Twitter. Places like Ann Arbor or Michigan are not. Apparently not important enough :(

## <font color='#00aced'>Submit Your Own Tweet</font>

If you want to become an *evil trolling TwitterBot*, this is the first step towards darkness. Use the `statuses.update` method to start tweeting from Python.

In [None]:
#twtr.statuses.update(status="Python Tweeting from the Twitter API workshop. Thanks #CSCAR and @ARC_UM")

Tweeting parameters can be found at  
https://dev.twitter.com/rest/reference/post/statuses/update

If you've been counting, we've touched upon 5 API endpoints out of the 100+ endpoints available.

## <font color='#00aced'>Rate Limits</font>

Usage of the Twitter API is subject to rate limits which varies based on the `GET` request or endpoint.  
Details can be found here at https://dev.twitter.com/rest/public/rate-limits.

# <font color='#1DA1F2'>Twitter Streaming API</font>

The streaming API is used to collect tweets from the future. The streaming API provides a **sample** of the available Tweets. Polling and rate limits do NOT apply to the streaming API.

Details on the public data streaming API can be found here at 
https://dev.twitter.com/streaming/public

This code block is here so we can quickly restart the kernel if the streaming API is complaining about `Exceeded connection limit for user`.

In [None]:
consumerKey = ''
consumerSecret = ''
oauthToken = ''
oauthTokenSecret = ''

import twitter
auth = twitter.OAuth(oauthToken, oauthTokenSecret, consumerKey, consumerSecret)

We need to create a new API hand for streaming using our authentication. Only one standing connection per account is allowed to a public endpoint. We'll be using the public stream API which is specified in the domain argument.

In [4]:
twtr_stream = twitter.TwitterStream(auth=auth, domain="stream.twitter.com")

There are two other streaming endpoints you can access. User and site streams. Details can be found at:  
https://dev.twitter.com/streaming/overview

**Note**: You can NOT request every future Tweet through this API. That is referred to as the Firehose. It costs a lot of `$$$$$$$$`.

## <font color='#1DA1F2'>Search by Filter</font>

Stream searches are done with a delimited list of terms. A phrase may consist of one or more terms. Term ordering is ignored and searches are not case sensitive.
 
spaces == logical ANDs (e.g. `"Alex twitter" == "alex AND twitter"`)  
commas == logical ORs (e.g. `"Alex, twitter" == "Alex OR twitter"`)

The text of the Tweet and some entity fields are considered for matches. Specifically:
- the `text` attribute of the Tweet
- `expanded_url` and `display_url` for links and media
- `text` for hashtags
- and `screen_name` for user mentions

Use the `statuses.filter` method to create a streaming query.

In [5]:
iterator = twtr_stream.statuses.filter(track="#stayathomedad, #stayathomemom, #stayathomefather, #stayathomemother, #sahd, #sahm, #sahf")

Use a `for` loop to get the generator to yield future results as they come in. I'm printing the fields (where applicable) that are being searched except time. The `break` command is to prevent it going on indefinitely.

**Tip**: Use the stop button in the toolbar to prevent it from going to 100.

In [6]:
for i, tweet in enumerate(iterator):   
    print('{} Time: {}'.format(i, tweet['created_at']))
    print('Tweet: {}'.format(tweet['text']))
    try:
        print('Expanded URL: {}'.format(tweet['entities']['urls'][0]['expanded_url']))
        print('Display URL: {}'.format(tweet['entities']['urls'][0]['display_url']))
    except:
        pass     
    if len(tweet['entities']['hashtags']) > 0:
        print('Hashtags: {}'.format(tweet['entities']['hashtags'][0]['text']))
    if len(tweet['entities']['user_mentions']) > 0:
        print('Screen Name: {}'.format(tweet['entities']['user_mentions'][0]['screen_name']))
    if i > 100:
        break

0 Time: Wed Apr 19 18:17:10 +0000 2017
Tweet: RT @urfavmenaldi: Love conversations with my dad!!!1!!1!1!1!!!! https://t.co/fN3iinCHCB
Screen Name: urfavmenaldi
1 Time: Wed Apr 19 18:17:10 +0000 2017
Tweet: @iamblackbear But the east coast dad :(
Screen Name: iamblackbear
2 Time: Wed Apr 19 18:17:10 +0000 2017
Tweet: Is the dad Stefan? Or Damon? I think it's Matt.
3 Time: Wed Apr 19 18:17:11 +0000 2017
Tweet: @HamillHimself BEST DAD IN THE GALAXY
Screen Name: HamillHimself
4 Time: Wed Apr 19 18:17:11 +0000 2017
Tweet: RT @LuC4zNytMare: @tribelaw @WhitfordBradley @BorowitzReport Kim Jong-Un's father groomed him for absolute rule over N. Korea, Jared'… 
Expanded URL: None
Screen Name: LuC4zNytMare
5 Time: Wed Apr 19 18:17:10 +0000 2017
Tweet: Daddy Cream and Marc Cervantes ---&gt; https://t.co/BmdqtZKgKW https://t.co/8yVJ9Tayb1
Expanded URL: http://ift.tt/2oPtxym
Display URL: ift.tt/2oPtxym
6 Time: Wed Apr 19 18:17:11 +0000 2017
Tweet: RT @hottychix: #FOLLOW @NinaNorth19 Push me harder Da

Details on the `track` parameter can be found here  
https://dev.twitter.com/streaming/overview/request-parameters#track

## <font color='#1DA1F2'>Search by Location</font>

Use the `locations` argument to specify a bounding box to search. The API will return all tweets whose location intersects the bounding box. This will return all tweets intersecting the New York City bounding box. 

In [15]:
iterator = twtr_stream.statuses.filter(locations="-74,40,-73,41")
for i, tweet in enumerate(iterator, start=1):
    print('Time: {}'.format(tweet['created_at']))
    print('Tweet: {}'.format(tweet['text']))
    print('Coordinates: {}'.format(tweet['coordinates']))
    if tweet['place'] is not None:
        print('BoundingBox: {}'.format(tweet['place']['bounding_box']['coordinates']))
        print('Name: {}'.format(tweet['place']['full_name']))
        print('Type: {}'.format(tweet['place']['place_type']))
        print('ID: {}\n'.format(tweet['place']['id']))
    else:
        print('')
    if i > 4:
        break

Time: Sat Apr 01 19:27:53 +0000 2017
Tweet: #fatboyproblems good looking my brother a_king_amongst_men lunch @… https://t.co/lGqIeLPBfi
Coordinates: {'type': 'Point', 'coordinates': [-74.01928, 40.64522]}
BoundingBox: [[[-74.041878, 40.570842], [-74.041878, 40.739434], [-73.855673, 40.739434], [-73.855673, 40.570842]]]
Name: Brooklyn, NY
Type: city
ID: 011add077f4d2da3

Time: Sat Apr 01 19:27:54 +0000 2017
Tweet: Just like that https://t.co/OFEh4B84OQ
Coordinates: None
BoundingBox: [[[-74.026675, 40.683935], [-74.026675, 40.877483], [-73.910408, 40.877483], [-73.910408, 40.683935]]]
Name: Manhattan, NY
Type: city
ID: 01a9a39529b27f36

Time: Sat Apr 01 19:27:54 +0000 2017
Tweet: Think the point was him using a girl with body wave bundles in when talking about Richard Pryor fros.... https://t.co/sABq7onDhh
Coordinates: None
BoundingBox: [[[-73.039115, 40.837693], [-73.039115, 40.921065], [-72.972416, 40.921065], [-72.972416, 40.837693]]]
Name: Coram, NY
Type: city
ID: 92e1e697abf56722

T

Bounding boxes act like OR operators. They do not filter `track` parameters. So the following will either return football OR tweets from NYC. 

In [None]:
iterator = twtr_stream.statuses.filter(track="football", locations="-74,40,-73,41")
for i, tweet in enumerate(iterator, start=1):
    print('Time: {}'.format(tweet['created_at']))
    print('Tweet: {}'.format(tweet['text']))
    print('Coordinates: {}'.format(tweet['coordinates']))
    if tweet['place'] is not None:
        print('BoundingBox: {}'.format(tweet['place']['bounding_box']['coordinates']))
        print('Name: {}'.format(tweet['place']['full_name']))
        print('Type: {}'.format(tweet['place']['place_type']))
        print('ID: {}\n'.format(tweet['place']['id']))
    else:
        print('')
    if i > 4:
        break

Details on the `locations` parameter can be found here  
https://dev.twitter.com/streaming/overview/request-parameters#locations

## <font color='#1DA1F2'>Saving the Tweets</font>

Once you have the tweets in hand, you can save it in JSON format to a:
1. text file
2. NoSQL database (MongoDB seems to be a popular choice)

We won't cover saving in detail here because it is non-trivial to setup a MongoDB database (and requires admin privileges) within this workshop.

### <font color='#1DA1b2'>Text File</font>

To save a single tweet to a text file, use the `json` module with the `dumps` method with standard Python file I/O.

In [None]:
import json
with open('tweet.txt','w') as fout:
    fout.write(json.dumps(tweet, indent=2))

### <font color='#1DA1b2'>MongoDB</font>

Below is a simple example of how to add a tweet to a MongoDB.

**Tip**: Make sure MongoDB is running before running this snippet.

In [None]:
import pymongo
client = pymongo.MongoClient("localhost", 27017)
db = client.example
db.my_collection

Insert a single tweet

In [None]:
db.my_collection.insert_one(tweet).inserted_id

Lookup the single tweet

In [None]:
db.my_collection.find_one()

**Note**: There are additional steps besides the code shown to get MongoDB working. 

## <font color='darkyellow'>Last Note: Library of Congress Twitter Archive</font>

The Library of Congress and Twitter have teamed up back in April 2010 to archive every single public tweet. This archive has not been made public yet. Like everything, its behind schedule. Here's an journal article on the subject matter. http://firstmonday.org/article/view/5619/4653#p4