# Using Twitter API

In this course we are going to use tweepy (pip install tweepy) in order to use Twitter API via Python. Twitter has two types of APIs:

1. Streaming API: Streaming API is used in order to listen to tweets in real time. You can for example fetch all tweets containing a certain word.
2. Search API: Search API is used in order to search for older tweets. You can fetch tweets or followers of a user using search API. 

Here, we are going to **listen** the tweets containing words 'Hillary' and 'Trump' in order to carry sentiment analysis. To get more information on Twitter API and tweepy package, I recommend you to visit the following websites.

https://marcobonzanini.com/2015/03/02/mining-twitter-data-with-python-part-1/

http://www.sananalytics.com/lab/twitter-sentiment/

To use Twitter API, you need to create a user.
1. Create a Twitter account if you do not have an account (or do not want to use your account)
2. Login to http://apps.twitter.com 
3. Click on "Create New App" and follow the instructions
4. On "Keys and Access Tokens" generate keys and access tokens.

You should enter the tokens and keys as below.

In [3]:
#Import the necessary methods from tweepy library
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream

#Variables that contains the user credentials to access Twitter API 
access_token = "22783903-xlCuRfdqXvt7XJL6NEtmzPYesCXfqYlPk9gp905ZP"
access_token_secret = "2Uy6FaBC7NUpAt0ujyq6agUwE5Olt87bs89sS30zCATqA"
consumer_key = "tOe4LcFZPt5OSj637j0CxEENe"
consumer_secret = "O4IfN0HfsmKZz1ykVmlAP5EqXboboh0nBZ9OC1f95yRIcxrDcY"

Now it is time to create a stream listener. In our example, we are going to output tweets on a file called 'fetched_tweets_class.txt'. In the code below, variable **data** corresponds to the tweet together with its metadata.

In [2]:
#This is a basic listener that just prints received tweets to a file.
class StdOutListener(StreamListener):
    def on_data(self, data):
        #print data
        with open('fetched_tweets_class.txt','a') as tf:
            tf.write(data)
        return True

    def on_error(self, status):
        print status


#This handles Twitter authetification and the connection to Twitter Streaming API
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
#We create a stream listener using the authentication information we provided above
stream = Stream(auth, l)

#This line filter Twitter Streams to capture data by the keywords: 'hillary', 'trump'
# For more information you can check: http://tweepy.readthedocs.io/en/v3.5.0/streaming_how_to.html
stream.filter(track=['hillary', 'trump'])

KeyboardInterrupt: 

We saved the tweets in a text file 'fetched_tweets_class.txt'. Now let us read the first tweet and see the format.

In [4]:
with open('../data/fetched_tweets.txt', 'r') as tweetfile:
    line = tweetfile.readline()
print(line)

{"created_at":"Tue Jun 07 09:09:33 +0000 2016","id":740108425686876160,"id_str":"740108425686876160","text":"RT @HillaryClinton: \"We must stand against hate wherever it rears its ugly head.\" \u2014Hillary in 2000\nhttps:\/\/t.co\/qyhdZysMmH","source":"\u003ca href=\"http:\/\/twitter.com\/download\/android\" rel=\"nofollow\"\u003eTwitter for Android\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":28571004,"id_str":"28571004","name":"Serah Mwihaki","screen_name":"mwihakiwanjeri","location":"Nairobi, Kenya","url":null,"description":"A filmmaker with an insatiable passion for story telling. Creator. Writer.- Nairobi Half Life. Producer. Aspiring Oscar winner. Explorer.","protected":false,"verified":false,"followers_count":4458,"friends_count":3999,"listed_count":65,"favourites_count":313,"statuses_count":17531,"created_at":"Fri Apr 03 14:17

The tweet that we fetched seems complicated to read as it is. However, we can see it is very similar to a data structure we have seen before. Now let us print the tweet in a prettier format using JSON package. 

In [5]:
import json
line = json.loads(line)
print json.dumps(line,indent=4)

{
    "contributors": null, 
    "truncated": false, 
    "text": "RT @HillaryClinton: \"We must stand against hate wherever it rears its ugly head.\" \u2014Hillary in 2000\nhttps://t.co/qyhdZysMmH", 
    "is_quote_status": false, 
    "in_reply_to_status_id": null, 
    "id": 740108425686876160, 
    "favorite_count": 0, 
    "source": "<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter for Android</a>", 
    "retweeted": false, 
    "coordinates": null, 
    "timestamp_ms": "1465290573890", 
    "entities": {
        "user_mentions": [
            {
                "id": 1339835893, 
                "indices": [
                    3, 
                    18
                ], 
                "id_str": "1339835893", 
                "screen_name": "HillaryClinton", 
                "name": "Hillary Clinton"
            }
        ], 
        "symbols": [], 
        "hashtags": [], 
        "urls": [
            {
                "url": "https://t.co/qyhdZysMmH",

It looks exactly like a dictionary with several layers. We can see the body of the tweet in **text** header. There are other informations such as id, language, time of the tweet. In **entities** header, we can see that the tweet mentions a user called **HillaryClinton** with id **1339835893**.

The tweets are in JSON format. It follows {"key":value} structure. Under a key, we are allowed to use multiple entries such as:


"numberChildren": 2,

"children": [

    {
      "name": "XXXX YYYY",
      "age": 4
    },
    {
      "name": "YYYY XXXX",
      "age": 7
    }
  ],....
  
Now let's read and format the tweets that we downloaded.

In [6]:
import string
import unicodedata
tweets = []
tweets_list = []
with open('../data/fetched_tweets.txt', 'r') as f:
    for line in f:
        tweet = json.loads(line) # load it as Python dict
        #print json.dumps(tweet, indent=4)
        tweet = tweet['text']
        tweet = unicodedata.normalize('NFKD', tweet).encode('ascii','ignore')
        tweets.append(tweet)
        usedwords = [str(word).translate(None, string.punctuation).lower() for word in tweet.split() \
             if word[0] != '@' and len(word) >= 3 and word[0:4] != 'http']
        usedwords = [word for word in usedwords if not word.isdigit()]
      
        #print usedwords
        tweets_list.append(usedwords)
        #print(json.dumps(tweet, indent=4)) # pretty-print
print(tweets[0])
print(tweets_list[0])

RT @HillaryClinton: "We must stand against hate wherever it rears its ugly head." Hillary in 2000
https://t.co/qyhdZysMmH
['we', 'must', 'stand', 'against', 'hate', 'wherever', 'rears', 'its', 'ugly', 'head', 'hillary']


# Search API

In the Search API we can get followers of a user and timeline of a particular user. To get the followers of a user, we need the screen name (Twitter user name).

Note that Twitter imposes heavy restrictions on fetching such data. You can make at most 15 requests in 15 minutes. This is why you should add a pause between requests. In this example we do not use any pause as we fetch only 5000 users.

In the Search API, an important feature is **Cursor**. Cursors allows us to iterate through the information that we want by breaking it into pages (in ordero to respect the usage limits). This information can be the followers of a user (followers_ids) or accounts a user is following (friends_ids).

First we set up the connection.

In [7]:
import time
import tweepy

#Variables that contains the user credentials to access Twitter API 
access_token = "22783903-xlCuRfdqXvt7XJL6NEtmzPYesCXfqYlPk9gp905ZP"
access_token_secret = "2Uy6FaBC7NUpAt0ujyq6agUwE5Olt87bs89sS30zCATqA"
consumer_key = "tOe4LcFZPt5OSj637j0CxEENe"
consumer_secret = "O4IfN0HfsmKZz1ykVmlAP5EqXboboh0nBZ9OC1f95yRIcxrDcY"


auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth)

Now let us find the followers of the account with **screen_name** "thebeatles". We use **Cursor** to fetch that information. We use Cursor as follows:

**for page in tweepy.Cursor(....).pages():**

We have to provide information on what to fetch in the paranthesis. Here we want to fetch **followers_ids** of the user with **screen_name** "thebeatles". Because of the time and usage restrictions we limit the fetching to one page (5000 users). If you want to carry on you can pause for 60 seconds (Twitter allows for 15 requests in 15 minutes). 

In [8]:
ids = []
for page in tweepy.Cursor(api.followers_ids, screen_name="thebeatles").pages():
    ids.extend(page)
    time.sleep(1)
    break

print len(ids)

5000


Now let us see the ids of users that we fetched.

In [9]:
print(ids[:10])

[756804564737748992, 756804457980162048, 3313103249, 756802089247014912, 756789422071373824, 756801874867712000, 756801759482253312, 755806401658748929, 756800218486624256, 756799624858316800]


Of course user ids do not make sense by themselves. Let us also see the screen name and the last tweets of each user, if any. For that purpose, we can use user id or we can get the user screen_name (**get_user**).

In [10]:
u = api.get_user(ids[0])
print u

User(follow_request_sent=False, has_extended_profile=False, profile_use_background_image=True, _json={u'follow_request_sent': False, u'has_extended_profile': False, u'profile_use_background_image': True, u'profile_text_color': u'333333', u'default_profile_image': True, u'id': 756804564737748992, u'profile_background_image_url_https': None, u'verified': False, u'profile_location': None, u'profile_image_url_https': u'https://abs.twimg.com/sticky/default_profile_images/default_profile_3_normal.png', u'profile_sidebar_fill_color': u'DDEEF6', u'entities': {u'description': {u'urls': []}}, u'followers_count': 1, u'profile_sidebar_border_color': u'C0DEED', u'id_str': u'756804564737748992', u'profile_background_color': u'F5F8FA', u'listed_count': 0, u'is_translation_enabled': False, u'utc_offset': None, u'statuses_count': 0, u'description': u'', u'friends_count': 54, u'location': u'', u'profile_link_color': u'2B7BB9', u'profile_image_url': u'http://abs.twimg.com/sticky/default_profile_images/de

Finally, we use user ids or screen names in order to fetch their latest tweet, if any. We use **user_timeline** to fetch tweets of a given user.

In [11]:
for id in ids[:15]:
    u = api.get_user(id)
    print u.screen_name
    tweets = api.user_timeline(screen_name = u.screen_name,count=1)
    if(len(tweets) > 0):
        outtweets = [tweets[0].id_str, tweets[0].created_at,tweets[0].text.encode('utf-8'), tweets[0].lang]
        print(outtweets)

realy198866
baba_shroof
cyn_noe96


TweepError: Not authorized.