# Network Science


**GOAL OF THE SESSION**: Fetch data from Twitter APIs

**DATA SOURCE**: Twitter 

**DEVELOPMENT**: How to create a Python script to query Twitter APIs

**REQUIREMENTS**: 

    Twitter Developer Account
    tweepy 
    Python Pretty Print

### Pretty Print

Using **pprint** we can format in a nice way the print output. Here an example:

    import pprint
    pp = pprint.PrettyPrinter(indent=2)
    pp.pprint(OBJECT-TO-PRINT)


In [1]:
import pprint
#print help(pprint.PrettyPrinter)
pp = pprint.PrettyPrinter(indent=2)
names = {"name":"Alex", "Surname":"Comu", "list":[1,2,3],"address":{"street":"Via Maria Vittoria", "number":1}}
print "NON PRETTY\n", names, "\n"
print "PRETTY:"
pp.pprint(names)

NON PRETTY
{'list': [1, 2, 3], 'Surname': 'Comu', 'name': 'Alex', 'address': {'street': 'Via Maria Vittoria', 'number': 1}} 

PRETTY:
{ 'Surname': 'Comu',
  'address': { 'number': 1, 'street': 'Via Maria Vittoria'},
  'list': [1, 2, 3],
  'name': 'Alex'}


# Demo with Special Effects

Inside the folder **demo** you'll find a very cool demo, a super interaction between:

* Twitter API
* Python Web Server
* D3Js visualization

Read the file **Readme.md** to have more information about the example.

The goal of the demo is to create a Connector between my PC and Twitter. After the creation of this connector I want ro retrieve all the tweets that contains a specific **hashtags**.

At the end I'll represent the tweets in a dynamic data visualization with D3Js.

# Twitter Developer Account

Sign in @ [https://dev.twitter.com/](https://dev.twitter.com/) website and create an account if you need.

After the creation of the account we need to create a new Twitter APP to fetch the APIs, so go to [https://apps.twitter.com/](https://apps.twitter.com/) and create a new one.

To allow our APP to use the Twitter APIs we need to create an Access Token, so click on **Keys and Access Tokens** and create a new one.

And now we're ready to play with Twitter:)

## Twitter Documentation

[HERE](https://dev.twitter.com/overview/api) we can find a complete overview on Twitter API.

# Tweepy Installation

We need to install the package **tweepy**:

    pip install tweepy
    
We can find the documentation of the Library:

    http://tweepy.readthedocs.io/
    
## OAuth

First of all we need to save our credentials in variables. After that we can login on twitter and start use the APIs.


In [3]:
import tweepy
import pprint
pp = pprint.PrettyPrinter(indent=2)

CONSUMER_KEY = "yJJLtWLC1mFsCGqhHAGoOHvfx"
SECRET_KEY = "A8hqqyon6PhFX3KAIcHBN4VODdJXrp3FMLTeQvtHj2rQiStT7x"
ACCESS_TOKEN = "727548304633004032-JJTPdv7pFCBOGgGFj9eaYC9Lj7LEhYa"
SECRET_ACCESS_TOKEN = "LaG41nkJbkK0XifOFC6GLo3G7LTcFT7w3ZjH6qdTubCBV"

In [5]:
# Twitter Authentication
auth = tweepy.OAuthHandler(CONSUMER_KEY, SECRET_KEY)
auth.set_access_token(ACCESS_TOKEN, SECRET_ACCESS_TOKEN)

In [6]:
auth

<tweepy.auth.OAuthHandler at 0x7f45dec74690>

In [7]:
# Create the connection to the api
api = tweepy.API(auth)
print api


<tweepy.api.API object at 0x7f45deb48110>


In [None]:
help(api)

In [None]:
api.rate_limit_status() # quante chiamate posso fare per ogni api

## Tweet Stream

In [10]:
# download your home timeline tweets
my_tweets = api.home_timeline()

In [11]:
print "Tweets LEN: ", len(my_tweets), "\n" # sempre 20

Tweets LEN:  20 



In [13]:
my_tweets[0]._json

{u'contributors': None,
 u'coordinates': None,
 u'created_at': u'Fri Nov 18 14:04:47 +0000 2016',
 u'entities': {u'hashtags': [],
  u'media': [{u'display_url': u'pic.twitter.com/RMpwiZzuQi',
    u'expanded_url': u'http://twitter.com/nypl/status/799614331570647040/photo/1',
    u'id': 799614329125539840,
    u'id_str': u'799614329125539840',
    u'indices': [110, 133],
    u'media_url': u'http://pbs.twimg.com/media/CxjM_uzXEAA2Mv7.jpg',
    u'media_url_https': u'https://pbs.twimg.com/media/CxjM_uzXEAA2Mv7.jpg',
    u'sizes': {u'large': {u'h': 489, u'resize': u'fit', u'w': 760},
     u'medium': {u'h': 489, u'resize': u'fit', u'w': 760},
     u'small': {u'h': 438, u'resize': u'fit', u'w': 680},
     u'thumb': {u'h': 150, u'resize': u'crop', u'w': 150}},
    u'type': u'photo',
    u'url': u'https://t.co/RMpwiZzuQi'}],
  u'symbols': [],
  u'urls': [{u'display_url': u'on.nypl.org/2f5lhHx',
    u'expanded_url': u'http://on.nypl.org/2f5lhHx',
    u'indices': [86, 109],
    u'url': u'https://t.

In [12]:
# Dir Command on Tweet
print "TWEET DIR: ", dir(my_tweets[0]), "\n"
print help(my_tweets[0])

TWEET DIR:  ['__class__', '__delattr__', '__dict__', '__doc__', '__eq__', '__format__', '__getattribute__', '__getstate__', '__hash__', '__init__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_api', '_json', 'author', 'contributors', 'coordinates', 'created_at', 'destroy', 'entities', 'extended_entities', 'favorite', 'favorite_count', 'favorited', 'geo', 'id', 'id_str', 'in_reply_to_screen_name', 'in_reply_to_status_id', 'in_reply_to_status_id_str', 'in_reply_to_user_id', 'in_reply_to_user_id_str', 'is_quote_status', 'lang', 'parse', 'parse_list', 'place', 'possibly_sensitive', 'possibly_sensitive_appealable', 'retweet', 'retweet_count', 'retweeted', 'retweets', 'source', 'source_url', 'text', 'truncated', 'user'] 

Help on Status in module tweepy.models object:

class Status(Model)
 |  Method resolution order:
 |      Status
 |      Model
 |      __builtin__.object
 |  
 |  Me

In [17]:
# USER of first Tweet
# print my_tweets[0].user
print my_tweets[0].user.screen_name

nypl


In [None]:
# First 3 tweets
for index, tw in enumerate(my_tweets):
    if index < 3:
        print tw.text, "\n"

## My Followers

In [18]:
## fetch follewer lists
my_followers = api.followers()
print "My_Followers LEN: ", len(my_followers)

My_Followers LEN:  2


In [20]:
my_followers_ids = api.followers_ids()
print "My_Followers IDS: ", my_followers_ids

My_Followers IDS:  [2576462520, 19590445]


In [None]:
my_followers[0]

In [None]:
print dir(my_followers[0])

In [None]:
print help(my_followers[0])

In [None]:
pp.pprint(my_followers[0]._json)

# Get External User

In [None]:
intesa = api.get_user("intesasanpaolo") # cerca un utente
intesa

In [171]:
help(intesa)

Help on User in module tweepy.models object:

class User(Model)
 |  Method resolution order:
 |      User
 |      Model
 |      __builtin__.object
 |  
 |  Methods defined here:
 |  
 |  follow(self)
 |  
 |  followers(self, **kargs)
 |  
 |  followers_ids(self, *args, **kargs)
 |  
 |  friends(self, **kargs)
 |  
 |  lists(self, *args, **kargs)
 |  
 |  lists_memberships(self, *args, **kargs)
 |  
 |  lists_subscriptions(self, *args, **kargs)
 |  
 |  timeline(self, **kargs)
 |  
 |  unfollow(self)
 |  
 |  ----------------------------------------------------------------------
 |  Class methods defined here:
 |  
 |  parse(cls, api, json) from __builtin__.type
 |  
 |  parse_list(cls, api, json_list) from __builtin__.type
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from Model:
 |  
 |  __getstate__(self)
 |  
 |  __init__(self, api=None)
 |  
 |  __repr__(self)
 |  
 |  ----------------------------------------------------------

In [23]:
print intesa.followers_count

4240


In [24]:
friends = api.friends_ids('intesasanpaolo')
print len(friends)

179


In [25]:
intesa.friends_count

179

In [27]:
likes = api.favorites('intesasanpaolo')
print len(likes)

20


In [26]:
likes[0]

Status(contributors=None, truncated=False, text=u'Nasce oggi PowerU Digital!  Grazie a @intesasanpaolo e @DeloitteItalia per credere con noi nel progetto @HumanAgeInsIT @ManpowerGroupIT', is_quote_status=False, in_reply_to_status_id=None, id=798107981736906753, favorite_count=4, _api=<tweepy.api.API object at 0x102cdf190>, author=User(follow_request_sent=False, has_extended_profile=False, profile_use_background_image=False, _json={u'follow_request_sent': False, u'has_extended_profile': False, u'profile_use_background_image': False, u'default_profile_image': False, u'id': 1166565846, u'profile_background_image_url_https': u'https://abs.twimg.com/images/themes/theme1/bg.png', u'verified': False, u'translator_type': u'none', u'profile_text_color': u'333333', u'profile_image_url_https': u'https://pbs.twimg.com/profile_images/761270969730301952/X8-H-Adj_normal.jpg', u'profile_sidebar_fill_color': u'DDEEF6', u'entities': {u'url': {u'urls': [{u'url': u'https://t.co/HIf5VJBnyO', u'indices': [0

In [None]:
intesa_followers_count =  intesa.followers_ids()
print len(intesa_followers_count)

In [None]:
print intesa_followers_count[0]

In [None]:
api.get_user(intesa_followers_count[0])

# Cursor

In [None]:
print len(intesa_followers_count)

In [33]:
intesa_followers = intesa.followers()
print len(intesa_followers)

20


In [None]:
help(tweepy.Cursor)

In [34]:
intesa_cursor = tweepy.Cursor(api.followers, screen_name='intesasanpaolo')

In [None]:
print dir(intesa_cursor)

In [None]:
print intesa_cursor.items() # generatore, restituisce un solo elem per chiamata, con .next() scorre i risultati

In [None]:
for follower in intesa_cursor.items():
    print follower

In [35]:
intesa_cursor.pages().next() # lista di 20 elementi

[User(follow_request_sent=False, has_extended_profile=False, profile_use_background_image=True, profile_sidebar_fill_color=u'DDEEF6', live_following=False, time_zone=None, id=799610634899980291, description=u'', _api=<tweepy.api.API object at 0x7f45deb48110>, verified=False, blocked_by=False, profile_text_color=u'333333', muting=False, profile_image_url_https=u'https://abs.twimg.com/sticky/default_profile_images/default_profile_3_normal.png', _json={u'follow_request_sent': False, u'has_extended_profile': False, u'profile_use_background_image': True, u'live_following': False, u'default_profile_image': True, u'id': 799610634899980291, u'profile_background_image_url_https': None, u'translator_type': u'none', u'verified': False, u'blocked_by': False, u'profile_text_color': u'333333', u'muting': False, u'profile_image_url_https': u'https://abs.twimg.com/sticky/default_profile_images/default_profile_3_normal.png', u'profile_sidebar_fill_color': u'DDEEF6', u'entities': {u'description': {u'url

In [36]:
# in realtà get_user sulla lista di ids permette di fare piu' chiamate prima dell'interruzione di 15 min. 
mylist = []
for follower in intesa_cursor.pages(): 
    print mylist.extend(follower) # appende la lista a quelli gia' estratti
    break

None


In [37]:
len(mylist)

20

In [38]:
mylist[0]._json

{u'blocked_by': False,
 u'blocking': False,
 u'contributors_enabled': False,
 u'created_at': u'Wed May 05 21:47:13 +0000 2010',
 u'default_profile': False,
 u'default_profile_image': False,
 u'description': u'The official channel for Cisco Financial Services Industry news, updates and events.',
 u'entities': {u'description': {u'urls': []},
  u'url': {u'urls': [{u'display_url': u'cisco.com/web/strategy/f\u2026',
     u'expanded_url': u'http://cisco.com/web/strategy/financial/index.html',
     u'indices': [0, 22],
     u'url': u'http://t.co/3rQdG7xBjY'}]}},
 u'favourites_count': 690,
 u'follow_request_sent': False,
 u'followers_count': 1557,
 u'following': False,
 u'friends_count': 780,
 u'geo_enabled': True,
 u'has_extended_profile': False,
 u'id': 140578968,
 u'id_str': u'140578968',
 u'is_translation_enabled': False,
 u'is_translator': False,
 u'lang': u'en',
 u'listed_count': 164,
 u'live_following': False,
 u'location': u'Global via Cisco TelePresence',
 u'muting': False,
 u'name': 

In [39]:
for i, f in enumerate(mylist):
    print i, f.statuses_count

0 3567
1 1029
2 523
3 6161
4 100
5 0
6 32
7 0
8 303
9 978
10 290
11 0
12 909
13 3
14 5071
15 42
16 3045
17 235
18 231
19 6900


In [45]:
# funzione che gestisce l'interruzione di 15 min al raggiungimento del limit rate
import time
def limit_handler(cursor):
    while True:
        try:
            yield cursor.next()
        except tweepy.RateLimitError:
            print "15 sec Sleep..."
            time.sleep(15)
            print "Restart"

In [46]:
intesa_followers = []

for page in limit_handler(intesa_cursor.pages()):
    intesa_followers.extend(page)

15 sec Sleep...
Restart
15 sec Sleep...


KeyboardInterrupt: 

# Get Hashtags


In [None]:
tweets = []
for tweet in tweepy.Cursor(api.search, q='#trump').items(5):
    print tweet.text
    tweets.append(tweet)
print "\n-----\n"
print tweets[0]

# Avoid Rate Limit Exception

In [None]:
import time
def limit_handler(cursor):
    while True:
        try:
            yield cursor.next()
        except tweepy.RateLimitError:
            print "Timeout Reached, I'm going to sleep for 15 Minutes"
            time.sleep(15*60)
            print "I'm going to try again!"

In [None]:
alexcomu_cursor = tweepy.Cursor(api.followers, screen_name='comualex')

alexcomu_followers = []
for followers in limit_handler(alexcomu_cursor.pages()):
    alexcomu_followers.extend(followers)
    

In [None]:
len(alexcomu_followers)

# Live Streaming

Check the complete example on the folder **esercitazione**.

In [47]:
class BDStreamingListener(tweepy.StreamListener):
    def __init__(self, count):
        super(BDStreamingListener, self).__init__()
        # Number of tweets we want to retrieve
        self.count = count

    def on_status(self, status):
        # automatic called when a new tweet is received
        # print dir(status)
        print dict(user=status.user.screen_name, text=status.text)

        self.count -= 1
        if self.count <= 0:
            return False

    def on_error(self, status_code):
        # automatic called when an error occures
        print "Error with status code: ", status_code
        return False

In [48]:
# Create an instand set the number of tweets we want ro retrieve
listener = BDStreamingListener(50)

# Create the stream fetching object with auth and listener
stream = tweepy.streaming.Stream(auth, listener)

# Tun the stream using filter
stream.filter(track=['#Trump'])


{'text': u'RT @ROCKONDUDE2: https://t.co/SLf3deENPy @HillaryClinton &amp; her thugs, may of gotten away with rigging the primaries, but not the gen\u2026 ', 'user': u's_chelf'}
{'text': u"RT @RealDLHughley: #KanyeWest says he didn't vote but if he did he woulda voted for #Trump, ain't this the cat that thought #GeorgeBush did\u2026", 'user': u'GarrieF'}
{'text': u'RT @JoanneFralin: #MAGA #Trump #PresidentElectTrump https://t.co/HLmAjl2UMs', 'user': u'woitekj'}
{'text': u'RT @Ryan9Caldwell: What happened to all the good shows? #WINan #tentoeschallenge #tentoesdownchallenge #trump #clinton @famouslos32 https:/\u2026', 'user': u'Randomshit48'}
{'text': u'RT @tagesschau: Wichtige Posten besetzt: Team Trump nimmt Form an https://t.co/2wbKBkAhGj #Trump', 'user': u'wehrs18'}
{'text': u'Could Trump be the catalyst for an all-American iPhone? | ZDNet #Trump  https://t.co/iz3ssWVzXz', 'user': u'danielhalseth'}
{'text': u'RT @ScottAdamsSays: When the press says no one expected Trump to win, they 

# Get INTESA Followers -- Version 1

In [169]:
# Ask for Followers using Cursor (20 followers per page, with a limit of 15 requests each 15 minutes) ~ 3 Hours
class IntesaFollowers(object):
    
    def __init__(self, auth):
        self.auth = auth
        self.api = tweepy.API(self.auth)
        self.intesa_cursor = tweepy.Cursor(self.api.followers, screen_name='intesasanpaolo')

    def get_followers(self):
        while True:
            try:
                yield self.intesa_cursor.pages().next()
            except tweepy.RateLimitError:
                print "[LOG %s] Timeout reached.. I'm going to sleep for 15 minutes.." % dt.now()
                time.sleep(15*60)
                print "[LOG %s] Try Again!" % dt.now()
            except Exception as e:
                # Generic Exception
                print "[LOG %s] Generic error " % dt.now(), e
                print "[LOG %s] Wait 60 seconds..." % dt.now()
                time.sleep(60)

In [None]:
intesa = IntesaFollowers(auth)
intesa_followers = []
for followers in intesa.get_followers():
    intesa_followers.extend(followers)

# Get INTESA Followers -- Version 2 (Faster)

In [49]:
# Ask for Followers_ids and ask data for each user -> Much Much Faster!  ~ 1.5 Hours

import time


class IntesaFollowers(object):

    def __init__(self, auth):
        self.auth = auth
        self.api = tweepy.API(self.auth)
        self.intesa = self.api.get_user('intesasanpaolo')

    def get_followers(self):
        for follower_id in self.intesa.followers_ids():
            try:
                yield self.api.get_user(follower_id)
            except tweepy.RateLimitError:
                print "[LOG %s] Timeout reached.. I'm going to sleep for 15 minutes.." % dt.now()
                time.sleep(15*60)
                print "[LOG %s] Try Again!" % dt.now() # datetime stampa un log
            except Exception as e:
                # Generic Exception
                print "[LOG %s] Generic error " % dt.now(), e
                print "[LOG %s] Wait 60 seconds..." % dt.now()
                time.sleep(60)

In [51]:
intesa = IntesaFollowers(auth)
intesa_followers = []
counter = 0
for follower in intesa.get_followers():
    print "Working...", counter
    counter +=1    
    intesa_followers.append(follower)

Working... 0
Working... 1
Working... 2
Working... 3
Working... 4
Working... 5
Working... 6
Working... 7
Working... 8
Working... 9
Working... 10
Working... 11
Working... 12
Working... 13
Working... 14
Working... 15
Working... 16
Working... 17
Working... 18
Working... 19
Working... 20
Working... 21
Working... 22
Working... 23
Working... 24
Working... 25
Working... 26
Working... 27
Working... 28
Working... 29
Working... 30
Working... 31
Working... 32
Working... 33
Working... 34
Working... 35
Working... 36
Working... 37
Working... 38
Working... 39
Working... 40
Working... 41
Working... 42
Working... 43
Working... 44
Working... 45
Working... 46
Working... 47
Working... 48
Working... 49
Working... 50
Working... 51
Working... 52
Working... 53
Working... 54
Working... 55
Working... 56
Working... 57
Working... 58
Working... 59
Working... 60
Working... 61
Working... 62
Working... 63
Working... 64
Working... 65
Working... 66
Working... 67
Working... 68
Working... 69
Working... 70
Working... 71
Wo

NameError: global name 'dt' is not defined

# Get INTESA tweets

In [7]:
class IntesaTweets(object):
    
    def __init__(self, auth):
        self.auth = auth
        self.api = tweepy.API(self.auth)
        self.intesa_cursor = tweepy.Cursor(self.api.user_timeline, screen_name='intesasanpaolo')

    def get_tweets(self):
        while True:
            try:
                yield self.intesa_cursor.pages().next()
            except tweepy.RateLimitError:
                print "[LOG %s] Timeout reached.. I'm going to sleep for 15 minutes.." % dt.now()
                time.sleep(15*60)
                print "[LOG %s] Try Again!" % dt.now()
            except Exception as e:
                # Generic Exception
                print "[LOG %s] Generic error " % dt.now(), e
                print "[LOG %s] Wait 60 seconds..." % dt.now()
                time.sleep(60)


In [8]:
intesa_timeline = IntesaTweets(auth)
intesa_tweets = []
for tweet in intesa_timeline.get_tweets():
    pp.pprint(tweet[0]._json)
    break

{ u'contributors': None,
  u'coordinates': None,
  u'created_at': u'Mon Nov 14 09:25:09 +0000 2016',
  u'entities': { u'hashtags': [ { u'indices': [0, 13],
                                  u'text': u'FlashMercati'}],
                 u'symbols': [],
                 u'urls': [ { u'display_url': u'twitter.com/i/web/status/7\u2026',
                              u'expanded_url': u'https://twitter.com/i/web/status/798094406406590464',
                              u'indices': [116, 139],
                              u'url': u'https://t.co/AOHmJBzUDk'}],
                 u'user_mentions': []},
  u'favorite_count': 0,
  u'favorited': False,
  u'geo': None,
  u'id': 798094406406590464,
  u'id_str': u'798094406406590464',
  u'in_reply_to_screen_name': None,
  u'in_reply_to_status_id': None,
  u'in_reply_to_status_id_str': None,
  u'in_reply_to_user_id': None,
  u'in_reply_to_user_id_str': None,
  u'is_quote_status': False,
  u'lang': u'it',
  u'place': None,
  u'possibly_sensitive': False,


# GET Intesa Favorites

# GET Intesa Friends

# GET Intesa Data