# 1.0 Introduction



***Copyright:*** *Parts of the contents of this Colab Notebook, unless otherwise indicated, are Copyright 2020 Filippo Menczer, Santo Fortunato and Clayton A. Davis, [A First Course in Network Science](https://github.com/CambridgeUniversityPress/FirstCourseNetworkScience). All rights reserved.* 

***References***: getting started with the Twitter API v2 for academic research [here](https://github.com/twitterdev/getting-started-with-the-twitter-api-v2-for-academic-research). 


# 2.0 Authenticating with Twitter's API



Twitter uses OAuth in order to allow third-party apps to access data on your behalf without requiring your Twitter login credentials -- note that none of the code in this notebook asks for your Twitter screen name or password.

The OAuth "dance" can be intimidating when you first use it, but it provides a far more secure way for software to make requests on your behalf than providing your username and password.

We'll make use of the
[Twython](https://twython.readthedocs.io/en/latest/usage/starting_out.html#authentication)
package to help us with authentication and querying Twitter's APIs.

In [None]:
!pip install Twython

Collecting Twython
  Downloading twython-3.9.1-py3-none-any.whl (33 kB)
Installing collected packages: Twython
Successfully installed Twython-3.9.1


In [None]:
from twython import Twython, TwythonError


## 2.1 Enter app info and get auth URL



In order to authenticate with Twitter, we'll provide the app details and  ask for a one-time authorization URL to authenticate your user with this app.

Copy and paste the API key and secret from your Twitter app into a file named <font color="red">keys.txt</font>. The first line is the API_KEY and the second line of the file is API_SECRET_KEY. For example, a template for the <font color="red">keys.txt</font>: 

```python
df6cf09894907b92f3ea749ef
d19c40cbb184f72055c806f107b5158d023a43eb7d8921a0d0
```

In [None]:
# open the keys file
my_file = open("keys.txt", "r")

# read the raw data
content = my_file.read()

# split all lines by  newline character
API_KEY, API_SECRET_KEY = content.split("\n")

# close the file
my_file.close()

Executing the cell should then print out a clickable URL. This link is unique and will work **exactly** once. <font color="red"> Visit this URL, log into Twitter, and then copy the verifier pin that is given to you so as to paste it in the next step</font>.

In [None]:
twitter = Twython(API_KEY, API_SECRET_KEY)

authentication_tokens = twitter.get_authentication_tokens()
print(authentication_tokens['auth_url'])

https://api.twitter.com/oauth/authenticate?oauth_token=GVT4jAAAAAABZKTtAAABfv15g_4


## 2.2 Authorize app using verifier PIN



That verifier PIN goes into the next cell. This will be different every time you run these steps. The `authentication_tokens` include temporary tokens that go with this verifier PIN; by submitting these together, we show Twitter that we are who we say we are.

In [None]:
# Replace the verifier with the pin number obtained with your web browser in the previous step
VERIFIER = '2887399'

twitter = Twython(API_KEY, API_SECRET_KEY,
                  authentication_tokens['oauth_token'],
                  authentication_tokens['oauth_token_secret'])

authorized_tokens = twitter.get_authorized_tokens(VERIFIER)


## 2.3 Use authorized tokens



Now we have a permanent token pair that we can use to make authenticated calls to the Twitter API. We'll create a new Twython object using these authenticated keys and verify the credentials of the logged-in user.

In [None]:
twitter = Twython(API_KEY, API_SECRET_KEY,
                  authorized_tokens['oauth_token'],
                  authorized_tokens['oauth_token_secret'])

twitter.verify_credentials()

{'contributors_enabled': False,
 'created_at': 'Fri Mar 23 20:52:47 +0000 2012',
 'default_profile': False,
 'default_profile_image': False,
 'description': '...',
 'entities': {'description': {'urls': []}},
 'favourites_count': 2,
 'follow_request_sent': False,
 'followers_count': 4,
 'following': False,
 'friends_count': 32,
 'geo_enabled': False,
 'has_extended_profile': True,
 'id': 534709384,
 'id_str': '534709384',
 'is_translation_enabled': False,
 'is_translator': False,
 'lang': None,
 'listed_count': 0,
 'location': 'Natal - RN',
 'name': 'Kaio Henrique',
 'needs_phone_verification': False,
 'notifications': False,
 'profile_background_color': 'C0DEED',
 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png',
 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png',
 'profile_background_tile': True,
 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/534709384/1474312724',
 'profile_image_url': 'http://pb

If the previous cell ran without error and printed out a dict corresponding to a
[Twitter User](https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/user-object),
then you're good. The authorized token pair is like a username/password and should be protected as such.


# 3.0 Twitter retweet network



One fundamental interaction in the Twitter ecosystem is the "retweet" -- rebroadcasting another user's tweet to your followers. A tweet object returned by the API is a retweet if it includes a `'retweeted_status'`. We're going to fetch tweets matching a hashtag and create a retweet network of the conversation.


## 3.1 Create DiGraph



Each tweet in this list of retweets represents an edge in our network. <font color='red'>We're going to draw these edges in the direction of information flow</font>: from the retweeted user to the retweeter, the user doing the retweeting. Since a user can retweet another user more than once, we want this graph to be weighted, with the number of retweets as the weight.

The edge addition logic here is to increase the edge weight by 1 if the edge exists, or else create the edge with weight 1 if it does not exist.

When writing code such as this that refers multiple times to the same directed edge, make sure to be consistent with the edge direction.

In [None]:
import datetime
import pandas as pd
import networkx as nx
import time

D = nx.DiGraph()
dict_ = {'id_retweet': [], 'retweeted_screen_name': [], 'retweeted_location': [], 'retweeter_screen_name': [], 'retweeter_location' : []}

# Getting today's date
datestamp = datetime.datetime.now().strftime("%Y-%m-%d")


In [None]:
import itertools

NUM_TWEETS_TO_FETCH = 15000

cursor = twitter.cursor(twitter.search, q='#NFL', count=100, result_type='mixed')
search_tweets = []
#search_tweets = list(itertools.islice(cursor, NUM_TWEETS_TO_FETCH))
#len(search_tweets)

In [None]:
for ii in range(2): # loop para coletar dados a cada 16 minutos
    search_tweets.extend(list(itertools.islice(cursor, NUM_TWEETS_TO_FETCH)))
    time.sleep(16 * 60)

print(len(search_tweets))

60000


In [None]:
retweets = []
for tweet in search_tweets:
    if 'retweeted_status' in tweet:
        retweets.append(tweet)
print("filter ", len(retweets))

filter  28585


In [None]:
for retweet in retweets:
    retweeted_status = retweet['retweeted_status']

    retweeted_sn = retweeted_status['user']['screen_name']
    retweetedL = retweeted_status['user']['location']
    retweeter_sn = retweet['user']['screen_name']
    retweeterL = retweet['user']['location']

    dict_['id_retweet'].append(retweet["id"])
    dict_['retweeted_screen_name'].append(retweeted_sn)
    dict_['retweeted_location'].append(retweetedL)
    dict_['retweeter_screen_name'].append(retweeter_sn)
    dict_['retweeter_location'].append(retweeterL)
  
    # Edge direction: retweeted_sn -> retweeter_sn
    if D.has_edge(retweeted_sn, retweeter_sn):
        D.edges[retweeted_sn, retweeter_sn]['weight'] += 1
    else:
        D.add_edge(retweeted_sn, retweeter_sn, weight=1)
    

In [None]:
# criando dataframe de retweets com #NFL 
df = pd.DataFrame(dict_)

df.head()

Unnamed: 0,id_retweet,retweeted_screen_name,retweeted_location,retweeter_screen_name,retweeter_location
0,1493582820241031170,MackTightRadio,Worldwide,SapphireSteamy,🙏🏽 LEVITATED 🙏🏽
1,1493582749919236100,therealBeede,"Orlando, FL",_0wayz,
2,1493582683070472199,DavidMTodd,,jaduke77,"Pittsburgh, PA"
3,1493582582298341382,crypto_prince2,"Las Vegas, NV",AkmazRasim,
4,1493582582264516610,NFLBrasil,Brasil,IgorBSilva81,"Valinhos, Brasil"


In [None]:
# Salvando dados em csv
df.to_csv("NFL-"+datestamp+".csv")

In [None]:
# Salvando network em graphml
nx.write_graphml(D, "NFL-"+datestamp+".graphml")

In [None]:
# Verificando a quantidade de requests restantes
twitter.get_application_rate_limit_status()['resources']['search']

{'/search/tweets': {'limit': 180, 'remaining': 180, 'reset': 1644944119}}

## 6.2 Analyze graph



Now that we have this graph, let's ask some questions about it.



### 6.2.1 Most retweeted user



Since the edges are in the direction of information flow, out-degree gives us the number of other users retweeting a given user. We can get the user with highest out-degree using the built-in `max` function:

In [None]:
max(D.nodes, key=D.out_degree)

'Brother_nfts'

but we can get more context and information from the "top N" users:

In [None]:
from operator import itemgetter

sorted(D.out_degree(), key=itemgetter(1), reverse=True)[:5]

[('Brother_nfts', 1543),
 ('Endzone_Brasil', 1173),
 ('1218Sports', 519),
 ('jollenelevid', 488),
 ('OddsCheckerUS', 482)]

In [None]:
D.out_degree()

OutDegreeView({'CarlaZambelli38': 105, 'CarlosBolzan2': 0, 'RFransceschi': 0, 'JosCarl78233530': 0, 'lssposito': 0, 'ercio_santoss': 0, 'BrunoCr62058963': 0, 'cleide_ita': 0, 'JFH84343564': 0, 'rblondt': 0, 'LeiHigor': 0, 'ovasco71': 0, 'Eduardoegg2': 0, 'sissa155': 0, 'Claudinho_oa': 0, 'HackAlberto': 0, 'danilovsouza1': 0, 'EldriEldri': 0, 'MachadoPrudente': 0, 'belluccis': 0, 'MarcosDiaslogan': 0, 'PedroFe33848000': 0, 'NovaFriburgoRJ': 0, 'Lou_novak': 0, 'ReinaldoLuizCa2': 0, 'maceno_sueli': 0, 'Docilda1': 0, 'soniaalmeidafe': 0, 'JooBati47318744': 0, 'regisrpop': 0, 'PrRobsonAlencar': 0, 'Marcos08905454': 0, 'Washing41753473': 0, 'JuNascimentoGyn': 0, 'EzioDiasdoNasc1': 0, 'RobsonWiller3': 0, 'BrasilPtriaAma3': 0, 'almagnolima': 0, 'soniaTangari': 0, 'UiraitanReis': 0, 'Paiakkan': 0, 'angoneto': 0, 'Salvado89779435': 0, 'dudu_santana05': 0, 'Raimund39337518': 0, 'ClaudomiroSil18': 0, 'Renan1debora': 0, 'Arlindo71123942': 0, 'NellsBhor': 0, 'RicardoLipex': 0, 'WOLF_Lorn': 0, 'Richa

In this piece of code, we take advantage of the fact that `D.out_degree()` returns a sequence of `(name, degree)` 2-tuples; specifying `key=itemgetter(1)` tells the `sorted` function to sort these 2-tuples by their value at index 1. Giving `reverse=True` tells the `sorted` function that we want this in descending order, and the `[:5]` at the end slices the first 5 items from the resulting list.

However, this is a weighted graph! By default, `out_degree()` ignores the edge weights. We can get out-strength by telling the `out_degree()` function to take into account the edge weight:

In [None]:
sorted(D.out_degree(weight='weight'), key=itemgetter(1), reverse=True)[:5]

[('Endzone_Brasil', 3305),
 ('Brother_nfts', 1673),
 ('1218Sports', 1013),
 ('nflextra', 843),
 ('jollenelevid', 765)]

In some cases these two results will be the same, namely if none of these users has been retweeted multiple times by the same user. Depending on your use case, you may or may not wish to take the weights into account.



### 6.2.2 Anomaly detection



One type of social media manipulation involves accounts that create very little original content, instead "spamming" retweets of any and all content in a particular conversation. Can we detect any users doing significantly more retweeting than others? Let's look at the top N retweeters:

In [None]:
sorted(D.in_degree(weight='weight'), key=itemgetter(1), reverse=True)[:5]

[('theffrobot', 153),
 ('topfanscorner', 143),
 ('nflttbr', 111),
 ('touchdownbot', 95),
 ('iglen31', 47)]

### 6.2.3 Connectivity



We can ask if the tweets obtained by the search represent one large conversation or many small conversations; broadly speaking, each weakly-connected component represents a conversation.

In [None]:
nx.is_weakly_connected(D)

False

In [None]:
nx.number_weakly_connected_components(D)

2453

### 6.2.4 Drawing



We can try to draw this graph with the nodes sized by their out-strength:

In [None]:
#node_sizes = [D.out_degree(n, weight='weight') * 50 for n in D.nodes]

%matplotlib inline
#nx.draw(D, node_size=node_sizes)
nx.draw(D)

Note that in this simplistic drawing, nodes with zero out-strength are not drawn on the diagram because their size is 0. This suits us fine; only the users who have been retweeted are drawn here.


Another Twitter interaction between users occurs when one user mentions another in a tweet by their @screen_name. As an example, consider the following hypothetical tweet from @osome_iu:

> Check out the new research from @IUSICE and @USC_ISI https://...

From this tweet we would create two edges:

    ('osome_iu', 'IUSICE')
    ('osome_iu', 'USC_ISI')

It's up to us which direction we draw these edges, but we should be consistent. In this example, we will draw edges in the direction of attention flow: @osome_iu is giving attention to @IUSICE and @USC_ISI.
