# Twitter data

## Copyright and Licensing

You are free to use or adapt this notebook for any purpose you'd like. However, please respect the [Simplified BSD License](https://github.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition/blob/master/LICENSE.txt) that governs its use.

# Twitter API Access

Twitter implements OAuth 1.0A as its standard authentication mechanism, and in order to use it to make requests to Twitter's API, you'll need to go to https://dev.twitter.com/apps and create a sample application.

Choose any name for your application, write a description and use `http://google.com` for the website.

Under **Key and Access Tokens**, there are four primary identifiers you'll need to note for an OAuth 1.0A workflow: 
* consumer key, 
* consumer secret, 
* access token, and 
* access token secret (Click on Create Access Token to create those).

Note that you will need an ordinary Twitter account in order to login, create an app, and get these credentials.

The first time you execute the notebook, add all credentials so that you can save them in the `pkl` file, then you can remove the secret keys from the notebook because they will just be loaded from the `pkl` file.

The `pkl` file contains sensitive information that can be used to take control of your twitter acccount, **do not share it**.

In [2]:
import pickle
import os

In [3]:
if not os.path.exists('secret_twitter_credentials.pkl'):
    Twitter={}
    Twitter['Consumer Key'] = ''
    Twitter['Consumer Secret'] = ''
    Twitter['Access Token'] = ''
    Twitter['Access Token Secret'] = ''
    with open('secret_twitter_credentials.pkl','wb') as f:
        pickle.dump(Twitter, f)
else:
    Twitter=pickle.load(open('secret_twitter_credentials.pkl','rb'))

Install the `twitter` package to interface with the Twitter API

In [4]:
!pip install twitter

Collecting twitter
[?25l  Downloading https://files.pythonhosted.org/packages/85/e2/f602e3f584503f03e0389491b251464f8ecfe2596ac86e6b9068fe7419d3/twitter-1.18.0-py2.py3-none-any.whl (54kB)
[K    100% |████████████████████████████████| 61kB 703kB/s ta 0:00:01
[?25hInstalling collected packages: twitter
Successfully installed twitter-1.18.0


## Example 1. Authorizing an application to access Twitter account data

In [5]:
import twitter

auth = twitter.oauth.OAuth(Twitter['Access Token'],
                           Twitter['Access Token Secret'],
                           Twitter['Consumer Key'],
                           Twitter['Consumer Secret'])

twitter_api = twitter.Twitter(auth=auth)

# Nothing to see by displaying twitter_api except that it's now a
# defined variable

print(twitter_api)

<twitter.api.Twitter object at 0x107c0ca58>


## Example 2. Retrieving trends

Twitter identifies locations using the Yahoo! Where On Earth ID.

The Yahoo! Where On Earth ID for the entire world is 1.
See https://dev.twitter.com/docs/api/1.1/get/trends/place and
http://developer.yahoo.com/geo/geoplanet/

look at the BOSS placefinder here: https://developer.yahoo.com/boss/placefinder/

In [6]:
WORLD_WOE_ID = 1
US_WOE_ID = 23424977

Look for the WOEID for [san-diego](http://woeid.rosselliot.co.nz/lookup/san%20diego%20%20ca)

You can change it to another location.

In [10]:
LOCAL_WOE_ID=718345

# Prefix ID with the underscore for query string parameterization.
# Without the underscore, the twitter package appends the ID value
# to the URL itself as a special case keyword argument.

world_trends = twitter_api.trends.place(_id=WORLD_WOE_ID)
us_trends = twitter_api.trends.place(_id=US_WOE_ID)
local_trends = twitter_api.trends.place(_id=LOCAL_WOE_ID)

In [11]:
world_trends[:2]

[{'as_of': '2019-01-12T21:12:43Z',
  'created_at': '2019-01-12T21:07:27Z',
  'locations': [{'name': 'Worldwide', 'woeid': 1}],
  'trends': [{'name': '#NXTUKTakeOver',
    'promoted_content': None,
    'query': '%23NXTUKTakeOver',
    'tweet_volume': 46351,
    'url': 'http://twitter.com/search?q=%23NXTUKTakeOver'},
   {'name': '#DestinationEurovision',
    'promoted_content': None,
    'query': '%23DestinationEurovision',
    'tweet_volume': 19711,
    'url': 'http://twitter.com/search?q=%23DestinationEurovision'},
   {'name': '#cepostaperte',
    'promoted_content': None,
    'query': '%23cepostaperte',
    'tweet_volume': 10506,
    'url': 'http://twitter.com/search?q=%23cepostaperte'},
   {'name': '#صندوق_السبع_Momax',
    'promoted_content': None,
    'query': '%23%D8%B5%D9%86%D8%AF%D9%88%D9%82_%D8%A7%D9%84%D8%B3%D8%A8%D8%B9_Momax',
    'tweet_volume': 79979,
    'url': 'http://twitter.com/search?q=%23%D8%B5%D9%86%D8%AF%D9%88%D9%82_%D8%A7%D9%84%D8%B3%D8%A8%D8%B9_Momax'},
   {'name'

In [12]:
trends=local_trends
print(type(trends))
print(list(trends[0].keys()))
print(trends[0]['trends'])

<class 'twitter.api.TwitterListResponse'>
['trends', 'as_of', 'created_at', 'locations']
[{'name': '#cepostaperte', 'url': 'http://twitter.com/search?q=%23cepostaperte', 'promoted_content': None, 'query': '%23cepostaperte', 'tweet_volume': 10506}, {'name': '#BolognaJuve', 'url': 'http://twitter.com/search?q=%23BolognaJuve', 'promoted_content': None, 'query': '%23BolognaJuve', 'tweet_volume': 10575}, {'name': '#SampdoriaMilan', 'url': 'http://twitter.com/search?q=%23SampdoriaMilan', 'promoted_content': None, 'query': '%23SampdoriaMilan', 'tweet_volume': 12078}, {'name': 'Ricky Martin', 'url': 'http://twitter.com/search?q=%22Ricky+Martin%22', 'promoted_content': None, 'query': '%22Ricky+Martin%22', 'tweet_volume': None}, {'name': '#Cutrone', 'url': 'http://twitter.com/search?q=%23Cutrone', 'promoted_content': None, 'query': '%23Cutrone', 'tweet_volume': None}, {'name': '#SampMilan', 'url': 'http://twitter.com/search?q=%23SampMilan', 'promoted_content': None, 'query': '%23SampMilan', 'twe

## Example 3. Displaying API responses as pretty-printed JSON

In [13]:
import json

print((json.dumps(us_trends[:2], indent=1)))

[
 {
  "trends": [
   {
    "name": "#NXTUKTakeOver",
    "url": "http://twitter.com/search?q=%23NXTUKTakeOver",
    "promoted_content": null,
    "query": "%23NXTUKTakeOver",
    "tweet_volume": 46351
   },
   {
    "name": "Duke",
    "url": "http://twitter.com/search?q=Duke",
    "promoted_content": null,
    "query": "Duke",
    "tweet_volume": 36194
   },
   {
    "name": "Cam Reddish",
    "url": "http://twitter.com/search?q=%22Cam+Reddish%22",
    "promoted_content": null,
    "query": "%22Cam+Reddish%22",
    "tweet_volume": null
   },
   {
    "name": "Dick Vitale",
    "url": "http://twitter.com/search?q=%22Dick+Vitale%22",
    "promoted_content": null,
    "query": "%22Dick+Vitale%22",
    "tweet_volume": null
   },
   {
    "name": "Chiefs",
    "url": "http://twitter.com/search?q=Chiefs",
    "promoted_content": null,
    "query": "Chiefs",
    "tweet_volume": 85860
   },
   {
    "name": "#INDvsKC",
    "url": "http://twitter.com/search?q=%23INDvsKC",
    "promoted_conten

## Example 4. Computing the intersection of two sets of trends

In [14]:
trends_set = {}
trends_set['world'] = set([trend['name'] 
                        for trend in world_trends[0]['trends']])

trends_set['us'] = set([trend['name'] 
                     for trend in us_trends[0]['trends']]) 

trends_set['san diego'] = set([trend['name'] 
                     for trend in local_trends[0]['trends']]) 

In [15]:
for loc in ['world','us','san diego']:
    print(('-'*10,loc))
    print((','.join(trends_set[loc])))

('----------', 'world')
#ChiefsKingdom,#EnBüyükBelaSevgisizlik,Cam Reddish,#Bets10PokerFreerollBinTL,#ukwomenstitle,#CoppaItalia,#JinGaveMeWings,#السعوديه_لبنان,#INDvsKC,David Luiz,Toni Storm,Cutrone,#نتايج_الثانويه_العامه,#HalkEytDiyor,Willian,#LetsRoll,#widm,#ÇekeVarsaTümBorçlaraHapis,Rae Sremmurd,#صندوق_السبع_Momax,#Its2019andWeStillCant,#ToniTime,Julian Castro,Bilal,#TeslaTRTBelgeselde,#SamsunOrjinCafedeSu8TL,Dick Vitale,#1YearOfCamila,#DestinationEurovision,#EstoyConGuaido,Finn Balor,Duke,#بالوقت_هذا_احتاج,#SampdoriaMilan,#cepostaperte,Pete Dunne,#NXTUKTagTitles,RJ Barrett,Florida State,#BabyOneMoreTime20,#40BinÖğrtAtamasıHaktır,#NXTTakeOverBlackpool,#BolognaJuve,Newcastle,#L6Nacuerdos,#OGCNFCGB,#NXTUKTakeOver,Chimène Badi,#CHENEW,#moltalk
('----------', 'us')
#PokemonGOCommunityDay,Eddie Dennis,Roy Williams,#Julian2020,Cam Reddish,NXT UK,Chiefs,#SaturdayMorning,Dickie V,Tyler Bate,#INDvsKC,Toni Storm,Phil Cofer,Toni Time,#RussianAsset,#snowpocalypse2019,Willian,Dave Mastiff,#snow

In [16]:
print(( '='*10,'intersection of world and us'))
print((trends_set['world'].intersection(trends_set['us'])))

print(('='*10,'intersection of us and san-diego'))
print((trends_set['san diego'].intersection(trends_set['us'])))

{'Pete Dunne', '#NXTUKTagTitles', 'Willian', 'RJ Barrett', 'Florida State', 'Cam Reddish', '#BabyOneMoreTime20', '#NXTTakeOverBlackpool', 'Rae Sremmurd', 'Dick Vitale', '#Its2019andWeStillCant', '#NXTUKTakeOver', 'Finn Balor', '#CHENEW', '#INDvsKC', '#ToniTime', 'Toni Storm', 'Julian Castro', 'Duke'}
set()


## Example 5. Collecting search results

In [17]:
q = '#MTVAwards' 

number = 100

# See https://dev.twitter.com/docs/api/1.1/get/search/tweets

search_results = twitter_api.search.tweets(q=q, count=number)

statuses = search_results['statuses']

In [18]:
len(statuses)
print(statuses)

[{'created_at': 'Sat Jan 12 16:33:52 +0000 2019', 'id': 1084126323327549440, 'id_str': '1084126323327549440', 'text': 'RT @MTV: Look!!! At!!! These!!! Cuties!!! #MTVAwards https://t.co/y8lBEgD8V7', 'truncated': False, 'entities': {'hashtags': [{'text': 'MTVAwards', 'indices': [42, 52]}], 'symbols': [], 'user_mentions': [{'screen_name': 'MTV', 'name': 'MTV', 'id': 2367911, 'id_str': '2367911', 'indices': [3, 7]}], 'urls': [], 'media': [{'id': 1008896054736052224, 'id_str': '1008896054736052224', 'indices': [53, 76], 'media_url': 'http://pbs.twimg.com/media/DgBRnc3V4AAPkpU.jpg', 'media_url_https': 'https://pbs.twimg.com/media/DgBRnc3V4AAPkpU.jpg', 'url': 'https://t.co/y8lBEgD8V7', 'display_url': 'pic.twitter.com/y8lBEgD8V7', 'expanded_url': 'https://twitter.com/MTV/status/1008896124260716544/photo/1', 'type': 'photo', 'sizes': {'small': {'w': 680, 'h': 536, 'resize': 'fit'}, 'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'large': {'w': 2048, 'h': 1613, 'resize': 'fit'}, 'medium': {'w':

In [19]:
all_text = []
filtered_statuses = []
for s in statuses:
    if not s["text"] in all_text:
        filtered_statuses.append(s)
        all_text.append(s["text"])
statuses = filtered_statuses     

In [20]:
len(statuses)

65

In [21]:
[s['text'] for s in search_results['statuses']]

['RT @MTV: Look!!! At!!! These!!! Cuties!!! #MTVAwards https://t.co/y8lBEgD8V7',
 'RT @MTV: My bb Katherine Langford just wanted to say hi from backstage at the #MTVAwards 💕 https://t.co/FnEqEGEClX',
 'Who is #RIGHTJUSTxREBEL ??? @rightjustgcoded  x @rebelpoc  \n🌐God•Goals•Grind🌐\nSubscribe/like/share/donate\nLink in b… https://t.co/i3l2EDWUS0',
 'RT @glossdaya: chloe and halle are angels beyoncé really did bless them #mtvawards  https://t.co/y2fDHMj3rs',
 'RT @PeytonList: Post rain #MTVAwards https://t.co/UnoizJRx8D',
 '#Andre3000 @1future \n🏃 |Culture Guardians |  link in bio \nSubscribe/like/share/\n #applemusic #spotify #tidal… https://t.co/KSWUfuly78',
 'RT @jaedenlieberher: Catch us on the #MTVAwards tonight! https://t.co/5mBLkKT11x',
 '🏃 |Culture Guardians |  link in bio \nSubscribe/like/share/\n #applemusic #spotify #tidal  #complex #xxlfreshmen… https://t.co/peZRYaOQuo',
 'RT @EmmaWatson: Thank you @MTV for a wonderful evening and thank you to everyone who voted for me! ❤️🍿 #M

In [22]:
# Show one sample search result by slicing the list...
print(json.dumps(statuses[0], indent=1))

{
 "created_at": "Sat Jan 12 16:33:52 +0000 2019",
 "id": 1084126323327549440,
 "id_str": "1084126323327549440",
 "text": "RT @MTV: Look!!! At!!! These!!! Cuties!!! #MTVAwards https://t.co/y8lBEgD8V7",
 "truncated": false,
 "entities": {
  "hashtags": [
   {
    "text": "MTVAwards",
    "indices": [
     42,
     52
    ]
   }
  ],
  "symbols": [],
  "user_mentions": [
   {
    "screen_name": "MTV",
    "name": "MTV",
    "id": 2367911,
    "id_str": "2367911",
    "indices": [
     3,
     7
    ]
   }
  ],
  "urls": [],
  "media": [
   {
    "id": 1008896054736052224,
    "id_str": "1008896054736052224",
    "indices": [
     53,
     76
    ],
    "media_url": "http://pbs.twimg.com/media/DgBRnc3V4AAPkpU.jpg",
    "media_url_https": "https://pbs.twimg.com/media/DgBRnc3V4AAPkpU.jpg",
    "url": "https://t.co/y8lBEgD8V7",
    "display_url": "pic.twitter.com/y8lBEgD8V7",
    "expanded_url": "https://twitter.com/MTV/status/1008896124260716544/photo/1",
    "type": "photo",
    "sizes": {

In [23]:
# The result of the list comprehension is a list with only one element that
# can be accessed by its index and set to the variable t
t = statuses[0]
#[ status for status in statuses 
#          if status['id'] == 316948241264549888 ][0]

# Explore the variable t to get familiarized with the data structure...

print(t['retweet_count'])
print(t['retweeted'])



1863
False


## Example 6. Extracting text, screen names, and hashtags from tweets

In [24]:
status_texts = [ status['text'] 
                 for status in statuses ]

screen_names = [ user_mention['screen_name'] 
                 for status in statuses
                     for user_mention in status['entities']['user_mentions'] ]

hashtags = [ hashtag['text'] 
             for status in statuses
                 for hashtag in status['entities']['hashtags'] ]

# Compute a collection of all words from all tweets
words = [ w 
          for t in status_texts 
              for w in t.split() ]

In [25]:
# Explore the first 5 items for each...

print(json.dumps(status_texts[0:5], indent=1))
print(json.dumps(screen_names[0:5], indent=1)) 
print(json.dumps(hashtags[0:5], indent=1))
print(json.dumps(words[0:5], indent=1))

[
 "RT @MTV: Look!!! At!!! These!!! Cuties!!! #MTVAwards https://t.co/y8lBEgD8V7",
 "RT @MTV: My bb Katherine Langford just wanted to say hi from backstage at the #MTVAwards \ud83d\udc95 https://t.co/FnEqEGEClX",
 "Who is #RIGHTJUSTxREBEL ??? @rightjustgcoded  x @rebelpoc  \n\ud83c\udf10God\u2022Goals\u2022Grind\ud83c\udf10\nSubscribe/like/share/donate\nLink in b\u2026 https://t.co/i3l2EDWUS0",
 "RT @glossdaya: chloe and halle are angels beyonc\u00e9 really did bless them #mtvawards  https://t.co/y2fDHMj3rs",
 "RT @PeytonList: Post rain #MTVAwards https://t.co/UnoizJRx8D"
]
[
 "MTV",
 "MTV",
 "RIGHTJUSTGcoded",
 "REBELPOC",
 "glossdaya"
]
[
 "MTVAwards",
 "MTVAwards",
 "RIGHTJUSTxREBEL",
 "mtvawards",
 "MTVAwards"
]
[
 "RT",
 "@MTV:",
 "Look!!!",
 "At!!!",
 "These!!!"
]


## Example 7. Creating a basic frequency distribution from the words in tweets

In [26]:
from collections import Counter

for item in [words, screen_names, hashtags]:
    c = Counter(item)
    print(c.most_common()[:10]) # top 10
    print()

[('RT', 42), ('the', 29), ('#MTVAwards', 28), ('at', 19), ('to', 15), ('for', 15), ('@MTV:', 12), ('a', 12), ('and', 11), ('in', 10)]

[('MTV', 16), ('MTVAwards', 4), ('chloexhalle', 3), ('ladygaga', 3), ('EmmaWatson', 2), ('NubiaSoulGODdes', 2), ('lindsaylohan', 2), ('iamctrex', 2), ('nfrealmusic', 2), ('beamiller', 2)]

[('MTVAwards', 31), ('mtvawards', 10), ('applemusic', 5), ('spotify', 5), ('FirstTakeBoyz', 5), ('tidal', 4), ('complex', 3), ('soul', 3), ('fashion', 3), ('style', 3)]



## Example 8. Create a prettyprint function to display tuples in a nice tabular format

In [27]:
def prettyprint_counts(label, list_of_tuples):
    print("\n{:^20} | {:^6}".format(label, "Count"))
    print("*"*40)
    for k,v in list_of_tuples:
        print("{:20} | {:>6}".format(k,v))

In [28]:
for label, data in (('Word', words), 
                    ('Screen Name', screen_names), 
                    ('Hashtag', hashtags)):
    
    c = Counter(data)
    prettyprint_counts(label, c.most_common()[:10])


        Word         | Count 
****************************************
RT                   |     42
the                  |     29
#MTVAwards           |     28
at                   |     19
to                   |     15
for                  |     15
@MTV:                |     12
a                    |     12
and                  |     11
in                   |     10

    Screen Name      | Count 
****************************************
MTV                  |     16
MTVAwards            |      4
chloexhalle          |      3
ladygaga             |      3
EmmaWatson           |      2
NubiaSoulGODdes      |      2
lindsaylohan         |      2
iamctrex             |      2
nfrealmusic          |      2
beamiller            |      2

      Hashtag        | Count 
****************************************
MTVAwards            |     31
mtvawards            |     10
applemusic           |      5
spotify              |      5
FirstTakeBoyz        |      5
tidal                |      4
comp

## Example 9. Finding the most popular retweets

In [29]:
retweets = [
            # Store out a tuple of these three values ...
            (status['retweet_count'], 
             status['retweeted_status']['user']['screen_name'],
             status['text'].replace("\n","\\")) 
            
            # ... for each status ...
            for status in statuses 
            
            # ... so long as the status meets this condition.
                if 'retweeted_status' in status
           ]

In [30]:
row_template = "{:^7} | {:^15} | {:50}"
def prettyprint_tweets(list_of_tuples):
    print()
    print(row_template.format("Count", "Screen Name", "Text"))
    print("*"*60)
    for count, screen_name, text in list_of_tuples:
        print(row_template.format(count, screen_name, text[:50]))
        if len(text) > 50:
            print(row_template.format("", "", text[50:100]))
            if len(text) > 100:
                print(row_template.format("", "", text[100:]))

In [31]:
# Slice off the first 5 from the sorted results and display each item in the tuple

prettyprint_tweets(sorted(retweets, reverse=True)[:10])


 Count  |   Screen Name   | Text                                              
************************************************************
 14589  |    getFANDOM    | RT @getFANDOM: Chris Pratt with all of the wisdom 
        |                 | #MTVAwards https://t.co/eu5cXU7WcQ                
 9883   |    ladygaga     | RT @ladygaga: So happy that #GagaFiveFootTwo won B
        |                 | est Music Documentary at the #MTVAwards! Thank u L
        |                 | ittle Monsters &amp; @MTV!! 😘 https://t.co/…      
 8260   |       MTV       | RT @MTV: Let @dylanobrien guide you through a firs
        |                 | t look at Maze Runner: The Death Cure, exclusively
        |                 |  for the #MTVAwards tonight at 8/7c! 💥…           
 6099   |       MTV       | RT @MTV: 1 of 9 rules from @prattprattpratt "Don't
        |                 |  be a 💩" Congrats on receiving the Generation Awar
        |                 | d! #MTVAwards https://t.co/IFn87o8Kuk    