# Twitter data

# Twitter API Access

To make requests to Twitter's API, you'll need to go to https://dev.twitter.com/apps and create a sample application.


Under **Key and Access Tokens**, there are four primary identifiers you'll need to note for an OAuth 1.0A workflow: 
* consumer key, 
* consumer secret, 
* access token, and 
* access token secret (Click on Create Access Token to create those).

Add all credentials so that you can save them in the `pkl` file, then you can remove the secret keys from the notebook because they will just be loaded from the `pkl` file.

The `pkl` file contains sensitive information that can be used to take control of your twitter acccount, **do not share it**.

In [1]:
import pickle
import os

In [2]:
if not os.path.exists('secret_twitter_credentials.pkl'):
    Twitter={}
    Twitter['Consumer Key'] = ''
    Twitter['Consumer Secret'] = ''
    Twitter['Access Token'] = ''
    Twitter['Access Token Secret'] = ''
    with open('secret_twitter_credentials.pkl','wb') as f:
        pickle.dump(Twitter, f)
else:
    Twitter=pickle.load(open('secret_twitter_credentials.pkl','rb'))

Install the `twitter` package to interface with the Twitter API

```python
pip install twitter```

## 1. Authorizing an application to access Twitter account data

In [3]:
import twitter

auth = twitter.oauth.OAuth(Twitter['Access Token'],
                           Twitter['Access Token Secret'],
                           Twitter['Consumer Key'],
                           Twitter['Consumer Secret'])

twitter_api = twitter.Twitter(auth=auth)

# Nothing to see by displaying twitter_api except that it's now a
# defined variable

print(twitter_api)

<twitter.api.Twitter object at 0x0000028A66E7F978>


## 2. Retrieving trends

Twitter identifies locations using the Yahoo! Where On Earth ID

In [4]:
WORLD_WOE_ID = 1
IN_WOE_ID = 23424848
LOCAL_WOE_ID = 2295386 # for my city

Look for the WOEID for [kolkata](http://woeid.rosselliot.co.nz/lookup/kolkata)

You can change it to another location.

In [5]:
# Prefix ID with the underscore for query string parameterization.
# Without the underscore, the twitter package appends the ID value
# to the URL itself as a special case keyword argument.

world_trends = twitter_api.trends.place(_id=WORLD_WOE_ID) # return top 50 trends
in_trends = twitter_api.trends.place(_id=IN_WOE_ID)
local_trends = twitter_api.trends.place(_id=LOCAL_WOE_ID)

In [7]:
# world_trends[:2]

In [8]:
trends=local_trends
print(type(trends))
print('\n\n')

print(list(trends[0].keys()))
print('\n\n')

print(trends[0]['trends'])

<class 'twitter.api.TwitterListResponse'>



['trends', 'as_of', 'created_at', 'locations']



[{'name': '#RadhaKrishn', 'url': 'http://twitter.com/search?q=%23RadhaKrishn', 'promoted_content': None, 'query': '%23RadhaKrishn', 'tweet_volume': None}, {'name': 'Gita Gopinath', 'url': 'http://twitter.com/search?q=%22Gita+Gopinath%22', 'promoted_content': None, 'query': '%22Gita+Gopinath%22', 'tweet_volume': None}, {'name': '#AusOpenInIndia', 'url': 'http://twitter.com/search?q=%23AusOpenInIndia', 'promoted_content': None, 'query': '%23AusOpenInIndia', 'tweet_volume': None}, {'name': '#SarkarTracklist', 'url': 'http://twitter.com/search?q=%23SarkarTracklist', 'promoted_content': None, 'query': '%23SarkarTracklist', 'tweet_volume': 47579}, {'name': '#KORvIND', 'url': 'http://twitter.com/search?q=%23KORvIND', 'promoted_content': None, 'query': '%23KORvIND', 'tweet_volume': None}, {'name': '#NEUGOA', 'url': 'http://twitter.com/search?q=%23NEUGOA', 'promoted_content': None, 'query': '%23NEUGOA

## 3. Displaying API responses as pretty-printed JSON

In [10]:
import json

# print((json.dumps(in_trends[:2], indent=1)))

## 4. Computing the intersection of two sets of trends

In [11]:
trends_set = {}
trends_set['world'] = set([trend['name'] for trend in world_trends[0]['trends']])

trends_set['in'] = set([trend['name'] for trend in in_trends[0]['trends']])

trends_set['kol'] = set([trend['name'] for trend in local_trends[0]['trends']])

In [12]:
for loc in ['world','in','kol']:
    print(('-'*10,loc))
    print((','.join(trends_set[loc])))

('----------', 'world')
#FelizLunes,#姫適性,#حازم_امام,Nilgün Bodur,#BuenLunes,#LaHaya,#MeuRacistaSecreto,#ورينا_تصويرك_بالجوال,#DelikanlıŞanlıYiğit,Heraldo Muñoz,#NCAAInclusion,#MICHELINSTAR19,Sam Gagner,Mastodon,#افضل_عطر_جربته,#USMCA,#ARMYSelcaDay,#AWNewYork,#TheProtector,#inktober,#OTDirecto1OCT,Charles Aznavour,#メガネの日,#GHVIP1O,Connor Carrick,#قريبا_ستمطر,#KceePsycho,#DosisMínima,Carlos Ezquerra,Cuauhtémoc Blanco,#JuventudAClases,#October1st,#あなたのIDの読み方を教えてください,#EXOComingSoon,#حريق_شمال_الرياض,有働さん,#جامعه_الامام,#AptCodingChallenge,#VegasStrong,#NASA60th,#BreastCancerAwarenessMonth,#Game163,#fsradiobrasil,#OutubroRosa,#DíaInternacionalDelCafé,#مهرجان_STC_للاجهزه,#DíaDelPeriodista,Shea Weber,#DünyaÇocukGünü,#MondayMotivaton
('----------', 'in')
#Monster,#Sensex,#BeVegetarianSaysStMSG,#NotaOnFriday,#AusOpenInIndia,#SarkarTracklist,#100PercentKadhal,#NETfromOct12th,Dipika,#Peniviti,#KORvIND,#4DaysToGoForRatsasan,Central Information Commission,#SarkarKondattam,Tata Sky,#MegaIcons,#GandhiA

In [13]:
print(( '='*10,'intersection of world and in'))
print((trends_set['world'].intersection(trends_set['in'])))

print(('='*10,'intersection of in and kol'))
print((trends_set['kol'].intersection(trends_set['in'])))

set()
{'#Sensex', '#Monster', '#BeVegetarianSaysStMSG', '#NotaOnFriday', '#AusOpenInIndia', '#SarkarTracklist', '#100PercentKadhal', '#NETfromOct12th', 'Dipika', '#Peniviti', '#KORvIND', '#4DaysToGoForRatsasan', 'Central Information Commission', '#SarkarKondattam', 'Tata Sky', '#MegaIcons', '#GandhiAt150', '#VivekTiwari', '#NiravModi', 'Nobel Prize', 'Uday Kotak', '#KrishnaRajKapoor', 'motherhood saved baby', '#TanushreeDutta', '#96TheMovie', '#SavyasachiTeaser', '#benche', '#RamNathKovind', '#MondayMotivation', '#RajKapoor', 'Gita Gopinath', 'Andaman Islands', '#NavigateToNevada', '#SivajiGanesan', '#शराफत_गई_तेल_लेने', 'Gautam Navlakha', 'Sharad Pawar', '#KaatrinMozhi', '#NEUGOA', 'jean-claude arnault', '#ShortcutOctober', '#TheVillain', '#RadhaKrishn', 'Maury Obstfeld', '#DipakMisra', '#Andaman', '#PChidambaram'}


## 5. Collecting search results

Setting the variable `q` to a trending topic, 
or anything else for that matter. The example query below
was a trending topic when this content was being developed
and is used throughout the remainder of this chapter

In [15]:
q = '#INDvWI' 

number = 100

# See https://dev.twitter.com/docs/api/1.1/get/search/tweets

search_results = twitter_api.search.tweets(q=q, count=number)

statuses = search_results['statuses']

In [16]:
len(statuses)
# print(statuses)

100

Twitter often returns duplicate results, we just filtering them out checking for duplicate texts:

In [17]:
all_text = []
filtered_statuses = []
for s in statuses:
    if not s["text"] in all_text:
        filtered_statuses.append(s)
        all_text.append(s["text"])
statuses = filtered_statuses     

In [18]:
len(statuses)

71

In [19]:
[s['text'] for s in search_results['statuses']]

['can someone enlighten me what is the credentials of MSK prasad to hold the chairman of selector post? He has got 23… https://t.co/nHPtc2Y66B',
 'Karun Nair should have given chance to play in test and ODIs. We know how many chances are given to Raina &amp; Rohit.… https://t.co/wFENkRRBAA',
 'RT @monicas004: I personally spoke to Karun soon after the selection of the Test Team against WI and also told the ways to make a comeback,…',
 'RT @monicas004: I personally spoke to Karun soon after the selection of the Test Team against WI and also told the ways to make a comeback,…',
 'RT @urvildesai999: Great to see a new talent in squad vs Wi series 🔥🤞🏻\n#Bcci @BCCI #INDvWI',
 "RT @IManishh10: If selectors want Rohit Sharma to score runs in red ball format to earn recall, then why didn't they selected him in A Seri…",
 "RT @StarSportsIndia: Menacing with the ball 🔥\nExceptional with the bat 💪\n\n@ashwinravi99 loves playing against West Indies, doesn't he? \U0001f9d0 #I…",
 'RT @yadavabneeshs

In [20]:
# Show one sample search result by slicing the list
print(json.dumps(statuses[0], indent=1))

{
 "created_at": "Mon Oct 01 17:11:38 +0000 2018",
 "id": 1046809877736304641,
 "id_str": "1046809877736304641",
 "text": "can someone enlighten me what is the credentials of MSK prasad to hold the chairman of selector post? He has got 23\u2026 https://t.co/nHPtc2Y66B",
 "truncated": true,
 "entities": {
  "hashtags": [],
  "symbols": [],
  "user_mentions": [],
  "urls": [
   {
    "url": "https://t.co/nHPtc2Y66B",
    "expanded_url": "https://twitter.com/i/web/status/1046809877736304641",
    "display_url": "twitter.com/i/web/status/1\u2026",
    "indices": [
     117,
     140
    ]
   }
  ]
 },
 "metadata": {
  "iso_language_code": "en",
  "result_type": "recent"
 },
 "source": "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>",
 "in_reply_to_status_id": null,
 "in_reply_to_status_id_str": null,
 "in_reply_to_user_id": null,
 "in_reply_to_user_id_str": null,
 "in_reply_to_screen_name": null,
 "user": {
  "id": 164939790,
  "id_str": "164939790",

In [21]:
t = statuses[0]

# Exploring the variable t to get familiarized with the data structure
print(t['retweet_count'])
print(t['retweeted'])


0
False


## 6. Extracting text, screen names, and hashtags from tweets

In [22]:
status_texts = [ status['text'] for status in statuses ]

screen_names = [ user_mention['screen_name'] 
                 for status in statuses
                     for user_mention in status['entities']['user_mentions'] ]

hashtags = [ hashtag['text'] 
             for status in statuses
                 for hashtag in status['entities']['hashtags'] ]

# Compute a collection of all words from all tweets
words = [ w 
          for t in status_texts 
              for w in t.split() ]

In [23]:
# Explore the first 5 items for each...

print(json.dumps(status_texts[0:5], indent=1))
print(json.dumps(screen_names[0:5], indent=1)) 
print(json.dumps(hashtags[0:5], indent=1))
print(json.dumps(words[0:5], indent=1))

[
 "can someone enlighten me what is the credentials of MSK prasad to hold the chairman of selector post? He has got 23\u2026 https://t.co/nHPtc2Y66B",
 "Karun Nair should have given chance to play in test and ODIs. We know how many chances are given to Raina &amp; Rohit.\u2026 https://t.co/wFENkRRBAA",
 "RT @monicas004: I personally spoke to Karun soon after the selection of the Test Team against WI and also told the ways to make a comeback,\u2026",
 "RT @urvildesai999: Great to see a new talent in squad vs Wi series \ud83d\udd25\ud83e\udd1e\ud83c\udffb\n#Bcci @BCCI #INDvWI",
 "RT @IManishh10: If selectors want Rohit Sharma to score runs in red ball format to earn recall, then why didn't they selected him in A Seri\u2026"
]
[
 "monicas004",
 "urvildesai999",
 "BCCI",
 "IManishh10",
 "StarSportsIndia"
]
[
 "Bcci",
 "INDvWI",
 "\u0915\u093e\u092e_\u092c\u094b\u0932\u0924\u093e_\u0939\u0948",
 "\u0915\u094d\u0930\u093f\u0915\u0947\u091f_\u0938\u094d\u091f\u0947\u0921\u093f\u092f\u092e",


## 7. Creating a basic frequency distribution from the words in tweets

In [24]:
from collections import Counter

for item in [words, screen_names, hashtags]:
    c = Counter(item)
    print(c.most_common()[:10]) # top 10
    print()

[('RT', 38), ('in', 30), ('to', 29), ('the', 28), ('Test', 27), ('#INDvWI', 25), ('for', 25), ('series', 22), ('Last', 19), ('a', 18)]

[('BCCI', 5), ('ImRo45', 4), ('StarSportsIndia', 3), ('FirstCric', 3), ('mayankcricket', 3), ('imVkohli', 3), ('ashwinravi99', 2), ('ICC', 2), ('monicas004', 1), ('urvildesai999', 1)]

[('INDvWI', 26), ('ViratKohli', 5), ('RohitSharma', 3), ('Cricket', 3), ('BCCI', 2), ('TeamIndia', 2), ('ZIMvSA', 2), ('PAKvAUS', 2), ('ENGvSL', 2), ('Live', 2)]



## 8. Create a `prettyprint_counts` function to display tuples in a nice tabular format

In [25]:
def prettyprint_counts(label, list_of_tuples):
    print("\n{:^20} | {:^6}".format(label, "Count"))
    print("*"*40)
    for k,v in list_of_tuples:
        print("{:20} | {:>6}".format(k,v))

In [26]:
for label, data in (('Word', words), 
                    ('Screen Name', screen_names), 
                    ('Hashtag', hashtags)):
    
    c = Counter(data)
    prettyprint_counts(label, c.most_common()[:10])


        Word         | Count 
****************************************
RT                   |     38
in                   |     30
to                   |     29
the                  |     28
Test                 |     27
#INDvWI              |     25
for                  |     25
series               |     22
Last                 |     19
a                    |     18

    Screen Name      | Count 
****************************************
BCCI                 |      5
ImRo45               |      4
StarSportsIndia      |      3
FirstCric            |      3
mayankcricket        |      3
imVkohli             |      3
ashwinravi99         |      2
ICC                  |      2
monicas004           |      1
urvildesai999        |      1

      Hashtag        | Count 
****************************************
INDvWI               |     26
ViratKohli           |      5
RohitSharma          |      3
Cricket              |      3
BCCI                 |      2
TeamIndia            |      2
ZIMv

## 9. Finding the most popular retweets

In [27]:
retweets = [
            # Store out a tuple of these three values ...
            (status['retweet_count'], 
             status['retweeted_status']['user']['screen_name'],
             status['text'].replace("\n","\\")) 
            
            # ... for each status ...
            for status in statuses 
            
            # ... so long as the status meets this condition.
                if 'retweeted_status' in status
           ]

We can build another `prettyprint` function to print entire tweets with their retweet count.

We also want to split the text of the tweet in up to 3 lines, if needed.

In [28]:
row_template = "{:^7} | {:^15} | {:50}"
def prettyprint_tweets(list_of_tuples):
    print()
    print(row_template.format("Count", "Screen Name", "Text"))
    print("*"*80)
    for count, screen_name, text in list_of_tuples:
        print(row_template.format(count, screen_name, text[:50]))
        if len(text) > 50:
            print(row_template.format("", "", text[50:100]))
            if len(text) > 100:
                print(row_template.format("", "", text[100:]))

In [29]:
# Slice off the first 5 from the sorted results and display each item in the tuple

prettyprint_tweets(sorted(retweets, reverse=True)[:10])


 Count  |   Screen Name   | Text                                              
********************************************************************************
  112   | StarSportsIndia | RT @StarSportsIndia: Whites, the red cherry, three
        |                 |  slips and a gully - who wouldn't fall in love wit
        |                 | h the five-day format? 😍\ \The world's…           
  92    |    riyaasrkk    | RT @riyaasrkk: Rohit Sharma 's @ICC ODI Rankings s
        |                 | ince he has become an opener:\\2013- 39th Rank\\20
        |                 | 14- 17th Rank\\2015- 16Th Rank\\2016- 7…          
  89    | SirIshantSharma | RT @SirIshantSharma: Just saw the Indian Squad for
        |                 |  West Indies and I didn't find my name. Can someon
        |                 | e please tell me I'm dropped or rested?…          
  85    | StarSportsIndia | RT @StarSportsIndia: Once a leader, always a leade
        |                 | r! 😇\ \Can MS Dhoni's