# Twitter data


# Twitter API Access

Twitter implements OAuth 1.0A as its standard authentication mechanism, and in order to use it to make requests to Twitter's API, you'll need to go to https://dev.twitter.com/apps and create a sample application.

Choose any name for your application, write a description and use `http://google.com` for the website.

Under **Key and Access Tokens**, there are four primary identifiers you'll need to note for an OAuth 1.0A workflow: 
* consumer key, 
* consumer secret, 
* access token, and 
* access token secret (Click on Create Access Token to create those).

Note that you will need an ordinary Twitter account in order to login, create an app, and get these credentials.

In [21]:
import pickle
import os

In [22]:
if not os.path.exists('secret_twitter_credentials.pkl'):
    Twitter={}
    Twitter['Consumer Key'] =' '
    Twitter['Consumer Secret'] =' '
    Twitter['Access Token'] = ' '
    Twitter['Access Token Secret'] = ' '
    with open('secret_twitter_credentials.pkl','wb') as f:
        pickle.dump(Twitter, f)
else:
    Twitter=pickle.load(open('secret_twitter_credentials.pkl','rb'))

Install the `twitter` package to interface with the Twitter API

In [23]:
!pip install twitter



## Example 1. Authorizing an application to access Twitter account data

In [24]:
import twitter

auth = twitter.oauth.OAuth(Twitter['Access Token'],
                           Twitter['Access Token Secret'],
                           Twitter['Consumer Key'],
                           Twitter['Consumer Secret'])

twitter_api = twitter.Twitter(auth=auth)

# Nothing to see by displaying twitter_api except that it's now a
# defined variable

print(twitter_api)

<twitter.api.Twitter object at 0x0000022C8279C048>


## Example 2. Retrieving trends

Twitter identifies locations using the Yahoo! Where On Earth ID.

The Yahoo! Where On Earth ID for the entire world is 1.
See https://dev.twitter.com/docs/api/1.1/get/trends/place and
http://developer.yahoo.com/geo/geoplanet/

look at the BOSS placefinder here: https://developer.yahoo.com/boss/placefinder/

In [40]:
LOCAL_WOE_ID=2487889

# Prefix ID with the underscore for query string parameterization.
# Without the underscore, the twitter package appends the ID value
# to the URL itself as a special case keyword argument.

world_trends = twitter_api.trends.place(_id=WORLD_WOE_ID)
us_trends = twitter_api.trends.place(_id=US_WOE_ID)
local_trends = twitter_api.trends.place(_id=LOCAL_WOE_ID)

In [41]:
world_trends[:2]

[{'trends': [{'name': '#BTS_BBMAs',
    'url': 'http://twitter.com/search?q=%23BTS_BBMAs',
    'promoted_content': None,
    'query': '%23BTS_BBMAs',
    'tweet_volume': 1266536},
   {'name': '#InMyBloodVideo',
    'url': 'http://twitter.com/search?q=%23InMyBloodVideo',
    'promoted_content': None,
    'query': '%23InMyBloodVideo',
    'tweet_volume': 57695},
   {'name': '#تذاكر_السينما_ب_١٣٠',
    'url': 'http://twitter.com/search?q=%23%D8%AA%D8%B0%D8%A7%D9%83%D8%B1_%D8%A7%D9%84%D8%B3%D9%8A%D9%86%D9%85%D8%A7_%D8%A8_%D9%A1%D9%A3%D9%A0',
    'promoted_content': None,
    'query': '%23%D8%AA%D8%B0%D8%A7%D9%83%D8%B1_%D8%A7%D9%84%D8%B3%D9%8A%D9%86%D9%85%D8%A7_%D8%A8_%D9%A1%D9%A3%D9%A0',
    'tweet_volume': 53865},
   {'name': '#FelizMartes',
    'url': 'http://twitter.com/search?q=%23FelizMartes',
    'promoted_content': None,
    'query': '%23FelizMartes',
    'tweet_volume': 49602},
   {'name': '#حط_صورتك_وانت_صغير',
    'url': 'http://twitter.com/search?q=%23%D8%AD%D8%B7_%D8%B5%D9%88%D

In [42]:
trends=local_trends
print(type(trends))
print(list(trends[0].keys()))
print(trends[0]['trends'])

<class 'twitter.api.TwitterListResponse'>
['trends', 'as_of', 'created_at', 'locations']
[{'name': 'Russ', 'url': 'http://twitter.com/search?q=Russ', 'promoted_content': None, 'query': 'Russ', 'tweet_volume': 73417}, {'name': '#InfinityWar', 'url': 'http://twitter.com/search?q=%23InfinityWar', 'promoted_content': None, 'query': '%23InfinityWar', 'tweet_volume': 460716}, {'name': '#MyHandleExplained', 'url': 'http://twitter.com/search?q=%23MyHandleExplained', 'promoted_content': None, 'query': '%23MyHandleExplained', 'tweet_volume': 60964}, {'name': 'Toronto', 'url': 'http://twitter.com/search?q=Toronto', 'promoted_content': None, 'query': 'Toronto', 'tweet_volume': 739638}, {'name': '#KUSINews', 'url': 'http://twitter.com/search?q=%23KUSINews', 'promoted_content': None, 'query': '%23KUSINews', 'tweet_volume': None}, {'name': '#Trump', 'url': 'http://twitter.com/search?q=%23Trump', 'promoted_content': None, 'query': '%23Trump', 'tweet_volume': 77702}, {'name': 'Waffle House', 'url': 'ht

## Example 3. Displaying API responses as pretty-printed JSON

In [43]:
import json

print((json.dumps(us_trends[:2], indent=1)))

[
 {
  "trends": [
   {
    "name": "#TuesdayThoughts",
    "url": "http://twitter.com/search?q=%23TuesdayThoughts",
    "promoted_content": null,
    "query": "%23TuesdayThoughts",
    "tweet_volume": 101867
   },
   {
    "name": "#DescribeTwitterBadly",
    "url": "http://twitter.com/search?q=%23DescribeTwitterBadly",
    "promoted_content": null,
    "query": "%23DescribeTwitterBadly",
    "tweet_volume": null
   },
   {
    "name": "#YouAreWinningWhen",
    "url": "http://twitter.com/search?q=%23YouAreWinningWhen",
    "promoted_content": null,
    "query": "%23YouAreWinningWhen",
    "tweet_volume": 16992
   },
   {
    "name": "#BTS_BBMAs",
    "url": "http://twitter.com/search?q=%23BTS_BBMAs",
    "promoted_content": null,
    "query": "%23BTS_BBMAs",
    "tweet_volume": 1266536
   },
   {
    "name": "#SMWNYC",
    "url": "http://twitter.com/search?q=%23SMWNYC",
    "promoted_content": null,
    "query": "%23SMWNYC",
    "tweet_volume": null
   },
   {
    "name": "Ronny Jacks

## Example 4. Computing the intersection of two sets of trends

In [44]:
trends_set = {}
trends_set['world'] = set([trend['name'] 
                        for trend in world_trends[0]['trends']])

trends_set['us'] = set([trend['name'] 
                     for trend in us_trends[0]['trends']]) 

trends_set['san diego'] = set([trend['name'] 
                     for trend in local_trends[0]['trends']]) 

In [45]:
for loc in ['world','us','san diego']:
    print(('-'*10,loc))
    print((','.join(trends_set[loc])))

('----------', 'world')
#HowMyBodyTellsMeImOld,Henri Michel,#EsMolestoCuandoAlguien,#RahulDaresModi,#NoSomosTresSomosTodxs,#EmbraceAmbition,#MiSerieFavoritaEs,#حط_صورتك_وانت_صغير,#TravelTuesday,#AbortoNosotrasDecidimos,#MIvSRH,#PodemosEresTú,#ElDebateEconomico,#DuduestamoscomVocê,#TorontoTheGood,#Martina,#DescribeTwitterBadly,#SMTMThailand,Ronny Jackson,#kathyszaboenbv,#24Abr,Bülent Arınç,France Football,#Kreuz,#TDR2018,#TenhoTendênciaPra,#NaGodVideoByTizzy,#MillicentFawcett,#BTS_BBMAs,Mike Francesa,#تذاكر_السينما_ب_١٣٠,#กาหลมหรทึก,#myhandlexplained,#budayasebagaipanglima,#الحب_صدق_او_كذب,Aziz Yıldırım,#ليله_الخمسه_والهروب_الكبير,#SMWNYC,#cityMW,Lucho Jara,#내가_음료수가_된다면,#GenocidioArmenio,#خسران_يامعاند_السعوديه,#花のち晴れ,#AvengerInfinityWar,#TuesdayThoughts,#YouAreWinningWhen,#InMyBloodVideo,#GraciasAlfredyAmaia,#FelizMartes
('----------', 'us')
#HowMyBodyTellsMeImOld,#AM2DM,Marcus Smart,LEAH ON THE OFFBEAT,Bill Snyder,Tom Hardy,#GSMCON2018,Olivier Vernon,Marty Hurney,Pat Shurmur,#EmbraceA

In [46]:
print(( '='*10,'intersection of world and us'))
print((trends_set['world'].intersection(trends_set['us'])))

print(('='*10,'intersection of us and san-diego'))
print((trends_set['san diego'].intersection(trends_set['us'])))

{'#HowMyBodyTellsMeImOld', '#BTS_BBMAs', 'Mike Francesa', '#myhandlexplained', '#AvengerInfinityWar', '#DescribeTwitterBadly', '#TuesdayThoughts', '#EmbraceAmbition', '#YouAreWinningWhen', '#InMyBloodVideo', '#SMWNYC', 'Ronny Jackson', '#cityMW', '#FelizMartes'}
{'#HowMyBodyTellsMeImOld', 'Marcus Smart', 'Tom Hardy', '#GSMCON2018', 'Olivier Vernon', 'Marty Hurney', 'Pat Shurmur', '#EmbraceAmbition', '#CLOSErikers', 'Nick Mangold', 'Amazon Key', 'Oil States', '#Peace72', 'Alfie Evans', 'Sterling Shepard', 'Ronny Jackson', '#ruralprogressive', 'Janoris Jenkins', 'Webby', '#BTS_BBMAs', '#InMyBloodVideo', 'Stupid Question', 'Mike Francesa', '#GEOINT2018', 'My Bloody Valentine', 'Schoolhouse Rock', '#VenomTrailer', '#SMWNYC', '#cityMW', '#CUonStrike', '#WhenIWasSingle', '#contentwritingchat', 'When Jon Karl', '#BookBirthday', 'Bryon Hefner', '#TuesdayThoughts', '#LeadDevNewYork', '#protectcleanwater', '#YouAreWinningWhen', 'Astacio', '#BHDoF', '#TransformationTuesday'}


## Example 5. Collecting search results

Set the variable `q` to a trending topic, 
or anything else for that matter. The example query below
was a trending topic when this content was being developed
and is used throughout the remainder of this chapter

In [47]:
q = '#Lietuva' 

number = 100

# See https://dev.twitter.com/docs/api/1.1/get/search/tweets

search_results = twitter_api.search.tweets(q=q, count=number)

statuses = search_results['statuses']

In [49]:
len(statuses)
#print(statuses)

100

Twitter often returns duplicate results, we can filter them out checking for duplicate texts:

In [50]:
all_text = []
filtered_statuses = []
for s in statuses:
    if not s["text"] in all_text:
        filtered_statuses.append(s)
        all_text.append(s["text"])
statuses = filtered_statuses     

In [51]:
len(statuses)

53

In [52]:
[s['text'] for s in search_results['statuses']]

['#ArianaComeToLithuania #Vilnius #Kaunas #Lietuva #Lithuania Listen on @Spotify  #NoTearsLeftToCry by @ArianaGrande ♡ https://t.co/Jbzq0rYvja',
 '#like4like #palanga #vilnius #lithuanian #litva #lietuva #lucky #luxurylife #europe #mafia… https://t.co/0Nn8rkVK0g',
 '#Birstonas #Lithuania spring time #lietuva https://t.co/aQ9IdWaDG0',
 'RT @WeLoveLithuania: Sakurų žydėjimas Vilniuje 2018 – Vakarinė versija\n#Vilnius\nWe love Lithuania - Photo by: Simonas Rudaminas\n#Lithuania…',
 'Veiksme | in action 😎\n•\n•\n•\n•\n•\n•\n•\n#me #choreographer #choreografas #vilonas #lietuva… https://t.co/UMvp0ml9xf',
 'RT @NejauNet: Shadow | #Kaunas, #Lithuania #mavicpro #Lietuva #dronas #beautifullithuania #mavic #djimavic #djieurope #Kaunascity #Kaunasae…',
 'Morning | #Kaunas, #Lithuania #mavicpro #Lietuva #dronas #kaunasoldtown #beautifullithuania #mavic #djimavic… https://t.co/sb2sWAXJ1a',
 'RT @WeLoveLithuania: Sakurų žydėjimas Vilniuje 2018 – Vakarinė versija\n#Vilnius\nWe love Lithuania - Photo 

In [64]:
# Show one sample search result by slicing the list...
print(json.dumps(statuses[4], indent=1))

{
 "created_at": "Tue Apr 24 09:30:39 +0000 2018",
 "id": 988711810751778816,
 "id_str": "988711810751778816",
 "text": "Veiksme | in action \ud83d\ude0e\n\u2022\n\u2022\n\u2022\n\u2022\n\u2022\n\u2022\n\u2022\n#me #choreographer #choreografas #vilonas #lietuva\u2026 https://t.co/UMvp0ml9xf",
 "truncated": false,
 "entities": {
  "hashtags": [
   {
    "text": "me",
    "indices": [
     36,
     39
    ]
   },
   {
    "text": "choreographer",
    "indices": [
     40,
     54
    ]
   },
   {
    "text": "choreografas",
    "indices": [
     55,
     68
    ]
   },
   {
    "text": "vilonas",
    "indices": [
     69,
     77
    ]
   },
   {
    "text": "lietuva",
    "indices": [
     78,
     86
    ]
   }
  ],
  "symbols": [],
  "user_mentions": [],
  "urls": [
   {
    "url": "https://t.co/UMvp0ml9xf",
    "expanded_url": "https://www.instagram.com/p/Bh8r7jHA6lu/",
    "display_url": "instagram.com/p/Bh8r7jHA6lu/",
    "indices": [
     88,
     111
    ]
   }
  ]
 },
 "metadata

In [76]:
# The result of the list comprehension is a list with only one element that
# can be accessed by its index and set to the variable t
t = statuses[0]
#[ status for status in statuses 
#          if status['id'] == 316948241264549888 ][0]

# Explore the variable t to get familiarized with the data structure...

print(t['retweet_count'])
print(t['retweeted'])


4
False


In [None]:
print(json.dumps(status_texts[0:5], indent=1))

## Example 6. Extracting text, screen names, and hashtags from tweets

In [77]:
status_texts = [ status['text'] 
                 for status in statuses ]

screen_names = [ user_mention['screen_name'] 
                 for status in statuses
                     for user_mention in status['entities']['user_mentions'] ]

hashtags = [ hashtag['text'] 
             for status in statuses
                 for hashtag in status['entities']['hashtags'] ]

# Compute a collection of all words from all tweets
words = [ w 
          for t in status_texts 
              for w in t.split() ]

In [78]:
# Explore the first 5 items for each...

print(json.dumps(status_texts[0:5], indent=1))
print(json.dumps(screen_names[0:5], indent=1)) 
print(json.dumps(hashtags[0:5], indent=1))
print(json.dumps(words[0:5], indent=1))

[
 "#ArianaComeToLithuania #Vilnius #Kaunas #Lietuva #Lithuania Listen on @Spotify  #NoTearsLeftToCry by @ArianaGrande \u2661 https://t.co/Jbzq0rYvja",
 "#like4like #palanga #vilnius #lithuanian #litva #lietuva #lucky #luxurylife #europe #mafia\u2026 https://t.co/0Nn8rkVK0g",
 "#Birstonas #Lithuania spring time #lietuva https://t.co/aQ9IdWaDG0",
 "RT @WeLoveLithuania: Sakur\u0173 \u017eyd\u0117jimas Vilniuje 2018 \u2013 Vakarin\u0117 versija\n#Vilnius\nWe love Lithuania - Photo by: Simonas Rudaminas\n#Lithuania\u2026",
 "Veiksme | in action \ud83d\ude0e\n\u2022\n\u2022\n\u2022\n\u2022\n\u2022\n\u2022\n\u2022\n#me #choreographer #choreografas #vilonas #lietuva\u2026 https://t.co/UMvp0ml9xf"
]
[
 "Spotify",
 "ArianaGrande",
 "WeLoveLithuania",
 "NejauNet",
 "NejauNet"
]
[
 "ArianaComeToLithuania",
 "Vilnius",
 "Kaunas",
 "Lietuva",
 "Lithuania"
]
[
 "#ArianaComeToLithuania",
 "#Vilnius",
 "#Kaunas",
 "#Lietuva",
 "#Lithuania"
]


## Example 7. Creating a basic frequency distribution from the words in tweets

In [79]:
from collections import Counter

for item in [words, screen_names, hashtags]:
    c = Counter(item)
    print(c.most_common()[:10]) # top 10
    print()

[('#Lietuva', 32), ('#Lithuania', 31), ('RT', 19), ('of', 12), ('@Confederation_M:', 11), ('the', 10), ('VADOVAS', 10), ('|', 9), ('#lietuva', 8), ('#mavicpro', 8)]

[('Confederation_M', 11), ('NejauNet', 4), ('Spotify', 2), ('ArianaGrande', 2), ('WeLoveLithuania', 1), ('ValstietisToday', 1), ('ArianaGrandeLT4', 1), ('IIHFHockey', 1), ('NJDevils', 1), ('NYRangers', 1)]

[('Lietuva', 33), ('Lithuania', 32), ('Kaunas', 15), ('Vilnius', 11), ('lietuva', 10), ('mavicpro', 8), ('dronas', 8), ('beautifullithuania', 8), ('mavic', 8), ('djimavic', 7)]



## Example 8. Create a prettyprint function to display tuples in a nice tabular format

In [80]:
def prettyprint_counts(label, list_of_tuples):
    print("\n{:^20} | {:^6}".format(label, "Count"))
    print("*"*40)
    for k,v in list_of_tuples:
        print("{:20} | {:>6}".format(k,v))

In [81]:
for label, data in (('Word', words), 
                    ('Screen Name', screen_names), 
                    ('Hashtag', hashtags)):
    
    c = Counter(data)
    prettyprint_counts(label, c.most_common()[:10])


        Word         | Count 
****************************************
#Lietuva             |     32
#Lithuania           |     31
RT                   |     19
of                   |     12
@Confederation_M:    |     11
the                  |     10
VADOVAS              |     10
|                    |      9
#lietuva             |      8
#mavicpro            |      8

    Screen Name      | Count 
****************************************
Confederation_M      |     11
NejauNet             |      4
Spotify              |      2
ArianaGrande         |      2
WeLoveLithuania      |      1
ValstietisToday      |      1
ArianaGrandeLT4      |      1
IIHFHockey           |      1
NJDevils             |      1
NYRangers            |      1

      Hashtag        | Count 
****************************************
Lietuva              |     33
Lithuania            |     32
Kaunas               |     15
Vilnius              |     11
lietuva              |     10
mavicpro             |      8
dron

## Example 9. Finding the most popular retweets

In [82]:
retweets = [
            # Store out a tuple of these three values ...
            (status['retweet_count'], 
             status['retweeted_status']['user']['screen_name'],
             status['text'].replace("\n","\\")) 
            
            # ... for each status ...
            for status in statuses 
            
            # ... so long as the status meets this condition.
                if 'retweeted_status' in status
           ]

In [83]:
row_template = "{:^7} | {:^15} | {:50}"
def prettyprint_tweets(list_of_tuples):
    print()
    print(row_template.format("Count", "Screen Name", "Text"))
    print("*"*60)
    for count, screen_name, text in list_of_tuples:
        print(row_template.format(count, screen_name, text[:50]))
        if len(text) > 50:
            print(row_template.format("", "", text[50:100]))
            if len(text) > 100:
                print(row_template.format("", "", text[100:]))

In [84]:
# Slice off the first 5 from the sorted results and display each item in the tuple

prettyprint_tweets(sorted(retweets, reverse=True)[:10])


 Count  |   Screen Name   | Text                                              
************************************************************
  14    |  sharetrainKA2  | RT @sharetrainKA2: The agenda for 2nd transnationa
        |                 | l meeting  in @KaunoPkc KaunoPkc #Kaunas #Lithuani
        |                 | a #Lietuva  is ready  Looking forward t…          
   8    | Confederation_M | RT @Confederation_M: #Energetikos Departamento VAD
        |                 | OVAS (#Vilnius)\Head of the #Energy (#LPG, #oil) D
        |                 | epartment \#Lithuania #Lietuva #Литва #…          
   8    | Confederation_M | RT @Confederation_M: #Eksporto Departamento VADOVA
        |                 | S (#Vilnius, #Kaunas)\Head of the #Export Departme
        |                 | nt\#Lithuania #Lietuva #Литва @Confeder…          
   7    | Confederation_M | RT @Confederation_M: Vacancies: \#Nekilnojamojo tu
        |                 | rto Departamento VADOVAS (#Vilnius, #Kaun