# Twitter data

## Copyright and Licensing

You are free to use or adapt this notebook for any purpose you'd like. However, please respect the [Simplified BSD License](https://github.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition/blob/master/LICENSE.txt) that governs its use.

# Twitter API Access

Twitter implements OAuth 1.0A as its standard authentication mechanism, and in order to use it to make requests to Twitter's API, you'll need to go to https://dev.twitter.com/apps and create a sample application.

Note: As of July 2018, you have to create a developer account in addition to creating twitter account. Developer account url: https://developer.twitter.com/en/apply/user

Choose any name for your application, write a description and use `http://google.com` for the website.

Under **Key and Access Tokens**, there are four primary identifiers you'll need to note for an OAuth 1.0A workflow: 
* consumer key, 
* consumer secret, 
* access token, and 
* access token secret (Click on Create Access Token to create those).

Note that you will need an ordinary Twitter account in order to login, create an app, and get these credentials.

The first time you execute the notebook, add all credentials so that you can save them in the `pkl` file, then you can remove the secret keys from the notebook because they will just be loaded from the `pkl` file.

The `pkl` file contains sensitive information that can be used to take control of your twitter acccount, **do not share it**.

In [2]:
import pickle
import os

In [3]:
if not os.path.exists('secret_twitter_credentialss.pkl'):
    Twitter={}
    Twitter['Consumer Key'] = ''
    Twitter['Consumer Secret'] = ''
    Twitter['Access Token'] = ''
    Twitter['Access Token Secret'] = ''
    with open('secret_twitter_credentials.pkl','wb') as f:
        pickle.dump(Twitter, f)
else:
    Twitter=pickle.load(open('secret_twitter_credentials.pkl','rb'))
    # rb = Read the file in Binary mode

Install the `twitter` package to interface with the Twitter API

In [3]:
!pip install twitter

Collecting twitter
  Downloading https://files.pythonhosted.org/packages/85/e2/f602e3f584503f03e0389491b251464f8ecfe2596ac86e6b9068fe7419d3/twitter-1.18.0-py2.py3-none-any.whl (54kB)
Installing collected packages: twitter
Successfully installed twitter-1.18.0


distributed 1.21.8 requires msgpack, which is not installed.
You are using pip version 10.0.1, however version 18.0 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.


## Example 1. Authorizing an application to access Twitter account data

In [16]:
import twitter

auth = twitter.oauth.OAuth(Twitter['Access Token'],
                           Twitter['Access Token Secret'],
                           Twitter['Consumer Key'],
                           Twitter['Consumer Secret'])

twitter_api = twitter.Twitter(auth=auth)

# Nothing to see by displaying twitter_api except that it's now a
# defined variable

print(twitter_api)

<twitter.api.Twitter object at 0x00000220F2654080>


## Example 2. Retrieving trends

Twitter identifies locations using the Yahoo! Where On Earth ID (WOEID).

The Yahoo! Where On Earth ID for the entire world is 1.<br>
See https://developer.twitter.com/en/docs/trends/trends-for-location/api-reference/get-trends-place and http://woeid.rosselliot.co.nz/

In [13]:
WORLD_WOE_ID = 1
US_WOE_ID = 23424977

Look for the WOEID for [san-diego](http://woeid.rosselliot.co.nz/lookup/san%20diego%20%20ca)

You can change it to another location.

In [17]:
LOCAL_WOE_ID=23424922 #Pakistan #san-diego 2487889

# Prefix ID with the underscore for query string parameterization.
# Without the underscore, the twitter package appends the ID value
# to the URL itself as a special case keyword argument.

world_trends = twitter_api.trends.place(_id=WORLD_WOE_ID)
us_trends = twitter_api.trends.place(_id=US_WOE_ID)
local_trends = twitter_api.trends.place(_id=LOCAL_WOE_ID)

In [18]:
world_trends[0]

{'trends': [{'name': '#عطنا_حكمه_حلوه_من_عندك',
   'url': 'http://twitter.com/search?q=%23%D8%B9%D8%B7%D9%86%D8%A7_%D8%AD%D9%83%D9%85%D9%87_%D8%AD%D9%84%D9%88%D9%87_%D9%85%D9%86_%D8%B9%D9%86%D8%AF%D9%83',
   'promoted_content': None,
   'query': '%23%D8%B9%D8%B7%D9%86%D8%A7_%D8%AD%D9%83%D9%85%D9%87_%D8%AD%D9%84%D9%88%D9%87_%D9%85%D9%86_%D8%B9%D9%86%D8%AF%D9%83',
   'tweet_volume': 18464},
  {'name': '#اكله_مستحيل_تاكلها',
   'url': 'http://twitter.com/search?q=%23%D8%A7%D9%83%D9%84%D9%87_%D9%85%D8%B3%D8%AA%D8%AD%D9%8A%D9%84_%D8%AA%D8%A7%D9%83%D9%84%D9%87%D8%A7',
   'promoted_content': None,
   'query': '%23%D8%A7%D9%83%D9%84%D9%87_%D9%85%D8%B3%D8%AA%D8%AD%D9%8A%D9%84_%D8%AA%D8%A7%D9%83%D9%84%D9%87%D8%A7',
   'tweet_volume': None},
  {'name': '#FelizJueves',
   'url': 'http://twitter.com/search?q=%23FelizJueves',
   'promoted_content': None,
   'query': '%23FelizJueves',
   'tweet_volume': 19235},
  {'name': '#RoaldDahlDay2018',
   'url': 'http://twitter.com/search?q=%23RoaldDahlDay2018

In [19]:
trends=local_trends
#print(type(trends))
#print(list(trends[0].keys())) #keys/attributes from trend 
#print(trends[0]['trends'])
trends = trends[0]['trends']
print(trends[20])


{'name': '#Putin', 'url': 'http://twitter.com/search?q=%23Putin', 'promoted_content': None, 'query': '%23Putin', 'tweet_volume': None}


## Example 3. Displaying API responses as pretty-printed JSON

In [20]:
import json

print((json.dumps(us_trends[:2], indent=1)))

[
 {
  "trends": [
   {
    "name": "#ThursdayThoughts",
    "url": "http://twitter.com/search?q=%23ThursdayThoughts",
    "promoted_content": null,
    "query": "%23ThursdayThoughts",
    "tweet_volume": 31407
   },
   {
    "name": "#PrimaryDay",
    "url": "http://twitter.com/search?q=%23PrimaryDay",
    "promoted_content": null,
    "query": "%23PrimaryDay",
    "tweet_volume": null
   },
   {
    "name": "Category 2",
    "url": "http://twitter.com/search?q=%22Category+2%22",
    "promoted_content": null,
    "query": "%22Category+2%22",
    "tweet_volume": 11219
   },
   {
    "name": "#IPeekedOutMyWindow",
    "url": "http://twitter.com/search?q=%23IPeekedOutMyWindow",
    "promoted_content": null,
    "query": "%23IPeekedOutMyWindow",
    "tweet_volume": null
   },
   {
    "name": "#ThingsDuctTapeCantFix",
    "url": "http://twitter.com/search?q=%23ThingsDuctTapeCantFix",
    "promoted_content": null,
    "query": "%23ThingsDuctTapeCantFix",
    "tweet_volume": null
   },
   {

## Example 4. Computing the intersection of two sets of trends

In [21]:
trends_set = {}
trends_set['world'] = set([trend['name'] 
                        for trend in world_trends[0]['trends']])

trends_set['us'] = set([trend['name'] 
                     for trend in us_trends[0]['trends']]) 

trends_set['pakistan'] = set([trend['name'] 
                     for trend in local_trends[0]['trends']]) 

In [22]:
trends_set

{'world': {'#13settembre',
  '#ArunJaitleyStepDown',
  '#DER18Madrid',
  '#DiaInternacionalDelChocolate',
  '#FMQs',
  '#FelizJueves',
  '#GaneshChaturthi',
  '#HambacherForst',
  '#IPeekedOutMyWindow',
  '#JeudiPhoto',
  '#MetinOktay',
  '#NSCKE2018',
  '#PSGxJordan',
  '#PersibDay',
  '#Perşembe',
  '#PlanPauvreté',
  '#QuintaDetremuraSDV',
  '#RoaldDahlDay2018',
  '#SakaryaMeydanMuharebesi',
  '#ThursdayThoughts',
  '#WorldSepsisDay',
  '#anipoke',
  '#quizjam_ko',
  '#اكله_مستحيل_تاكلها',
  '#الخميس',
  '#اليوم_العالمي_للقانون',
  '#تصريح_مدير_جامعه_شقراء',
  '#تكريم_اليسا',
  '#جواز_الامارات_تاسع_اقوي_جواز',
  '#رساله_توجهها_للحاسد',
  '#عتق_رقبه_حمدان_النفيعي',
  '#عطنا_حكمه_حلوه_من_عندك',
  '#وجهك_طويل_او_قصير_او_دايري',
  '#ぐるナイ',
  '#公式ナカノヒトの悩み',
  '#死ぬからの予測変換で生き返ってみせる',
  '#레진대표_미성년자착취',
  '#마음당_절대_안치이는_요소',
  '#서랍안에_들어있는것',
  'Franco del Valle',
  'GT-R',
  'Golden Sun',
  'John Lewis',
  'Merkez Bankası',
  'Ruth Beitia',
  'Salisbury Cathedral',
  'VRカレシ',
  'Солсбери',
  

In [23]:
for loc in ['world','us','pakistan']:
    print(('-'*10,loc))
    print((','.join(trends_set[loc])))

('----------', 'world')
#死ぬからの予測変換で生き返ってみせる,Солсбери,#GaneshChaturthi,#اليوم_العالمي_للقانون,#عتق_رقبه_حمدان_النفيعي,#DiaInternacionalDelChocolate,#anipoke,Franco del Valle,#PersibDay,#JeudiPhoto,アニポケ,#ぐるナイ,#NSCKE2018,#DER18Madrid,#레진대표_미성년자착취,VRカレシ,Merkez Bankası,#الخميس,#Perşembe,#QuintaDetremuraSDV,#جواز_الامارات_تاسع_اقوي_جواز,#公式ナカノヒトの悩み,#تكريم_اليسا,#FMQs,#PSGxJordan,#FelizJueves,#تصريح_مدير_جامعه_شقراء,#وجهك_طويل_او_قصير_او_دايري,#SakaryaMeydanMuharebesi,#ThursdayThoughts,#HambacherForst,GT-R,#PlanPauvreté,#마음당_절대_안치이는_요소,Salisbury Cathedral,Golden Sun,#MetinOktay,#13settembre,#quizjam_ko,#RoaldDahlDay2018,#رساله_توجهها_للحاسد,#서랍안에_들어있는것,#ArunJaitleyStepDown,#اكله_مستحيل_تاكلها,John Lewis,#WorldSepsisDay,#عطنا_حكمه_حلوه_من_عندك,#IPeekedOutMyWindow,Ruth Beitia,聚楽第
('----------', 'us')
#PrimaryDay,#GaneshChaturthi,#MAGARallyRules,#AppleEvent,#FridayEve,#AskACurator,Trump's FEMA,American Horror Story,#HRTechConf,#GCAS2018,60 Minutes,Norm Macdonald,#HappyBirthdayNiall,#AHSApocolyps

In [24]:
print(( '='*10,'intersection of world and us'))
print((trends_set['world'].intersection(trends_set['us'])))

print(('='*10,'intersection of us and pakistan'))
print((trends_set['pakistan'].intersection(trends_set['us'])))

{'#ThursdayThoughts', '#GaneshChaturthi', '#IPeekedOutMyWindow'}
{'#AppleEvent'}


## Example 5. Collecting search results

Set the variable `q` to a trending topic, 
or anything else for that matter. The example query below
was a trending topic when this content was being developed

In [27]:
q = '#ThursdayThoughts' 

number = 100

#See https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets.html

search_results = twitter_api.search.tweets(q=q, count=number)

t_statuses = search_results['statuses']

In [28]:
print(len(t_statuses))
print(t_statuses)

100


Twitter often returns duplicate results, we can filter them out checking for duplicate texts:

In [29]:
all_text = []
filtered_statuses = []
statuses = []
for s in t_statuses:
    if not s["text"] in all_text:
        filtered_statuses.append(s)
        all_text.append(s["text"])
statuses = filtered_statuses     

In [30]:
len(statuses)

83

In [31]:
[s['text'] for s in search_results['statuses']]

['RT @TomHall: Patience of a Saint.\n\nOr should I say - a Mother! \n\n🐆\n\n#ThursdayThoughts #Mom #Motherhood \n\nhttps://t.co/7bNvRGYyUa',
 'RT @pinkk9lover: Lying just doesn’t stop for people like @HillaryClinton . Her latest twitter rant purposely mischaracterizes #JudgeKavenau…',
 'Happy #ThursdayThoughts ! :)💖👑 #ThursdayMotivation #BringItOn #YouveGotThis #September #wishes https://t.co/NgRJ2YS0TM',
 'RT @pinkk9lover: There are at least 65 million of us that love the USA and @POTUS ! Now let that show at the polls in  #NovemberMidTerms \nN…',
 'LATEST: @HighclereCastle Cigar Tasting Session Review. Fantastic night down at @Turmeaus Chester with @Tor_Imports… https://t.co/aArs8WmZcI',
 '#PrimaryDay time to elect the #Corrupt and #Crooked liberal Space Cadet Cuomo Since Nixon has zero chance of winnin… https://t.co/3woiMBZjlD',
 'Couple of us did a whole review of Jiggas library. Thoughts?\n#ThursdayThoughts #JayZ @S_C_ https://t.co/a1ohEq9IkF',
 '#ThursdayThoughts https://t.co/229

In [32]:
statuses[0]

{'created_at': 'Thu Sep 13 11:47:20 +0000 2018',
 'id': 1040205284231340033,
 'id_str': '1040205284231340033',
 'text': 'RT @TomHall: Patience of a Saint.\n\nOr should I say - a Mother! \n\n🐆\n\n#ThursdayThoughts #Mom #Motherhood \n\nhttps://t.co/7bNvRGYyUa',
 'truncated': False,
 'entities': {'hashtags': [{'text': 'ThursdayThoughts', 'indices': [68, 85]},
   {'text': 'Mom', 'indices': [86, 90]},
   {'text': 'Motherhood', 'indices': [91, 102]}],
  'symbols': [],
  'user_mentions': [{'screen_name': 'TomHall',
    'name': 'Tom Hall ☘',
    'id': 14993272,
    'id_str': '14993272',
    'indices': [3, 11]}],
  'urls': [],
  'media': [{'id': 918838873160953856,
    'id_str': '918838873160953856',
    'indices': [105, 128],
    'media_url': 'http://pbs.twimg.com/ext_tw_video_thumb/918838873160953856/pu/img/mEX_9Z2iNsovi4NA.jpg',
    'media_url_https': 'https://pbs.twimg.com/ext_tw_video_thumb/918838873160953856/pu/img/mEX_9Z2iNsovi4NA.jpg',
    'url': 'https://t.co/7bNvRGYyUa',
    'display_

In [33]:
# Show one sample search result by slicing the list...
print(json.dumps(statuses[0], indent=1))

{
 "created_at": "Thu Sep 13 11:47:20 +0000 2018",
 "id": 1040205284231340033,
 "id_str": "1040205284231340033",
 "text": "RT @TomHall: Patience of a Saint.\n\nOr should I say - a Mother! \n\n\ud83d\udc06\n\n#ThursdayThoughts #Mom #Motherhood \n\nhttps://t.co/7bNvRGYyUa",
 "truncated": false,
 "entities": {
  "hashtags": [
   {
    "text": "ThursdayThoughts",
    "indices": [
     68,
     85
    ]
   },
   {
    "text": "Mom",
    "indices": [
     86,
     90
    ]
   },
   {
    "text": "Motherhood",
    "indices": [
     91,
     102
    ]
   }
  ],
  "symbols": [],
  "user_mentions": [
   {
    "screen_name": "TomHall",
    "name": "Tom Hall \u2618",
    "id": 14993272,
    "id_str": "14993272",
    "indices": [
     3,
     11
    ]
   }
  ],
  "urls": [],
  "media": [
   {
    "id": 918838873160953856,
    "id_str": "918838873160953856",
    "indices": [
     105,
     128
    ],
    "media_url": "http://pbs.twimg.com/ext_tw_video_thumb/918838873160953856/pu/img/mEX_9Z2iNsovi4NA

In [34]:
# The result of the list comprehension is a list with only one element that
# can be accessed by its index and set to the variable t
t = statuses[0]
#[ status for status in statuses 
#          if status['id'] == 1039794680861405184 ][0]

# Explore the variable t to get familiarized with the data structure...

print(t['retweet_count'])
print(t['retweeted'])


281
False


## Example 6. Extracting text, screen names, and hashtags from tweets

In [35]:
status_texts = [ status['text'] 
                 for status in statuses ]

screen_names = [ user_mention['screen_name'] 
                 for status in statuses
                     for user_mention in status['entities']['user_mentions'] ]

hashtags = [ hashtag['text'] 
             for status in statuses
                 for hashtag in status['entities']['hashtags'] ]

# Compute a collection of all words from all tweets
words = [ w 
          for t in status_texts 
              for w in t.split() ]

In [36]:
# Explore the first 5 items for each...

print(json.dumps(status_texts[0:5], indent=1))
print(json.dumps(screen_names[0:5], indent=1)) 
print(json.dumps(hashtags[0:5], indent=1))
print(json.dumps(words[0:5], indent=1))

[
 "RT @TomHall: Patience of a Saint.\n\nOr should I say - a Mother! \n\n\ud83d\udc06\n\n#ThursdayThoughts #Mom #Motherhood \n\nhttps://t.co/7bNvRGYyUa",
 "RT @pinkk9lover: Lying just doesn\u2019t stop for people like @HillaryClinton . Her latest twitter rant purposely mischaracterizes #JudgeKavenau\u2026",
 "Happy #ThursdayThoughts ! :)\ud83d\udc96\ud83d\udc51 #ThursdayMotivation #BringItOn #YouveGotThis #September #wishes https://t.co/NgRJ2YS0TM",
 "RT @pinkk9lover: There are at least 65 million of us that love the USA and @POTUS ! Now let that show at the polls in  #NovemberMidTerms \nN\u2026",
 "LATEST: @HighclereCastle Cigar Tasting Session Review. Fantastic night down at @Turmeaus Chester with @Tor_Imports\u2026 https://t.co/aArs8WmZcI"
]
[
 "TomHall",
 "pinkk9lover",
 "HillaryClinton",
 "pinkk9lover",
 "POTUS"
]
[
 "ThursdayThoughts",
 "Mom",
 "Motherhood",
 "ThursdayThoughts",
 "ThursdayMotivation"
]
[
 "RT",
 "@TomHall:",
 "Patience",
 "of",
 "a"
]


## Example 7. Converting to DataFrame

In [37]:
import pandas as pd

In [38]:
tweet_df = pd.DataFrame(t_statuses,index=None)

In [39]:
tweet_df.keys()

Index(['contributors', 'coordinates', 'created_at', 'entities',
       'extended_entities', 'favorite_count', 'favorited', 'geo', 'id',
       'id_str', 'in_reply_to_screen_name', 'in_reply_to_status_id',
       'in_reply_to_status_id_str', 'in_reply_to_user_id',
       'in_reply_to_user_id_str', 'is_quote_status', 'lang', 'metadata',
       'place', 'possibly_sensitive', 'quoted_status', 'quoted_status_id',
       'quoted_status_id_str', 'retweet_count', 'retweeted',
       'retweeted_status', 'source', 'text', 'truncated', 'user'],
      dtype='object')

In [40]:
len(tweet_df)

100

In [41]:
tweet_df = tweet_df.text.drop_duplicates()

In [42]:
len(tweet_df)

83

### Additional Links:

1. https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object.html