# Twitter data

## Copyright and Licensing

You are free to use or adapt this notebook for any purpose you'd like. However, please respect the [Simplified BSD License](https://github.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition/blob/master/LICENSE.txt) that governs its use.

# Twitter API Access

Twitter implements OAuth 1.0A as its standard authentication mechanism, and in order to use it to make requests to Twitter's API, you'll need to go to https://dev.twitter.com/apps and create a sample application.

Choose any name for your application, write a description and use `http://google.com` for the website.

Under **Key and Access Tokens**, there are four primary identifiers you'll need to note for an OAuth 1.0A workflow: 
* consumer key, 
* consumer secret, 
* access token, and 
* access token secret (Click on Create Access Token to create those).

Note that you will need an ordinary Twitter account in order to login, create an app, and get these credentials.

The first time you execute the notebook, add all credentials so that you can save them in the `pkl` file, then you can remove the secret keys from the notebook because they will just be loaded from the `pkl` file.

The `pkl` file contains sensitive information that can be used to take control of your twitter acccount, **do not share it**.

How do we use pickle?
We import the package and also the os package
to check your files.
What does pickle do, right?
It's a cute name for a Python utility module
to save any Python object or data structure on disk.
Pickle will do something special called serialization.
To convert any Python object or in this case,
this Twitter object, into a character stream
so the object can be created later in Python
when we need it.
Reconstruction of that object is called deserialization.
So pickle will do this for your Twitter access credentials
and when we come back, it loads it
back into Twitter as an object.

In [1]:
import pickle
import os

Here, let's review what's going on in this code block.
As you see in this code block, in this if statement,
we create an object called Twitter and we use it
to store our access credentials.
So consumer key, consumer secret,
they are all stored in the Twitter object.
If the pickled credentials exist,
it will just load the credentials from the pickle file
into the Twitter object.



In [2]:
if not os.path.exists('secret_twitter_credentials1.pkl'):
    Twitter={}
    Twitter['Consumer Key'] = ''
    Twitter['Consumer Secret'] = ''
    Twitter['Access Token'] = ''
    Twitter['Access Token Secret'] = ''
    with open('secret_twitter_credentials1.pkl','wb') as f:
        pickle.dump(Twitter, f)
else:
    Twitter=pickle.load(open('secret_twitter_credentials1.pkl','rb'))

Install the `twitter` package to interface with the Twitter API

In [3]:
!pip install twitter



## Example 1. Authorizing an application to access Twitter account data

In [4]:
import twitter

auth = twitter.oauth.OAuth(Twitter['Access Token'],
                           Twitter['Access Token Secret'],
                           Twitter['Consumer Key'],
                           Twitter['Consumer Secret'])

twitter_api = twitter.Twitter(auth=auth)

# Nothing to see by displaying twitter_api except that it's now a
# defined variable

print(twitter_api)

<twitter.api.Twitter object at 0x107655ef0>


## Example 2. Retrieving trends

Twitter identifies locations using the Yahoo! Where On Earth ID.

The Yahoo! Where On Earth ID for the entire world is 1.
See https://dev.twitter.com/docs/api/1.1/get/trends/place and
http://developer.yahoo.com/geo/geoplanet/

look at the BOSS placefinder here: https://developer.yahoo.com/boss/placefinder/

In [5]:
WORLD_WOE_ID = 1
US_WOE_ID = 23424977

Look for the WOEID for [san-diego](http://woeid.rosselliot.co.nz/lookup/san%20diego%20%20ca)

You can change it to another location.

Using now the trends.place from the Twitter API
or for the Twitter API object we created
we can get the top 50 trends for any location.
So by default trends.place will give us the top 50 trends.


In [6]:
LOCAL_WOE_ID=2487889

# Prefix ID with the underscore for query string parameterization.
# Without the underscore, the twitter package appends the ID value
# to the URL itself as a special case keyword argument.

# top 50 trends for world, US, and local
world_trends = twitter_api.trends.place(_id=WORLD_WOE_ID)
us_trends = twitter_api.trends.place(_id=US_WOE_ID)
local_trends = twitter_api.trends.place(_id=LOCAL_WOE_ID)

Here what we received back is a trend object
with a response that's in JSON. We can see the response for example for world trends
by querying the first two records.
So here in the next code block or code cell
we say world trends and there's a slicing range query there.
We get back two trends.


In [7]:
world_trends[:2]

[{'as_of': '2018-02-04T17:27:07Z',
  'created_at': '2018-02-04T17:22:26Z',
  'locations': [{'name': 'Worldwide', 'woeid': 1}],
  'trends': [{'name': '#الباطن_الهلال',
    'promoted_content': None,
    'query': '%23%D8%A7%D9%84%D8%A8%D8%A7%D8%B7%D9%86_%D8%A7%D9%84%D9%87%D9%84%D8%A7%D9%84',
    'tweet_volume': 99970,
    'url': 'http://twitter.com/search?q=%23%D8%A7%D9%84%D8%A8%D8%A7%D8%B7%D9%86_%D8%A7%D9%84%D9%87%D9%84%D8%A7%D9%84'},
   {'name': '#TheVoiceKids',
    'promoted_content': None,
    'query': '%23TheVoiceKids',
    'tweet_volume': 140243,
    'url': 'http://twitter.com/search?q=%23TheVoiceKids'},
   {'name': '#SuperBowlSunday',
    'promoted_content': None,
    'query': '%23SuperBowlSunday',
    'tweet_volume': 35337,
    'url': 'http://twitter.com/search?q=%23SuperBowlSunday'},
   {'name': 'Piqué',
    'promoted_content': None,
    'query': 'Piqu%C3%A9',
    'tweet_volume': 65926,
    'url': 'http://twitter.com/search?q=Piqu%C3%A9'},
   {'name': '#LIVTOT',
    'promoted_con

In [8]:
trends=local_trends
print(type(trends))
print(list(trends[0].keys()))
print(trends[0]['trends'])

<class 'twitter.api.TwitterListResponse'>
['trends', 'as_of', 'created_at', 'locations']
[{'name': '#UFCBelem', 'url': 'http://twitter.com/search?q=%23UFCBelem', 'promoted_content': None, 'query': '%23UFCBelem', 'tweet_volume': 52049}, {'name': '#NFLHonors', 'url': 'http://twitter.com/search?q=%23NFLHonors', 'promoted_content': None, 'query': '%23NFLHonors', 'tweet_volume': 94658}, {'name': 'Air Force', 'url': 'http://twitter.com/search?q=%22Air+Force%22', 'promoted_content': None, 'query': '%22Air+Force%22', 'tweet_volume': 16061}, {'name': '#SuperBowl', 'url': 'http://twitter.com/search?q=%23SuperBowl', 'promoted_content': None, 'query': '%23SuperBowl', 'tweet_volume': 181153}, {'name': '#ChicanoPark', 'url': 'http://twitter.com/search?q=%23ChicanoPark', 'promoted_content': None, 'query': '%23ChicanoPark', 'tweet_volume': None}, {'name': '#UCSD', 'url': 'http://twitter.com/search?q=%23UCSD', 'promoted_content': None, 'query': '%23UCSD', 'tweet_volume': None}, {'name': 'Amtrak', 'url'

## Example 3. Displaying API responses as pretty-printed JSON

Let's import that JSON module here
and use it in something useful.
So in this line we are using the dumps function of JSON
to create a better
or more prettier version of the same output.
Here we said the indentation format,
we are saying indent every new parenthesis
or every new level we would call in JSON
with one character. We can now see thet trends.

In [9]:
import json

print((json.dumps(us_trends[:2], indent=1)))

[
 {
  "trends": [
   {
    "name": "#SuperBowlSunday",
    "url": "http://twitter.com/search?q=%23SuperBowlSunday",
    "promoted_content": null,
    "query": "%23SuperBowlSunday",
    "tweet_volume": 35337
   },
   {
    "name": "#JanetJacksonAppreciationDay",
    "url": "http://twitter.com/search?q=%23JanetJacksonAppreciationDay",
    "promoted_content": null,
    "query": "%23JanetJacksonAppreciationDay",
    "tweet_volume": 33841
   },
   {
    "name": "#WorldCancerDay",
    "url": "http://twitter.com/search?q=%23WorldCancerDay",
    "promoted_content": null,
    "query": "%23WorldCancerDay",
    "tweet_volume": 156977
   },
   {
    "name": "Amtrak",
    "url": "http://twitter.com/search?q=Amtrak",
    "promoted_content": null,
    "query": "Amtrak",
    "tweet_volume": 43850
   },
   {
    "name": "#LIVTOT",
    "url": "http://twitter.com/search?q=%23LIVTOT",
    "promoted_content": null,
    "query": "%23LIVTOT",
    "tweet_volume": 22445
   },
   {
    "name": "#KittenBowl",
 

## Example 4. Computing the intersection of two sets of trends

We will keep using this JSON module in the upcoming cells
but next let's create sets of these trends
for each location.
And then we'll find the commonalities between trends
for those locations.
So in other words
we'll find the intersections of these sets.
Here we are getting the name for all trends
using a loop set trend name and for trend and world trends.
So the first one is for world trends,
we add that to our trend set with the world label,
the second is for San Diego, so for US,
and the third is third set we are creating
is for San Diego.
So let's run this one.

In [10]:
trends_set = {}
trends_set['world'] = set([trend['name'] 
                        for trend in world_trends[0]['trends']])

trends_set['us'] = set([trend['name'] 
                     for trend in us_trends[0]['trends']]) 

trends_set['san diego'] = set([trend['name'] 
                     for trend in local_trends[0]['trends']]) 

We have now this trends set object
and there are three sets in it, one for the world,
one for US, one for San Diego.
In the next cell what we are doing is
we are first creating a four loop
that joins all the trends for a particular location
and prints them in pretty format. Here we are joining trends for all three locations
with a four loop.
So we'll first join the trends for world, then US,
then San Diego.
Let's display this, the output of the cell.

In [11]:
for loc in ['world','us','san diego']:
    print(('-'*10,loc))
    print((','.join(trends_set[loc])))

('----------', 'world')
#ليفربول_توتنهام,Liverpool 1-0 Tottenham,#بقولك_شي_بيني_وبينك,#domenicalive,#乃木坂工事中,#SMCFCN,#Συλλαλητηριο,#İyiHekimliğeÖzgürlük,احمد اشرف,#HappyJisungDay,#SaldırGALATASARAY,#CRYNEW,Dier,#DateMyFamily,#tubalifekazanc,#EleccionesCR,#SuperBowlSunday,Jon Moss,#UdineseMilan,#Mesaza,#الشباب_الرايد,#LIVTOT,#برشلونه_اسبانيول,#ديربي_جده,#حياتك1,#JanetJacksonAppreciationDay,#OTDirecto4F,#SRFCEAG,#DiaMundialContraElCancer,#ConsultaPopular2018,HOSEOK DAY,#欅って書けない,#الباطن_الهلال,#vvvfey,#FCASGE,#PuppyBowl,Piqué,#MorfiTelefe,Digne,#FelizDomingo,Emre Belözoğlu,#4FRebeliónDeFuturo,#JuveSassuolo,Karius,포카 105종,#ITAvENG,#ajanac,Amtrak,#HSVH96,#TheVoiceKids
('----------', 'us')
Katie Roiphe,Pique,#Juventus,Evan Turner,Kam Williams,Go Viral,#WorldCancerDay,#AMJoy,#CNNSOTU,Pat Robertson,Anfield,#GuacWorld,#KittenBowl,#CapsKnights,Harry Kane,#sundaymotivation,Espanyol,#CRYNEW,Eric Dier,#SportyBaking,#cxworlds,#KraftEntry,#SuperBowlSunday,Reince Priebus,#tphonline,#LIVTOT,#THFC,#BestW

We'll see that the world trends after a padding of 10 dashes
and the name of our location, the location for world,
we are printing all the elements of the trends
for the world and for US we have a similar thing
and for San Diego we have a similar thing,
for all three sets.

Okay, now how do we create intersections
of these texts, sets?
We will make use of Python set objects as seen above in the cell after we imported JSON. This set object will give us an intersection function.
So let's use that. We are saying here we are getting the set for the world
for the topics and intersection of it for US.
So the first one we'll give us the intersection
of world and US, the second one will give us
the intersection of San Diego with what's going on in US.
Let's print these out.

In [12]:
print(( '='*10,'intersection of world and us'))
print((trends_set['world'].intersection(trends_set['us'])))

print(('='*10,'intersection of us and san-diego'))
print((trends_set['san diego'].intersection(trends_set['us'])))

{'#CRYNEW', '#JanetJacksonAppreciationDay', 'HOSEOK DAY', '#SuperBowlSunday', '#LIVTOT', 'Amtrak'}
{'Katie Roiphe', 'Pique', 'Evan Turner', 'Kam Williams', 'Go Viral', '#WorldCancerDay', '#AMJoy', '#CNNSOTU', 'Pat Robertson', 'Anfield', '#GuacWorld', '#KittenBowl', '#CapsKnights', 'Harry Kane', 'Espanyol', '#CRYNEW', 'Eric Dier', '#SportyBaking', '#cxworlds', '#tphonline', 'Reince Priebus', '#LIVTOT', '#THFC', '#BestWayToHelpOthers', 'Daniel Nations', '#RosaParks', '#JanetJacksonAppreciationDay', 'Leon Panetta', '#BlackMenSmilling', 'HOSEOK DAY', '#theboyzselcaday', 'Go Eagles', '#SundayFutures', '#4Feb', '#PeopleSkills', '#HackLearning', 'Umtiti', 'Amtrak', '#ThisWeek', 'Scot McCloughan', '#ThingsILearnedFromSports'}


Intersection of world and US,
as you see it's a much smaller set.
And here we have an intersection of San Diego and US,
it's larger because ya know,
US affects locally San Diego as well.
So there we go, great.
How about if we want to search Twitter
for a particular topic or hashtag?
Let's move onto the next section to find out.

## Example 5. Collecting search results

Set the variable `q` to a trending topic, 
or anything else for that matter. The example query below
was a trending topic when this content was being developed
and is used throughout the remainder of this chapter
<br><br>
So there is a search that tweets under the Twitter API
so we can search for tweets.
And one of the arguments for it is the topic.
And the other argument is the count of tweets we want back
for the function to return.
And we store that in a variable called number.
So let's see what we get as statuses,

In [13]:
q = '#MTVAwards' 

number = 100

# See https://dev.twitter.com/docs/api/1.1/get/search/tweets

search_results = twitter_api.search.tweets(q=q, count=number)

statuses = search_results['statuses']

In [14]:
len(statuses)
print(statuses)

[{'created_at': 'Sun Feb 04 15:24:51 +0000 2018', 'id': 960172304008339457, 'id_str': '960172304008339457', 'text': 'RT @MTV: Let @dylanobrien guide you through a first look at Maze Runner: The Death Cure, exclusively for the #MTVAwards tonight at 8/7c! 💥…', 'truncated': False, 'entities': {'hashtags': [{'text': 'MTVAwards', 'indices': [109, 119]}], 'symbols': [], 'user_mentions': [{'screen_name': 'MTV', 'name': 'MTV', 'id': 2367911, 'id_str': '2367911', 'indices': [3, 7]}, {'screen_name': 'dylanobrien', 'name': "Dylan O'Brien", 'id': 281766200, 'id_str': '281766200', 'indices': [13, 25]}], 'urls': []}, 'metadata': {'iso_language_code': 'en', 'result_type': 'recent'}, 'source': '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 2589360386, 'id_str': '2589360386', 'name': 'hjyb

Twitter often returns duplicate results, we can filter them out checking for duplicate texts:
So because Twitter often returns
duplicate records on a subject,
we need to use a for loop to clean the data here
and create a slice of the data called,
in this case for the unique statuses,
so we'll call that statuses.
So what we are doing here is for each status,
for each s,
if the text is not in all the text,
we are keeping all the text.
And if the same text of the tweet,
which is the tweet message,
is not in already in that all_text
that we are keeping track of,
we are going to append that to filtered_statuses.
And in the end when we are done with this for loop,
we'll assign this filtered_statuses to statuses
that object we had as the response from Twitter before.

In [15]:
all_text = []
filtered_statuses = []
for s in statuses:
    if not s["text"] in all_text:
        filtered_statuses.append(s)
        all_text.append(s["text"])
statuses = filtered_statuses     

In [1]:
len(statuses)
len(all_text)

NameError: name 'statuses' is not defined

In [26]:
[s['text'] for s in search_results['statuses']]

['RT @MTV: Let @dylanobrien guide you through a first look at Maze Runner: The Death Cure, exclusively for the #MTVAwards tonight at 8/7c! 💥…',
 'Me pure years ago 🤣🤣🤣 #Mtvawards #ladyboss 🤣🤣🤣🤣 https://t.co/0JZB73ifg0',
 'RT @MTV: Let @dylanobrien guide you through a first look at Maze Runner: The Death Cure, exclusively for the #MTVAwards tonight at 8/7c! 💥…',
 'RT @LatinasWinning: "¡My gente latina stand up!" -  @Camila_Cabello #MTVAwards https://t.co/mIeB30c3ec',
 'RT @BellaTwins: My 40 year old ❤ #MTVAwards https://t.co/M3MwwHaN9a',
 'RT @EW: #HiddenFigures wins Best Fight Against the System at the #MTVAwards! https://t.co/UKX49xAMag',
 'RT @Kabeerisgod: #MTVAwards\nStop dances and songs and pray for complete salvation\nhttps://t.co/RPLANoFnu6 https://t.co/n5cVHVBBcn',
 '@RCARecords Thanks For Finally Putting The Proper Time &amp; Money Into @Tinashe Now @mike_nazzaro Make Sure @Tinashe P… https://t.co/F1WrsjHDus',
 "RT @tylergposey: I'm taking over @mtv's snapchat account from the

In [27]:
# Show one sample search result by slicing the list...
print(json.dumps(statuses[0], indent=1))

{
 "created_at": "Sun Feb 04 15:24:51 +0000 2018",
 "id": 960172304008339457,
 "id_str": "960172304008339457",
 "text": "RT @MTV: Let @dylanobrien guide you through a first look at Maze Runner: The Death Cure, exclusively for the #MTVAwards tonight at 8/7c! \ud83d\udca5\u2026",
 "truncated": false,
 "entities": {
  "hashtags": [
   {
    "text": "MTVAwards",
    "indices": [
     109,
     119
    ]
   }
  ],
  "symbols": [],
  "user_mentions": [
   {
    "screen_name": "MTV",
    "name": "MTV",
    "id": 2367911,
    "id_str": "2367911",
    "indices": [
     3,
     7
    ]
   },
   {
    "screen_name": "dylanobrien",
    "name": "Dylan O'Brien",
    "id": 281766200,
    "id_str": "281766200",
    "indices": [
     13,
     25
    ]
   }
  ],
  "urls": []
 },
 "metadata": {
  "iso_language_code": "en",
  "result_type": "recent"
 },
 "source": "<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter for Android</a>",
 "in_reply_to_status_id": null,
 "in_reply_to_statu

In [37]:
# The result of the list comprehension is a list with only one element that
# can be accessed by its index and set to the variable t
t = statuses[0]
#[ status for status in statuses 
#          if status['id'] == 316948241264549888 ][0]

# Explore the variable t to get familiarized with the data structure...

print(t['retweet_count'])
print(t['retweeted'])
print(t['created_at'])


8594
False
Sun Feb 04 15:24:51 +0000 2018


## Example 6. Extracting text, screen names, and hashtags from tweets


We'll use again the text screen names and hashtags
for all these records and we'll assign them to lists.
We'll call the first list status_texts.

In [29]:
status_texts = [ status['text'] 
                 for status in statuses ]

screen_names = [ user_mention['screen_name'] 
                 for status in statuses
                     for user_mention in status['entities']['user_mentions'] ]

hashtags = [ hashtag['text'] 
             for status in statuses
                 for hashtag in status['entities']['hashtags'] ]

# Compute a collection of all words from all tweets by using split() function
words = [ w 
          for t in status_texts 
              for w in t.split() ]

In [30]:
# Explore the first 5 items for each...

print(json.dumps(status_texts[0:5], indent=1))
print(json.dumps(screen_names[0:5], indent=1)) 
print(json.dumps(hashtags[0:5], indent=1))
print(json.dumps(words[0:5], indent=1))

[
 "RT @MTV: Let @dylanobrien guide you through a first look at Maze Runner: The Death Cure, exclusively for the #MTVAwards tonight at 8/7c! \ud83d\udca5\u2026",
 "Me pure years ago \ud83e\udd23\ud83e\udd23\ud83e\udd23 #Mtvawards #ladyboss \ud83e\udd23\ud83e\udd23\ud83e\udd23\ud83e\udd23 https://t.co/0JZB73ifg0",
 "RT @LatinasWinning: \"\u00a1My gente latina stand up!\" -  @Camila_Cabello #MTVAwards https://t.co/mIeB30c3ec",
 "RT @BellaTwins: My 40 year old \u2764 #MTVAwards https://t.co/M3MwwHaN9a",
 "RT @EW: #HiddenFigures wins Best Fight Against the System at the #MTVAwards! https://t.co/UKX49xAMag"
]
[
 "MTV",
 "dylanobrien",
 "LatinasWinning",
 "Camila_Cabello",
 "BellaTwins"
]
[
 "MTVAwards",
 "Mtvawards",
 "ladyboss",
 "MTVAwards",
 "MTVAwards"
]
[
 "RT",
 "@MTV:",
 "Let",
 "@dylanobrien",
 "guide"
]


In the next code cell, we just the JSON dumps
to display the first five items for each list.
So please check out the output
to get familiar with these lists.
So the first one will give us the status texts
that we just sliced out.
The second one will give us the screen names
that are mentioned.
The fourth one will us the hashtags.
And remember we searched for MTV awards
but there are other hashtags related to these tweets as well
so we also get those ones.
And the last one here will give us the words
in some of these tweets.
Next we'll find the frequencies of these words.


## Example 7. Creating a basic frequency distribution from the words in tweets

Python offers a counter class for the purpose
of counting the number of items in each collection.
We will use it to count the frequencies of all
the words in the Tweets.
That gives us the commonly used words.
Here we have the four item
in words, screen names, hashtags.
All those lists we created.
And we are iterating
over the counter,
to come up with the most common words for each of them.
So we create a counter with the item, first words,
then screen names then hashtags.
And we are printing them.
For the most common 10 words.
Let's display this.

In [31]:
from collections import Counter

for item in [words, screen_names, hashtags]:
    c = Counter(item)
    print(c.most_common()[:10]) # top 10
    print()

[('RT', 39), ('#MTVAwards', 32), ('the', 21), ('&amp;', 12), ('de', 12), ('a', 10), ('Movie', 10), ('TV', 10), ('at', 9), ('and', 9)]

[('SGNewsSpain', 9), ('MTV', 6), ('Camila_Cabello', 4), ('EmmaWatson', 4), ('billboard', 4), ('Tinashe', 2), ('tylergposey', 2), ('hsmnews', 2), ('_franciscompany', 2), ('katherinelchile', 2)]

[('MTVAwards', 36), ('SelenaBBMAs', 9), ('mtvawards', 6), ('ToyotaCHR', 2), ('Mtvawards', 1), ('ladyboss', 1), ('HiddenFigures', 1), ('pitbull', 1), ('topshot', 1), ('MTVAWARDS', 1)]



So, we have definitely the RT and MTVAwards,
the first one in the words.
The second was for screen names,
these are some common users,
or Twitter screen names.
And the MTVAwards
and other hashtags are also represented.
You see also here,
MTVAwards is displayed
a few times in different
character settings.
There's the upper case, lower case, combinations there.
They are all listed separately
because we didn't do a data cleaning. Turn the data into all caps or all lower cases.
So they'll definitely be counted separately.
<br><br>
It's like a dictionary,
by slicing it with one of the words,
we can also check its output, for example.
So the counter works fine.
As you see this is the most common.
And we just print it.
But the output is still hard to read.
It is a list of tuples so it has a lot of syntax
like parenthesis
and the rectangular parenthesis,
square parenthesis, codes, and things like that.
There's actually a better way to do it in Python,
using advanced string formatting.
And we'll use that in the next example.

## Example 8. Create a prettyprint function to display tuples in a nice tabular format

We are using the padding option
in advanced string formatting.
You can format a string to a specific length
using a format string like we do for here,
the 20.
It's gonna be definitely printing 20.
And here, we have
the left alignment
and here we have the right alignment.
And this is centered.
So,
let's go through this,
we are creating a pretty print function.
We specified in this if we want to center align
or right align or things like that, that we are printing. The technical word is prettyprinting.


In [32]:
def prettyprint_counts(label, list_of_tuples):
    print("\n{:^20} | {:^6}".format(label, "Count"))
    print("*"*40)
    for k,v in list_of_tuples:
        print("{:20} | {:>6}".format(k,v))

In [33]:
for label, data in (('Word', words), 
                    ('Screen Name', screen_names), 
                    ('Hashtag', hashtags)):
    
    c = Counter(data)
    prettyprint_counts(label, c.most_common()[:10])


        Word         | Count 
****************************************
RT                   |     39
#MTVAwards           |     32
the                  |     21
&amp;                |     12
de                   |     12
a                    |     10
Movie                |     10
TV                   |     10
at                   |      9
and                  |      9

    Screen Name      | Count 
****************************************
SGNewsSpain          |      9
MTV                  |      6
Camila_Cabello       |      4
EmmaWatson           |      4
billboard            |      4
Tinashe              |      2
tylergposey          |      2
hsmnews              |      2
_franciscompany      |      2
katherinelchile      |      2

      Hashtag        | Count 
****************************************
MTVAwards            |     36
SelenaBBMAs          |      9
mtvawards            |      6
ToyotaCHR            |      2
Mtvawards            |      1
ladyboss             |      1
Hidd

## Example 9. Finding the most popular retweets

In [34]:
retweets = [
            # Store out a tuple of these three values ...
            (status['retweet_count'], 
             status['retweeted_status']['user']['screen_name'],
             status['text'].replace("\n","\\")) 
            
            # ... for each status ...
            for status in statuses 
            
            # ... so long as the status meets this condition.
                if 'retweeted_status' in status
           ]

We can build another `prettyprint` function to print entire tweets with their retweet count.

We also want to split the text of the tweet in up to 3 lines, if needed.

In [35]:
row_template = "{:^7} | {:^15} | {:50}"
def prettyprint_tweets(list_of_tuples):
    print()
    print(row_template.format("Count", "Screen Name", "Text"))
    print("*"*60)
    for count, screen_name, text in list_of_tuples:
        print(row_template.format(count, screen_name, text[:50]))
        if len(text) > 50:
            print(row_template.format("", "", text[50:100]))
            if len(text) > 100:
                print(row_template.format("", "", text[100:]))

In [36]:
# Slice off the first 5 from the sorted results and display each item in the tuple

prettyprint_tweets(sorted(retweets, reverse=True)[:10])


 Count  |   Screen Name   | Text                                              
************************************************************
 14044  |       MTV       | RT @MTV: Don’t turn off the lights during this bra
        |                 | nd new clip of "IT" from the #MTVAwards airing rig
        |                 | ht now! https://t.co/2rTf7HcATO                   
 8594   |       MTV       | RT @MTV: Let @dylanobrien guide you through a firs
        |                 | t look at Maze Runner: The Death Cure, exclusively
        |                 |  for the #MTVAwards tonight at 8/7c! 💥…           
 6389   |   EmmaWatson    | RT @EmmaWatson: Thank you @MTV for a wonderful eve
        |                 | ning and thank you to everyone who voted for me! ❤
        |                 | ️🍿 #MTVAwards @beourguest                         
 4686   |       MTV       | RT @MTV: Thank you for your beautiful Best Actor i
        |                 | n a Movie acceptance speech at the #MTVAw

In order to get more familiar with the Twitter API,
try to execute all of the notebook again.
But this time, changing the location of the local trends
to something other than San Diego.
Maybe your own location.
And change the hashtag in example five
into another topic you're interested in.
If it's a topic you're interested in,
you'll likely find some interesting scenarios
to analyze in that topic.
And maybe use some of this data to do bag of words
and identify further topics
or sentiments related to that topic.
Or do some other classification algorithms.