<DIV ALIGN=CENTER>

# Introduction to Twitter data processing
## Professor Robert J. Brunner
  
</DIV>  
-----
-----

## Introduction

When looking for data to use for text data processing, one of the more
popular data sources is [Twitter][tw]. In this Notebook, we introduce
the Twitter API, and demonstrate how to use the Twitter API from within
a Python program to acquire and process tweets, or Twitter messages.

-----
[tw]: https://www.twitter.com

## Python and Twitter

To work with the Twitter API from within a Python program, we need a
Python library that wraps the official [Twitter API][twapi]. There are a
number of different Python libraries that provide this capability, we
will use the [tweepy][tpy] library, which is fairly popular and provides
a fairly complete interface.

The full Twitter API is large and robust (and continuous to evolve),
for this course we will restrict our attention to several basic
concepts, namely authenticating to Twitter, searching for Tweets, and
digesting the messages.

----
[twapi]: https://dev.twitter.com
[tpy]: http://www.tweepy.org

In [1]:
import tweepy as tw

-----

## Reading Twitter Data

To read twitter data, you need to first need to be a registered Twitter
user and you need to create a new _Twitter Application_ in order to
obtain credentials for connecting to Twitter and querying to the
Twitter data. You create (and later manage) Twitter applications by
visting the [Twitter Application Management](https://apps.twitter.com)
website.

![Twitter App Sign-in](images/twitter-app-signin.png)

At this point you need to authenticate with Twitter, if you are already
logged in to Twitter on your computer (for instance by using the Twitter
website) you should already be authenticated. If you are not
authenticated, click the _sign in_ link to be directed to the Twitter
signin page where you can enter your credentials (if you do not have
Twitter credentials, you will need to obtain a Twitter account to
proceed).

![Twitter Sign-in](images/twitter-signin.png)

After you have been authenticated, you will be redirected to the Twitter
apps page. If you have never created a Twitter application, you will
have nothing listed. To create a new application, press the _Create New
App_ button, as shown in the following screenshot.

![Twitter Create App](images/twitter-create.png)

This will open up the Twitter _Create an application_ webpage, where you
need to supply some basic information for your Twitter application such
as an application name, description, and website.

![Twitter Application details](images/twitter-appdetails.png)

Scroll to the bottom of this webpage where the **Developer Agreement**
is located. Following this agreement, is a check box that you should
click to signify you agree to be bound by the agreement (of course you
should read this to be sure you do _agree_ with it first). Following
this, press the _Create your Twitter application_ button as shown in the
following screenshot.

![Twitter Agree](images/twitter-agree.png)

This will create your new application, and provide you with your
application webpage, which will be similar to the following screenshot.

![Twitter Apppage](images/twitter-apppage.png)

While you can control a number of application features from this
webpage, the most important tasks to complete include:

1. Change your application to _read-only_ in case it is set to
read-write.

2. Obtain the application **Consumer Key** and **Consumer Secret**.

3. Obtain your personal **Access Token** and **Access Token Secret**.

You should change your application read-only to ensure you don't
accidentally send data out to Twitter. You change this by selecting the
_Permissions_ tab and selecting _Read only_, shown in the following
screenshot. To save this setting, scroll down this webpage and click the
_Update Settings_ button at the bottom of the page.

![Twitter Read Only Setting](images/twitter-ro.png)

These credentials can be found by selecting the _Keys and Access Tokens_
tab, and scrolling down appropriately as shown in the following two
screenshots.

![Twitter Consumer Application Credentials](images/twitter-consume.png)

![Twitter User Credentials](images/twitter-access.png)

<font color='red'>Warning: Never share these credentials with others or
they will be able to fully impersonate you on Twitter!</font>

You can directly copy these credentials into your Notebook, or,
alternatively, save them into a file (for example by opening a terminal
window and using `vim` to create a text file. In the rest of this
Notebook, I demonstrate this functionality by using my credentials,
which I have saved into a file called`twitter.cred'. In this empty file,
which is in your github repository, I have saved the following four
credentials in order:

1. Access Token
2. Access Token Secret
3. Consumer Key
4. Consumer Secret

The following code cell demonstrates how these credentials are read from
the file and used to properly authenticate our application with Twitter.

-----



In [2]:
tokens = []

# Order: Access Token, Access Token Secret, Consumer Key, Consumer SecretAccess

with open("twitter.cred", 'r') as fin:
    for line in fin:
        if line[0] != '#': # Not a comment line
            tokens.append(line.rstrip('\n'))

auth = tw.OAuthHandler(tokens[2], tokens[3])
auth.set_access_token(tokens[0], tokens[1])

api = tw.API(auth)

user = api.me()

print("Twitter Screen Name: ", user.screen_name)
print("Twitter Follower Count: ", user.followers_count)

print("\nThis user follows:\n--------------")
for friend in user.friends():
    print(friend.screen_name)

Twitter Screen Name:  ProfBrunner
Twitter Follower Count:  138

This user follows:
--------------
LauraFrerichs
googleresearch
jakevdp
powersoffour
NateSilver538
flowingdata
twiecki
EdwardTufte
wesmckinn
fonnesbeck
aebrunn
jarrettebrunner


-----

If the following code cell runs without an error, you have successfully
connected to twitter. If you are new to twitter and are not following
anyone, you can instead display the user information for a different
Twitter user. For example, the following code would display my Twitter
information.

```python
user = api.get_user('ProfBrunner')
```

Replacing `ProfBrunner` with any valid Twitter user id will display
their information. You can find examples by looking at those Twitter
users you (or `ProfBrunner`) follow.

At any point, you can return to your Twitter application management
webpage to view your new application. You can now view and manage your
existing application, or create a new application as shown in the
following screenshot.

![Twitter new app management](images/twitter-manage.png)


-----

## Breakout Session

During this breakout, you should work through the previous Twitter
application setup in order to better learn how Twitter and in particular
the Tweepy Python library works. In addition, this will guarantee you
can follow along with the rest of this Notebook. Specific problems you
can attempt include the following:

1. Create a New Twitter application.

2. Save your Twitter credentials and Application credentials into the
provided `twitter.cred` file.

3. Run the _tweepy_ sample code to connect to Twitter and display your
Twitter user information.

Additional, more advanced problems:

1. Run the Twitter example above but for `ProfBrunner` instead.

2. Find the Twitter username for someone else (perhaps someone famous,
or someone else you know) and run the example code using their name.

-----

### Obtaining Tweets

Once you have authenticated with Twitter, you can begin to [search the
Twitter stream][stw] for tweets of interest. The easiest method to get started
is to being with your own (or another specific Twitter user's) own
Twitter feed. To access your own Twitter feed, you can simply use your
`home_timeline` to retrieve your own Tweets or Tweets from those whom
you follow. This is demonstrated in the following code cell, where we
display the `text` values from the ten most recent Tweets from our
timeline.

-----
[stw]: https://dev.twitter.com/rest/public/search

In [3]:
for status in tw.Cursor(api.home_timeline).items(10):
    # Process a single status
    print(status.text) 

Welcome to Buffalo for @AURP Research Park conference kicks off @DigBuffalo coworking space, healthcare tech booming http://t.co/x165NzWxp7
It suddenly got really cold. Not happy about this.
4.4% =
a) Patriots' chances of going undefeated? -or-
b) Trump's chances of winning the GOP nomination?
http://t.co/LAld48d4eD
RT @PyTennessee: The CFP opens tomorrow! Learn more at https://t.co/IBFogCMTZn #pytn2016
RT @cjam: Whoa, man #superbloodmoon http://t.co/oH5eneEm0O http://t.co/xnj1RLOL8h
RT @twitter: Today @Snowden joined Twitter, and here's the world's response. http://t.co/d6HgVvdRsf
"The day we buried Brady"
Excellent debunking of 
sports media panic and
microscopic attention span.
By . . . ESPN!
http://t.co/5zn1jtx2MQ
EFF, Privacy ratings for major companies.
Telcos flunk.
Apple, DropBox, Adobe doing well.
https://t.co/dWEUuezqGS http://t.co/saIKmaxi9I
Huge lineup for @tlipcon talk on @getkudu #StrataHadoop http://t.co/DvGZqwwMda
Former intern Alessandro Epasto shares his experiences w

-----

### Searching

Twitter also provides the capability to search for specific tweets by
using the Tweepy [`search` method][twse]. In this method, you supply a
query string (and optional arguments) and are returned a list of Tweets.
The query string should follow the [Twitter Search API][tsa], but
basically you can search for specific text in a string by using the text
of interest, you can search for a person by using the `@` character
followed by their Twitter username, and hashtags by using the `#`
character followed by the tag text.

-----

[twse]: http://docs.tweepy.org/en/stable/api.html#API.search
[tsa]: https://dev.twitter.com/rest/public/search

In [4]:
term ='UIResearchPark'

messages = []

current_page = 1
max_pages = 2

while(current_page <= max_pages):
    tweets = api.search(term, rpp=5)
    for tweet in tweets:
        messages.append(tweet)
    current_page += 1

for message in messages:
    print("Tweet ID:", message.id)
    print('Tweeted by ', message.user.screen_name)
    print("Created at ",message.created_at)
    print("Location: ",message.source)
    print('Tweet Text: ', message.text)
    print('-------------------------')


Tweet ID: 649308367060619264
Tweeted by  UIResearchPark
Created at  2015-09-30 19:42:32
Location:  Hootsuite
Tweet Text:  Join us on October 6 for Data Analytics After Hours: #networking event w/local companies http://t.co/fJfOa0U3tS http://t.co/23F0sDYhQN
-------------------------
Tweet ID: 649297680917639168
Tweeted by  kolegraffvclink
Created at  2015-09-30 19:00:05
Location:  Twuffer
Tweet Text:  Are you an Entrepreneur at the University of Illinois in Champaign ? If so, you will want to look at this. See=&gt; https://t.co/TVMxlEdo9d
-------------------------
Tweet ID: 649296121949347840
Tweeted by  SusanEyman
Created at  2015-09-30 18:53:53
Location:  Twitter Web Client
Tweet Text:  RT @HealthTechForum: Oct 1 Did you know? @vprillinois @UICOUResearch @ILinnovations @UIResearchPark about the new HTF Chicago chapter?   ht…
-------------------------
Tweet ID: 649293399238012928
Tweeted by  HealthTechForum
Created at  2015-09-30 18:43:04
Location:  Hootsuite
Tweet Text:  Oct 1 Did you

-----

We can view the available attributes to display by using Python `dir`
method to perform introspection. In the following code cell we
explicitly remove _class_ methods to minimize the display list and focus
on the items of interest. After this, we display the Tweet in its raw
JSON format by accessing the `_json` attribute.

-----

In [5]:
[att for att in dir(message) if '__' not in att]

['_api',
 '_json',
 'author',
 'contributors',
 'coordinates',
 'created_at',
 'destroy',
 'entities',
 'favorite',
 'favorite_count',
 'favorited',
 'geo',
 'id',
 'id_str',
 'in_reply_to_screen_name',
 'in_reply_to_status_id',
 'in_reply_to_status_id_str',
 'in_reply_to_user_id',
 'in_reply_to_user_id_str',
 'is_quote_status',
 'lang',
 'metadata',
 'parse',
 'parse_list',
 'place',
 'possibly_sensitive',
 'retweet',
 'retweet_count',
 'retweeted',
 'retweets',
 'source',
 'source_url',
 'text',
 'truncated',
 'user']

In [6]:
# We can display the message data in JSON format
message._json

{'contributors': None,
 'coordinates': None,
 'created_at': 'Tue Sep 29 19:00:04 +0000 2015',
 'entities': {'hashtags': [],
  'symbols': [],
  'urls': [{'display_url': 'twitter.com/UIResearchPark',
    'expanded_url': 'https://twitter.com/UIResearchPark',
    'indices': [116, 139],
    'url': 'https://t.co/TVMxlEdo9d'}],
  'user_mentions': []},
 'favorite_count': 0,
 'favorited': False,
 'geo': None,
 'id': 648935289855983616,
 'id_str': '648935289855983616',
 'in_reply_to_screen_name': None,
 'in_reply_to_status_id': None,
 'in_reply_to_status_id_str': None,
 'in_reply_to_user_id': None,
 'in_reply_to_user_id_str': None,
 'is_quote_status': False,
 'lang': 'en',
 'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
 'place': None,
 'possibly_sensitive': False,
 'retweet_count': 0,
 'retweeted': False,
 'source': '<a href="http://twuffer.com" rel="nofollow">Twuffer</a>',
 'text': 'Are you an Entrepreneur at the University of Illinois in Champaign ? If so, you will want to 

## Persisting Tweets

Since Twitter enforces rate limits, any lage analysis of tweets will
likely need to persist the Twitter search results. We can build on the
[Introduction to MongoDB][i2mdb] Notebook to persist Twitter data in our
MongoDB. First, we will need to establish a connection to our MongoDB.
Next we will create a new database and collection to hold our tweets.
Finally, we will iterate through the tweets we retrieved in our earlier
search to populate the new collection. 

After building our tweet collection, we  perform several simple queries
to demonstrate the power of combining MongoDB and Twitter.


-----
[i2mdb]: intro2mongodb.ipynb

In [7]:
from pymongo import MongoClient

# Establish a connection to MongoDB (uncomment only one of these lines)

# For remote course server use
#client = MongoClient("mongodb://10.0.3.126:27017")

# For local Docker server use
client = MongoClient("mongodb://localhost:27017")

In [8]:
# We will delete our working directory if it exists before recreating.

dbname = 'tweet-database'
if  dbname in client.database_names():
    client.drop_database(dbname)

db = client[dbname]
tweets = db['tweets']

for message in messages:
    result = tweets.insert(message.__dict__['_json'])


In [9]:
print("Number of tweets = ", tweets.count())

Number of tweets =  30


In [10]:
tweet = tweets.find_one()

print("Tweet ID:", tweet['id'])
print('Tweeted by ', tweet['user']['screen_name'])
print("Created at ", tweet['created_at'])
print("Location: ", tweet['source'])
print('Tweet Text: ', tweet['text'])

Tweet ID: 649308367060619264
Tweeted by  UIResearchPark
Created at  Wed Sep 30 19:42:32 +0000 2015
Location:  <a href="http://www.hootsuite.com" rel="nofollow">Hootsuite</a>
Tweet Text:  Join us on October 6 for Data Analytics After Hours: #networking event w/local companies http://t.co/fJfOa0U3tS http://t.co/23F0sDYhQN


In [11]:
for tweet in tweets.find({"retweet_count": {'$gte': 5}}).sort('_id'):
    print("Tweet ID:", tweet['id'])
    print(tweet['text'])
    print('-------------------------')

Tweet ID: 649203518574870528
RT @LauraFrerichs: Tim Hassinger, CEO of @DowAgro at grand opening of new Innovation Lab @UIResearchPark "it's a special relationship" http…
-------------------------
Tweet ID: 649203518574870528
RT @LauraFrerichs: Tim Hassinger, CEO of @DowAgro at grand opening of new Innovation Lab @UIResearchPark "it's a special relationship" http…
-------------------------


-----
## Breakout Session

During this breakout, you should work to integrate your Twitter
application with a MongoDB (you probably want to create your own MongoDB
database and collection). Specific problems you can attempt include the
following:

1. Pick a Twitter user and obtain one hundred of their tweets, insert
these into your MongoDB database.

2. Query your Twitter database to retrieve any tweets that were setn
between 12:00 pm and 6:00 pm on any day.

Additional, more advanced problems:

1. Write a Twitter application that retrieves chunks of tweets for a
particular user or hashtag. In order to circumvent rate limits, keep the
chunk retrieval below your rate limit by employing `sleep` in your code.
Insert these tweets continuously into MongoDB.

-----

-----
### Additional References

2. [Twitter Developer][twd] Documentation
21. Twitter {REST API][twdr] Documentation
34. [Tweepy][twpy] Documentation

-----
 
[twd]: https://dev.twitter.com/overview/documentation
[twdr]: https://dev.twitter.com/rest/public
[twpy]: http://docs.tweepy.org/en/stable/index.html

### Return to the [Week Three](index.ipynb) index.

-----