<DIV ALIGN=CENTER>

# Introduction to Twitter data processing
## Professor Robert J. Brunner
  
</DIV>  
-----
-----

## Introduction

Previously in this course, we have discussed .

In this IPython Notebook, we explore using .

-----

## Python and Twitter

Use library to interact with Twitter API from within a Python program. Many to choose from, we will use tweepy.

http://docs.tweepy.org/en/stable/

First step is to import the library.

----

In [1]:
import tweepy as tw

-----

## Reading Twitter Data

To read twitter data, you need to first need to be a registered Twitter
user and you need to create a new _Twitter Application_ in order to
obtain credentials for connecting to Twitter and querying to the
Twitter data. You create (and later manage) Twitter applications by
visting the [Twitter Application Management](https://apps.twitter.com)
website.

![Twitter App Sign-in](images/twitter-app-signin.png)

At this point you need to authenticate with Twitter, if you are already
logged in to Twitter on your computer (for instance by using the Twitter
website) you should already be authenticated. If you are not
authenticated, click the _sign in_ link to be directed to the Twitter
signin page where you can enter your credentials (if you do not have
Twitter credentials, you will need to obtain a Twitter account to
proceed).

![Twitter Sign-in](images/twitter-signin.png)

After you have been authenticated, you will be redirected to the Twitter
apps page. If you have never created a Twitter application, you will
have nothing listed. To create a new application, press the _Create New
App_ button, as shown in the following screenshot.

![Twitter Create App](images/twitter-create.png)

This will open up the Twitter _Create an application_ webpage, where you
need to supply some basic information for your Twitter application such
as an application name, description, and website.

![Twitter Application details](images/twitter-appdetails.png)

Scroll to the bottom of this webpage where the **Developer Agreement**
is located. Following this agreement, is a check box that you should
click to signify you agree to be bound by the agreement (of course you
should read this to be sure you do _agree_ with it first). Following
this, press the _Create your Twitter application_ button as shown in the
following screenshot.

![Twitter Agree](images/twitter-agree.png)

This will create your new application, and provide you with your
application webpage, which will be similar to the following screenshot.

![Twitter Apppage](images/twitter-apppage.png)

While you can control a number of application features from this
webpage, the most important tasks to complete include:

1. Change your application to _read-only_ in case it is set to
read-write.

2. Obtain the application **Consumer Key** and **Consumer Secret**.

3. Obtain your personal **Access Token** and **Access Token Secret**.

You should change your application read-only to ensure you don't
accidentally send data out to Twitter. You change this by selecting the
_Permissions_ tab and selecting _Read only_, shown in the following
screenshot. To save this setting, scroll down this webpage and click the
_Update Settings_ button at the bottom of the page.

![Twitter Read Only Setting](images/twitter-ro.png)

These credentials can be found by selecting the _Keys and Access Tokens_
tab, and scrolling down appropriately as shown in the following two
screenshots.

![Twitter Consumer Application Credentials](images/twitter-consume.png)

![Twitter User Credentials](images/twitter-access.png)

<font color='red'>Warning: Never share these credentials with others or
they will be able to fully impersonate you on Twitter!</font>

You can directly copy these credentials into your Notebook, or,
alternatively, save them into a file (for example by opening a terminal
window and using `vim` to create a text file. In the rest of this
Notebook, I demonstrate this functionality by using my credentials,
which I have saved into a file called`twitter.cred'. In this empty file,
which is in your github repository, I have saved the following four
credentials in order:

1. Access Token
2. Access Token Secret
3. Consumer Key
4. Consumer Secret

The following code cell demonstrates how these credentials are read from
the file and used to properly authenticate our application with Twitter.

-----



In [2]:
tokens = []

# Order: Access Token, Access Token Secret, Consumer Key, Consumer SecretAccess

with open("twitter.cred", 'r') as fin:
    for line in fin:
        if line[0] != '#': # Not a comment line
            tokens.append(line.rstrip('\n'))

auth = tw.OAuthHandler(tokens[2], tokens[3])
auth.set_access_token(tokens[0], tokens[1])

api = tw.API(auth)

user = api.me()

print("Twitter Screen Name: ", user.screen_name)
print("Twitter Follower Count: ", user.followers_count)

print("\nThis user follows:\n--------------")
for friend in user.friends():
    print(friend.screen_name)

Twitter Screen Name:  ProfBrunner
Twitter Follower Count:  138

This user follows:
--------------
LauraFrerichs
googleresearch
jakevdp
powersoffour
NateSilver538
flowingdata
twiecki
EdwardTufte
wesmckinn
fonnesbeck
aebrunn
jarrettebrunner


-----

If the following code cell runs without an error, you have successfully
connected to twitter. If you are new to twitter and are not following
anyone, you can instead display the user information for a different
Twitter user. For example, the following code would display my Twitter
information.

```python
user = api.get_user('ProfBrunner')
```

Replacing `ProfBrunner` with any valid Twitter user id will display
their information. You can find examples by looking at those Twitter
users you (or `ProfBrunner`) follow.

At any point, you can return to your Twitter application management
webpage to view your new application. You can now view and manage your
existing application, or create a new application as shown in the
following screenshot.

![Twitter new app management](images/twitter-manage.png)


-----

## Breakout Session

During this breakout, you should work through the previous Twitter
application setup in order to better learn how Twitter and in particular
the Tweepy Python library works. In addition, this will guarantee you
can follow along with the rest of this Notebook. Specific problems you
can attempt include the following:

1. Create a New Twitter application.

2. Save your Twitter credentials and Application credentials into the
provided `twitter.cred` file.

3. Run the _tweepy_ sample code to connect to Twitter and display your
Twitter user information.

Additional, more advanced problems:

1. Run the Twitter example above but for `ProfBrunner` instead.

2. Find the Twitter username for someone else (perhaps someone famous,
or someone else you know) and run the example code using their name.

-----

### Obtaining Tweets

-----

In [3]:
for status in tw.Cursor(api.home_timeline).items(10):
    # Process a single status
    print(status.text) 

RT @JR_Newt: The Malcolm Butler Interception | Do Your Job: Bill Belichick &amp; the 2014... https://t.co/tEXo2kvVvP
RT @StanfordGreg: A simple coin toss may prove that basketball players really can get the hot hand http://t.co/FBmjuIACj9 //this came up in…
Open data in use. Predicting areas in need of working fire alarms to prevent deaths http://t.co/wAwusiSo3g http://t.co/Ka7ynTZlJe
RT @ClouderaEvents: Learn about scaling #python analytics on @RideImpala w/ @wesmckinn #StrataHadoop NY today at 11:20AM rm: 1 E8/1 E9
Garrison Keillor, reading one's own #writing aloud reduces BS
http://t.co/jaGiA0OvwI #teaching #designthinking http://t.co/9OfJ4huh4E
RT @ChicagoBlueSky: Incubators aren't just for big cities. Downstate Streator has started one. via @KateMacArthur http://t.co/OSJ6o8nDMX ht…
How much has the email scandal hurt Hillary Clinton? (Not an easy question but we got a good discussion going.) http://t.co/03CrZpC3ln
RT @ClevelandClinic: Why we share our clinical outcomes with the 🌎 

-----

Now lets lookup something

-----

In [4]:
term ='UIResearchPark'

messages = []

current_page = 1
max_pages = 2

while(current_page <= max_pages):
    tweets = api.search(term, rpp=5)
    for tweet in tweets:
        messages.append(tweet)
    current_page += 1

for message in messages:
    print("Tweet ID:", message.id)
    print('Tweeted by ', message.user.screen_name)
    print("Created at ",message.created_at)
    print("Location: ",message.source)
    print('Tweet Text: ', message.text)
    print('-------------------------')


Tweet ID: 649228565637152768
Tweeted by  UIResearchPark
Created at  2015-09-30 14:25:26
Location:  Hootsuite
Tweet Text:  RT @Inc: 5 Surprising Applications for Virtual Reality @Tess_Townsend http://t.co/B6YMN1yC1U
-------------------------
Tweet ID: 649222718206910464
Tweeted by  jmitch171
Created at  2015-09-30 14:02:12
Location:  Twitter for iPhone
Tweet Text:  @uiucbusiness @iVenture_UofI @TECenter @UIResearchPark @DardenMBA @UIMakerLab @IlliniBizDean  You're on your own, bootstrappers...😥😕
-------------------------
Tweet ID: 649216033354555392
Tweeted by  KateMacArthur
Created at  2015-09-30 13:35:38
Location:  Twitter for iPad
Tweet Text:  @LauraFrerichs @ChicagoBlueSky @StreatorInc @UIResearchPark Great to see incubators helping each other.
-------------------------
Tweet ID: 649213128648052736
Tweeted by  vprillinois
Created at  2015-09-30 13:24:06
Location:  Twitter Web Client
Tweet Text:  @StreatorInc credits @Illinois_Alma @UIResearchPark for helping launch startup incubator

-----

We can view the available attributes to display by using Python `dir`
method to perform introspection. In the following code cell we
explicitly remove _class_ methods to minimize the display list and focus
on the items of interest.

-----

In [5]:
[att for att in dir(message) if '__' not in att]

['_api',
 '_json',
 'author',
 'contributors',
 'coordinates',
 'created_at',
 'destroy',
 'entities',
 'favorite',
 'favorite_count',
 'favorited',
 'geo',
 'id',
 'id_str',
 'in_reply_to_screen_name',
 'in_reply_to_status_id',
 'in_reply_to_status_id_str',
 'in_reply_to_user_id',
 'in_reply_to_user_id_str',
 'is_quote_status',
 'lang',
 'metadata',
 'parse',
 'parse_list',
 'place',
 'possibly_sensitive',
 'retweet',
 'retweet_count',
 'retweeted',
 'retweets',
 'source',
 'source_url',
 'text',
 'truncated',
 'user']

In [6]:
# We can display the message data in JSON format
message._json

{'contributors': None,
 'coordinates': None,
 'created_at': 'Tue Sep 29 17:35:13 +0000 2015',
 'entities': {'hashtags': [],
  'media': [{'display_url': 'pic.twitter.com/Nfd6zLLgAO',
    'expanded_url': 'http://twitter.com/LauraFrerichs/status/648913938298601472/photo/1',
    'id': 648913931537289216,
    'id_str': '648913931537289216',
    'indices': [116, 138],
    'media_url': 'http://pbs.twimg.com/media/CQFnyFsVEAA_Oy2.jpg',
    'media_url_https': 'https://pbs.twimg.com/media/CQFnyFsVEAA_Oy2.jpg',
    'sizes': {'large': {'h': 768, 'resize': 'fit', 'w': 1024},
     'medium': {'h': 450, 'resize': 'fit', 'w': 600},
     'small': {'h': 255, 'resize': 'fit', 'w': 340},
     'thumb': {'h': 150, 'resize': 'crop', 'w': 150}},
    'type': 'photo',
    'url': 'http://t.co/Nfd6zLLgAO'}],
  'symbols': [],
  'urls': [],
  'user_mentions': [{'id': 175415640,
    'id_str': '175415640',
    'indices': [100, 115],
    'name': 'UIUC Research Park',
    'screen_name': 'UIResearchPark'}]},
 'favorite_c

## Saving Tweets

We can save tweets into MongoDB.

-----

In [7]:
from pymongo import MongoClient

# Establish a connection to MongoDB (uncomment only one of these lines)

# For remote course server use
#client = MongoClient("mongodb://10.0.3.126:27017")

# For local Docker server use
client = MongoClient("mongodb://localhost:27017")

In [8]:
# We will delete our working directory if it exists before recreating.

dbname = 'tweet-database'
if  dbname in client.database_names():
    client.drop_database(dbname)

db = client[dbname]
tweets = db['tweets']

for message in messages:
    result = tweets.insert(message.__dict__['_json'])


In [9]:
print("Number of tweets = ", tweets.count())

Number of tweets =  30


In [10]:
tweet = tweets.find_one()

print("Tweet ID:", tweet['id'])
print('Tweeted by ', tweet['user']['screen_name'])
print("Created at ", tweet['created_at'])
print("Location: ", tweet['source'])
print('Tweet Text: ', tweet['text'])

Tweet ID: 649228565637152768
Tweeted by  UIResearchPark
Created at  Wed Sep 30 14:25:26 +0000 2015
Location:  <a href="http://www.hootsuite.com" rel="nofollow">Hootsuite</a>
Tweet Text:  RT @Inc: 5 Surprising Applications for Virtual Reality @Tess_Townsend http://t.co/B6YMN1yC1U


In [11]:
for tweet in tweets.find({"retweet_count": {'$gte': 5}}).sort('_id'):
    print("Tweet ID:", tweet['id'])
    print(tweet['text'])
    print('-------------------------')

Tweet ID: 649203518574870528
RT @LauraFrerichs: Tim Hassinger, CEO of @DowAgro at grand opening of new Innovation Lab @UIResearchPark "it's a special relationship" http…
-------------------------
Tweet ID: 649203518574870528
RT @LauraFrerichs: Tim Hassinger, CEO of @DowAgro at grand opening of new Innovation Lab @UIResearchPark "it's a special relationship" http…
-------------------------


-----
## Breakout Session

During this breakout, you should work to integrate your Twitter application with a MongoDB (you probably want to create your own MongoDB database and collection). Specific problems you can attempt include the following:

1. 

2. 

Additional, more advanced problems:

1.

-----

-----
### Additional References

2. 

-----
 

### Return to the [Week Three](index.ipynb) index.

-----