# Twitter Example   
## ACE Cluster
### School of Psychology, Massey University

**Twitter API Setup:**
To use the Twitter API you need to register as an app developer. All you need is a Twitter account. When you register an app you are given four encryption keys, two public and two private. One pair is to identify you to the Twitter server and the other is to allow  someone using your app to give you access to their data without them having to share their private credentials with your app.

In [None]:
import twitter

consumer_key, consumer_secret = twitter.read_token_file("consumer.txt")
oauth_token, oauth_secret = twitter.read_token_file("oauth.txt") 
auth = twitter.oauth.OAuth(oauth_token, oauth_secret, consumer_key, consumer_secret)
twitter_api = twitter.Twitter(auth=auth)
print(twitter_api)

The **twitter_api** object exists which means we are good to go.

Let's find out what we know about @MasseyUni:

In [None]:
massey_info = twitter_api.users.show(screen_name = "MasseyUni")
print(massey_info.keys())

Lots of goodies. Let's check out the profile image:

In [None]:
print(massey_info['profile_image_url'])

How many followers does @MasseyUni have?

In [None]:
print(massey_info["followers_count"])

How many Tweets (including retweets) has @MasseyUni issued?

In [None]:
print(massey_info["statuses_count"])

Let's grab some Tweets from @MasseyUni (which Twitter also calls **statuses** or **status updates**).

In [None]:
q = "@MasseyUni" # The query string - the string we are going to search Twitter with
count = 5
results =  twitter_api.statuses.user_timeline(screen_name="@MasseyUni", count=count)

 Like most web services, Twitter returns data in **json** format (json = JavaScript Object Notation)
 which is very similar to a Python dictionary containing other nested dictionaries and lists in its structure. To print json in a readable
 manner I will use the json dump string function aliased to dump.
 
 A tweet is only 140 characters but the information Twitter provides for each tweet is around 5kB.

In [None]:
from json import dumps as dump

print(dump(results, indent=2))

That is just 5 tweets!

The data is returned as a dictionary at the topmost level. The first item is **statuses** which are the actual tweets. The second is **search_metadata**.

In [None]:
print(len(results))

Let's take a look at what is inside the first tweet (aka statuses[0])

In [None]:
print(results[0].keys())

**"text"** is the actual text of the tweet. Let's print them:

In [None]:
for n in range(count):
    print(results[n]["text"])

In [None]:
for n in range(count):
    print(results[n]["source"])

Let's see what is inside a single tweet.

**"entities"** are things like hashtags, users and urls mentioned in a tweet. Let's check out the entities for the first tweet: 

In [None]:
t1 = results[0] # t1 = tweet 1, saves writing results["statuses"][0] all the time

print(dump(t1["entities"], indent=1))

**"user"** contains infromation about the original tweeter:

In [None]:
print(t1["user"]["screen_name"])

In this case the original tweeter is MasseyUni but if it was a retweet we could get the original tweeter's user profile, follower, friends and tweets. 

How many users is @MasseyUni following (i.e. @MasseyUni's friends)?

In [None]:
print(massey_info["friends_count"])

Show the latest 20 users @MasseyUni follows:

In [None]:
friends = twitter_api.friends.list(screen_name="MasseyUni", count = 20)
for friend in friends["users"]:
    print(friend["screen_name"])

Show the latest 20 followers of @MasseyUni:

In [None]:
followers = twitter_api.followers.list(screen_name="MasseyUni", count = 20)
for follower in followers["users"]:
    print(follower["screen_name"])

**Note:** Twitter places limits on how many friends and followers can be downloaded at once. If obtaining full user data, 200 can be returned in one request (no more than 15 requests in 15 minutes is allowed). If friends/followers are obtained by ID number only, Twitter allows 5000 user IDs to be returned in one request.

---

How about what is trending in NZ? First we need to find the Yahoo! WOE (Where On Earth) code for NZ. There is a simple lookup page at [http://woeid.rosselliot.co.nz/](http://woeid.rosselliot.co.nz/) 

It turns out NZ is 23424916

In [None]:
WOE_NZ = 23424916
nz_trends = twitter_api.trends.place(_id=WOE_NZ) 
# the underscore on _id is needed because of a quirk in the python twitter API ("id" is reserved for another purpose)
# It turns out trends is a list of dictionaries with only one element. Not much of a list really :)
print(nz_trends[0].keys())

The interesting stuff is in the 'trends' key. Let's take a look at the first 10:

In [None]:
for trend in nz_trends[0]["trends"][0:10]:
    print(dump(trend, indent=2))

Now I will try something more complicated. Let's compare the **lexical diversity** of Massey tweets with Victoria tweets. Lexical diversity will be crudely defined as **the number of unique words divided by the total number of words** in a list of tweets, N = 100 say.

First get 100 tweets from Massey and Victoria:

In [None]:
count = 100

q = "@MasseyUni"
#massey_tweets = twitter_api.search.tweets(q=q, count=count)["statuses"]
massey_tweets = twitter_api.statuses.user_timeline(screen_name="@MasseyUni", count=count)
q = "@VicUniWgtn"
#victoria_tweets = twitter_api.search.tweets(q=q, count=count)["statuses"]
victoria_tweets = twitter_api.statuses.user_timeline(screen_name="@VicUniWgtn", count=count)

Now extract the text of each tweet for both Massey and Victoria (using a Python *list comprehension*):

In [None]:
massey_texts = [tweet["text"] for tweet in massey_tweets]
victoria_texts = [tweet["text"] for tweet in victoria_tweets]

Now we need to break down each text into individual words add the words to a list. I will exclude 'words' that include punctuation (like hastags, screen names etc.) by means of Pythons isalpha() method. We need to iterate over each text and then over each word in the text:

In [None]:
massey_words = [word 
                    for text in massey_texts
                        for word in text.split() if word.isalpha()]
victoria_words = [word 
                    for text in victoria_texts
                        for word in text.split() if word.isalpha()]

print("50 Massey words:\n\n", massey_words[0:50])
print("\n50 Victoria words:\n\n", victoria_words[0:50])

Notice that the words are not unique. We have just split up all the texts and pulled out strings of alpha characters. It's easy to convert a list of words into a *set* of unique words by using Pythons set() method:

In [None]:
unique_massey_words = set(massey_words)
unique_victoria_words = set(victoria_words)

Python makes it that easy! Now we have everything we need to compare the lexical diversity.

In [None]:
print("Massey:", len(unique_massey_words), "unique words out of", len(massey_words), "=",
      len(unique_massey_words) / len(massey_words),"\n")
print(unique_massey_words, "\n\n")
print("\nVictoria:", len(unique_victoria_words), "unique words out of", len(victoria_words), "=",
      len(unique_victoria_words) / len(victoria_words),"\n")
print(unique_victoria_words)

Get the intersection of Massey and Victoria words:

In [None]:
print(unique_victoria_words & unique_massey_words)

The difference:

In [None]:
print(unique_victoria_words - unique_massey_words)

In [None]:
print(unique_massey_words - unique_victoria_words)

Get the words that in one or other but not in both:
|