
# Part 1 - Hand Calculated Graph Centrality

In this part, we will load the Kite network data and perform a graph centrality algorithm by hand. First we load the data again:

In [4]:
# Hide some silly output
import logging
logging.getLogger("requests").setLevel(logging.WARNING)
logging.getLogger("urllib3").setLevel(logging.WARNING)

# Import everything we need
import graphlab as gl

# Load Data
kite_vertices = gl.SFrame.read_csv('../Week1/kite_vertices.csv')
kite_edges = gl.SFrame.read_csv('../Week1/kite_edges.csv')

# Create graph
g_kite = gl.SGraph()
g_kite = g_kite.add_vertices(vertices=kite_vertices, vid_field='name')
g_kite = g_kite.add_edges(edges=kite_edges, src_field='src', dst_field='dst')

# Visualize graph?
gl.canvas.set_target('ipynb')
g_kite.show(vlabel="id")

PROGRESS: Finished parsing file /home/james/Development/Masters/IndependentStudy/Week1/kite_vertices.csv
PROGRESS: Parsing completed. Parsed 10 lines in 0.018072 secs.
------------------------------------------------------
Inferred types from first line of file as 
column_type_hints=[str]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
PROGRESS: Finished parsing file /home/james/Development/Masters/IndependentStudy/Week1/kite_vertices.csv
PROGRESS: Parsing completed. Parsed 10 lines in 0.017847 secs.
PROGRESS: Finished parsing file /home/james/Development/Masters/IndependentStudy/Week1/kite_edges.csv
PROGRESS: Parsing completed. Parsed 18 lines in 0.019082 secs.
------------------------------------------------------
Inferred types from first line of file as 
column_type_hints=[str,str,str]
If parsing fails due to incorrect types, you can

<IPython.core.display.Javascript object>

# Part 2 - Crawling Social Data

In this part, I will use the Twitter / Facebook / LinkedIn API to download a graphical data of my portion of the social graph, and attempt to visualize it in Gephi or Neo4J.

## 2.1 - Facebook API

In [6]:
import requests
import json

ACCESS_TOKEN="CAAFokc3kSoEBALJY8T8qtg1q5Frfc9PYMgjBqHocZBf5a0kwfsKi0AGpZApw5iEKAZBQlVAQMCZBGcKJglbVHkZB2n2pwquMHWrZAgrhpGqHbVLbXMsmHjAvQfnHP4u1Mx2CQ0CHAJNMme9j4ozJut1MBf9V2ZCxYHZA2wVcDxZBOa9WaDwJNOAiR7wJcsZCh7Of7VvG3rvz7ZAeM4wiFvV0SRN2lfnYVwbN206oEUnJJhgogZDZD"

base_url = 'https://graph.facebook.com/me'

# Get 10 likes for 10 friends
fields = 'id,name,friends.fields(likes.limit(10))'
url = '%s?fields=%s&access_token=%s' % (base_url, fields, ACCESS_TOKEN,)

# Interpret the response as JSON and convert back to Python data structures
content = requests.get(url).json()

# Pretty-print the JSON and display it
print json.dumps(content, indent=1)

{
 "friends": {
  "data": [], 
  "summary": {
   "total_count": 214
  }
 }, 
 "id": "684051972564", 
 "name": "James Quacinella"
}


What? No friends?! Researching this brought me to https://github.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition/issues/191, which states that the API only gives back results for users who give permission to the Oauth app generated by me. Much of the API calls and graph explorer are different now, so much of the book does not apply now.

Moving on to LinkedIn ...

## 2.2 - LinkedIn

Sadly, the [same issue](https://github.com/ozgur/python-linkedin/issues/78) has arisen with LinkedIn. The API is no longer giving out access to Oauth 1.0 tokens and have substantially altered the API. Even the book's website has an [open github issue](https://github.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition/issues/274) about this.

# Part 3 - Textual Analysis of Tweets from Political ThinkTanks

In this part, I will download the tweet streams from different political 'think tanks' and perform a simple frequency analysis to see if there are any insights we can derive about the political leanings of these institutions. As an example of what tweet strams I will parse:

- https://twitter.com/fairmediawatch - Fairness and Accuracy In Reporting
- https://twitter.com/AccuracyInMedia - Accuracy In Media
- https://twitter.com/ips_dc - Institute for Policy Studies
- https://twitter.com/heritage - Heritage Foundation

## 3.1 - Prep Work

In [9]:
# Lets load up the Twitter API
import twitter
import prettytable

# Lets create out api object w/ OAuth parameters
api = twitter.Api(consumer_key='yp4wi4FASXbsRKa6JxYqzhUlH',
                consumer_secret='Wkh1d5ygAOp4Bp65syFzHRN4xQsS8O4FvU3zHWosX8NXCqMpcl',
                access_token_key='16562593-F6lRFe7iyoQEahezhPmaI64oInHZD0LNpcIbbq7Wy',
                access_token_secret='weregYL8n6DI7yZy9pkizIJ78rH2GY02Do9jvpTe7rCey')

# Grab FAIR's tweet stream
#
# NOTE: do not include retweets, too many dupes (though for text analysis this might be a 
#      way to weigh more heavily text from tweets that are being retweeted by the account)
statuses = api.GetUserTimeline(screen_name='fairmediawatch', count=500, include_rts=False)

# Create a pretty table of tweet contents and any expanded urls
pt = prettytable.PrettyTable(["Tweet Status", "Expanded URLs"])
pt.align["Tweet Status"] = "l" # Left align city names
pt.align["Expanded URLs"] = "l" # Left align city names
pt.max_width = 60 
pt.padding_width = 1 # One space between column edges and contents (default)

# Add rows to the pretty table
for status in statuses:
    pt.add_row([status.text, "".join([url.expanded_url for url in status.urls]) ])

# Lets see the results!
print pt

+--------------------------------------------------------------+----------------------------------------------+
| Tweet Status                                                 | Expanded URLs                                |
+--------------------------------------------------------------+----------------------------------------------+
| That most US terrorists aren't Muslim "may come as a         | http://bit.ly/1J7XVYL                        |
| surprise"--especially if you rely on corporate media.        |                                              |
| http://t.co/J5bn1tQzRY                                       |                                              |
| Baltimore "gang threat" swallowed by media was found to be   | http://bit.ly/1RxKvr6                        |
| "non-credible" by FBI. @Vice @AdamJohnsonNYC                 |                                              |
| http://t.co/4kZSXwnRka                                       |                                        

## Part 3.2 - Getting Recent Tweets from All Accounts

Lets get the tweets for all the accounts, and store them in a dictionary:

In [10]:
# List of accounts to process, and our results dict
accounts = ['fairmediawatch', 'AccuracyInMedia', 'ips_dc', 'heritage']
allStatuses = { }

# For each account, query tiwtter for top tweets
for account in accounts:
    allStatuses[account] = api.GetUserTimeline(screen_name=account, count=500, include_rts=False)

# Save results
import pickle
pickle.dump( allStatuses, open( "allStatuses", "wb" ) )