In [91]:
import networkx as nx
from pickle import *
import numpy as np
import pandas as pd
from pandas.io.json import json_normalize
import json
import yaml

## Overview: Members of Congress on Twitter

CSPAN maintains [a list](https://twitter.com/cspan/lists/members-of-congress?lang=en) of Twitter accounts of Members of Congress.  By scraping information about those accounts, we can produce a dataset consisting of:

* basic account information for each MOC
* the network of who follows whom among MOC

### Twitter account info on the legislators

The first part of these data consists of publicly available  information about each of these Twitter accounts. Most of these fields are self-explanatory, but you can see more about the API [here](https://dev.twitter.com/rest/reference/get/users/lookup).

If you have questions or figure something interesting out about the dataset, tell us all about it on Piazza!

In [3]:
congress_members = load(open('data/congress_list_members.pickle', 'rb'))

In [136]:
congress_member_data = pd.DataFrame([x._json for x in congress_members])

In [137]:
congress_member_data.head()

Unnamed: 0,contributors_enabled,created_at,default_profile,default_profile_image,description,entities,favourites_count,follow_request_sent,followers_count,following,...,profile_text_color,profile_use_background_image,protected,screen_name,status,statuses_count,time_zone,url,utc_offset,verified
0,False,Tue Jun 28 14:57:49 +0000 2016,True,False,Representing MA's 7th District in the U.S. Hou...,{'url': {'urls': [{'url': 'https://t.co/3ljoXJ...,9,False,1621,False,...,333333,True,False,RepMikeCapuano,"{'lang': 'en', 'id': 787971427538460672, 'coor...",35,Eastern Time (US & Canada),https://t.co/3ljoXJRbEf,-14400.0,True
1,False,Thu Feb 19 20:08:54 +0000 2015,True,False,Representing California's 33rd District in Con...,{'url': {'urls': [{'url': 'https://t.co/SFF1Hj...,100,False,6692,False,...,333333,True,False,RepTedLieu,"{'lang': 'en', 'id': 788163021860380672, 'coor...",1924,Eastern Time (US & Canada),https://t.co/SFF1Hjj9ww,-14400.0,True
2,False,Mon Feb 09 16:00:36 +0000 2015,False,False,Congresswoman Representing American Samoa,{'url': {'urls': [{'url': 'http://t.co/L7B6x0y...,0,False,494,False,...,0,False,False,RepAmata,"{'lang': 'en', 'id': 564820451508383746, 'coor...",1,,http://t.co/L7B6x0yzTS,,True
3,False,Wed Feb 04 21:01:23 +0000 2015,True,False,Pete Aguilar represents CA’s 31st Congressiona...,{'url': {'urls': [{'url': 'http://t.co/o4deRoU...,1,False,2894,False,...,333333,True,False,RepPeteAguilar,"{'lang': 'en', 'id': 788120934544703489, 'coor...",731,,http://t.co/o4deRoUnuK,,True
4,False,Tue Jan 13 15:08:48 +0000 2015,False,False,U.S. Congressman (WI-06),{'url': {'urls': [{'url': 'http://t.co/MY3WTta...,13,False,1752,False,...,0,False,False,RepGrothman,"{'lang': 'en', 'id': 788387900505387009, 'coor...",422,,http://t.co/MY3WTtaAeb,,True


We'll save these data to a file so that they can be opened up for further analysis in other notebooks.

In [90]:
congress_member_data.to_csv("congress_member_data.csv")

### Who follows whom network among the Members of Congress

By scraping the list of all of the people each Member of Congress publicly follows, we can build a network of who follows whom among MOC.

We'll open that file up here.

In [131]:
congress_friends = load(open('data/congress_friends_all.pickle', 'rb'))

In [132]:
cong_ids = congress_friends.keys()

In [134]:
congress_edges = []

for cong_id in congress_friends.keys():
    alters = congress_friends[cong_id]
    alters = [x for x in alters if x in cong_ids]
    
    for a in alters:
        congress_edges.append((cong_id, a))
    

In [135]:
cong_twitter_net = nx.Graph(congress_edges)

In [138]:
nx.average_clustering(cong_twitter_net)

0.6634751613749071

In [139]:
nx.number_connected_components(cong_twitter_net)

1

Finally, we'll save the congress twitter edgelist so that you can easily open it up in other notebooks.

We'll use the format:

twitter ID of first MOC, twitter ID of second MOC

In [127]:
with open("congress_twitter_edges.csv", "w") as f:
    f.write("twitter_id1, twitter_id2\n")
    for e in congress_edges:
        f.write(str(e[0]) + ", " + str(e[1]) + "\n")