In [1]:
import networkx as nx
from pickle import *
import numpy as np
import pandas as pd
from pandas.io.json import json_normalize
import json
import yaml
from datascience import *

## Overview: Data on Members of Congress

Both the Twitter data and the bill co-sponsorship data contain information about connections between Members of Congress.  The repository [https://github.com/unitedstates/congress-legislators](https://github.com/unitedstates/congress-legislators) has collected lots of useful additional information about each MOC. Here, we'll illustrate how to open up a couple of the datafiles.

The important files are stored in `yaml` (yet another markup language) format. You can use the `yaml` package (which we imported above) to read in yaml data. There are a couple of examples below (and note that the repository has additional files that we won't explore here).

If you have questions or figure something interesting out about the dataset, tell us all about it on Piazza!

### Biographical information

In [2]:
# information on current legislators
leg_current = yaml.load(open('congress-legislators/legislators-current.yaml', 'r'))

In [3]:
leg_current_df = pd.DataFrame(leg_current)

In [4]:
leg_current_df.head()

Unnamed: 0,bio,family,id,leadership_roles,name,other_names,terms
0,"{'gender': 'M', 'birthday': '1952-11-09', 'rel...",,"{'ballotpedia': 'Sherrod Brown', 'govtrack': 4...",,"{'last': 'Brown', 'first': 'Sherrod', 'officia...",,"[{'type': 'rep', 'state': 'OH', 'party': 'Demo..."
1,"{'gender': 'F', 'birthday': '1958-10-13', 'rel...",,"{'ballotpedia': 'Maria Cantwell', 'govtrack': ...",,"{'last': 'Cantwell', 'first': 'Maria', 'offici...",,"[{'type': 'rep', 'state': 'WA', 'party': 'Demo..."
2,"{'gender': 'M', 'birthday': '1943-10-05', 'rel...",,"{'ballotpedia': 'Ben Cardin', 'govtrack': 4000...",,"{'last': 'Cardin', 'first': 'Benjamin', 'middl...",,"[{'type': 'rep', 'state': 'MD', 'party': 'Demo..."
3,"{'gender': 'M', 'birthday': '1947-01-23', 'rel...",,"{'ballotpedia': 'Tom Carper', 'govtrack': 3000...",,"{'last': 'Carper', 'first': 'Thomas', 'middle'...",,"[{'type': 'rep', 'state': 'DE', 'party': 'Demo..."
4,"{'gender': 'M', 'birthday': '1960-04-13'}",,"{'ballotpedia': 'Bob Casey, Jr.', 'govtrack': ...",,"{'last': 'Casey', 'middle': 'P.', 'nickname': ...",,"[{'phone': '202-224-6324', 'state': 'PA', 'cla..."


### Social media information

NOTE: the curators of these data say that they are careful to only include official (and not campaign-related) social media accounts. So there may be some difference between the Twitter accounts listed here and the Twitter accounts that we grabbed using CSPAN's list.

In [5]:
# information on social media accounts of legislators
leg_socmedia = yaml.load(open('congress-legislators/legislators-social-media.yaml', 'r'))

We can see that `yaml.load` gives us a dictionary.

In [6]:
leg_socmedia[0]

{'id': {'bioguide': 'R000600', 'govtrack': 412664, 'thomas': '02222'},
 'social': {'facebook': 'congresswomanaumuaamata',
  'facebook_id': '1537155909907320',
  'twitter': 'RepAmata',
  'twitter_id': 3026622545,
  'youtube_id': 'UCGdrLQbt1PYDTPsampx4t1A'}}

We can convert the dictionary to a pandas DataFrame like so:

In [7]:
leg_socmedia_df = pd.DataFrame(leg_socmedia)

In [8]:
leg_socmedia_df.head()

Unnamed: 0,id,social
0,"{'bioguide': 'R000600', 'thomas': '02222', 'go...","{'youtube_id': 'UCGdrLQbt1PYDTPsampx4t1A', 'fa..."
1,"{'bioguide': 'H001070', 'thomas': '02260', 'go...","{'youtube_id': 'UCc8E6NWCdgrXjBVI2NNPYdA', 'yo..."
2,"{'bioguide': 'Y000064', 'thomas': '02019', 'go...","{'youtube_id': 'UCuknj4PGn91gHDNAfboZEgQ', 'yo..."
3,"{'bioguide': 'E000295', 'thomas': '02283', 'go...","{'youtube_id': 'UCLwrmtF_84FIcK3TyMs4MIw', 'in..."
4,"{'bioguide': 'T000476', 'thomas': '02291', 'go...","{'youtube_id': 'UCUD9VGV4SSGWjGdbn37Ea2w', 'fa..."


And we can convert the pandas DataFrame into a datascience Table like this:

In [9]:
leg_socmedia_table = Table.from_df(leg_socmedia_df)

In [10]:
leg_socmedia_table

id,social
"{'bioguide': 'R000600', 'thomas': '02222', 'govtrack': 4 ...","{'youtube_id': 'UCGdrLQbt1PYDTPsampx4t1A', 'facebook': ' ..."
"{'bioguide': 'H001070', 'thomas': '02260', 'govtrack': 4 ...","{'youtube_id': 'UCc8E6NWCdgrXjBVI2NNPYdA', 'youtube': 'R ..."
"{'bioguide': 'Y000064', 'thomas': '02019', 'govtrack': 4 ...","{'youtube_id': 'UCuknj4PGn91gHDNAfboZEgQ', 'youtube': 'R ..."
"{'bioguide': 'E000295', 'thomas': '02283', 'govtrack': 4 ...","{'youtube_id': 'UCLwrmtF_84FIcK3TyMs4MIw', 'instagram_id ..."
"{'bioguide': 'T000476', 'thomas': '02291', 'govtrack': 4 ...","{'youtube_id': 'UCUD9VGV4SSGWjGdbn37Ea2w', 'facebook': ' ..."
"{'bioguide': 'Y000063', 'thomas': '02021', 'govtrack': 4 ...","{'youtube_id': 'UCCeYmn4A8kZEHCcAfeUW9lQ', 'youtube': 'R ..."
"{'bioguide': 'Y000062', 'thomas': '01853', 'govtrack': 4 ...","{'youtube_id': 'UCy5KW4yrEfEiyZRX45Eoxkg', 'youtube': 'R ..."
"{'bioguide': 'Y000033', 'thomas': '01256', 'govtrack': 4 ...","{'youtube_id': 'UCg5ZIR5-82EbJiNeI1bqT-A', 'youtube': 'R ..."
"{'bioguide': 'W000809', 'thomas': '01991', 'govtrack': 4 ...","{'youtube_id': 'UCXJbUDLYX-wGIhRuN66hqZw', 'youtube': 'C ..."
"{'bioguide': 'W000808', 'thomas': '02004', 'govtrack': 4 ...","{'youtube_id': 'UCP5QBhng_lHv-vJgE_h7lpA', 'youtube': 'r ..."
