IS 620 Project 1 
David Stern

For this assignment, I looked at all campaign contributions from residents of New York State to candidates in the 2016 Presidential election. I downloaded this data as a csv file from the most recent quarterly report on the website for the Federal Elections Commission. I then split the data into two csv files - one for nodes (all contributors and candidates, deduplicated) and one for edges (all campaign contributions).

In [697]:
import networkx as nx
import matplotlib.pylab as plt
import os
import csv

In [698]:
os.chdir('/Users/davidstern/Downloads/')

In [699]:
reader = csv.reader(open("NYedges.csv", 'rU'), dialect=csv.excel_tab)

In [700]:
G = nx.Graph()

In [701]:
for row in reader:
    row = row[0].split(',')
    contributor = row[0]
    politician = row[1]
    G.add_edge(contributor, politician)

Here we see that there are over 6000 nodes and edges. There are not many more edges than nodes, meaning that very few people make multiple contributions.

In [702]:
print "Nodes: %d" %nx.number_of_nodes(G)
print "Edges: %d" %nx.number_of_edges(G)

Nodes: 6386
Edges: 6457


In this next part, we will need to create dictionary objects for all of the keys we want to load into the edges and then set them as nodeattributes.

In [703]:
reader2 = csv.reader(open("NYvertices.csv", 'rU'), dialect=csv.excel_tab)

In [704]:
cities = {}
employer = {}
occupation = {}
for row in reader2:
    row = row[0].split(',')
    cities[row[0]] = row[2]
    employer[row[0]] = row[5]
    occupation[row[0]] = row[5]

In [705]:
nx.set_node_attributes(G, 'city', cities)
nx.set_node_attributes(G, 'employer', employer)
nx.set_node_attributes(G, 'occupation', occupation)

Now we will see if all of the keys loaded properly. Disclaimer: I am not the David J. Stern listed below.

In [706]:
{k: G.node[k] for k in G.node.keys()[:5]}

{' BLANCA PICON': {'city': 'NEW YORK',
  'employer': 'NYC DEPARTMENT OF EDUCATION',
  'occupation': 'NYC DEPARTMENT OF EDUCATION'},
 ' DAVID J STERN': {'city': 'BROOKLYN',
  'employer': 'LURIA ACADEMY OF BROOKLYN',
  'occupation': 'LURIA ACADEMY OF BROOKLYN'},
 ' GREGORY REED': {'city': 'SUNNYSIDE',
  'employer': 'JPMORGAN CHASE & CO.',
  'occupation': 'JPMORGAN CHASE & CO.'},
 ' KATHERINE SAILER': {'city': 'NEW YORK',
  'employer': 'SELF',
  'occupation': 'SELF'},
 ' MATTHEW LEVINE': {'city': 'NEW YORK',
  'employer': 'THE LEGAL AID SOCIETY',
  'occupation': 'THE LEGAL AID SOCIETY'}}

Now we will calculate degree centrality:

In [707]:
degree_centrality = nx.degree_centrality(G)

In [708]:
degree_centrality_sorted = sorted(degree_centrality.iteritems(),key=lambda(k,v):(-v,k))

In [709]:
degree_centrality_sorted[0:18]

[(' Hillary Rodham Clinton', 0.6021926389976507),
 (' Bernard Sanders', 0.09835552075176195),
 (' Jeb Bush', 0.08081440877055598),
 (' Marco Rubio', 0.05763508222396241),
 (" Rafael Edward 'Ted' Cruz", 0.03946750195771339),
 (' Rand Paul', 0.030540328895849646),
 (' Benjamin S. Carson', 0.02678151918559123),
 (" Martin Joseph O'Malley", 0.01597494126859828),
 (' George E. Pataki', 0.01456538762725137),
 (' Lindsey O. Graham', 0.012216131558339859),
 (' RAFAEL EDWARD TED CRUZ', 0.010806577916992952),
 (' Carly Fiorina', 0.009083790133124511),
 (' Mike Huckabee', 0.00845732184808144),
 (' Richard J. Santorum', 0.001879404855129209),
 (' James R. (Rick) Perry', 0.0015661707126076742),
 (' LEWIS VAN AMERONGEN', 0.0006264682850430697),
 (' WILLIAM B. MR. HOTALING', 0.0006264682850430697),
 (' Bobby Jindal', 0.00046985121378230227)]

With degree centrality, we see that almost all of the top vertices are the candidates themselves. 

Now we will calculate eigenvalue centrality:

In [710]:
eigen_centrality = nx.eigenvector_centrality_numpy(G)

In [711]:
eigen_centrality_sorted = sorted(eigen_centrality.iteritems(),key=lambda(k,v):(-v,k))

In [712]:
eigen_centrality_sorted[0:15]

[(' Hillary Rodham Clinton', 0.7070997401310243),
 (' DAVID LEVINE', 0.011459242607468131),
 (' LYNN MANGUM', 0.011449343928887908),
 (' ALEXANDRA KORRY', 0.011449343928887907),
 (' DAVID ROTH', 0.011449343928887907),
 (' GINA BOONSHOFT', 0.011449343928887907),
 (' PHILLIP DONAHUE', 0.011449343928887907),
 (' DONALD RUBIN', 0.011449343928887905),
 (' FRIEDRIKE MERCK', 0.011449343928887905),
 (' PATRICK NOLAN', 0.011449343928887905),
 (' RENEE FEINBERG', 0.011449343928887905),
 (' GISELA GAMPER', 0.011449343928887903),
 (' MICHAEL LEVINE', 0.011449343928887903),
 (' MICHELLE LAVITT', 0.011449343928887901),
 (' JOHN CATSIMATIDIS', 0.011419214434426408)]

For eigenvector centrality, we see that Hillary Clinton is still at the top, but the other individuals are donors, not candidates. These are likely all high-profile donors are bundlers. John Catsimatidis, for example, is the CEO of Gristedes and entered the Republican primary for Mayor of NYC in 2013.

In this next part, I was not able to subset the graph object to include only the key values I wanted in certain categories.

In [713]:
attorneys = {key:value for key,value in G.node.values() if value == 'attorney'}

ValueError: too many values to unpack