### Assignment 3: Graph Visualization

#### Summer 2021
**Authors:** GOAT Team (Estaban Aramayo, Ethan Haley, Claire Meyer, and Tyler Frankenburg) 

This assignment looks at a CSV of Donor + Donor Recipient Data from OpenSecret, which tracks political donations. This data is available [here](https://docs.google.com/spreadsheets/d/1PPjz-U1LueQYHaVCU8iCYf3O4lc-OYN7uOf3OknhYxo/edit#gid=1325242852). 

In [7]:
import networkx as nx
import pandas
import matplotlib.pyplot as plt

First we import the CSV and do a couple quick checks to see the shape and form of the data. 

In [8]:
df = pandas.read_csv('donor_members.csv')
df.head()

Unnamed: 0,PAC,CID,CRPName,Distid,Total,Unnamed: 5,Unnamed: 6
0,American Medical Assn,N00025219,"Burgess, Michael",TX26,"$20,000",,
1,American Medical Assn,N00028152,"McCarthy, Kevin",CA23,"$20,000",,Direct contributions data covers the 2020 elec...
2,American Dental Assn,N00005736,"Babin, Brian",TX36,"$20,000",,
3,American Dental Assn,N00025219,"Burgess, Michael",TX26,"$20,000",,
4,American Dental Assn,N00035346,"Carter, Buddy",GA01,"$17,500",,


In [9]:
df.shape

(2686, 7)

Then, we use the `from_pandas_dataframe` function to create a networkx graph from the dataframe. [Source](https://networkx.org/documentation/networkx-1.10/reference/generated/networkx.convert_matrix.from_pandas_dataframe.html). 

In [10]:
test_graph = nx.from_pandas_dataframe(df, source="PAC", target="CRPName",
                                  edge_attr=["Total"])

In [11]:
print(nx.info(test_graph))

Name: 
Type: Graph
Number of nodes: 712
Number of edges: 2675
Average degree:   7.5140


For this assignment we want to explore diameter. However, diameter requires a connected graph. First, let's check if this graph is, with the `is_connected` function. [Source](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.components.is_connected.html#networkx.algorithms.components.is_connected).

In [12]:
print(nx.is_connected(test_graph))

False


This graph is not connected. We can look for subgraphs that are however, and focus measurement there. The `connected_component_subgraphs` function generates any available connected subgraphs. [Source](https://networkx.org/documentation/networkx-1.9.1/reference/generated/networkx.algorithms.components.connected.connected_component_subgraphs.html).

In [13]:
graphs = list(nx.connected_component_subgraphs(test_graph))
print("There are ", len(graphs), " connected subgraphs in this graph.")

There are  2  connected subgraphs in this graph.


Let's compare the size of these subgraphs by the number of nodes.

In [14]:
print("The first subgraph has ",len(graphs[0].nodes())," nodes.")
print("The second subgraph has ",len(graphs[1].nodes())," nodes.")

The first subgraph has  710  nodes.
The second subgraph has  2  nodes.


Let's select the larger of the two, and explore further.

In [15]:
subgraph_test = graphs[0]

In [39]:
#nx.draw_networkx(subgraph_test)

We can use the built in diameter function to determine the diameter of this subgraph.

In [20]:
diameter_test = nx.diameter(subgraph_test)

In [21]:
print("The diameter is: ", diameter_test)

The diameter is:  6


We can also look at top nodes based on some of our centrality measures, e.g. degree centrality, closeness, and betweenness. We start by pulling the sorted_map function from [the textbook's repo](https://www.oreilly.com/library/view/social-network-analysis/9781449311377/), then using different NetworkX built in centrality functions. 

In [22]:
def sorted_map(dd: dict) -> dict:
    """
    Sorts dict by its values (desc)
    
    :param dd: dictionary with numeric values
    :return sorted dictionary ordered by its numeric value
    """
    sorted_dict = sorted(dd.items(), key=lambda x: (-x[1], x[0]))
    return sorted_dict

In [23]:
d = nx.degree_centrality(subgraph_test)
ds = sorted_map(d)
ds[:10]

[('McCarthy, Kevin', 0.22284908321579688),
 ('Majority Cmte PAC', 0.1706629055007052),
 ('Scalise, Steve', 0.1466854724964739),
 ('Luetkemeyer, Blaine', 0.12552891396332863),
 ('Hudson, Richard', 0.10155148095909731),
 ('American Bankers Assn', 0.09449929478138222),
 ('Stefanik, Elise', 0.09449929478138222),
 ('Marshall, Roger', 0.09026798307475317),
 ('Graves, Sam', 0.08603667136812412),
 ('National Auto Dealers Assn', 0.0846262341325811)]

In [24]:
c = nx.closeness_centrality(subgraph_test)
cs = sorted_map(c)
cs[:10]

[('Majority Cmte PAC', 0.47583892617449663),
 ('American Bankers Assn', 0.4425717852684145),
 ('National Auto Dealers Assn', 0.43443627450980393),
 ('McCarthy, Kevin', 0.4180424528301887),
 ('American Crystal Sugar', 0.4141355140186916),
 ('National Assn of Realtors', 0.4122093023255814),
 ('Comcast Corp', 0.4065366972477064),
 ('AT&T Inc', 0.40101809954751133),
 ('Koch Industries', 0.3978675645342312),
 ('Scalise, Steve', 0.39214601769911506)]

In [25]:
b = nx.betweenness_centrality(subgraph_test)
bs = sorted_map(b)
bs[:10]

[('Majority Cmte PAC', 0.17282877117345324),
 ('McCarthy, Kevin', 0.16931791360004345),
 ('Luetkemeyer, Blaine', 0.09260890526740499),
 ('Scalise, Steve', 0.07678017910891104),
 ('Hyde-Smith, Cindy', 0.06477478319404831),
 ('American Bankers Assn', 0.059420459128166014),
 ('Hudson, Richard', 0.05398386178323799),
 ('Marshall, Roger', 0.053670599744799354),
 ('National Auto Dealers Assn', 0.05243269238376074),
 ('Graves, Sam', 0.051492943847402814)]

There are some consistent names across all 3.

In [26]:
nx.__version__

'1.11'

In [27]:
nx.write_graphml(subgraph_test, "subgraph_test.graphml")

In [28]:
from IPython.display import IFrame

In [38]:
IFrame('https://ebhtra.github.io/gory-graph/network/#McCarthy,%20Kevin', width=1000, height=1000)