# $\color{purple}{\text{Network Analysis of Senate Voting Patterns}}$

____

__By Alexander Ng & Philip Tanofsky__

**Data 620 Web Analytics**

**Submitted March 30, 2022**

____

We analyze from a social network perspective the United States Senate voting patterns in the 113th Congress of two types of roll calls: Senate initiated bills and presidential nominations.  Our goal is to find, visualize and interprete patterns of coordinated voting behavior.  Is voting always split along party lines or do other factors emerge?

In Part I, we discuss Senate initiated legislation and analyze the network patterns.  We discuss the background of Senate voting, describe the questions to be addressed.  We also discuss the dataset which is common for both Parts I and II. We find that overall polarization does exist in the Senate of the 113th Congress based on the voting patterns of Senate bills. We locate smaller factions of Senators within each party with very common voting records along with the identification of one Senator with a voting pattern more in line with the opposition party for a subset of bills.

In Part II, we discuss Senate nominations of high official positions to the Cabinet, judiciary and various agencies.  We use network projection methods to find clusters within both senators and nominations.   The results can be explained by variations in liberal-conservative ideology, duration of Senatorial service, branch of government and other factors.

Both Parts use a social network analysis approach to study this dataset.   By modeling Senate rollcall votes as a 2-mode network (a bipartite graph) of Senators and rollcalls, we can apply the arsenal of graph theory.

<img src="https://www.lifeandnews.com/articles/wp-content/uploads/senate-floor.jpg" width=800 height=500 />
____

# $\color{blue}{\text{Part One:  Senate Initiated Legislation}}$

## Background

US Senate voting patterns are well documented public data.   Our data source was the voteview website [www.voteview.com] which published Congressional voting data of both chambers from the establishment of the Republic to its current Congress.   Extensive prior work has been done to model voting patterns using statistical methods.   The most famous model is the DW-NOMINATE model developed by political scientistis Kevin Poole and Howard Rosenthal  https://en.wikipedia.org/wiki/NOMINATE_(scaling_method).  Poole and Rosenthal also co-created the voteview.com website.

The analysis attempts to answer three questions:

1. What is the level of polarization within the Senate of the 113th Congress?
2. Do smaller factions of Senators exist with similar voting patterns within each party?
3. Does any Senator appear to vote more in line with the opposition party?

___




## Data


We use data for the 113th Congress which convened from January 3, 2013 - January 3, 2015 during the fifth and sixth years of the Barack Obama presidential administration.   The 113th Congress refers to both chambers:  House of Representatives and the Senate.   We focus on the Senate which normally has 100 active voting members (2 per State).

The dataset consists of 3 files obtained on https://voteview.com/data

   * `S113_members.csv` contains the list of 105 members of the US Senate.  
   * `S113_rollcalls.json` contains the set of legislative roll calls including bills, procedural motions, nominations, cloture votes.  It contains metadata as well.
   * `S113_votes.csv` contains a dataframe of votes made by each Senator for each rollcall item from `S113_rollcalls.json` discussed earlier.  It also refers to the members by a foreign key in `S113_members.csv`.
   
The legislative rollcalls are identified with the scope of the dataset by a `bill code` and a `rollcall number`.  For example, the nomination of Jacob Lew as Secretary of the Treasury is identified by `bill code` `PN40` and by `rollcall number` `26`.   The `rollcall number` is unique to each item to be voted while the `bill code` is associated with the legislation or nomination as well as procedure voted like cloture.  Each Senate member is uniquedly assigned a 5 digit `icpsr` numeric code.  For example, Barbara Boxer has icpsr code 15011.

### Nuances of the Data

#### Changes of Membership

The dataset contains 105 senators because of changes in membership due to death or resignation.   While the Senate normally has 100 active voting members, during the 2 year period of the 113th Congress, changes arose.  These changes were:

*  Death of Frank Lautenberg, NJ - Democrat on June 3, 2013 - replaced by gubernatorial appointment of Jeffrey Chiesa, NJ - Republican on June 6, 2013.
*  Replacement of Jeffrey Chiesa, NJ - Republican on Oct 31, 2013 by Corey Booker, NJ - Democrat due to a special election following the death of Frank Lautenberg.
*  Replacement of John Kerry, MA - Democrat on Feb 1, 2013 who become US Secretary of State by Mo Cowan by gubernatorial appointment.
*  Replacement of Mo Cowan, MA - Democrat on July 16, 2013 by Ed Markey, MA - Democrat by special election.
*  Replacement of Max Baucus, MT - Democrat on February 11, 2014 by John Walsh, MT Democrat on February 11, 2014

Source:  https://en.wikipedia.org/wiki/113th_United_States_Congress#Changes_in_membership

#### How Votes are Cast

Another nuance concerns the outcome of each Senator's vote.  It is not binary.   Formally, there are 10 different results that could be recorded for each vote.   For our purposes, we collapse them down to 3 outcomes:

*  `Yea` - a Yes vote in the affirmative on the motion, bill or nomination.
*  `Nay` - a No vote
*  `Present` - equivalent to choosing not to vote for multiple reasons.  We also treat absent votes in the `Present` category.  It favors neither Yea or Nay.


#### Data Quality

Due to the importance of this dataset, the quality of rollcall voting data is very high.   Highly accurate recordkeeping is essential for both the passage of legislation and the operation of government.
While some data transformations were made for this analysis, no data corrections were required to our knowledge.


_____


## Methods

All data is retrieved from the site voteview.com in JSON format.

### Senator Votes
Read in the individual Senator votes by rollnumber.
- congress: 113 confirms the data is correct
- icpsr: unique identifier for each Senator
- cast_code: actual vote cast based on the following codes
- rollnumber: vote for the Senate of that Congress. Number increments with each vote.



In [None]:
# https://voteview.com/static/data/out/votes/S113_votes.json
s113_votes_raw_df = pd.read_json('S113_votes.json')

In [None]:
s113_votes_raw_df.head()

Next, we trim the dataframe to only contain $icpsr$, $cast\_code$, and $rollnumber$. The JSON from voteview.com contains Barack Obama votes on the Senate rollnumbers. We have removed his votes by $icpsr$ "99911" as he was President and not a Senator at the time.

In [None]:
# Select only the columns I want to use later
s113_votes_sel_df = s113_votes_raw_df[['icpsr', 'cast_code', 'rollnumber']]
# Convert all to string
s113_votes_sel_df = s113_votes_sel_df.astype(str)
# Remove Barack Obama from the list (icpsr==99911)
s113_votes_sel_df = s113_votes_sel_df.drop(s113_votes_sel_df[s113_votes_sel_df.icpsr == '99911'].index)
s113_votes_sel_df = s113_votes_sel_df.reset_index(drop=True)
s113_votes_sel_df.head()

We retrieve all the unique IDs from the $icpsr$ column, which is the unique identifier of each Senator. We confirm the overall Senator count is 105. The list of Senator IDs will be needed later to define the Senator graph nodes.

In [None]:
senator_ids = s113_votes_sel_df['icpsr'].unique()

In [None]:
senator_ids

In [None]:
len(senator_ids) # 105

### Senator Metadata

Read in the Senate members JSON object which contains the metadata for each Senator including $bioname$ and $party\_code$.

In [None]:
# https://voteview.com/static/data/out/members/S113_members.json
s113_members_raw_df = pd.read_json('S113_members.json')

In [None]:
s113_members_raw_df.head()

We trim the dataframe to contain only the necessary data.

- icpsr: Unique identifier
- state_abbrev: State for which the Senator represents
- party_code: Political party of Senator
- bioname: Full name of Senator
- nominate_dim1: NOMINATE statistic defining the Senator idealogy on a scale of -1 to 1 with -1 being far liberal and 1 being far conservative.

Again, we remove Barack Obama ("99911").

In [None]:
# icpsr, state_abbrev, party_code, bioname
s113_members_sel_df = s113_members_raw_df[['icpsr','state_abbrev','party_code','bioname','nominate_dim1']]
s113_members_sel_df = s113_members_sel_df.astype(str)
s113_members_sel_df[['nominate_dim1']] = s113_members_sel_df[['nominate_dim1']].astype(float) 
# Remove Barack Obama
s113_members_sel_df = s113_members_sel_df.drop(s113_members_sel_df[s113_members_sel_df.icpsr == '99911'].index)
s113_members_sel_df = s113_members_sel_df.reset_index(drop=True)
s113_members_sel_df.head()

Confirm Senator count is 105 to match previous input file.

In [None]:
len(s113_members_sel_df) # 105

In [None]:
s113_members_sel_df.head()
# icpsr, state_abbrev, party_code, bioname

In [None]:
# Get unique party codes
party_codes = s113_members_sel_df['party_code'].unique()
party_codes

Check $party\_code$ values.

- 100: Democrat
- 200: Republican
- 328: Independent (for 113 Congress, both independents align idealogically with Democrats)

### Rollcall Metadata
Read in rollcall votes JSON which contains the metadata for each rollnumber vote. Trim dataframe to columns:
- date: Date of vote
- rollnumber: Numerical identifier of vote for the Senate chamber of the 113th Congress
- bill_number: Formal bill identifier
- vote_result: Vote outcome
- vote_desc: Vote description
- clause_codes: Code to categorize content of vote

In [None]:
# https://voteview.com/static/data/out/rollcalls/S113_rollcalls.json
s113_rollcalls_raw_df = pd.read_json('S113_rollcalls.json')

In [None]:
# len is 657, is 657 votes
len(s113_rollcalls_raw_df[[]])
s113_rollcalls_sel_df = s113_rollcalls_raw_df[['date', 'rollnumber', 'bill_number', 'vote_result', 'vote_desc', 'clausen_codes']]
s113_rollcalls_sel_df['rollnumber'] = s113_rollcalls_sel_df['rollnumber'].astype(str)
s113_rollcalls_sel_df['date'] = s113_rollcalls_sel_df['date'].astype(str)
s113_rollcalls_sel_df.head()
# date, rollnumber, bill_number, vote_result, vote_desc, clausen_codes

Convert Clausen codes list to dummy variables and append to dataframe

In [None]:
claus_codes = pd.get_dummies(s113_rollcalls_sel_df['clausen_codes'].explode()).sum(level=0)
claus_codes.columns = ['cc_agr','cc_civ_lib','cc_for_def','cc_gov_mgmt','cc_misc','cc_soc_wel']
claus_codes

In [None]:
s113_rollcalls_sel_df = s113_rollcalls_sel_df.reset_index(drop=True).join(claus_codes)
s113_rollcalls_sel_df = s113_rollcalls_sel_df.drop(columns=['clausen_codes'])
s113_rollcalls_sel_df.head()

### Clausen Categories

The Clausen codes broadly define each bill according to the categories defined below.

- Government Management: Environmental control; government regulation of business; natural resource management; government ownership of business; government control of the economy; budget balancing; tax policy; interest rates; management of the bureaucracy; etc.
- Social Welfare: Social security; public housing; urban renewal; labor regulation; education; urban affairs; employment opportunities and rewards; welfare; medicare; unemployment; minimum wage; legal services; immigration, etc.
- Agriculture: Price supports and subsidies; commodity control; acreage limitations; etc.
- Civil Liberties: Civil rights; equality; criminal procedure; privacy; guarantees of the Bill of Rights; slavery; Hatch Act; etc.
- Foreign and Defense Policy: International policy; foreign aid; aid to international organizations; armament policy; defense procurement; international trade; military pensions; etc.
- Miscellaneous Policy: Unclassifiable or unidentifiable votes; all votes concerned with internal organization of Congress; procedural motions.

From: https://voteview.com/articles/issue_codes

### Senate Bills
Focus on only the Senate bills, thus filter votes that match the pattern of $bill\_number$ with 'S' followed by a number.

In [None]:
bill_numbers = s113_rollcalls_sel_df['bill_number']
bill_numbers = list(bill_numbers)
bill_numbers = [str(x) for x in bill_numbers]
#bill_numbers

In [None]:
# Regex to get those starting with S and a Number
import re
regex_sen_bill = re.compile(r'^S\d')
senate_bills = list(filter(regex_sen_bill.match, bill_numbers)) # Read Note below
#(senate_bills)

Create dictionary of key $bill\_number$ and value $rollnumber$.

In [None]:
senate_rollcall_dict = dict(zip(s113_rollcalls_sel_df.bill_number, s113_rollcalls_sel_df.rollnumber))
#senate_rollcall_dict

Create dictionary of $bill\_number$ and $vote\_result$.

In [None]:
senate_vote_result_dict = dict(zip(s113_rollcalls_raw_df.bill_number, s113_rollcalls_raw_df.vote_result))

Create dictionary of key $bill\_number$ and $vote\_result$ for just the Senate bills.

In [None]:
senate_bills_result_dict = dict((k, senate_vote_result_dict[k]) for k in senate_bills if k in senate_vote_result_dict)
#senate_bills_result_dict
# Luckily this creation of a dictionary does seem to be using the last entry and thus the final vote

Create a dictionary of key $bill\_number$ and $rollnumber$ for just the Senate bills.

In [None]:
senate_bills_dict = dict((k, senate_rollcall_dict[k]) for k in senate_bills if k in senate_rollcall_dict)
#senate_bills_dict

Convert dictionary into array of $rollnumber$ for Senate Bills

In [None]:
rollnumber_list = senate_bills_dict.values()
rollnumber_list = list(rollnumber_list)
#rollnumber_list

Trim dataframe to just Senate bills with metadata based on list of $rollnumber$ for Senate bills

In [None]:
senate_bills_df = s113_rollcalls_sel_df[s113_rollcalls_sel_df['rollnumber'].isin(rollnumber_list)]
senate_bills_df.head()

Check language of all possible $vote\_result$ values to be able to identify affirmation or rejection.

In [None]:
s113_rollcalls_sel_df['vote_result'].unique()

Votes resulting in affirmation will contain one of the following:
- 'Agreed'
- 'Passed'
- 'Confirmed'
- 'Decision of Chair Sustained'

Votes resulting in rejection will contain one of the following:
- 'Rejected'
- 'Failed'
- 'Defeated'
- 'Not Well Taken'
- 'Decision of Chair Not Sustained'

Create Boolean column $passed$ based on the $vote\_result$ column values.

In [None]:
passed_list = ['Agreed','Passed','Confirmed', 'Decision of Chair Sustained']
def bill_passed(row):
    result = 0
    if any(word in row['vote_result'] for word in passed_list):
        result=1
    return result
senate_bills_df['passed'] = senate_bills_df.apply(bill_passed, axis=1)
senate_bills_df.head()

Trim the dataframe of individual votes cast to just the Senate bills by $rollnumber$.

In [None]:
senate_bill_votes_df = s113_votes_sel_df[s113_votes_sel_df['rollnumber'].isin(rollnumber_list)]
senate_bill_votes_df

Senator 14920 is not contained the above dataframe. This absence is due to the fact that this Senator never voted on a Senate bill of the 113th Congress. Senator 14920 is John Kerry, who became Secretary of State early in 2013 at the start of Obama's second Presidential term.

### Bipartite Graph Construction

Create a bipartite graph of Senator nodes and Senate bill nodes.

First, create Senator nodes with $icpsr$ as the key value, bipartite value as '0' along with name, party, state, and NOMINATE dimension.

In [None]:
G = nx.Graph()

for index, sen in s113_members_sel_df.iterrows():
    G.add_node(sen.icpsr, bipartite=0, 
               name=sen.bioname,
               party=sen.party_code,
               state=sen.state_abbrev,
               nom_dim=sen.nominate_dim1)
    
len(G.nodes())

Create the Senate bill nodes with rollnumber as the key value, bipartite value as '1' along with bill date, formal bill number, passed result, description, and dummy variables for the Clausen codes.

In [None]:
for index, bill in senate_bills_df.iterrows():
    G.add_node(bill.rollnumber, bipartite=1,
               date=bill.date,
               bill_number=bill.bill_number,
               passed=bill.passed,
               desc=bill.vote_desc,
               cc_agr=bill.cc_agr,
               cc_civ_lib=bill.cc_civ_lib, 
               cc_for_def=bill.cc_for_def,
               cc_gov_mgmt=bill.cc_gov_mgmt,
               cc_misc=bill.cc_misc,
               cc_soc_wel=bill.cc_soc_wel)
len(G.nodes())

Separate the nodes into two unique sets: Senators and Senate bill votes.

In [None]:
senator_nodes = {n for n, d in G.nodes(data=True) if d["bipartite"] == 0}
vote_nodes = set(G) - senator_nodes

In [None]:
len(senator_nodes)

In [None]:
len(vote_nodes)

Create the edges for each $cast\_code$ between a Senator and a bill.

In [None]:
for index, row in senate_bill_votes_df.iterrows():
    G.add_edge(row.icpsr, row.rollnumber, vote=row.cast_code)

In [None]:
(len(G.nodes()), len(G.edges()))

Check if the graph is connected.

In [None]:
nx.is_connected(G)

The graph is not connected because $icpsr$ of 14920 (John Kerry) did not vote on any Senate bills as noted above.

Retrieve the largest connected component subgraph, which contains the other 104 senators and the 36 Senate bills.

In [None]:
graphs = list(nx.connected_component_subgraphs(G))
g_s113 = graphs[0]
len(g_s113.nodes())

Define a function to create a label map for the Networkx graph object. The name is used to label the Senator nodes, and the bill number is used to label the bill nodes.

In [None]:
my_labels = [ v[1]['name'] if v[1]['bipartite']==0 else v[1]['bill_number'] for v in g_s113.nodes(data=True)]
len(my_labels)
label_map = { g_s113.nodes()[i] : my_labels[i] for i in range(len(g_s113.nodes())) }

def generateLabelMapForNodes(graph):
    my_labels = [ v[1]['name'] if v[1]['bipartite']==0 else v[1]['bill_number'] for v in graph.nodes(data=True)]
    len(my_labels)
    label_map = { graph.nodes()[i] : my_labels[i] for i in range(len(graph.nodes())) }
    return label_map

Define a function to create a color map for the nodes of the Networkx graph object. Nodes are colored according to the following rules.

- Democrats and Independents: Blue
- Republicans: Red
- Bills with final vote in the affirmative: Green
- Bills with final vote in rejection: Orange

In [None]:
# Assign node colors

my_node_color = [ ]

for node in g_s113.nodes(data=True):
    if node[1]['bipartite'] == 0: # Senator
        if node[1]['party'] == '100' or node[1]['party'] == '328': # Democrat
            my_node_color.append('blue')
        else: # Republican
            my_node_color.append('red')
    else: # Bill
        if node[1]['passed'] == 1: # Passed
            my_node_color.append('green')
        else: # Not Passed
            my_node_color.append('orange')
            
def generateColorMapForNodes(graph):
    my_node_color = [ ]

    for node in graph.nodes(data=True):
        if node[1]['bipartite'] == 0: # Senator
            if node[1]['party'] == '100' or node[1]['party'] == '328': # Democrate
                my_node_color.append('blue')
            else: # Republican
                my_node_color.append('red')
        else: # Bill
            if node[1]['passed'] == 1: # Passed
                my_node_color.append('green')
            else: # Not Passed
                my_node_color.append('orange')
    return my_node_color

Define function to create a color map for the edges of the Networkx graph object. Vote edges are colored according to the following rules.

- 'Yea': Green
- 'Nay': Orange
- Other vote: Gray

In [None]:
# Assign edge colors based on vote attribute.

my_edge_color = [ ]

for ed in g_s113.edges(data=True):
    if ed[2]['vote'] == '1':
        my_edge_color.append('green')
    elif ed[2]['vote'] == '6':
        my_edge_color.append('orange')
    else:
        my_edge_color.append('gray')

def generateColorMapForEdges(graph):
    my_edge_color = [ ]

    for ed in graph.edges(data=True):
        if ed[2]['vote'] == '1':
            my_edge_color.append('green')
        elif ed[2]['vote'] == '6':
            my_edge_color.append('orange')
        else:
            my_edge_color.append('gray')
    return my_edge_color

Separate the nodes into three sets: Republican nodes, Democrat nodes, and Senate bill nodes in order to properly position the nodes for display of the bipartite graph in three columns.

In [None]:
# Assign the position

# Separate into 3 sets
senator_nodes = {n for n, d in g_s113.nodes(data=True) if d["bipartite"] == 0}
vote_nodes = set(g_s113) - senator_nodes
rep_nodes = {n for n, d in g_s113.nodes(data=True) if d["bipartite"] == 0 and d["party"] == '200'}
dem_nodes = {n for n, d in g_s113.nodes(data=True) if d["bipartite"] == 0 and (d["party"] == '100' or d["party"] == '328')}

Assign the x-axis of the position based on the set of the node.

In [None]:
# Separate by group
pos = {}

# Update position for node from each group
pos.update((node, (1, index)) for index, node in enumerate(dem_nodes))
pos.update((node, (2, index)) for index, node in enumerate(vote_nodes))
pos.update((node, (3, index)) for index, node in enumerate(rep_nodes))

Assign the x-axis of the position based on the NOMINATE dimension of each node.

In [None]:
# Separate by group
pos_nom_dim = {}

# Update position for node from each group
pos_nom_dim.update((node, (s113_members_sel_df.loc[s113_members_sel_df['icpsr'] == node, 'nominate_dim1'].item(), index)) for index, node in enumerate(dem_nodes))
pos_nom_dim.update((node, (0, index+11)) for index, node in enumerate(vote_nodes))
pos_nom_dim.update((node, (s113_members_sel_df.loc[s113_members_sel_df['icpsr'] == node, 'nominate_dim1'].item(), index+6)) for index, node in enumerate(rep_nodes))

Confirm the graph is connected.

In [None]:
# graph[0] has all the nodes and edges
nx.is_connected(g_s113)

### Bipartite Graph: All Senators and Bills
Plot the graph of all the nodes, all the edges and x-axis value based on the NOMINATE dimension.

In [None]:
plt.figure(figsize = (16, 40))
plt.tight_layout()
plt.axis("off")
nx.draw_networkx(g_s113, 
                 node_color = my_node_color,
                 edge_color = my_edge_color,
                 labels = label_map,
                 pos = pos)

The above plot shows the Democrats on the left, Republicans on the right, Senate bills in the middle
- Final bill votes in the affirmative are colored green
- Final bill votes in rejection are colored orange
- Edges represent the votes by the Senators

As expected for a Congress with a majority of Democrats, the bills that come to a vote (out of commiittee) are likely to be Democrat-friendly and thus a majority of the voting reflects the partisan divide with green 'Yea' votes by Democrats and orange 'Nay' votes by Republicans.

Write graph object to .graphml for inspection with Gephi.

In [None]:
graphml_file = "S113_voting.graphml"
nx.write_graphml( g_s113, graphml_file )

Next, we remove all 'Nay' votes to see if a clearer pattern emerges with just the 'Yea' votes.

In [None]:
dict_edge = nx.get_edge_attributes(g_s113, "vote")
dict_edge_remove = {key:val for key, val in dict_edge.items() if val == '6'}

In [None]:
g_s113_yes_edges = g_s113
g_s113_yes_edges.remove_edges_from(list(dict_edge_remove))

In [None]:
# Update map of edges
edge_map = generateColorMapForEdges(g_s113_yes_edges)

### Bipartite Graph: Only Yea Votes

In [None]:
plt.figure(figsize = (16, 40))
plt.tight_layout()
plt.axis("off")
nx.draw_networkx(g_s113_yes_edges, 
                 node_color = my_node_color,
                 edge_color = edge_map,
                 labels = label_map,
                 pos = pos)

By removing the 'Nay' votes, we do see some bills with predominantly 'Yea' votes by Republicans. The left side of the plot does indicate a much higher volume of 'Yea' votes by Democrats.

### Bipartite Graph: Only Nay Votes
Now, let's inverse the above and only display 'Nay' votes.

In [None]:
dict_edge = nx.get_edge_attributes(g_s113, "vote")
dict_edge_remove = {key:val for key, val in dict_edge.items() if val == '1'}

In [None]:
# Reset g_s113 graph
graphs = list(nx.connected_component_subgraphs(G))
g_s113 = graphs[0]
g_s113_no_edges = g_s113
g_s113_no_edges.remove_edges_from(list(dict_edge_remove))

In [None]:
# Update map of edges
edge_map = generateColorMapForEdges(g_s113_no_edges)

In [None]:
plt.figure(figsize = (16, 40))
plt.tight_layout()
plt.axis("off")
nx.draw_networkx(g_s113_no_edges, 
                 node_color = my_node_color,
                 edge_color = edge_map,
                 labels = label_map,
                 pos = pos)

We do see the majority of 'Nay' votes are cast by Republicans and three final bill votes receive a high number of 'Nay' votes by Democreats. (S16, S2280, S1003). The voting does fall predominantly along party lines and consistent with a Democratic majority but some bill votes indicate exceptions to that assumption.

### Bipartite Graph: Bills with Affirmative Final Vote

For the 113th Congress, only 10 Senate bills were passed by the Senate. A final affirmative vote does not indicate the bill was passed and signed into law.

Now, let's show just the bills with final votes in the affirmative.

In [None]:
# Reset g_s113 graph
graphs = list(nx.connected_component_subgraphs(G))
g_s113 = graphs[0]
# Get all the nodes with Passed = 0
bills_not_passed_nodes = {n for n, d in g_s113.nodes(data=True) if d["bipartite"] == 1 and d['passed'] == 0}
# Get all the edges connected to Removed Nodes
edges_to_remove = nx.edges(g_s113, bills_not_passed_nodes)
# Remove those nodes from the Graph
g_s113_passed_bills = g_s113
g_s113_passed_bills.remove_nodes_from(bills_not_passed_nodes)
# Remove those edges
g_s113_passed_bills.remove_edges_from(edges_to_remove)
# EdgeColorMap
node_map = generateColorMapForNodes(g_s113_passed_bills)
edge_map = generateColorMapForEdges(g_s113_passed_bills)
label_map = generateLabelMapForNodes(g_s113_passed_bills)

In [None]:
# Plot
plt.figure(figsize = (16, 40))
plt.tight_layout()
plt.axis("off")
nx.draw_networkx(g_s113_passed_bills, 
                 node_color = node_map,
                 edge_color = edge_map,
                 labels = label_map,
                 pos = pos)

Of the 36 Senate bills in the 113th Congress, only 16 bills received an affirmative final vote. For those with a final affirmative vote, each one garnered high Democratic support.

### Bipartite Graph: Bills with Negative Final Vote

Now, only bills that received a negative final vote.

In [None]:
# Reset g_s113 graph
graphs = list(nx.connected_component_subgraphs(G))
g_s113 = graphs[0]
# Get all the nodes with Passed = 0
bills_passed_nodes = {n for n, d in g_s113.nodes(data=True) if d["bipartite"] == 1 and d['passed'] == 1}
# Get all the edges connected to Removed Nodes
edges_to_remove = nx.edges(g_s113, bills_passed_nodes)
# Remove those nodes from the Graph
g_s113_not_passed_bills = g_s113
g_s113_not_passed_bills.remove_nodes_from(bills_passed_nodes)
# Remove those edges
g_s113_not_passed_bills.remove_edges_from(edges_to_remove)
# EdgeColorMap
node_map = generateColorMapForNodes(g_s113_not_passed_bills)
edge_map = generateColorMapForEdges(g_s113_not_passed_bills)
label_map = generateLabelMapForNodes(g_s113_not_passed_bills)

In [None]:
# Plot
plt.figure(figsize = (16, 40))
plt.tight_layout()
plt.axis("off")
nx.draw_networkx(g_s113_not_passed_bills, 
                 node_color = node_map,
                 edge_color = edge_map,
                 labels = label_map,
                 pos = pos)

Of the 36 Senate bills, 20 bills received a negative final bill vote. Again, we note that three bills indicate an inverse of expectations: high Republican support and low Democratic support (S16, S1003, S2280).

### Clustering Method on Foreign and Defense Policy Bills

To focus the voting analysis even further, we've chosen to separate the Senate bills by Clausen category and then apply the island method on the bills categorized as foreign and defense policy. We wanted to assess the level of polarization on bills often receiving a high level of bipartisan support.

The plot below shows all the votes for just the bills categorized as foreign and defense policy. Of the eight bills, only half receive a final affirmative vote. Of the 10 Senate bills passed during the 113th Congress, only one is categorized as foreign and defense policy (S1917). Three bills (S1917, S25, S1963) appear to receive near unanimous approval.

In [None]:
# Reset g_s113 graph
graphs = list(nx.connected_component_subgraphs(G))
g_s113 = graphs[0]
# Get all the nodes with cc_agr = 0, these will be nodes to remove in order to focus on Foreign and Defense Policy
nodes_to_remove = {n for n, d in g_s113.nodes(data=True) if d["bipartite"] == 1 and d['cc_for_def'] == 0}
# Get all the edges connected to Removed Nodes
edges_to_remove = nx.edges(g_s113, nodes_to_remove)
# Remove those nodes from the Graph
g_s113_bills_cc_agr = g_s113
g_s113_bills_cc_agr.remove_nodes_from(nodes_to_remove)
# Remove those edges
g_s113_bills_cc_agr.remove_edges_from(edges_to_remove)
# EdgeColorMap
node_map = generateColorMapForNodes(g_s113_bills_cc_agr)
edge_map = generateColorMapForEdges(g_s113_bills_cc_agr)
label_map = generateLabelMapForNodes(g_s113_bills_cc_agr)

# Plot
plt.figure(figsize = (16, 40))
plt.tight_layout()
plt.axis("off")
nx.draw_networkx(g_s113_bills_cc_agr, 
                 node_color = node_map,
                 edge_color = edge_map,
                 labels = label_map,
                 pos = pos)

### Graph Construction

In order to evaluate the Senator voting patterns and level of polarization based on the foreign and defense policy Senate bills, we separate each bill vote node into two nodes, one node for 'Yea' votes and another for 'Nay' votes. By separating the positive and negative votes, we can assess which Senators share a higher number of common neighbors, otherwise all Senators would share the same common neighbors because every Senators votes on almost every Senate bill as that is their job.

First, add the nodes for each Senator to the graph object. As expected, the result is 105 nodes, one for each Senator.

In [None]:
G_bills_split = nx.Graph()

for index, sen in s113_members_sel_df.iterrows():
    G_bills_split.add_node(sen.icpsr, bipartite=0, 
                           name=sen.bioname,
                           party=sen.party_code,
                           state=sen.state_abbrev,
                           nom_dim=sen.nominate_dim1)
    
len(G_bills_split.nodes())

Next, we add a node for each foreign and defense policy related bill and postfix '-P' or '-NP' to the $rollnumber$ value to indicate a positive or not positive vote. This approach results in the addition of 16 nodes, two each for the 8 bills.

In [None]:
# Loop through the bills and only add defense bills twice
for index, bill in senate_bills_df.iterrows():
    if bill.cc_for_def == 1:
        node_id = bill.rollnumber + '-P'
        G_bills_split.add_node(node_id, bipartite=1,
                   date=bill.date,
                   bill_number=bill.bill_number,
                   #passed=bill.passed,
                   desc=bill.vote_desc,
                   passed=1)
        node_id = bill.rollnumber + '-NP'
        G_bills_split.add_node(node_id, bipartite=1,
                   date=bill.date,
                   bill_number=bill.bill_number,
                   #passed=bill.passed,
                   desc=bill.vote_desc,
                   passed=0)
len(G_bills_split.nodes())

Now, we add the appopriate edge for each Senator vote to the positive or not positive node of each bill.

In [None]:
for index, row in senate_bill_votes_df.iterrows():
    # Only consider the def bill roll numbers
    if row.rollnumber in [ '245','317','326','337','350','353','370','573']:
        if row.cast_code == '1':
            G_bills_split.add_edge(row.icpsr, row.rollnumber+'-P')
        elif row.cast_code == '6':
            G_bills_split.add_edge(row.icpsr, row.rollnumber+'-NP')

In [None]:
#(len(G_bills_split.nodes()), len(G_bills_split.edges()))

The resulting graph object is not connected. Six nodes are not connected because two bills received only positive votes and four Senators did not vote on any of the foreign and defense policy bills.

In [None]:
nx.is_connected(G_bills_split)

In [None]:
graphs = list(nx.connected_component_subgraphs(G_bills_split))
g_s113_bill_split = graphs[0]
len(g_s113_bill_split.nodes()) #121 to 115, 6 were removed

As in the section above, we generate the label and color of the nodes.

In [None]:
my_labels = [ v[1]['name'] if v[1]['bipartite']==0 else v[1]['bill_number'] for v in g_s113_bill_split.nodes(data=True)]
#len(my_labels)
label_map = { g_s113_bill_split.nodes()[i] : my_labels[i] for i in range(len(g_s113_bill_split.nodes())) }

In [None]:
# Assign node colors

my_node_color = [ ]

for node in g_s113_bill_split.nodes(data=True):
    if node[1]['bipartite'] == 0: # Senator
        if node[1]['party'] == '100' or node[1]['party'] == '328': # Democrate
            my_node_color.append('blue')
        else: # Republican
            my_node_color.append('red')
    else: # Bill
        if node[1]['passed'] == 1: # Passed
            my_node_color.append('green')
        else: # Not Passed
            my_node_color.append('orange')

In [None]:
# Assign the position

# Separate into 3 sets
senator_nodes = {n for n, d in g_s113_bill_split.nodes(data=True) if d["bipartite"] == 0}
vote_nodes = set(g_s113_bill_split) - senator_nodes
rep_nodes = {n for n, d in g_s113_bill_split.nodes(data=True) if d["bipartite"] == 0 and d["party"] == '200'}
dem_nodes = {n for n, d in g_s113_bill_split.nodes(data=True) if d["bipartite"] == 0 and (d["party"] == '100' or d["party"] == '328')}

In [None]:
# Separate by group
pos = {}

# Update position for node from each group
pos.update((node, (1, index)) for index, node in enumerate(dem_nodes))
pos.update((node, (2, index)) for index, node in enumerate(vote_nodes))
pos.update((node, (3, index)) for index, node in enumerate(rep_nodes))

In [None]:
# Separate by group
pos_nom_dim = {}

# Update position for node from each group
pos_nom_dim.update((node, (s113_members_sel_df.loc[s113_members_sel_df['icpsr'] == node, 'nominate_dim1'].item(), index)) for index, node in enumerate(dem_nodes))
pos_nom_dim.update((node, (0, index+11)) for index, node in enumerate(vote_nodes))
pos_nom_dim.update((node, (s113_members_sel_df.loc[s113_members_sel_df['icpsr'] == node, 'nominate_dim1'].item(), index+6)) for index, node in enumerate(rep_nodes))

In [None]:
# graph[0] has all the nodes and edges
# graph[1] contains just a single node 14920, who did not vote on anything
nx.is_connected(g_s113_bill_split)

In [None]:
# Set to True for plot
create_plot = False
if create_plot:
    plt.figure(figsize = (16, 40))
    plt.tight_layout()
    plt.axis("off")
    nx.draw_networkx(g_s113_bill_split, 
                     node_color = my_node_color,
                     #edge_color = my_edge_color,
                     labels = label_map,
                     pos = pos)

### Weighted Projected Graph

Following the textbook approach, we apply the weighted project graph to the Senator nodes of the bipartite graph. The Senator node degrees of the weighted project graph are computed as the total number of shared contacts. The degrees of the nodes and the weight of the edges provide an additional level of information within the whole group.

In [None]:
# apply island method to just this graph
import math
import seaborn as sns

# Reset figure size for plots
#plt.rcParams.update(plt.rcParamsDefault)

In [None]:
# project bipartite graph onto Senator nodes
W = bipartite.weighted_projected_graph(g_s113_bill_split, senator_nodes, ratio=False)

In [None]:
weights=[math.log(edata['weight']) for f,t,edata in W.edges(data=True)]

The plot based on the weighted projected graph of Senator nodes shows the connections between the individuals with the thicker, lighter colored edges indicating a greater strength in the relationship. The yellow lines below indicate the higher number of shared votes. Also note, we take the logarithm of the weight values to decrease the range of values.

In [None]:
nx.draw_networkx(W, width=weights, edge_color=weights)

In [None]:
senators_wgt_dict = {}
for s in senator_nodes:
        senators_wgt_dict[s] = (W.degree(s,weight='weight'))
dt={k: v for k, v in sorted(senators_wgt_dict.items(), key=lambda item: item[1], reverse=True)}
df=pd.DataFrame.from_dict(dt, orient='index').reset_index()
df.columns = ['Senator','Degree']
#df

In order to prune the plot of the weighted projected graph, we need to select an edge weight threshold to ensure only stronger relationships. The histogram below indicates most edge weights bare less then 2.0. We will start with a threshold of 2.0 to focus the initial graph.

In [None]:
# Need to create histogram of the weights
plt.rcParams.update(plt.rcParamsDefault)
sns.histplot(data=weights).set(title='Network of Senators: Edge Weight Histogram', xlabel='Log of Edge Weight', ylabel='Count');

Following the textbook example, we create a function to trim the graph edges based on a weight threshold.

In [None]:
# From textbook
def trim_edges(g, weight=1):
    g2 = nx.Graph()
    g2.add_nodes_from(g.nodes(data=True))
    #g2 = g.__class__()
    #g2.add_nodes_from(g)
    #g2=nx.Graph()
    for f, to, edata in g.edges(data=True):
        if edata['weight'] > weight:
            g2.add_edge(f,to,edata)
    return g2

In [None]:
## The weights histogram is logarithmic;
## we should compute the original weight = e^log_weight
Wnet_trim=trim_edges(W, weight=math.exp(2.0))
# Remove node isolates
Wnet_trim.remove_nodes_from(list(nx.isolates(Wnet_trim)))

______

## Results

We plot the network graph connecting Senator nodes with an edge-weight 2.0 or greater. A second plot constructed using a graphml file in Gephi provides a clearer depiction of the clusters with the Senator nodes labeled by name.

In [None]:
## re-calculate weights based on the new graph
weights=[edata['weight'] for f,t,edata in Wnet_trim.edges(data=True)]
plt.rcParams['figure.figsize'] = [10, 10]
nx.draw_networkx(Wnet_trim,width=weights, edge_color=weights)

<img src="s113_wgt2-0.png" width="1200" />

In [None]:
Wnet_file = "S113_Island_2-0.graphml"
nx.write_graphml( Wnet_trim, Wnet_file )

In [None]:
g_islands = list(nx.connected_component_subgraphs(Wnet_trim))
len(g_islands)

for g in g_islands:
    df1 = (s113_members_sel_df[s113_members_sel_df['icpsr'].isin(g.nodes())][['bioname','party_code', 'nominate_dim1']])
    #print(df1)
    #print(len(df1))

The resulting graph plot indicates 8 islands or clusters. The make-up of those 8 clusters are as follows:

- 37 Democrats
- 7 Democrats
- 2 Democrats
- 12 Republicans
- 9 Republicans
- 5 Republicans
- 2 Republicans
- 2 Republicans

Now that we've established a number of distinct clusters within each party, we decrease the edge weight threshold to 1.9 in an attempt to broaden the affiliation networks and decrease the number of clusters. We expect to find two distinct clusters that fall along party lines.

In [None]:
## The weights histogram is logarithmic;
## we should compute the original weight = e^log_weight
Wnet_trim=trim_edges(W, weight=math.exp(1.9))
Wnet_trim.remove_nodes_from(list(nx.isolates(Wnet_trim)))

In [None]:
## re-calculate weights based on the new graph
weights=[edata['weight'] for f,t,edata in Wnet_trim.edges(data=True)]
nx.draw_networkx(Wnet_trim,width=weights, edge_color=weights)

In [None]:
Wnet_file = "S113_Island_1-9.graphml"
nx.write_graphml( Wnet_trim, Wnet_file )

By decreasing the threshold, we find an expected result of two distinct clusters. Let's confirm the clusters follow party lines.

In [None]:
g_islands = list(nx.connected_component_subgraphs(Wnet_trim))
len(g_islands)

for g in g_islands:
    df1 = (s113_members_sel_df[s113_members_sel_df['icpsr'].isin(g.nodes())][['bioname','party_code']])
    #print(df1)
    #print(len(df1))

By using the graphml file of the graph object in Gephi, we are able to detect the two clusters are split along party lines except Republican Senator from Alaska, Lisa Murkowski, falls in the cluster with the Democrats (bottom left of the plot).

<img src="s113_wgt_1-9.png" width="1200" />

### Boundary Spanners
In order to find boundary spanners of the two clusters, we lower the edge weight threshold to 1.77 to ensure the resulting graph is completely connected.

In [None]:
## The weights histogram is logarithmic;
## we should compute the original weight = e^log_weight
Wnet_trim=trim_edges(W, weight=math.exp(1.77))
Wnet_trim.remove_nodes_from(list(nx.isolates(Wnet_trim)))

In [None]:
## re-calculate weights based on the new graph
plt.rcParams['figure.figsize'] = [10, 10]
weights=[edata['weight'] for f,t,edata in Wnet_trim.edges(data=True)]
#nx.draw_networkx(Wnet_trim,width=weights, edge_color=weights)

<img src="s113_wgt_1-77.png" width="1200" />

In [None]:
Wnet_file = "S113_Island_1-77.graphml"
nx.write_graphml( Wnet_trim, Wnet_file )

With a fully connected graph based on the weighted projected graph, we identify 11 primary boundary spanners connecting the Democrat and Republican clusters. On the Democratic side, three individuals connect to eight individuals on the Republican side.

The Democratic side consists of:

- Harry Reid
- Ronald Lee Wyden
- Bill Nelson

The Republican side consists of:

- Lisa Murkowski
- Chuck Grassley
- Jerry Moran
- Mitch McConnell
- David Vitter
- Susan Collins
- Mike Johanns
- Dean Heller

Interestingly as before, Murkowski shares more common neighbors with Democratics for the foreign and defense policy bill votes despite being a Republican. 

______

## Discussion

The use of network analysis reaffirmed assumptions about the polarization of the voting patterns for a recent Senate while also providing insights into instances of unanimous votes and individual Senators crossing the party line vote.

* By plotting the Senators according to party, we were able to clearly see the polarization in the voting patterns on Senate bills between Republicans and Democrats. Also, ws we narrowed the focus of the analysis to foreign and defense policy bills, we identified a three unanimous votes that indicates some topics do galvanize the support of all Senators no matter the party.
* The island clustering method did highlight small factions within each party. The Republicans had 5 distinct clusters with a maximum count of 12 Senators while the Democrats had one primary cluster of 37 Senators and two smaller clusters.
* In the focused analysis, we identified the Republican Senator Lisa Murkowski shares a closer voting pattern with Democrats for the small sample size of foreign and defense policy bills.
__________________________________

# $\color{blue}{\text{Part Two:  Presidential Nominations}}$

## Background

The President of the United States nominates officials to certain high positions with the advice and consent of the Senate.   Such positions include federal judges, ambassadors, Cabinet level officials, directors or heads of regulatory or law enforcement agencies. During the 113th Congress, the Senate considered and approved 188 nominations.  105 Senators participated in some or all of the nomination rollcalls but only 100 active members at any one time. Typically nominations are not rejected because the nominee is not considered for a full Senate vote unless the administration and Senate leaders have the required votes to proceed.

During the 113th Congress, the Senate was controlled by the Democratic party with 53 members in the Democratic Party and 2 independents who caucused with the Democrats for a total of 55 votes.  The Republican side hasd 45 members.  The 10 vote advantage to the Democratic party enabled it to pass legislation which became more difficult in the 114th Congress when control shifted to the Republican Party.

Our analysis considers two questions:

1.  Did Senators vote for presidential nomination in a coordinated manner based on a network analysis of the rollcall voting?

2.  Did nominations show clustering due to coordination of Senatorial rollcall voting?   If so, is the clustering related to the department of the nomination?

Our dataset consisted of 105 senators, 188 nominations (rollcalls) and 18800 member votes on those nominations.

___




## Methods

To analyze both questions above, we build and evaluate 2 weighted projections of bipartite graphs in which the set of nodes are $A$ senators and $B$ rollcalls of nominations.

However, we take a slightly different approach to defining the rollcall votes as follows:

*  The set $A$ of senators is encoded by their `icpsr` code.  There are 105 senators as stated earlier.

*  The set $B$ of rollcalls is split into two sets based on `yea` or `nay` outcomes.   $B_y$ is the set of nodes representing yea votes on a nomination.   $B_n$ is the set of nodes representing `nay` votes on a nomination.
Since $\lvert  B \rvert = 188$, there are twice as many nodes 

$$\lvert B_y \rvert + \lvert B_n \rvert = 2 \cdot \lvert B \rvert = 376$$.


This approach has some advantages:

*  construct a undirected, unweighted graph that encodes all the voting information 
*  instead of using a weighted graph that requires assigning numerical scores to both yea, nay and absent votes.


Let $M = 105$ denote the number of Senators.   Let $N = 188$ denote the number of nominations.


### Weighted projection of the Senate Voting

To analyze the coordinated voting of Senators, we will examine the weighted projection of this above bipartite graph $G$ on its senators $A$.
We construct the biadjacency matrix $\mathbf{W}$ of size $M \times 2N$ which contains 0-1 entries where $W[i,j1] = 1$ means senator $i$ cast a yea vote for nominee $j$ if $j1$ is in the yea group of columns or a nay vote if $j1$ is in the nay group of columns.

The weighted projected graph of Senators based on their common votes has an adjacency matrix with self-loops of $\mathbf{PW}$ defined as:

$$\mathbf{ PW } = \mathbf{W} \cdot \mathbf{W^T}$$

The weighted projected graph of Senators with no self-loops (the graph which we will visualize below) has an adjacency matrix:

$$\mathbf{ PW^{*}} = \mathbf{ W \cdot W^{T} - diag( W \cdot W^{T} ) }$$

Our approach is to build the weighted projection of the Senator bipartite graph with no self-loops from the matrix $\mathbf{ PW^{*}}$ rather than use the networkx functions for projection.


### Weighted projection of Nomination Support

We are also interested in whether nominations garner the same support and resistance from Senators.

To analyze the similarity of nomination support, we will examine the weighted projection of a slightly different bipartite graph $H$.
The bipartite graph $H$ has two sets of senators and one set of nominations.

We construct two nodes $s_y$ and $s_n$ representing `yea` votes or `nay` votes for each senator $s$.  These nodes are contained in the two sets $A_y$ and $A_n$ respectively corresponding to the set of senators $A$.

We construct the biadjacency matrix $\mathbf{Y}$ of size $N  \times 2M $ which contains 0-1 entries where $Y[i,j1] = 1$ means nomination $i$ got a yea vote from Senator $j$ if $j1$ is in the yea group of columns or a nay vote if $j1$ is in the nay group of columns.

The weighted projected graph of nominations based on their common votes with self-loops has an adjacency matrix $\mathbf{PY}$ defined as:

$$\mathbf{ PY } = \mathbf{Y} \cdot \mathbf{Y^T}$$

The weighted projected graph of nominations with no self-loops (the graph which we will visualize below) has an adjacency matrix:

$$\mathbf{ PY^{*}} = \mathbf{ Y \cdot Y^{T} - diag( Y \cdot Y^{T} ) }$$

Our approach is to build the weighted projection of the nomination bipartite graph with no self-loops from the matrix $\mathbf{ PY^{*}}$ rather than use the networkx functions for projection.

In [None]:
import warnings
warnings.filterwarnings('ignore')


import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import networkx as nx
import networkx.algorithms.bipartite as bipartite
import json
import os
import re
import matplotlib.pyplot as plt
import operator

We load the rollcall data from json or csv to datafram in the code below.   It is necessary to split certain text strings to separate the nominee from the position to create three derived fields: `State` (of residence of the nominee), `Name` of the nominee and `Position` (e.g. Secretary of Treasury).   The data is stored in the dataframe `nominations`.  

In [None]:
with open("S113_rollcalls.json", 'r') as f:
    rollcall_data = json.load(f)

raw_rollcall = pd.DataFrame(rollcall_data)

stage_nomination_rollcalls = raw_rollcall[ raw_rollcall["bill_number"].notnull() ]

stage_nomination_rollcalls = stage_nomination_rollcalls[ stage_nomination_rollcalls["bill_number"].str.startswith('PN') ]

# Select only useful columns
nomination_rollcalls = stage_nomination_rollcalls[['rollnumber', 'date', 'yea_count', 'nay_count' ,
                                                   'nominate_mid_1' , 'bill_number' , 'vote_result' , 'vote_desc' ,
                                                   'vote_question' , 'clausen_codes' , 'peltzman_codes', 'issue_codes'
                                                   ]]

# Extract the set of nominees based solely on the vote question.
# The name is always the first string in Vote Desc before , of

vote_questions = nomination_rollcalls[ nomination_rollcalls['vote_question'] == 'On the Nomination' ].copy().reset_index(drop=True)

df_nominations = vote_questions['vote_desc'].apply(str.strip).str.split(', of', 1, expand = True).rename(
     columns = {0: 'Name', 1: 'StatePosition'})

df_nominations['StatePosition'] = df_nominations['StatePosition'].apply(str.strip)

state_position = df_nominations['StatePosition'].apply( lambda x: re.split(', to be |, for the rank of ',x ))

nomination_state_pos = pd.DataFrame( state_position.tolist(), columns = ['State', 'Position'] )

issue_codes = vote_questions["issue_codes"].apply(pd.Series).rename( columns = {0: 'issue_code1', 1: 'issue_code2'} )
clausen_codes = vote_questions["clausen_codes"].apply(pd.Series).rename( columns = {0: 'clausen_code1'} )

nominations = pd.concat( [ df_nominations, nomination_state_pos, issue_codes["issue_code1"], clausen_codes, 
                vote_questions[['rollnumber', 'date', 'yea_count', 'nay_count', 'nominate_mid_1', 'bill_number', 'vote_result']]], axis = 1)


We also need to identify the broad department of each candidate.  We assign custom category based on the nominee's agency or department, `Position` and `issue_code`.

In [None]:
#
# This function allows us to convert each nomination to its branch
# based on the department associated with the position.
#
def translate_dept(row):
    
    iss1 = row['issue_code1']
    pos = row['Position']
    
    if iss1 in [ 'Judiciary',  'Banking and Finance' ,  'Energy' ]:
        return iss1
    if operator.contains(pos, "Internal Revenue") or operator.contains(pos, "Bank"):
        return "Banking and Finance"
    
    if operator.contains( pos, "Ambassador") or operator.contains(pos, "Secretary of State"):
        return "State"
    
    if pos in ['Secretary of Defense', 
               'Secretary of the Air Force' 
               ] or operator.contains(pos, "Central Intelligence"):
        return "Defense"
    if operator.contains(pos, "Office of Management and Budget") or \
       operator.contains(pos, "Personnel Management" ):
       return "Executive"
    
    if operator.contains(pos, "Homeland Security"):
        return "Homeland Security"
    if operator.contains(pos, "National Labor Relations") or \
        operator.contains(pos, "Privacy and Civil Liberties") or \
        operator.contains(pos, "Environmental Protection") or \
        operator.contains(pos, "Federal Trade Commission") or \
        operator.contains(pos, "Equal Employment Opportunity Commission") or \
        operator.contains(pos, "Nuclear Regulatory") or \
        operator.contains(pos, "Tennessee Valley") or \
        operator.contains(pos, "Consumer Product Safety"):
        return "Regulatory"
    
    if operator.contains(pos, "Secretary of the Interior") or \
        operator.contains(pos, "Land Management") or \
        operator.contains(pos, "Commerce") or \
        operator.contains(pos, "Transportation") or \
        operator.contains(pos, "of Labor"):
        return "Domestic, Commerce"
    if operator.contains(pos, "Health and Human Services") or \
        operator.contains(pos, "Social Security") or \
       operator.contains(pos, "Housing and Urban Development") or \
       operator.contains(pos, "Veterans") or \
         operator.contains(pos, "Public Health") or \
       operator.contains(pos, "Medicare"):
        return "Human Services"
    if operator.contains(pos, "Federal Bureau of Investigation") or \
       operator.contains(pos, "Attorney General") or \
       operator.contains(pos, "Alcohol, Tobacco, Firearms"):
        return   "Justice"
    
    return 'Other'

nominations["dept"] = nominations.apply(lambda row: translate_dept(row), axis = 1)

# Keep the important columns
nominations = nominations[["Name", "State", "dept", "Position", "clausen_code1", "rollnumber", "date", "yea_count", 
                           "nay_count", "nominate_mid_1", "bill_number", "vote_result"]]



Next, we load the Senators and their key attributes into a dataframe `members`.   For simplicity, the two independent Senators (Angus King and Bernie Sanders) are treated as Democrats because they caucus with the Democratic Party.  We construct a derived field `short_name` to combine party, state abbreviation and last name.

In [None]:
# Load the Senate Members

members_raw = pd.read_csv("S113_members.csv")

#
#  Need to drop Obama.  The president is in the file.
#
df_members = members_raw[ members_raw["chamber"] == "Senate" ]

def convert_party_code(row):
    
    party_code = row["party_code"]
    
    if party_code == 100:
        return "Democrat"
    if party_code == 200:
        return "Republican"
    if party_code == 328:
        return "Independent"
    return "Unknown"

df_members["party"] = df_members.apply( lambda row:  convert_party_code(row), axis = 1 )

def convert_bioname(row):
    
    name = row["bioname"]
    state = row["state_abbrev"]
    party_code = row["party_code"]
    
    X = name.split(", ")
    
    last_name_init =  X[0] 

    if party_code == 200:
        short_code = "R"
    else:
        short_code =  "D"

    return short_code + "-" + state + "-" + last_name_init

df_members["short_name"] = df_members.apply(lambda row: convert_bioname(row), axis = 1)

# Use this dataframe
# --------------------
members = df_members[["icpsr", "state_abbrev", "party", "occupancy", "bioname", "short_name",  "born", "nominate_dim1", "state_icpsr"]]  

members.reset_index(inplace=True)  # Zero out the index


The final dataset is the member votes.  We need to translate the votes to `yea`, `nay` or abstention.  This is done using the `convert_castcode`.
The last step is to inner join our members, rollcalls and member votes into a single wide dataframe which will be used to build our matrices and bipartite graphs.
The resulting dataframe is `join_select` below.

In [None]:
## Finally, let's load the votes for each nomination
# Need to drop the votes of the president.
votes_raw = pd.read_csv("S113_votes.csv")

head_votes = votes_raw.head()  # for testing

head_nominations = nominations.head()  # for testing

# Join with nominations on the raw votes.  But this will include the President's vote which needs to be omitted.
#
join_votes_nominations = pd.merge( nominations, votes_raw , how = 'inner', left_on = "rollnumber", right_on = "rollnumber")

head_join_votes_nominations = join_votes_nominations.head()  # for testing

join_all = pd.merge( join_votes_nominations, members, how = "inner", left_on = "icpsr", right_on = "icpsr")

def convert_castcode(row):
    
    cast_code = row["cast_code"]

    if cast_code == 1:
        return "Yes"
    if cast_code == 6:
        return "No"
    return "Present"

join_all["cast_value"] = join_all.apply( lambda row :  convert_castcode(row), axis = 1 )

head_join_all = join_all.head() # Test

#
# Contains the nominee, roll call result, senator vote and party.
# Useful for building the network graph.
# -------------------------------------------------------------------
join_select = join_all[["Name", "dept", "Position", "clausen_code1", "rollnumber", "date", \
                        "yea_count", "nay_count", "bill_number", "icpsr", \
                        "cast_value", "state_abbrev", "party", "short_name", "bioname" ]]


In the next step, we use the `join_select` dataframe to build the senator covoting biadjacency matrix which we denoted as $\mathbf{W}$ above.   We save a csv file version for reference.
We initialize $\mathbf{W}$ as a matrix of zeroes and populate its non-zero entries based on the voting records.    The matrix $\mathbf{W}$ is partitioned into two submatrics $\mathbf{W_y}$ on the left side and $\mathbf{W_n}$ on the right side.

In [None]:
#  Build a biadjacency matrix using the M senators and 2 * N nominations
#  Assign an explicit senator order ex ante to a list
#  Assign an explicit yes-nomination and no-nomination list.  2*N
#  The entries are 1 for vote of Yea or Nay in appropriate column.
#  Abstentions, present, no voting are treated as 0. 
# --------------------------------------------------------------------------
s_rollnumber =  nominations["rollnumber"]  # length is number of columns N  
s_icpsr = members["icpsr"]   # length is number of rows M

M = len(s_icpsr)  # rows
N = len(s_rollnumber)  # columns

m_bia = np.zeros( shape=( M, 2*N ) , dtype = np.int8)  # for senator covoting


for r, d in join_select.iterrows():
    
    row_i = s_icpsr[ s_icpsr == d["icpsr"]  ].index[0]
    
    cast_value = d["cast_value"]
    
    rollnumber = d["rollnumber"]
    
    rollnumber_index = s_rollnumber[ s_rollnumber == rollnumber ].index[0]
    
    if cast_value == "Yes":
        col_j = rollnumber_index
    if cast_value == "No":
        col_j = N + rollnumber_index
    
    m_bia[ row_i, col_j ] = 1

np.savetxt("Nomination_Biadjacency_Matrix.csv", m_bia, delimiter = ",")  

In the next step, we use the `join_select` dataframe to build the nomination cosupport biadjacency matrix which we denoted as $\mathbf{Y}$ above.   We save a csv file version for reference.
We initialize $\mathbf{Y}$ as a matrix of zeroes and populate its non-zero entries based on the voting records.    The matrix $\mathbf{Y}$ is partitioned into two submatrics $\mathbf{Y_y}$ on the left side and $\mathbf{Y_n}$ on the right side.  Each row of $\mathbf{Y}$ is indexed by a list of its rollnumber values.
Each column of $\mathbf{Y}$ is indexed by a list of the Senator `icpsr` codes for `yea` and then `nay` respectively.

In [None]:

m_bia2 = np.zeros( shape=(N , 2 * M ), dtype = np.int8 )  # for nomination co-support

for r, d in join_select.iterrows():
    
    cast_value = d["cast_value"]
    rollnumber = d["rollnumber"]

    
    row_i = s_rollnumber[ s_rollnumber == rollnumber ].index[0]
    
    icpsr_index =  s_icpsr[ s_icpsr == d["icpsr"]  ].index[0]
    
    
    if cast_value == "Yes":
        col_j = icpsr_index
    if cast_value == "No":
        col_j = M + icpsr_index
    
    m_bia2[ row_i, col_j ] = 1

np.savetxt("Senator_Biadjacency_Matrix.csv", m_bia2, delimiter = ",")  

Next, we do the matrix multiplications to obtain the adjacency matrix of the projected graph for Senator covoting.  As previously noted, the diagonal entries represent the number of times a Senator votes in the 113th Congress for nominations.  These could be thought of as self-loops which are not of interest.   We will construct a graph based on the upper triangular half of the matrix using `networkx` and save the result to `graphml` file format.

The graph construction algorithm in `make_covoting_graph` is parameterized by a threshold weight $\lambda$ which is the percentage of total nominations for this senators $i$ and $j$ concur in their votes for an edge to be added.
While scaling of the edge weights is not necessary to derive the clusters of nodes in the projected graph, the percentages are more interpretable than raw counts.

Clearly, when $\lambda = 0$, all edges of $\mathbf{W}$ are included into the graph.  However, when $\lambda = 55$, we only include edges between senators if they concur in over 55 percent of all nominations.

The results below are exported to `graphml` format to interactive visualization in the tool Gephi which have better plotting capabilities than `networkx`.


In [None]:

#
# Now we compute the matrix equivalent of the projection of
# The senators based on their common voting patterns for nominations.
# by multiplying the biadjacency matrix by its own transpose
# The resulting M X M matrix compares the number of common votes: yeas or nays where the senators concurred.
#
# Note that we only want to use the edges in the upper triangular half of the matrix.
# since the product is symmetric.
# -------------------------------------------------------------------------------
senator_covoting =  np.matmul( m_bia, m_bia.transpose())

#
#
# Now we draw the projected graph for the senators as an undirected
# weighted graph.  Note that we scale each senator covoting score
# by the number of nominations:  188
# to normalize the covoting from counts to fractions of votes.
# 
# Each threshold weight is entered in integer from 0-100
#
# Only include edges where covoting exceeds the threshold.
# -----------------------------------------------------------------
def make_covoting_graph( threshold_weight):

    G_covoting = nx.Graph()

    for r, d in members.iterrows():
    
        senator_id = d["icpsr"]
    
        G_covoting.add_node( senator_id, 
                 short_name = d["short_name"] ,
                 bioname = d["bioname"],
                 party = d["party"] ,
                 state_abbrev = d["state_abbrev"]
               )
        
    for i in range(M):
        for j in range(M):
            if i < j:            
                w_ij = senator_covoting[i,j]/188
            
                if w_ij > threshold_weight/100.0 :
                    G_covoting.add_edge( s_icpsr[i]  , s_icpsr[j]   ,weight= float(w_ij ) )


    graphml_file = "Senate_Covoting_K{}.graphml".format(threshold_weight) 

    nx.write_graphml( G_covoting , graphml_file)
    
    return G_covoting


In [None]:
G_covoting_55 = make_covoting_graph( 55  )

G_covoting_60 = make_covoting_graph( 60  )

G_covoting_65 = make_covoting_graph( 65  )


def convert_party_color(row):
    
    party = row["party"]
    
    if party == "Democrat":
        return "blue"
    if party  == "Republican":
        return "red"
    if party == "Independent":
        return "blue"
    return "Unknown"


covoting_node_colors = members.apply( lambda row :  convert_party_color(row), axis = 1 )

covoting_node_labels = { row["icpsr"] : row["short_name"] for i, row in members.iterrows()}

def plot_covoting(G_covoting, threshold):

    plt.figure(figsize = (12,10))
    plt.tight_layout()
    plt.axis("off")

    plt.suptitle("Senator Covoting {}".format(threshold), y = 1.05 ,fontsize= 18)

    plt.title("""
113 Congress - Nominations
""", fontsize=12)
    
    
    pos = nx.spring_layout(G_covoting)

    weights = [ G_covoting_55[u][v]['weight'] for u,v in G_covoting_55.edges() ]

    nx.draw_networkx_labels(G_covoting , pos = pos , labels = covoting_node_labels , font_size = 11 )

    nx.draw_networkx_edges(G_covoting , pos = pos , width =  weights, edge_color = 'gray', alpha = 0.1)

    nx.draw( G_covoting , pos = pos, with_labels = False, alpha = 0.6,  node_size = 100, node_color = covoting_node_colors )




The 3 plots below show the Senator weighted projection for three different $\lambda = 55, 60, 65$. 
The node colors correspond to party affiliation with $\color{blue}{\text{Democrats in blue}}$ and $\color{red}{\text{Republicans in red}}$.

A $\lambda$ increases, the graph density decreases.  We see a cluster of like-minded voting behavior and groups of mostly Republican outliers.


In [None]:
plot_covoting(G_covoting_55, 55)

plot_covoting(G_covoting_60, 60)

plot_covoting(G_covoting_65, 65)

Using the exported `graphml` file for the $\lambda = 55$ covoting graph, we used the interactive tool *Gephi* to render a more detailed and better graphic below than what we can make using `networkx`.

A link to the resulting Gephi project file is included below [Senator_Covote_K55.gephi](Senator_Covote_K55.gephi).

<img src = "Senate_Covoting_K55_v1.png" width = "1200"/>



The above graphic shows how Senators co-voted together in the 113th Congress on nominations.  Edges are shown only in they voted the same on $\lambda = 55$ percent of nominations.  

*  It was constructed using the Fruchterman-Reingold layout algorithm.  
*  $\color{blue}{\text{Democratic nodes are blue.}}$  $\color{red}{\text{Republican nodes are red.}}$
*  Node size depends on degree.

**Findings**

*  We observe that Republicans form a central cluster with coordination of their voting.  
*  Democrats were more distributed in their overall voting.
*  bipartisan links exist but is less frequent than intra-party concurrence.

   +  Senator Blumenthal Democrat often voted in concert with Republicans more than with fellow Democrats.
   +  Senator Collins Republican often voted with Democrats especially Senators Rockefeller (WV) and Markey (MA).
   

*  Some Senators on the periphery of the diagram did not follow the party line because:

   +  They had different opinions:  Senator Barrasso
   +  They resigned or died in office or were not elected in the usual fashion.
      


Next, we examine the co-voting subgraphs restricted by party.   How do the Democrats behave in their voting behavior compared to Republicans?   We construct these subgraphs by using the `networkx` `subgraph` function which takes a list of node identifiers as a parameter.   We examine the party restricted subgraphs for $\lambda = 55$.   The resulting subgraphs are exported in `graphml` format and visualized using `Gephi`.

We link to the Gephi files for [Democrats](Senate_Democrat_Covote_K55.gephi) and [Republicans](Senate_Republican_Covote_K55.gephi).


In [None]:
#
# Let's examine the subgraphs by party to understand
# their covoting behavior.
# -----------------------------------------------------
members_democrat = [  row["icpsr"] for d, row in members.iterrows() if row["party"] != "Republican"]

members_republican = [  row["icpsr"] for d, row in members.iterrows() if row["party"] == "Republican"]

G_democrat_covoting_55 = G_covoting_55.subgraph(members_democrat)

nx.write_graphml( G_democrat_covoting_55 , "Senate_Democrat_Covoting_K{}.graphml".format(55) )


G_republican_covoting_55 = G_covoting_55.subgraph(members_republican)

nx.write_graphml( G_republican_covoting_55 , "Senate_Republican_Covoting_K{}.graphml".format(55) )


<H3>The Democratic Senator network at $\lambda=55$ concurrent voting</H3>

<img src="Senate_Democrat_Covote_K55_V2.png" width="1200" />


<H3>The Republican Senator network at $\lambda=55$ concurrent voting</H3>

<img src="Senate_Republican_Covote_K55_v2.png" width="1200" />

**Findings**

There are significant differences in the concurrent voting behavior by party line.  

*  Democratic Senators seems to belong to 3 clusters:  

   + A large cluster whose center is Montana Senator John Walsh.   I interprete this cluster as a centrist group that approved nearly all of the Presidential nominations.   John Walsh approved all but 4 nominees and was never absent from voting.
   
   + A smaller cluster whose center is New Jersey Senator Corey Booker.  It includes West Virginia Senator Joe Machin - one of the most conservative.  This cluster might better be interpreted as disagreeing somewhat with Presidential nominees when policy or experience issues are raised by the Senator.
   
   + A smaller cluster of 3 senators: Mary Landrieu of Louisiana (conservative), Jay Rockefeller (moderate) of West Virginia and Ed Markey (liberal) of Massachusetts.  They are clustered because of **lower attendance** not ideology on votes.
       

* Republican Senators have a cluster centered on Senators Tom Coburn (Oklahoma) and Thad Cochran (Mississippi) but more mavericks.  

   +  The center cluster is characterized as not voting for the majority of appointees.  45% approval for Senator Coburn.

   +  Mavericks like Senator Grassley (Iowa) is characterized by a higher approval rating 61%.

## Nomination Concurrent Support

To analyze the nominations, we construct the projected graph $H$ and then use the island method to adjust the threshold $\lambda$ at which edges are included in the graph.
Our threshold filtered projected graphs are saved to graphml file.  

We report and visualize projected graphs with choices of $\lambda = 60, 92$.  

These two thresholds were chosen to illustrate the change in density and cluster formation.  

These graphs are exported in graphml format and visualized in Gephi below.

In [None]:
#
#  Next we consider how nominations are considered.
#  Taking the dual approach to the senators, we ask if two nominations
#  are alike when the senators voting for their appointments are 
#  the same or nearly so.
# --------------------------------------------------------
nominee_cosupport =  np.matmul( m_bia2 , m_bia2.transpose())


In [None]:
# Now we draw the projected graph for the nominations as an undirected
# weighted graph.  The function arguments:  
#
#    threshold_weight:  takes a positive integer argument between 0-99
def make_cosupport_graph(threshold_weight):
    G_cosupport = nx.Graph()

    for r, d in nominations.iterrows():
    
        rollnumber = d["rollnumber"]
    
        G_cosupport.add_node( rollnumber, 
                 name = d["Name"] ,
                 dept = d["dept"],
                 position = d["Position"] ,
                 bill_number = d["bill_number"] ,
                 yea_count = d["yea_count"],
                 nay_count = d["nay_count"],
                 margin = d["yea_count"] - d["nay_count"]
               )

    for i in range(N):
        for j in range(N):
            if i < j:
            
                w_ij = nominee_cosupport[i,j]/100
            
                if w_ij > threshold_weight/100.0 :
                    G_cosupport.add_edge( s_rollnumber[i]  , s_rollnumber[j]   ,weight= float(w_ij ) )
    
    graphml_file = "Nomination_Cosupport_K{}.graphml".format(threshold_weight) 
    nx.write_graphml( G_cosupport , graphml_file)

    return G_cosupport




In [None]:
G_cosupport_60 = make_cosupport_graph( 60 )

G_cosupport_92 = make_cosupport_graph( 92 )

In [None]:
G_cosupport_60.order(), G_cosupport_60.number_of_edges()

In [None]:
G_cosupport_92.order(), G_cosupport_92.number_of_edges()

The nomination co-support graph for these thresholds were loaded into Gephi.  

<H3>Nomination Concurrent Support by Senators with 60% Threshold</H3>

Number of vertices:  188

Number of edges:  13,321

A link to the $\lambda = 60$ gephi file is [here](Nomination_Cosupport_K60.gephi).

Plot conventions:

*  Node size is proportional to number of Yea votes.  More senators in favor means larger node diameter.

*  Edge width is proportional to weight.   More agreement between two nominations means thicker line.

*  Node label is the name of the nominee.

*  Node color shows the custom group of departments of each position.   E.g Judiciary, State, Regulatory.  A legend is shown below.

<img src = "Cosupport_Dept_Legend.png" width="200" />

<img src = "Nomination_Cosupport_K60_V2.png" width="1500" />

Overall, the plot at threshold $\lambda$ is too dense to be useful.  However, we can see two clusters forming with nodes in between the 2 clusters.

We reserve our comments for the higher threshold $\lambda = 92$ graph below.  

<H3>Nomination Concurrent Support by Senators with 92% Threshold</H3>

Number of vertices: 188

Number of edges:  4,365

The color scheme and layout algorithm are the same as above.

A link to the $\lambda = 92$ gephi file is [here](Nomination_Cosupport_K92.gephi).

Plot conventions:

*  Node size is proportional to number of Yea votes.  More senators in favor means larger node diameter.

*  Edge width is proportional to weight.   More agreement between two nominations means thicker line.

*  Node label is the name of the nominee.

*  Node color shows the custom group of departments of each position.   E.g Judiciary, State, Regulatory.  A legend is shown below.

<img src = "Cosupport_Dept_Legend.png" width="200" />

<img src = "Nomination_Cosupport_K92_V1.png" width="1800" />

**Findings**

*  There are 2 clusters of nominations.   A large dense cluster of mostly $\color{violet}{\text{judicial}}$ appointments are confirmed with large positive vote margins.

*  A smaller less dense cluster of nomination with small node sizes.  They pass often on narrow party line votes due to ideological issues:

    +  Antony Blinken for Deputy State Department confirmed by 55/38 
    + Tom Perez for Secretary of Labor by 54/46.


*  $\color{green}{\text{State Department in green}}$ are also usually confirmed for ambassadors, United Nations, Secretaries.  

*  $\color{black}{\text{Banking and Finance in black}}$ are often challenged and not consisten with other votes. 

*  Competence of the nominee is unrelated to more positive votes or concurrent support.  

   +  Example: Janet Yellen and Stanley Fischer, well regarded central bankers, faced moderate opposition.  They are also isolates in the $\lambda = 92$ island graph.
   

*  Regulatory nominations are often challenged by Republicans (too liberal) or Democrats (too conservative).

   +  Example:  Sharon Bowen - Commodities, Futures Trading Commission (CFTC) by 48/46.
   
   +  Example:  All 4 National Labor Relations Board members:   54/44,  59/38,  54/44, 54/40.

## Conclusion

Senate confirmation of Presidential appointees exhibits properties of social networks.   

These appointments can define a bipartite network which yields interesting insights when applying the island method on edge weights of the bipartite projection.

When we project the graph on the senators, the traditional party division is visible but differences do exist between parties.

*  For the Democrats, clusters seems to form around a centrist ideology for most Senators with another cluster formed around liberal and conservative factions.

*  For Republicans, the main cluster forms around approval of popular non-ideological appointments but also disapproval of ideologically unacceptable candidates.   

*  Republicans seems to have a larger number of maverick senators.

When we project the graph on the nominations, different clusters emerge:

*  one large cluster centers around the judiciary 

*  another cluster around contested appointees.

Why do these clusters occur?  We cannot only speculate as the data does not speak for itself in this case.  Senatorial politics is governed by coalitions seeking to pass or deny appointments.  A great deal of bargaining takes place before the confirmation vote.  Even the occurrence of a vote signals progress.  For example, nominations can be stalled by opposition for months.   Cloture votes may be necessary to silence opposition or filibusters.

The clusters represent categories of approval and disapproval of appointments for high ranking government positions.  

The value of social network analysis is to

*  quantify the proportion of each category of nominations.

*  quantify the density of clusters of similiar nominations

*  quantify future changes in polarization of the parties as Senate margins shrink or grow.

Lastly, we observe that the 113th Congress was unusually contentious and unproductive in which nominations were the most substantive action taken.  (https://www.vox.com/mischiefs-of-faction/2016/5/17/11693956/113th-congress-unusual)  In hindsight, it is hard to see the current 117th Congress being much more productive as the Democrats' margin is now zero.
