# California Legislation Co-Sponsorship Network
## I. Introduction
Central to many theories of the policy-making process like the multiple-streams framework (MSF), advocacy coalition framework (ACF), and policy innovation, is an actor or set of actors that push forward policies. In MSF the actor or actors are known as policy entrepreneurs because they embed themselves in putting forward specfiic policy that they are invested in financially or ideologically. In ACF central actors are known as policy brokers that act as key individuals in the negotiation process between the competing coalitions. In policy innovation the actor invents new and innovative solutions to policy problems.

While there are many studies looking at policy actors, only a handful use social network analysis as a methdology. Furthermore, the majority of studies are interested in federal policy, not state policy, or are or are focused on the broad frameworks rather than the specifics of the actor in relation to the network. Fewer still use co-sponsorship in their study as the edge definition. In this project, I use legislative data to look at California state co-sponsorship of bills for the congressional periods: 2009-2010, 2011-2012, 2013-2014, 2015-2016, and 2017-2018.

My long-term overall research question is: can state legislation and legislative topics be explained and clarified by looking at the network structure of state legislative bodies? I am interested in resilience policy, so I am also interested in the applying the MSF and ACF frameworks to see if policy entrepreneurs are needed to pass certain resilience and disaster-policies, or if a focusing event is enough to pass policy. I think that looking at legislation networks can reveal underlying policy processes not obviously evident by looking at bill passages alone.

I was not able to include all of my research goals in the time limitations for this project, so my research questions for this limited project are: 
1. What does the California legislation look like from a network perspective? 
2. Who are the most central legislators in the California state legislation? 
3. What is the average clustering coefficient of the five legislative periods I have data for and how has the clustering coefficient changed? I think the explanation for clustering coefficient changes is found in electoral changes (and topic foci, but that will require my full analysis goals).

## II. Data
The data I use is from Legiscan.com, a nonpartisan legislative tracking and reporting service that tracks legislation from all 50 states as well as the federal government and the District of Columbia. Legiscan has a database and an API for aquiring data, and I have downloaded the CSVs for the legislative sessions in question. The data includes three components: a spreadsheet of bills and their progress, a spreadsheet of legislators, and a spreadsheet of bill sponsors.

The first step is to read files as pandas dataframes and lists. The functions below read in the data.

In [None]:
# Function to read files as pandas
import pandas
def filereader(file):
    file = pandas.read_csv(file, encoding = 'utf-8', sep = ',')
    return file

# Function to read files as lists
def listreader(file):
    with open (file, 'r', encoding = 'utf-8') as f:
        header = f.readline().strip().split(',')
        data = [line.strip().split(',') for line in f.readlines()]
    return data

bills1718 = filereader('data/CA1718/bills.csv')
people1718 = filereader('data/CA1718/people.csv')
sponsors1718 = filereader('data/CA1718/sponsors.csv')
sponsorslist1718 = listreader('data/CA1718/sponsors.csv')

bills1516 = filereader('data/CA1516/bills.csv')
people1516 = filereader('data/CA1516/people.csv')
sponsors1516 = filereader('data/CA1516/sponsors.csv')
sponsorslist1516 = listreader('data/CA1516/sponsors.csv')

bills1314 = filereader('data/CA1314/bills.csv')
people1314 = filereader('data/CA1314/people.csv')
sponsors1314 = filereader('data/CA1314/sponsors.csv')
sponsorslist1314 = listreader('data/CA1314/sponsors.csv')

bills1112 = filereader('data/CA1112/bills.csv')
people1112 = filereader('data/CA1112/people.csv')
sponsors1112 = filereader('data/CA1112/sponsors.csv')
sponsorslist1112 = listreader('data/CA1112/sponsors.csv')

bills0910 = filereader('data/CA0910/bills.csv')
people0910 = filereader('data/CA0910/people.csv')
sponsors0910 = filereader('data/CA0910/sponsors.csv')
sponsorslist0910 = listreader('data/CA0910/sponsors.csv')

The bills dataframe looks like this (though I did not use it for the scope of this project, but plan on using it later):

In [None]:
bills1718[:6]

The legislators dataframe, note that each name is also coupled with a 'people_id':

In [None]:
people1718[:6]

Finally, the sponsors dataframe is below. Note that for bills with multiple positions, there are multiple positions, meaning those bills are being co-sponsored.

In [None]:
sponsors1718[:21]

## III. Steps Toward Analysis

With the bill, legislator, and sponsor data loaded into memory, the next steps are:
1. Match names with the sponsors list.
2. Create an edgelist that counts the number of co-sponsorships. The number of bills co-sponsored by a dyad of legislators will be the edge weight.
3. Create network visulizations and calculate network statistics (centrality and clustering).

First, the analysis needs functions that create a dictionary of only the 'name' and the 'people_id', converts the dictionary to a list, and then converts the dictionary to a dataframe. We will do run these functions for every congressional session.

In [None]:
# Function to create dictionary of two items
def dictcreator(input1, input2):
    newdict = dict(zip(input1, input2))
    return newdict
peopledict1718 = dictcreator(people1718['name'], people1718['people_id'])
peopledict1516 = dictcreator(people1516['name'], people1516['people_id'])
peopledict1314 = dictcreator(people1314['name'], people1314['people_id'])
peopledict1112 = dictcreator(people1112['name'], people1112['people_id'])
peopledict0910 = dictcreator(people0910['name'], people0910['people_id'])

# Function to convert dictionaries to lists
def dicttolist(inputdict):
    newlist = []
    for key, value in inputdict.items():
        newlist.append([key,value])
    return newlist
peoplelist1718 = dicttolist(peopledict1718)
peoplelist1516 = dicttolist(peopledict1516)
peoplelist1314 = dicttolist(peopledict1314)
peoplelist1112 = dicttolist(peopledict1112)
peoplelist0910 = dicttolist(peopledict0910)

# Function to convert dictionary to dataframe
def dfconverter(inputlist, colname1, colname2):
    newdf = pandas.DataFrame(inputlist, columns=(colname1, colname2))
    return newdf
peopledf1718 = dfconverter(peoplelist1718, 'name', 'pid')
peopledf1516 = dfconverter(peoplelist1516, 'name', 'pid')
peopledf1314 = dfconverter(peoplelist1314, 'name', 'pid')
peopledf1112 = dfconverter(peoplelist1112, 'name', 'pid')
peopledf0910 = dfconverter(peoplelist0910, 'name', 'pid')

Second, the sponsors list needs to be matched with the list of legislators.

In [None]:
# Function to match sponsors list with list of legislators - returns a list and a pandas dataframe.
def matcher(list1, list2):
    for items in list1:
        for name, pid in list2:
            if int(items[1]) == int(pid):
                items.append(name)
    return list1, pandas.DataFrame(list1, columns=('bill','pid','position','sponsor'))
sponsoredlist1718, sponsoreddf1718 = matcher(sponsorslist1718, peoplelist1718)
sponsoredlist1516, sponsoreddf1516 = matcher(sponsorslist1516, peoplelist1516)
sponsoredlist1314, sponsoreddf1314 = matcher(sponsorslist1314, peoplelist1314)
sponsoredlist1112, sponsoreddf1112 = matcher(sponsorslist1112, peoplelist1112)
sponsoredlist0910, sponsoreddf0910 = matcher(sponsorslist0910, peoplelist0910)

sponsoredlist1718[:21]

The third step is to create an edgelist out of the above list. The edgelist will match co-sponsors into dyads. The function below takes a long time, especially for all 5 legislative sessions. The function returns both a list and a dictionary.

In [None]:
#function to create an edgelist - Takes a very long time
def edges(list1, list2):
    edgelist = []
    edgedict = {}
    for bills in list1:
        for names in list2:
            try:
                if bills[0] == names[0]:
                    if bills[3] != 0:
                        if bills[3] != names[3]:
                            edgelist.append([bills[3], names[3]])
                            edgedict[bills[3]] = names[3]
            except:
                continue 
    return edgelist, edgedict

edgelist1718, edgedict1718 = edges(sponsoredlist1718, sponsoredlist1718)
edgelist1516, edgedict1516 = edges(sponsoredlist1516, sponsoredlist1516)
edgelist1314, edgedict1314 = edges(sponsoredlist1314, sponsoredlist1314)
edgelist1112, edgedict1112 = edges(sponsoredlist1112, sponsoredlist1112)
edgelist0910, edgedict0910 = edges(sponsoredlist0910, sponsoredlist0910)

Below is what our sponsorlist looks like:

In [None]:
edgelist1718[:11]

The next function converts the list into a pandas dataframe with a 'source' and 'target'.

In [None]:
edgedf1718 = pandas.DataFrame(edgelist1718, columns=('source','target'))
edgedf1516 = pandas.DataFrame(edgelist1516, columns=('source','target'))
edgedf1314 = pandas.DataFrame(edgelist1314, columns=('source','target'))
edgedf1112 = pandas.DataFrame(edgelist1112, columns=('source','target'))
edgedf0910 = pandas.DataFrame(edgelist0910, columns=('source','target'))
edgedf1718[:6]

The current edgelist is unweighted, so when converted into a network, multiple co-sponsors will only count as one tie. Therefore, the number of co-sponsorships in a dyad will need to be counted and assigned as the edge weight to the dyad. I use the 'collections' library with the 'Counter' function.

Below, first the edgelist is converted into tuples. Then the 'Counter' function is used to count repeated edges. Finally, the weighted list is converted back into a list of our network parameters.

In [None]:
# Code to weight edges
import collections
edgetuple1718 = tuple(tuple(x) for x in edgelist1718)
edgetuple1516 = tuple(tuple(x) for x in edgelist1516)
edgetuple1314 = tuple(tuple(x) for x in edgelist1314)
edgetuple1112 = tuple(tuple(x) for x in edgelist1112)
edgetuple0910 = tuple(tuple(x) for x in edgelist0910)

weightededges1718 = list(collections.Counter(edgetuple1718).items())
weightededges1516 = list(collections.Counter(edgetuple1516).items())
weightededges1314 = list(collections.Counter(edgetuple1314).items())
weightededges1112 = list(collections.Counter(edgetuple1112).items())
weightededges0910 = list(collections.Counter(edgetuple0910).items())


params1718 = list((x, y, int(v)) for (x,y), v in weightededges1718)
params1516 = list((x, y, int(v)) for (x,y), v in weightededges1516)
params1314 = list((x, y, int(v)) for (x,y), v in weightededges1314)
params1112 = list((x, y, int(v)) for (x,y), v in weightededges1112)
params0910 = list((x, y, int(v)) for (x,y), v in weightededges0910)

The final parameters variable looks like this:

In [None]:
params1112[:11]

## IV. Network Visualization and Statistics

Now that we have converted our co-sponsorship data into a weighted edgelist, we can produce network visualizations and calculate network statistics. The first step is to import the python network analysis library 'networkx'.

In [None]:
import networkx as nx

To produce network visualizations, the following steps must be followed:
1. Create empty network graphs.
2. Assign nodes and node names.
3. Assign weighted edges.
4. Remove isolated nodes (otherwise our networks will not visualize correctly).

In [None]:
# Step 1. Create empty network graphs
G1718 = nx.Graph()
G1516 = nx.Graph()
G1314 = nx.Graph()
G1112 = nx.Graph()
G0910 = nx.Graph()

# Step 2. Assign nodes and node names
node_names1718 = [n[0] for n in peoplelist1718]
node_names1516 = [n[0] for n in peoplelist1516]
node_names1314 = [n[0] for n in peoplelist1314]
node_names1112 = [n[0] for n in peoplelist1112]
node_names0910 = [n[0] for n in peoplelist0910]
G1718.add_nodes_from(node_names1718)
G1516.add_nodes_from(node_names1516)
G1314.add_nodes_from(node_names1314)
G1112.add_nodes_from(node_names1112)
G0910.add_nodes_from(node_names0910)

# Step 3. Assign weighted edges
G1718.add_weighted_edges_from(params1718)
G1516.add_weighted_edges_from(params1516)
G1314.add_weighted_edges_from(params1314)
G1112.add_weighted_edges_from(params1112)
G0910.add_weighted_edges_from(params0910)

Next, the new network data variables can be summarized.

In [None]:
# Creates network information variables
G1718info = nx.info(G1718)
G1516info = nx.info(G1516)
G1314info = nx.info(G1314)
G1112info = nx.info(G1112)
G0910info = nx.info(G0910)

In [None]:
G1718info

The network data can also be visualized. Visualization requires the 'matplotlib' library function 'pyplot', and so it is imported.

In [None]:
import matplotlib.pyplot as plt

The first plot is for the 2017-2018 legislative session:

In [None]:
# Assigns weight values to new variables
elarge1718 = [(u, v) for (u, v, d) in G1718.edges(data=True) if d['weight'] > 300]
esmall1718 = [(u, v) for (u, v, d) in G1718.edges(data=True) if d['weight'] <= 299]

# Kamada Kawai is the kind of network visualization layout we will use, but there are many others
pos = nx.kamada_kawai_layout(G1718)

# Next the network is drawn by using the weights.
nx.draw_networkx_nodes(G1718, pos, node_size=4)
nx.draw_networkx_edges(G1718, pos, edgelist=elarge1718,width=.5)
nx.draw_networkx_edges(G1718, pos, edgelist=esmall1718,width=.1, alpha=.75, edge_color='b', style='dashed')
nx.draw_networkx_labels(G1718, pos, font_size=25, font_family='sans-serif')
plt.rcParams["figure.figsize"] = [50,50]
plt.axis('off')
plt.show()

Second is the 2015-2016 session:

In [None]:
# Assigns weight values to new variables
elarge1516 = [(u, v) for (u, v, d) in G1516.edges(data=True) if d['weight'] > 300]
esmall1516 = [(u, v) for (u, v, d) in G1516.edges(data=True) if d['weight'] <= 299]

# Kamada Kawai is the kind of network visualization layout we will use, but there are many others
pos = nx.kamada_kawai_layout(G1516)

# Next the network is drawn by using the weights.
nx.draw_networkx_nodes(G1516, pos, node_size=4)
nx.draw_networkx_edges(G1516, pos, edgelist=elarge1516,width=.5)
nx.draw_networkx_edges(G1516, pos, edgelist=esmall1516,width=.1, alpha=.75, edge_color='b', style='dashed')
nx.draw_networkx_labels(G1516, pos, font_size=25, font_family='sans-serif')
plt.rcParams["figure.figsize"] = [50,50]
plt.axis('off')
plt.show()

Next is the 2013-2014 session, note for this session, the large/small weight sensitivity for edges has been changed.

In [None]:
# Assigns weight values to new variables
elarge1314 = [(u, v) for (u, v, d) in G1314.edges(data=True) if d['weight'] > 250]
esmall1314 = [(u, v) for (u, v, d) in G1314.edges(data=True) if d['weight'] <= 249]

# Kamada Kawai is the kind of network visualization layout we will use, but there are many others
pos = nx.kamada_kawai_layout(G1314)

# Next the network is drawn by using the weights.
nx.draw_networkx_nodes(G1314, pos, node_size=4)
nx.draw_networkx_edges(G1314, pos, edgelist=elarge1314,width=.5)
nx.draw_networkx_edges(G1314, pos, edgelist=esmall1314,width=.1, alpha=.75, edge_color='b', style='dashed')
nx.draw_networkx_labels(G1314, pos, font_size=25, font_family='sans-serif')
plt.rcParams["figure.figsize"] = [50,50]
plt.axis('off')
plt.show()

The 2011-2012 session is next, again, the large/small weight edge sensitivity is changed to be even lower than 2013-2014.

In [None]:
# Assigns weight values to new variables
elarge1112 = [(u, v) for (u, v, d) in G1112.edges(data=True) if d['weight'] > 200]
esmall1112 = [(u, v) for (u, v, d) in G1112.edges(data=True) if d['weight'] <= 199]

# Kamada Kawai is the kind of network visualization layout we will use, but there are many others
pos = nx.kamada_kawai_layout(G1112)

# Next the network is drawn by using the weights.
nx.draw_networkx_nodes(G1112, pos, node_size=4)
nx.draw_networkx_edges(G1112, pos, edgelist=elarge1112,width=.5)
nx.draw_networkx_edges(G1112, pos, edgelist=esmall1112,width=.1, alpha=.75, edge_color='b', style='dashed')
nx.draw_networkx_labels(G1112, pos, font_size=25, font_family='sans-serif')
plt.rcParams["figure.figsize"] = [50,50]
plt.axis('off')
plt.show()

Finally, the 2009-2010 session, again the weights are lower.

In [None]:
# Assigns weight values to new variables
elarge0910 = [(u, v) for (u, v, d) in G0910.edges(data=True) if d['weight'] > 200]
esmall0910 = [(u, v) for (u, v, d) in G0910.edges(data=True) if d['weight'] <= 199]

# Kamada Kawai is the kind of network visualization layout we will use, but there are many others
pos = nx.kamada_kawai_layout(G0910)

# Next the network is drawn by using the weights.
nx.draw_networkx_nodes(G0910, pos, node_size=4)
nx.draw_networkx_edges(G0910, pos, edgelist=elarge0910,width=.5)
nx.draw_networkx_edges(G0910, pos, edgelist=esmall0910,width=.1, alpha=.75, edge_color='b', style='dashed')
nx.draw_networkx_labels(G0910, pos, font_size=25, font_family='sans-serif')
plt.rcParams["figure.figsize"] = [50,50]
plt.axis('off')
plt.show()

For statistics, there are two commonly-used measures of networks that are built into 'networkx'. First, the eigenvector centrality of the nodes is an indicator of the most central legislator in terms of not only its own centrality, but the centrality of its neighbors. Second, the clustering coefficent of the network  is an indicator of the overall edge compositions. Clustering is an statistic of how much the network is working together as a singular unit. Measuring network transitivity is an alternative measure of overall network clustering, but does not take into account edge weights.

The first lines below calculates eigenvector centrality and outputs a dictionary. Then the dictionary is converted into an ordered list and sorted by centrality from highest to lowest. The numbers are low because the network is weighted and there are a large number of possible co-sponsorships.

In [None]:
# Calculates Eigenvector centrality.
G1718eigenvectorcentrality = nx.algorithms.eigenvector_centrality(G1718, weight='weight')
G1516eigenvectorcentrality = nx.algorithms.eigenvector_centrality(G1516, weight='weight')
G1314eigenvectorcentrality = nx.algorithms.eigenvector_centrality(G1314, weight='weight')
G1112eigenvectorcentrality = nx.algorithms.eigenvector_centrality(G1112, weight='weight')
G0910eigenvectorcentrality = nx.algorithms.eigenvector_centrality(G0910, weight='weight')

# Function that sorts dictionaries
def sorter(input):
    newlist = []
    for key, value in list(input.items()):
        newlist.append((value,key))
    newlist.sort(reverse=True)
    return newlist

# Lines that produce sorted Eigenvector centrality lists
G1718ECSorted = sorter(G1718eigenvectorcentrality)
G1516ECSorted = sorter(G1516eigenvectorcentrality)
G1314ECSorted = sorter(G1314eigenvectorcentrality)
G1112ECSorted = sorter(G1112eigenvectorcentrality)
G0910ECSorted = sorter(G0910eigenvectorcentrality)

In [None]:
print('2017-2018:', G1718ECSorted[:11],'\n\n' + '2015-2016:', G1516ECSorted[:11],'\n\n' +'2013-2014:', G1314ECSorted[:11],'\n\n' + '2011-2012:', G1112ECSorted[:11],'\n\n' + '2009-2010:', G0910ECSorted[:11])

The next statistic of interest is the average network clustering coeffient for all nodes.

In [None]:
G1718clustering = nx.algorithms.cluster.average_clustering(G1718, weight='weight')
G1516clustering = nx.algorithms.cluster.average_clustering(G1516, weight='weight')
G1314clustering = nx.algorithms.cluster.average_clustering(G1314, weight='weight')
G1112clustering = nx.algorithms.cluster.average_clustering(G1112, weight='weight')
G0910clustering = nx.algorithms.cluster.average_clustering(G0910, weight='weight')

In [None]:
print('2017-2018:', G1718clustering,'\n' + '2015-2016:', G1516clustering, '\n' + '2013-2014:', G1314clustering, '\n' + '2011-2012:', G1112clustering, '\n' + '2009-2010:', G0910clustering)

Below is the average clustering coefficients visualized as a line graph.

In [None]:
session = ['2009-2010', '2011-2012', '2013-2014', '2015-2016', '2017-2018']
avecluster = [G0910clustering, G1112clustering, G1314clustering, G1516clustering, G1718clustering]
plt.plot(session, avecluster, color='green')
plt.xlabel('Legislative Session')
plt.ylabel('Coefficient')
plt.rcParams["figure.figsize"] = [10,10]
plt.title('Average Clustering Coefficient 2009-2018')
plt.show()

## V. Conclusions and Future Work

The produced visualizations and descriptive statistics provide informative findings. Surprisingly, we do not see distinct clustering among groups within the California legislation. Instead, in 2017-2018 and 2013-2014 we see a core and periphery structure instead, and the other legislative sessions are more dispersed without any clear core or clusters. The lack of multiple clusters indicates two ideas worth investigating deeper. First, there is not any distinct partisan clustering and while I did not associate partisan labels with nodes, it appears that legislators are sponsoring bills across the aisle. Second, the California legislation is bicameral with an assembly and a senate, so there is co-sponsorship occuring across houses of the legislature.

From the statistics, we can see that the 2015-2016 legislative session has the highest average clustering coefficient, meaning that they are working together the most on average to co-sponsor bills together. The 2009-2010 legislative session on the other hand, has the lowest average clustering coefficient. From the eigenvector centrality, we can draw out the most central legislators for each legislative session, and some of the legislators appear more than once in the top ten central legislators.

A lot of this project was spent converting the data into edgelists, but there is still a lot to be done in the future to fulfill the overall objectives of the project. Next, I would like to incorporate the bill data to associate sponsorships with the bills themselves. I can then see if sponsorship numbers, centrality, or clustering is correlates with the passage of a bill. Furthermore, I would like to perform a text analysis of the bills themselves to see if certain legislators are associated with different kinds of bills outside of typical committee structure. I can also use the network data separated by bill topic to see which topics are polarized. There is also the potential of comparing networks across states. The network data and statistics is only the first step in what can branch out into many different parts of a larger project looking at state legislation. For example, other states could be included to compare sponsorship networks across states.