##### Liam Byrne
##### DATA 620 - Web Analytics
##### Fall - 2017

# Week 6

***

## Introduction
A simple two-node network, which contains the attendance of 18 Southern Women at 14 social events, is used in this network analysis. In order to analyze this network, it needs to be transformed into a one-mode network either containing the network relationships between the women who attended the same function or the network relationships between events attended by the same women. We will accomplish this by describing this two-mode network as a bipartite graph.

We are asked, what we can infer about the relationships between **(1)** the women, and **(2)** the social events? This will be done by looking at the degree centrality and eigenvector centrality of the respective one-mode networks.

## Loading the Data
We will first load the data and create a graph from the adjacency matrix obtained from [UCINET IV Datasets](http://vlado.fmf.uni-lj.si/pub/networks/data/ucinet/ucidata.htm#davis), which was wrangled into a csv file due to its UCINET DL format:

In [1]:
import pandas as pd
import networkx as nx
from networkx.algorithms import bipartite
from IPython.display import display

# Source data saved as a csv and sent to repo
url = "https://raw.githubusercontent.com/Liam-O/Data620/master/Wk6/davis.csv"

# Adjacency list stored as Pandas df
davis = pd.read_csv(url, index_col = 0)
davis_g = nx.Graph()

## Methodology Used to Create One-Mode Projections
*Networkx* does not have any native functions to create a two-mode graph so the following methodology was used.
+ Create a bipartite graph using the node attribute ```bipartite = 0``` to signify nodes of women and ```bipartite = 1``` to signify the nodes of events.
+ Using *Networkx's* algorithm package, *bipartite*, create node sets for each one-mode network.
+ Using *Networkx's* algorithm package, *bipartite*, create a projected weighted graph of the one-mode network, where the weights refer to the co-occurrences (i.e. the number of events two individuals have co-attended or vice-versa).

In [2]:
# Create a bipartite graph with:
# Women --> Bipartite = 0; Event --> Bipartite = 1
davis_g.add_nodes_from(davis.index, bipartite = 0)
davis_g.add_nodes_from(davis.columns, bipartite = 1)

# Loop through adjacency matrix to create edge list
davis_edges = list()
for i in davis:
    for j in davis[i][davis[i] != 0].index:
        davis_edges.append((j, i))

# Insert edge list into graph
davis_g.add_edges_from(davis_edges)


# Create node sets for the Women and Event nodes
women_nodes, event_nodes = bipartite.sets(davis_g)

# Create a projected, weighted one-mode network for the Women and Events
women_projected = bipartite.weighted_projected_graph(davis_g, women_nodes)
event_projected = b = bipartite.weighted_projected_graph(davis_g, event_nodes)

## Looking at Centrality
From the projected graph, a set of nodes have connected edges if a pair of women attended the same function or a pair of functions were attended by the same woman. For the respective centrality measurements used, it would be helpful to define what each would mean for each mode.
### Centrality Measurements for Women
#### Closeness Centrality
Looking at the minimum social distance between the women can explain how close they could be to individuals within the network. This is done by taking the inverse of the sum of shortest paths from a woman *u* to all other women *v* and multiplying by the sum of minimum of possible distances *n-1*. This centrality can measure the closeness to individuals relative to those in the network. Meaning, whoever has the largest closeness centrality would be most able to serve as the shortest conduit between any two non-directly connected women in a connected network. We will use the inverse of the weight, which will give a shorter path to the individual with the more connections.
#### Eigenvector Centrality
Women directly connected to highly connected women in the social circles have a degree of influence and, possibly, status within the social network. The number of events that they share with these The edge weight, i.e. the number of events two individuals attended. 

In [3]:
# list comp for inverse weight
inverse_weight = {(u, v) : 1.0/d["weight"] for (u, v, d) in women_projected.edges(data = True)}
# Set inverse weight as attribute to use as distance
nx.set_edge_attributes(women_projected, "inverse_weight", inverse_weight)
# Create dataframe to display centrality
women_central = pd.DataFrame({
        "closeness_centrality": nx.closeness_centrality(women_projected, distance = "inverse_weight"),
        "eigenvector_centrality": nx.eigenvector_centrality(women_projected)})
print("Women Centrality:")
display(women_central)

Women Centrality:


Unnamed: 0,closeness_centrality,eigenvector_centrality
BRENDA,2.103093,0.284342
CHARLOTTE,1.672131,0.160557
DOROTHY,1.888889,0.206621
ELEANOR,1.906542,0.225475
EVELYN,2.241758,0.299136
FLORA,1.171644,0.077523
FRANCES,1.74359,0.203194
HELEN,2.079208,0.249641
KATHERINE,2.048193,0.242635
LAURA,2.081633,0.280929


The top three women for closeness centrality are:
1. THERESA: 2.62
2. SYLVIA: 2.47
3. NORA: 2.43

The top three women for eigenvector centrality are:
1. THERESA: 0.334
2. EVELYN: 0.299
3. SYLVIA: 0.291

From the above centrality, it appears that ```THERESA``` is closest to most individuals and holds the most influence.

### Centrality Measurements for Events
#### Closeness Centrality
A high closeness centrality with an event would be a signal that the event is close to most of the events in the social network. This would be a good event to attend to meet the most people associated with other events; increasing the diversity of events one could be invited to.
#### Eigenvector Centrality
Events that have a large eigenvector centrality could serve as a signal for a event that could introduce them to the largest circle of women in the social network.

In [4]:
# list comp for inverse weight
inverse_weight = {(u, v) : 1.0/d["weight"] for (u, v, d) in event_projected.edges(data = True)}
# Set inverse weight as attribute to use as distance
nx.set_edge_attributes(event_projected, "inverse_weight", inverse_weight)
    
event_central = pd.DataFrame({
        "closeness_centrality": nx.closeness_centrality(event_projected, distance = "inverse_weight"),
        "eigenvector_centrality": nx.eigenvector_centrality(event_projected, weight = "weight")})

print("Event Centrality:")
display(event_central)

Event Centrality:


Unnamed: 0,closeness_centrality,eigenvector_centrality
E1,2.03744,0.15211
E10,2.840052,0.228261
E11,1.750561,0.112344
E12,3.06914,0.25512
E13,2.266501,0.177435
E14,2.266501,0.177435
E2,2.03744,0.161062
E3,2.874693,0.251833
E4,2.135036,0.184287
E5,3.245492,0.304622


The top three events for closeness centrality are:
1. Event E8: 4.06
2. Event E9: 3.4
3. Event E7: 3.31

The top three events for eigenvector centrality are:
1. Event E8: 0.452
2. Event E7: 0.369
3. Event E9: 0.357

From the above centrality, it appears that ```Event E8``` would expose one to the most individuals and be the best place to expand one's social footprint.