## IS 620 Week 3 Assignment (Parts 1-4)
### Brian Chu || Sep. 20, 2015

**This dataset describes the food-web ecosystem of the Florida Bay in the dry season.**    
  
Source: http://vlado.fmf.uni-lj.si/pub/networks/data/bio/foodweb/foodweb.htm  
Paper: http://www.cbl.umces.edu/~atlss/FBay701.html

**If graphs are not visible, please view <a href="http://nbviewer.ipython.org/github/bchugit/IS620_WebAnalytics/blob/master/Week%203%20-%20Graph%20Theory%20and%20Definitions/bchu_wk3_graphviz_full.ipynb">here</a>. The <a href="http://blog.jupyter.org/2015/05/07/rendering-notebooks-on-github/">Github-Jupyter</a> rendering doesn't seem to connect GraphLab Canvas properly.**

In [1]:
import networkx as nx
import matplotlib.pyplot as plt

In [2]:
eco = nx.read_weighted_edgelist("eco-foodweb-baydry.edges", comments='%', encoding='utf-8')

**Provide basic description and metrics about the dataset**  

In [3]:
print "nodes: %d"  %len(eco)
print "edges: %d"  %nx.number_of_edges(eco)

nodes: 128
edges: 2106


In [4]:
print "min degrees: %s" %min(eco.degree())
print "max degrees: %s" %max(eco.degree())

min degrees: 1
max degrees: 99


**There certainly seems to be some centrality as at least one node is connected to 99/127 other nodes.**

**Eccentricity is the maximum distance from any singular node to the remaining nodes. This is just a snippet of the dataset's eccentricity.**  

In [5]:
print nx.eccentricity(eco).items()[:10]

[(u'41', 3), (u'24', 3), (u'25', 3), (u'26', 3), (u'27', 3), (u'20', 3), (u'21', 2), (u'22', 3), (u'23', 3), (u'28', 3)]


**Diameter is the maximum eccentricity in the graph between any two nodes.   
Radius is the minimum eccentricity from any node to all other nodes.**

In [6]:
print "diameter: %d"  %nx.diameter(eco)
print "radius: %d"  %nx.radius(eco)

diameter: 3
radius: 2


**The small diameter and radius show again that the system seems rather centralized and tightly connected.**

In [8]:
import graphlab as gl
gl.canvas.set_target('ipynb')

In [10]:
geco = gl.SFrame.read_csv("eco-foodweb-baydry.csv", column_type_hints={'orig':str, 'dest':str, 'wt':float})

PROGRESS: Finished parsing file /Users/bc/Github/IS620_WebAnalytics/Week 3 - Graph Theory and Definitions/eco-foodweb-baydry.csv
PROGRESS: Parsing completed. Parsed 100 lines in 0.025068 secs.
PROGRESS: Finished parsing file /Users/bc/Github/IS620_WebAnalytics/Week 3 - Graph Theory and Definitions/eco-foodweb-baydry.csv
PROGRESS: Parsing completed. Parsed 2137 lines in 0.021741 secs.


In [11]:
g = gl.SGraph()
g = g.add_edges(geco, 'orig', 'dest')

In [12]:
g.show()

**Since GraphLab Canvas cannot display the full graph, we take a random subset to display**

In [13]:
subset_ids = ['1','2','4','8','16','32','64']
subgraph = g.get_neighborhood(ids=subset_ids, radius=1, full_subgraph=True)
subgraph.show(vlabel='id', highlight=subset_ids)

**Indeed there is a closely connected network. There is also a noticeable outlier in Node 13, which I'm not sure what to do with at this point.**

**We can use a function to identify all center points and their graph location.  
NetworkX defines a center point as the node(s) with eccentricity equal to radius.**

In [14]:
center_points = nx.center(eco)
print center_points

[u'21', u'128', u'55', u'57', u'88', u'85', u'69', u'11', u'10', u'12', u'15', u'14', u'17', u'16', u'18', u'72', u'71', u'78']


In [15]:
subgraph_center = g.get_neighborhood(ids=center_points, radius=1, full_subgraph=False)
subgraph_center.show(highlight=center_points, vlabel='id', vlabel_hover=True)

*Note: Not all nodes and edges shown due to size limitation.*

**A basic summary of the dataset also helps identify high frequency nodes.**

In [16]:
geco.show()

**Node 85 seems to be a central origin point. Let's look at the connections originating from this vertex.**

In [17]:
subgraph85 = g.get_neighborhood(ids=['85'], radius=1, full_subgraph=True)
subgraph85.show(highlight=['85'], vlabel='id', vlabel_hover=True)

**Similarly, Node 57 seems to be a central destination point. In fact, so central that its full network cannot be displayed.**

In [18]:
subgraph57 = g.get_neighborhood(ids=['57'], radius=1, full_subgraph=True)
subgraph57.show(highlight=['57'], vlabel='id', vlabel_hover=True)

**Interestingly, there is not much intersection between the edges of Nodes 85 and 57.**

In [19]:
subgraph_center = g.get_neighborhood(ids=['57', '85'], radius=1, full_subgraph=False)
subgraph_center.show(highlight=['57', '85'], vlabel='id', arrows=True)

**Let's take another look at the outlier Node 13**

In [28]:
subgraph_center = g.get_neighborhood(ids=['13'], radius=1, full_subgraph=True)
subgraph_center.show(highlight=['13'], vlabel='id', arrows=True)

**Questions to consider at this point:**  
* Why do nodes that originate from 85 not connect to 57?
* Why do nodes that connect to 57 not originate from 85?
* Based on the patterns of nodes 57 and 85, what is the significance of nodes that are both origin and destination points?
* Is 13 an important outlier in this system? 