# Using Graphs to Model Linked Data
#### &copy; Brian E. Chapman, PhD

In this module we will learn about how to model graph data with [NetworkX](http://networkx.readthedocs.io/en/latest/)

In [None]:
%matplotlib inline

In [None]:
import os
DATADIR = os.path.join(os.path.expanduser('~'), "DATA")

import networkx as nx
import csv
import imaplib
import getpass
import email
from collections import defaultdict
from IPython.display import Image


# Graphs


* Graphs are a data representation consisting of **nodes** and **edges**
* Nodes are entities
* Edges are relationships
* Examples
    * Text:
        * Nodes are words in sentence (e.g. findings, modifiers, conjuntions)
        * Edges are relationships between the words
    * Images:
        * Nodes are antatomic features (e.g. bifurcations)
        * Edges are adjacency.paths between features (e.g. vessels)
    * Social Networks
        * Nodes are people
        * Edges are relationships (e.g. friendship, coauthorship)
    * Physiology
        * Brain connectivity
        * Metabolic pathways
    * Ontologies
    
## Example Graphs
### Word Relationships
![word relationships](./Resources/case005.png)

### An *undirected* graph based on e-mails
![email graph](./Resources/mainMail0075.png)

### A *directed* graph from the human disease ontology
![example disease graph](./Resources/disease_graphs.png)
    
## Python Graph Packages

* [NetworkX:](http://networkx.github.io/) this is a very popular, easy to use package. Its advantage and disadvantage is that it is pure Python. Conseqeuntly, easy to use but relatively slow.
* [graph-tool:](https://graph-tool.skewed.de/) "Despite its nice, soft outer appearance of a regular python module, the core algorithms and data structures of graph-tool are written in C++, with performance in mind. Most of the time, you can expect the algorithms to run just as fast as if graph-tool were a pure C/C++ library."
* [python-igraph:](http://igraph.org/python/) "igraph is a collection of network analysis tools with the emphasis on efficiency, portability and ease of use. igraph is open source and free. igraph can be programmed in R, Python and C/C++."

# [NetworkX](http://networkx.github.io/)
* Graphs (networkx.Graph())
    * Edges (relationships) have no directionality
* Directional Graphs (networkx.DiGraph())
    * Edges (relationships) have directionality
* MultiGraphs (networkx.MultiGraph(), networkx.MultiDiGraph() )
    * There can be multiple edges between nodes 
* Graphs, nodes, and edges can all have attributes (dictionaries)
    * Each node has a label
    * Each node also has a dictionary (possibly empty) of attributes
    * Each edge also has a label (the node labels defining the beginning and ending of the edge) 
    * Each edge also has a dictionary (possibly empty) of attributes
    

## Creating graphs is a matter of adding nodes and edges

* If we add an edge it will add a node, if needed.

In [None]:
import networkx as nx

informatics = nx.DiGraph()
informatics.add_node("Homer Warner")
informatics.add_node("Paul Clayton")
informatics.add_edge("Homer Warner","Reed Gardner")
informatics.add_edge("Homer Warner", "Al Pryor")
informatics.add_edge("Al Pryor", "Dennis Parker")
informatics.add_edge("Dennis Parker", "Brian Chapman")
informatics.add_edge("Brian Chapman", "Holly Perry")
informatics.add_edge("Peter Haug","Wendy Chapman")
informatics.add_edge("Wendy Chapman", "Jeannie Irwin")
nx.draw_spring(informatics, with_labels=True, alpha=0.3)

## Some NetworkX Notebooks
* [From *Learning IPython for Interactive Computing and Data Visualization*](http://nbviewer.ipython.org/github/ipython-books/minibook-code/blob/master/chapter2/203-networkx.ipynb)
* [Twitter Data](http://nbviewer.ipython.org/gist/ellisonbg/3837783/TwitterNetworkX.ipynb)
* [NetworkX Basics](https://www.wakari.io/sharing/bundle/nvikram/Basics%20of%20Networkx?has_login=False)

## Further Reading
[Here is a brief course on graphs and Python](http://www.python-course.eu/graphs_python.php)


<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" property="dct:title">University of Uah Data Science for Health</span> by <span xmlns:cc="http://creativecommons.org/ns#" property="cc:attributionName">Brian E. Chapman</span> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.