# Network Analysis of RDF Graphs

In this notebook we'll provide basic facilities for performing network analyses of RDF graphs easily with Python, [rdflib](https://github.com/RDFLib/rdflib) and [networkx](https://networkx.github.io/)

We do this in 3 steps:
- Load an arbitrary RDF graph into rdflib
- Get a subgraph of relevance
- Convert the rdflib Graph into an networkx Graph, as shown [here](https://github.com/RDFLib/rdflib/blob/master/rdflib/extras/external_graph_libs.py)
- Run networkx's algorithms on that data structure

In [27]:
# Install required packages in the current Jupyter kernel
import sys
!{sys.executable} -m pip install rdflib networkx matplotlib

Collecting rdflib
  Using cached https://files.pythonhosted.org/packages/3c/fe/630bacb652680f6d481b9febbb3e2c3869194a1a5fc3401a4a41195a2f8f/rdflib-4.2.2-py3-none-any.whl
Collecting networkx
Collecting matplotlib
  Downloading https://files.pythonhosted.org/packages/1e/f8/4aba1144dad8c67db060049d1a8bc740ad9fa35288d21b82bb85de69ff15/matplotlib-3.0.1-cp36-cp36m-manylinux1_x86_64.whl (12.9MB)
[K    100% |████████████████████████████████| 12.9MB 126kB/s ta 0:00:011   45% |██████████████▌                 | 5.8MB 5.4MB/s eta 0:00:02    59% |███████████████████             | 7.7MB 4.6MB/s eta 0:00:02
[?25hCollecting pyparsing (from rdflib)
  Using cached https://files.pythonhosted.org/packages/71/e8/6777f6624681c8b9701a8a0a5654f3eb56919a01a78e12bf3c73f5a3c714/pyparsing-2.3.0-py2.py3-none-any.whl
Collecting isodate (from rdflib)
  Using cached https://files.pythonhosted.org/packages/9b/9f/b36f7774ff5ea8e428fdcfc4bb332c39ee5b9362ddd3d40d9516a55221b2/isodate-0.6.0-py2.py3-none-any.whl
Collectin

In [31]:
# Imports
from rdflib import Graph as RDFGraph
from rdflib.extras.external_graph_libs import rdflib_to_networkx_graph
import networkx as nx
import matplotlib.pyplot as plt

In [36]:
rg = RDFGraph()
rg.parse('ghostbusters-uri.ttl', format='turtle')
print("rdflib Graph loaded successfully with {} triples".format(len(rg)))

G = rdflib_to_networkx_graph(rg)
print("networkx Graph loaded successfully with length {}".format(len(G)))

rdflib Graph loaded successfully with 60462 triples
networkx Graph loaded successfully with length 9023


In [37]:
list(nx.connected_components(G))

[{rdflib.term.URIRef('http://purl.org/midi-ld/piece/2eb43ce7edf27b505bcc0dfb6c283784/track02/event2141'),
  rdflib.term.URIRef('http://purl.org/midi-ld/piece/2eb43ce7edf27b505bcc0dfb6c283784/track09/event2853'),
  rdflib.term.URIRef('http://purl.org/midi-ld/piece/2eb43ce7edf27b505bcc0dfb6c283784/track02/event1068'),
  rdflib.term.URIRef('http://purl.org/midi-ld/piece/2eb43ce7edf27b505bcc0dfb6c283784/track02/event1194'),
  rdflib.term.URIRef('http://purl.org/midi-ld/piece/2eb43ce7edf27b505bcc0dfb6c283784/track03/event0211'),
  rdflib.term.URIRef('http://purl.org/midi-ld/piece/2eb43ce7edf27b505bcc0dfb6c283784/track02/event1488'),
  rdflib.term.URIRef('http://purl.org/midi-ld/piece/2eb43ce7edf27b505bcc0dfb6c283784/track09/event0328'),
  rdflib.term.URIRef('http://purl.org/midi-ld/piece/2eb43ce7edf27b505bcc0dfb6c283784/track09/event2520'),
  rdflib.term.URIRef('http://purl.org/midi-ld/piece/2eb43ce7edf27b505bcc0dfb6c283784/track07/event0007'),
  rdflib.term.URIRef('http://purl.org/midi-ld/

In [38]:
nx.clustering(G)

{rdflib.term.URIRef('http://purl.org/midi-ld/piece/2eb43ce7edf27b505bcc0dfb6c283784/track09/event1672'): 0,
 rdflib.term.Literal('115', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')): 0,
 rdflib.term.URIRef('http://purl.org/midi-ld/piece/2eb43ce7edf27b505bcc0dfb6c283784/track02/event1162'): 0,
 rdflib.term.Literal('0', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')): 0,
 rdflib.term.URIRef('http://purl.org/midi-ld/piece/2eb43ce7edf27b505bcc0dfb6c283784/track09/event2256'): 0,
 rdflib.term.Literal('1', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')): 0,
 rdflib.term.URIRef('http://purl.org/midi-ld/piece/2eb43ce7edf27b505bcc0dfb6c283784/track01'): 0,
 rdflib.term.URIRef('http://purl.org/midi-ld/piece/2eb43ce7edf27b505bcc0dfb6c283784/track01/event0213'): 0,
 rdflib.term.URIRef('http://purl.org/midi-ld/piece/2eb43ce7edf27b505bcc0dfb6c283784/track09/event1024'): 0,
 rdflib.term.Literal('9', datatype=rdflib.term.URIR

In [None]:
# plt.subplot(121)
# nx.draw(G, with_labels=True, font_weight='bold')
# plt.subplot(122)
# nx.draw_shell(G, nlist=[range(5, 10), range(5)], with_labels=True, font_weight='bold')

plt.plot()
nx.draw(G)