# Graph Databases

## - Knowledge Graphs
## - Labeled Property Graphs

<img src="../img/graph_mails.png" width="400">

- Graph Database = system that represents and stores data in a graph structure and allows the execution semantic queries, directly retrieving related data.  
- Vertices = Nodes
- Edges = Relations
- NoSQL databases family
- Possibility to retrieve another data analysis; community detection, pattern recognition and centrality measures


<img src="../img/trend_graphs.png" width="1000">

###### https://db-engines.com/en/ranking_trend/graph+dbms

## Neo4j

- Provides a compliant transactional backend for applications. 
- Publicly available since 2007. Source code, written in Java and Scala
- The most popular graph-based database
- Cypher query language
- Largest ecosystem


### LPG Characteristics

- Nodes and relations are labeled and can store properties.
- Nodes can be labeled with one or more labels.
- Relationships are always named, always have a start and end node.
- The graph model should be interpretable as a natural language.

<img src="../img/lpg.png" width="1000">

<img src="../img/syntax.png" width="1000">

### Data Representation

<img src="../img/data_rep.png" width="950">

<img src="../img/na7.png" width="600">
<img src="../img/na8.png" width="600">

## Graph Query Language

- Retrieving data from a graphDB requires a query language
- Currently no single language has been universally adopted in the same
way as SQL was for relational databases
- Some standardization efforts Gremlin, SPARQL, and Cypher.

### Cypher (.cql, .cyp, .cypher)
<img src="../img/na10.png" width="1000">

# neo4j.com/sandbox-v2

# py2neo

`py2neo` is one of Neo4j's Python drivers. It offers a fully-featured interface for interacting with your data in Neo4j.   Install `py2neo` with:  
`!pip install py2neo`.

Connect to Neo4j with the `Graph` class.

In [None]:
from py2neo import Graph
import pandas as pd

In [None]:
graph = Graph(host="18.234.106.185",
               password="november-uncertainties-defeat",
               port="37113",
               scheme='http',
               user='neo4j')

In [None]:
graph

In [None]:
def query2table(graph, query):
    return pd.DataFrame(graph.run(query).data())

In [None]:
query_1 = """
MATCH (c:Character)-->()
WITH c, count(*) AS num
RETURN min(num) AS min, max(num) AS max, avg(num) AS avg_characters, stdev(num) AS stdev
"""

In [None]:
query2table(graph, query_1)

In [None]:
query_2 = """
MATCH (c:Character)-[r]->(s:Character)
WHERE r.weight > 20
RETURN c.name AS source, s.name AS target
"""

In [None]:
query2table(graph, query_2).head()

In [None]:
query_3 = """
MATCH (c:Character)-[r]->()
WITH r.book as book, c, count(*) AS num
RETURN book, min(num) AS min, max(num) AS max, avg(num) AS avg_characters, stdev(num) AS stdev
ORDER BY book

"""

In [None]:
query2table(graph, query_3).head()

In [None]:
query_4 = """
MATCH (c:Character)-[]-()
RETURN c.name AS character, count(*) AS degree ORDER BY degree DESC LIMIT 50
"""

In [None]:
query2table(graph, query_4).head()

## jgraph

In [None]:
import jgraph as jg

In [None]:
def query2tuples(graph, query):
    return [tuple(x) for x in graph.run(query)]

In [None]:
query ="""MATCH (n)-[INTERACTS1]->(m) RETURN n.name, m.name LIMIT 50"""

In [None]:
query_tuples = query2tuples(graph, query)

In [None]:
jg.draw(query_tuples, directed=False, shader="lambert",
        default_node_color=0x383294, z=200, size=(800, 600))

In [None]:
generated = jg.generate(query_tuples)

In [None]:
colors = ['Arianne-Martell','Aegon-V-Targaryen','Catelyn-Stark','Arya-Stark']

In [None]:
for k,v in generated['nodes'].items():
    if k in colors:
        v.update({'color': 0xffaaaa})
    else:
        v.update({'color': 0x2222ff})

In [None]:
jg.draw(generated, directed=False, 
        shader="lambert", default_node_color=0x383294, z=200, size=(800, 600))

### Integration

In [None]:
import networkx as nx
import matplotlib.pyplot as plt

In [None]:
g = nx.Graph()
g.add_edges_from(query_tuples)

In [None]:
def measure2table(measure):
    table = pd.DataFrame.from_dict(measure, orient='index').reset_index()
    table.columns = ['nodes','score']
    return table.sort_values('score', ascending=False)

In [None]:
dc = nx.degree_centrality(g)

In [None]:
centrality = measure2table(dc)

In [None]:
centrality.head()