# Hands ON

<img src="images/neo4j-logo.svg" width=100px height=100px /> 

In [41]:
!pip install -q py2neo pandas matplotlib sklearn

In [21]:
from py2neo import Graph
graph = Graph("bolt://localhost:11003", auth=("neo4j", "got"))

import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

---

## Neo4j & Graph Data Science Library & Game of Thrones

A rede de Game of Thrones é um grafo **monopartido** contendo nós de personagens e suas interações nos programas de TV.
**As interações entre os personagens são agrupadas por temporadas da série.**


Por exemplo, um relacionamento **INTERACTS_SEASON1** representa uma interação entre personagens na primeira temporada,
**INTERACTS_SEASON2** significa interação na segunda temporada e assim por diante.  
O peso do relacionamento representa a força da interação, e porque dois personagens podem interagir em mais de uma única temporada, estamos lidando com um **multigrafo ponderado.**

<img src="images/got.png" width=1200px height=1200px /> 


---

### Criação do banco de dados

``` cypher
CREATE CONSTRAINT ON (c:Character) ASSERT c.id IS UNIQUE;

UNWIND range(1,7) AS season
LOAD CSV WITH HEADERS FROM "https://github.com/neo4j-apps/neuler/raw/master/sample-data/got/got-s" + season + "-nodes.csv" AS row
MERGE (c:Character {id: row.Id})
ON CREATE SET c.name = row.Label;

UNWIND range(1,7) AS season
LOAD CSV WITH HEADERS FROM "https://github.com/neo4j-apps/neuler/raw/master/sample-data/got/got-s" + season + "-edges.csv" AS row
MATCH (source:Character {id: row.Source})
MATCH (target:Character {id: row.Target})
CALL apoc.merge.relationship(source, "INTERACTS_SEASON" + season, {}, {}, target) YIELD rel
SET rel.weight = toInteger(row.Weight);
```

---

## Cypher

In [100]:
query = """
Match(c:Character {id:'NED'})-[r:INTERACTS_SEASON1]->(c1:Character) 
return c.name as character1, 
       r.weight as nr_interactions,
       c1.name as character2 
       order by nr_interactions DESC
       LIMIT 10
"""

relationships = graph.run(query).to_data_frame()

In [101]:
relationships

Unnamed: 0,character1,nr_interactions,character2
0,Ned,192,Robert
1,Ned,96,Varys
2,Ned,68,Pycelle
3,Ned,49,Sansa
4,Ned,30,Renly
5,Ned,23,Robb
6,Ned,15,Yoren
7,Ned,13,Theon
8,Ned,11,Tywin
9,Ned,11,Tyrion


---

## Graph Data Science Library

In [122]:
query = """
CALL gds.graph.create(
   'got',
   'Character',
   'INTERACTS_SEASON1'
)
YIELD graphName, nodeCount, relationshipCount, createMillis
"""

graph.run(query).to_data_frame()

Unnamed: 0,graphName,nodeCount,relationshipCount,createMillis
0,got,400,550,6


In [120]:
query = """
CALL gds.graph.list('got')
YIELD graphName, nodeQuery, relationshipQuery, nodeCount, relationshipCount, schema, creationTime, modificationTime, memoryUsage
"""

graph.run(query).to_data_frame()

Unnamed: 0,graphName,nodeQuery,relationshipQuery,nodeCount,relationshipCount,schema,creationTime,modificationTime,memoryUsage
0,got,,,400,550,"{'relationships': {'INTERACTS_SEASON1': {}}, '...",2021-06-08T10:53:34.603848000-03:00,2021-06-08T10:53:34.606970000-03:00,310 KiB


In [121]:
query = """
CALL gds.graph.drop('got')
YIELD graphName, nodeCount, relationshipCount
"""

graph.run(query).to_data_frame()

Unnamed: 0,graphName,nodeCount,relationshipCount
0,got,400,550


### Algoritmos de Grafos

``` cypher
CALL gds[.<tier>].<algorithm>.<execution-mode>[.<estimate>](
  graphName: String,
  configuration: Map
)

tier: production mode, beta, alpha
algorithm: o algoritmo a ser utilizado
execution-mode: stream, stats, mutate, write
estimate: estimating memory requirements for algorithms
```

### 1 - Path finding

#### Shortest Path

In [123]:
query = """
MATCH (c:Character {id: 'NED'} ),
      (c1:Character {id: 'BRONN'}),
p = shortestPath((c)-[:INTERACTS_SEASON1*..]-(c1))
with nodes(p) as nds
RETURN [n IN nds| n.name] AS path
"""

path = graph.run(query).to_data_frame()

In [124]:
path

Unnamed: 0,path
0,"[Ned, Rodrik Cassel, Bronn]"


### 2 - Clusterização

#### Weakly Connected Components 

In [147]:
query = """
CALL gds.wcc.stream('got') 
YIELD nodeId, componentId AS community
WITH gds.util.asNode(nodeId) AS node, community
WITH collect(node) AS allNodes, community
RETURN community, allNodes[0..10] AS nodes, size(allNodes) AS size
ORDER BY size DESC
LIMIT 10"""

wcc = graph.run(query).to_data_frame()

In [148]:
wcc

Unnamed: 0,community,nodes,size
0,0,"[{'name': 'Addam', 'id': 'ADDAM_MARBRAND', 'pa...",127
1,128,"[{'name': 'Amory', 'id': 'AMORY', 'pagerank': ...",1
2,133,"[{'name': 'Boros', 'id': 'BOROS', 'pagerank': ...",1
3,130,"[{'name': 'Billy', 'id': 'BILLY', 'pagerank': ...",1
4,131,"[{'name': 'Biter', 'id': 'BITER', 'pagerank': ...",1
5,132,"[{'name': 'Black Lorren', 'id': 'BLACK_LORREN'...",1
6,129,"[{'name': 'Barra', 'id': 'BARRA', 'pagerank': ...",1
7,134,"[{'name': 'Brienne', 'id': 'BRIENNE', 'pageran...",1
8,135,"[{'name': 'Captain's Daughter', 'id': 'CAPTAIN...",1
9,127,"[{'name': 'Alton', 'id': 'ALTON', 'pagerank': ...",1


In [162]:
query = """
CALL gds.wcc.write('got', { writeProperty: 'component' })
YIELD componentCount, nodePropertiesWritten
"""

graph.run(query).to_data_frame()

Unnamed: 0,componentCount,nodePropertiesWritten
0,274,400


---

### 3 - Centralidade

<img src="images/centalityalgos.png" width=1000px height=1000px /> Representative centrality algorithms (Source: [Needham & Hodler, 2019](https://neo4j.com/blog/graph-algorithms-community-detection-recommendations/))

#### Centralidade de Grau

In [149]:
query = """
CALL gds.alpha.degree.stream('got') YIELD nodeId, score
WITH gds.util.asNode(nodeId) AS node, score
RETURN node.name as name, score
ORDER BY score DESC
LIMIT 10
"""

centrality_degree_df = graph.run(query).to_data_frame()

In [150]:
centrality_degree_df

Unnamed: 0,name,score
0,Catelyn,30.0
1,Arya,28.0
2,Ned,23.0
3,Cersei,23.0
4,Joffrey,18.0
5,Bran,18.0
6,Daenerys,17.0
7,Jaime,16.0
8,Petyr,15.0
9,Jon,14.0


#### Betweenness Centrality

In [151]:
query = """
CALL gds.betweenness.stream('got') YIELD nodeId, score
WITH gds.util.asNode(nodeId) AS node, score
RETURN node.name as name, score
ORDER BY score DESC
LIMIT 10
"""

betweeness = graph.run(query).to_data_frame()

In [152]:
betweeness

Unnamed: 0,name,score
0,Ned,780.10928
1,Catelyn,332.132143
2,Robert,185.064835
3,Jon,167.066056
4,Tyrion,162.265934
5,Jorah,155.767857
6,Robb,142.23956
7,Joffrey,140.90583
8,Jeor,140.665385
9,Jaime,137.284524


#### PageRank

In [153]:
query = """
CALL gds.pageRank.stream('got') YIELD nodeId, score
WITH gds.util.asNode(nodeId) AS node, score
RETURN node.name as name, score
ORDER BY score DESC
LIMIT 10
"""

pagerank = graph.run(query).to_data_frame()

In [154]:
pagerank

Unnamed: 0,name,score
0,Tyrion,3.233113
1,Yoren,2.721321
2,Varys,1.659855
3,Tywin,1.605978
4,Ned,1.316527
5,Sam,1.281997
6,Robert,1.092738
7,Walder,1.088475
8,Robb,1.064555
9,Theon,1.021642


#### Write Pagerank

In [155]:
query = """
CALL gds.pageRank.write('got', {
  maxIterations: 20,
  dampingFactor: 0.85,
  writeProperty: 'pagerank'
})
YIELD nodePropertiesWritten, ranIterations
"""

pagerank = graph.run(query).to_data_frame()

#### Top 10 PageRank Season1

In [178]:
query = """
Match(c:Character) 
return c.name as character, c.pagerank as pagerank
       order by pagerank DESC
       LIMIT 10
"""

top10_pagerank = graph.run(query).to_data_frame()

In [179]:
top10_pagerank

Unnamed: 0,character,pagerank
0,Tyrion,3.233113
1,Yoren,2.721321
2,Varys,1.659855
3,Tywin,1.605978
4,Ned,1.316527
5,Sam,1.281997
6,Robert,1.092738
7,Walder,1.088475
8,Robb,1.064555
9,Theon,1.021642
