In [1]:
!pip install ipython-cypher -q

In [2]:
%load_ext cypher

## The Graph of Thrones

<center><img src="img/got.jpeg" align="center"/></center>

#### Clean the database

In [3]:
%%cypher http://neo4j:dextra2020@172.19.0.2:7474/db/data
Match(n)-[r]-(m) delete r,n            

0 rows affected.


#### **Load Data**

In [6]:
%%cypher http://neo4j:dextra2020@172.19.0.2:7474/db/data
CREATE CONSTRAINT ON (c:Character) ASSERT c.name IS UNIQUE

0 rows affected.


In [7]:
%%cypher http://neo4j:dextra2020@172.19.0.2:7474/db/data
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/mathbeveridge/asoiaf/master/data/asoiaf-book1-edges.csv" AS row
MERGE (src:Character {name: row.Source})
MERGE (tgt:Character {name: row.Target})
MERGE (src)-[r:INTERACTS1]->(tgt)
ON CREATE SET r.weight = toInt(row.weight), r.book=1

187 nodes created.
1555 properties set.
684 relationships created.
187 labels added.


In [8]:
%%cypher http://neo4j:dextra2020@172.19.0.2:7474/db/data
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/mathbeveridge/asoiaf/master/data/asoiaf-book2-edges.csv" AS row
MERGE (src:Character {name: row.Source})
MERGE (tgt:Character {name: row.Target})
MERGE (src)-[r:INTERACTS2]->(tgt)
ON CREATE SET r.weight = toInt(row.weight), r.book=2

169 nodes created.
1719 properties set.
775 relationships created.
169 labels added.


In [9]:
%%cypher http://neo4j:dextra2020@172.19.0.2:7474/db/data
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/mathbeveridge/asoiaf/master/data/asoiaf-book3-edges.csv" AS row
MERGE (src:Character {name: row.Source})
MERGE (tgt:Character {name: row.Target})
MERGE (src)-[r:INTERACTS3]->(tgt)
ON CREATE SET r.weight = toInt(row.weight), r.book=3

142 nodes created.
2158 properties set.
1008 relationships created.
142 labels added.


In [10]:
%%cypher http://neo4j:dextra2020@172.19.0.2:7474/db/data
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/mathbeveridge/asoiaf/master/data/asoiaf-book45-edges.csv" AS row
MERGE (src:Character {name: row.Source})
MERGE (tgt:Character {name: row.Target})
MERGE (src)-[r:INTERACTS45]->(tgt)
ON CREATE SET r.weight = toInt(row.weight), r.book=45

298 nodes created.
2956 properties set.
1329 relationships created.
298 labels added.


### Quantos personagens temos?

In [12]:
%%cypher http://neo4j:dextra2020@172.19.0.2:7474/db/data
MATCH (c:Character)
RETURN count(c) as nr_personagens

1 rows affected.


nr_personagens
796


### Quantas interações existem em cada livro?

In [13]:
%%cypher http://neo4j:dextra2020@172.19.0.2:7474/db/data
MATCH ()-[r]->()
RETURN r.book as book, count(r) as interacoes
ORDER BY book

4 rows affected.


book,interacoes
1,684
2,775
3,1008
45,1329


### Diametro da rede

> The diameter (or geodesic) of a network is defined as the longest shortest path in the network.

<center><img src="img/network-diameter.gif" align="center"/></center>

The following query will find the longest shortest path in the network for the second book:

In [16]:
%%cypher http://neo4j:dextra2020@172.19.0.2:7474/db/data
MATCH (a:Character), (b:Character) WHERE id(a) > id(b)
MATCH p = shortestPath((a)-[:INTERACTS2*]-(b))
WITH length(p) AS len, p
ORDER BY len DESC
LIMIT 5
RETURN nodes(p) AS path, len

5 rows affected.


path,len
"[{'name': 'Steffon-Varner'}, {'name': 'Eldon-Estermont'}, {'name': 'Bryce-Caron'}, {'name': 'Stannis-Baratheon'}, {'name': 'Tywin-Lannister'}, {'name': 'Arya-Stark'}, {'name': 'Gendry'}, {'name': 'Cutjack'}, {'name': 'Tarber'}]",8
"[{'name': 'Murch'}, {'name': 'Gariss'}, {'name': 'Aggar'}, {'name': 'Ramsay-Snow'}, {'name': 'Roose-Bolton'}, {'name': 'Arya-Stark'}, {'name': 'Gendry'}, {'name': 'Cutjack'}, {'name': 'Kurz'}]",8
"[{'name': 'Steffon-Varner'}, {'name': 'Eldon-Estermont'}, {'name': 'Bryce-Caron'}, {'name': 'Stannis-Baratheon'}, {'name': 'Tywin-Lannister'}, {'name': 'Arya-Stark'}, {'name': 'Gendry'}, {'name': 'Cutjack'}, {'name': 'Kurz'}]",8
"[{'name': 'Murch'}, {'name': 'Gariss'}, {'name': 'Aggar'}, {'name': 'Ramsay-Snow'}, {'name': 'Robb-Stark'}, {'name': 'Stannis-Baratheon'}, {'name': 'Bryce-Caron'}, {'name': 'Eldon-Estermont'}, {'name': 'Steffon-Varner'}]",8
"[{'name': 'Murch'}, {'name': 'Gariss'}, {'name': 'Aggar'}, {'name': 'Ramsay-Snow'}, {'name': 'Roose-Bolton'}, {'name': 'Arya-Stark'}, {'name': 'Gendry'}, {'name': 'Cutjack'}, {'name': 'Tarber'}]",8


### Pivotal nodes

> A node is said to be pivotal if it lies on all shortest paths between two other nodes in the network. We can find all pivotal nodes in the network.

In [17]:
%%cypher http://neo4j:dextra2020@172.19.0.2:7474/db/data
MATCH (a:Character), (b:Character) WHERE id(a) > id(b)
MATCH p = allShortestPaths((a)-[:INTERACTS1*]-(b))
WITH collect(p) AS paths, a, b
UNWIND nodes(head(paths)) as c // first path
WITH *
WHERE NOT c IN [a,b]
AND all(path IN tail(paths) WHERE c IN nodes(path))
RETURN a.name, b.name, c.name AS PivotalNode, length(head(paths)) AS pathLength, length(paths) AS pathCount
SKIP 490
LIMIT 10

10 rows affected.


a.name,b.name,PivotalNode,pathLength,pathCount
Galbart-Glover,Dolf,Tyrion-Lannister,4,3
Galbart-Glover,Dolf,Shagga,4,3
Galbart-Glover,Donal-Noye,Jon-Snow,3,2
Galbart-Glover,Doreah,Eddard-Stark,3,1
Galbart-Glover,Doreah,Daenerys-Targaryen,3,1
Galbart-Glover,Dywen,Jon-Snow,3,2
Galbart-Glover,Fogo,Drogo,5,4
Galbart-Glover,Fogo,Ogo,5,4
Gared,Bowen-Marsh,Jeor-Mormont,2,1
Gared,Bran-Stark,Jeor-Mormont,2,1


### Medidas de Centralidade

<center><img src="img/measures.png" align="center"/></center>

### Degree Centrality

> The Degree Centrality algorithm **measures the number of incoming and outgoing relationships** from a node, and helps us find the **most popular nodes** in a graph.

The following query finds the most popular characters in the 1st book, based on the number of character interactions:

In [50]:
%%cypher http://neo4j:dextra2020@172.19.0.2:7474/db/data
CALL algo.degree.stream("Character", "INTERACTS1", {
  direction: "BOTH",
  weightProperty: "weight"
})
YIELD nodeId, score
RETURN algo.asNode(nodeId).name AS personagem, score
ORDER BY score DESC
LIMIT 10

10 rows affected.


personagem,score
Eddard-Stark,1284.0
Robert-Baratheon,941.0
Jon-Snow,784.0
Tyrion-Lannister,650.0
Sansa-Stark,545.0
Bran-Stark,531.0
Catelyn-Stark,520.0
Robb-Stark,516.0
Daenerys-Targaryen,443.0
Arya-Stark,430.0


---

### Betweenness Centrality 

<center><img src="img/betweenness-centrality.png" align="center"/></center>

> Betweenness centrality **identifies nodes that are strategically positioned in the network**, meaning that information will often travel through that person.  
Such an intermediary position gives that person **power and influence**.

Betweenness centrality is a raw count of the number of short paths that go through a given node. For example, if a node is located on a bottleneck between two large communities, then it will have high betweenness.

In [22]:
%%cypher http://neo4j:dextra2020@172.19.0.2:7474/db/data
CALL algo.betweenness.stream("Character", "INTERACTS1", {
  direction: "BOTH"
})
YIELD nodeId, centrality
RETURN algo.asNode(nodeId).name, centrality
ORDER BY centrality DESC
LIMIT 10

10 rows affected.


algo.asNode(nodeId).name,centrality
Eddard-Stark,4638.534951255041
Robert-Baratheon,3682.3910357678137
Tyrion-Lannister,3272.606015526037
Jon-Snow,2952.057281565675
Catelyn-Stark,2604.755646755592
Daenerys-Targaryen,1484.2780232288706
Robb-Stark,1255.6896562838217
Drogo,1115.094639245037
Bran-Stark,960.0319135675138
Sansa-Stark,639.0769144474223


### Closeness Centrality

Closeness centrality is a way of detecting nodes that are able to spread information very efficiently through a graph.  
The closeness centrality of a node measures its average farness (inverse distance) to all other nodes. Nodes with a high closeness score have the shortest distances to all other nodes.

In [25]:
%%cypher http://neo4j:dextra2020@172.19.0.2:7474/db/data
CALL algo.closeness.stream("Character", "INTERACTS1", {
  direction: "BOTH"
})
YIELD nodeId, centrality
RETURN algo.asNode(nodeId).name  as personagem, centrality
ORDER BY centrality DESC
LIMIT 10

10 rows affected.


personagem,centrality
Eddard-Stark,0.5636363636363636
Robert-Baratheon,0.5454545454545454
Tyrion-Lannister,0.510989010989011
Catelyn-Stark,0.5054347826086957
Robb-Stark,0.4973262032085561
Jon-Snow,0.493368700265252
Sansa-Stark,0.4894736842105263
Bran-Stark,0.4869109947643979
Cersei-Lannister,0.484375
Joffrey-Baratheon,0.4806201550387597


### PageRank

<center><img src="img/pagerank.svg.png" width="360px" height="360px" align="center"/></center>

> PageRank captures how effectively you are taking advantage of your network contacts. In our context, PageRank centrality nicely captures narrative tension. Indeed, major developments occur when two important characters interact.

In [60]:
%%cypher http://neo4j:dextra2020@172.19.0.2:7474/db/data
CALL algo.betweenness("Character", "INTERACTS1", {direction: "BOTH", writeProperty: "book1BetweennessCentrality"})

1 rows affected.


loadMillis,computeMillis,writeMillis,nodes,minCentrality,maxCentrality,sumCentrality
4,8,10,796,-1.0,-1.0,-1.0


In [61]:
%%cypher http://neo4j:dextra2020@172.19.0.2:7474/db/data
CALL algo.pageRank("Character", "INTERACTS1", {direction: "BOTH", writeProperty:'book1PageRank'})

1 rows affected.


nodes,iterations,loadMillis,computeMillis,writeMillis,dampingFactor,write,writeProperty
796,20,4,5,3,0.85,True,book1PageRank


In [62]:
%%cypher http://neo4j:dextra2020@172.19.0.2:7474/db/data
MATCH (c:Character)
WITH c, [(c)-[r:INTERACTS1]-(other) | {character: other.name, weight: r.weight}] AS interactions
RETURN c.name, c.book1PageRank, c.book1BetweennessCentrality,
       apoc.coll.sum([i in interactions | i.weight]) AS totalInteractions,
       [i in apoc.coll.reverse(apoc.coll.sortMaps(interactions, 'weight'))[..5] | i.character] as charactersInteractedWith
ORDER BY c.book1PageRank DESC
LIMIT 5

5 rows affected.


c.name,c.book1PageRank,c.book1BetweennessCentrality,totalInteractions,charactersInteractedWith
Eddard-Stark,8.216258445568382,4638.534951255041,1284.0,"['Beric-Dondarrion', 'Balon-Greyjoy', 'Daryn-Hornwood', 'Wyl-(guard)', 'Wylla']"
Tyrion-Lannister,5.966885147057473,3272.606015526037,650.0,"['Timett', 'Sansa-Stark', 'Theon-Greyjoy', 'Tommen-Baratheon', 'Varys']"
Catelyn-Stark,5.452866427600384,2604.755646755592,520.0,"['Colemon', 'Mychel-Redfort', 'Robert-Arryn', 'Eon-Hunter', 'Nestor-Royce']"
Robert-Baratheon,5.354372273758053,3682.3910357678137,941.0,"['Lancel-Lannister', 'Meryn-Trant', 'Benjen-Stark', 'Jorah-Mormont', 'Hoster-Tully']"
Jon-Snow,4.851433810405431,2952.057281565675,784.0,"['Jory-Cassel', 'Rodrik-Cassel', 'Matthar', 'Chett', 'Cersei-Lannister']"


### Community Detection

In [63]:
%%cypher http://neo4j:dextra2020@172.19.0.2:7474/db/data
CALL algo.labelPropagation(
  'MATCH (c:Character) RETURN id(c) as id',
  'MATCH (c:Character)-[rel]-(c2) RETURN id(c) as source, id(c2) as target, SUM(rel.weight) as weight',
  {graph:'cypher', partitionProperty: 'community'})

1 rows affected.


loadMillis,computeMillis,writeMillis,postProcessingMillis,nodes,communityCount,iterations,didConverge,p1,p5,p10,p25,p50,p75,p90,p95,p99,p100,weightProperty,write,partitionProperty,writeProperty
90,3,1,7,796,131,1,False,1,1,1,1,2,3,14,29,67,77,weight,True,community,community


### Querying Communities

In [64]:
%%cypher http://neo4j:dextra2020@172.19.0.2:7474/db/data
MATCH (c:Character)
WHERE exists(c.community)
RETURN c.community, count(*) AS count
ORDER BY count DESC limit 10

10 rows affected.


c.community,count
62,77
7,67
65,65
119,62
114,47
289,45
182,38
1,29
238,25
25,22


> It’d be good to know who are the influential people in each community. To do that we’ll need to calculate a PageRank score for each character across all the books:

In [68]:
%%cypher http://neo4j:dextra2020@172.19.0.2:7474/db/data
CALL algo.pageRank('MATCH (c:Character) RETURN id(c) as id', 'MATCH (c:Character)-[rel]-(c2) RETURN id(c) as source, id(c2) as target, SUM(rel.weight) as weight', {graph:'cypher', writeProperty: 'pageRank'})

1 rows affected.


nodes,iterations,loadMillis,computeMillis,writeMillis,dampingFactor,write,writeProperty
796,20,219,18,9,0.85,True,pageRank


In [67]:
%%cypher http://neo4j:dextra2020@172.19.0.2:7474/db/data
MATCH (c:Character)
WHERE exists(c.community)
WITH c ORDER BY c.pageRank DESC
RETURN c.community as cluster, count(*) AS count, collect(c.name)[..10] as influencers
ORDER BY count DESC limit 10

10 rows affected.


cluster,count,influencers
62,77,"['Jon-Snow', 'Samwell-Tarly', 'Mance-Rayder', 'Jeor-Mormont', 'Aemon-Targaryen-(Maester-Aemon)', 'Janos-Slynt', 'Eddison-Tollett', 'Tormund', 'Bowen-Marsh', 'Qhorin-Halfhand']"
7,67,"['Jaime-Lannister', 'Cersei-Lannister', 'Catelyn-Stark', 'Sansa-Stark', 'Joffrey-Baratheon', 'Brienne-of-Tarth', 'Tommen-Baratheon', 'Margaery-Tyrell', 'Loras-Tyrell', 'Myrcella-Baratheon']"
65,65,"['Daenerys-Targaryen', 'Jorah-Mormont', 'Hizdahr-zo-Loraq', 'Quentyn-Martell', 'Drogo', 'Daario-Naharis', 'Rhaegar-Targaryen', 'Irri', 'Viserys-Targaryen', 'Belwas']"
119,62,"['Tyrion-Lannister', 'Tywin-Lannister', 'Varys', 'Bronn', 'Oberyn-Martell', 'Jon-Connington', 'Aegon-Targaryen-(son-of-Rhaegar)', 'Podrick-Payne', 'Shae', 'Haldon']"
114,47,"['Stannis-Baratheon', 'Davos-Seaworth', 'Renly-Baratheon', 'Melisandre', 'Selyse-Florent', 'Wyman-Manderly', 'Shireen-Baratheon', 'Axell-Florent', 'Cressen', 'Mathis-Rowan']"
289,45,"['Theon-Greyjoy', 'Asha-Greyjoy', 'Ramsay-Snow', 'Lorren', 'Jeyne-Poole', 'Hagen', 'Aggar', 'Galbart-Glover', 'Sour-Alyn', 'Skinner']"
182,38,"['Arya-Stark', 'Gendry', 'Yoren', 'Beric-Dondarrion', 'Lem', 'Hot-Pie', 'Harwin', 'Tom-of-Sevenstreams', 'Thoros-of-Myr', 'Weese']"
1,29,"['Robb-Stark', 'Bran-Stark', 'Rodrik-Cassel', 'Luwin', 'Rickon-Stark', 'Meera-Reed', 'Hodor', 'Jojen-Reed', 'Osha', 'Nan']"
238,25,"['Victarion-Greyjoy', 'Aeron-Greyjoy', 'Euron-Greyjoy', 'Balon-Greyjoy', 'Hotho-Harlaw', 'Red-Oarsman', 'Lucas-Codd', 'Baelor-Blacktyde', 'Dunstan-Drumm', 'Ralf-Stonehouse']"
25,22,"['Petyr-Baelish', 'Pycelle', 'Lysa-Arryn', 'Robert-Arryn', 'Jory-Cassel', 'Jon-Arryn', 'Yohn-Royce', 'Marillion', 'Brandon-Stark', 'Vayon-Poole']"


### Intra community PageRank

> We can also calculate the PageRank within communities.
Run the following query to calculate the page rank for the 2nd largest community:

In [44]:
%%cypher http://neo4j:dextra2020@172.19.0.2:7474/db/data
MATCH (c:Character) WHERE EXISTS(c.community)
WITH c.community AS communityId, COUNT(*) AS count
ORDER BY count DESC
SKIP 1 LIMIT 1
CALL apoc.cypher.doIt(
  "CALL algo.pageRank(
    'MATCH (c:Character) WHERE c.community =" + communityId + " RETURN id(c) as id',
    'MATCH (c:Character)-[rel]->(c2) WHERE c.community =" + communityId + " AND c2.community =" + communityId + " RETURN id(c) as source,id(c2) as target, sum(rel.weight) as weight',
    {graph:'cypher', writeProperty: 'communityPageRank'}) YIELD nodes RETURN count(*)", {})
YIELD value
RETURN value

1 rows affected.


value
{'count(*)': 1}


In [45]:
%%cypher http://neo4j:dextra2020@172.19.0.2:7474/db/data
MATCH (c:Character) WHERE exists(c.community)
WITH c.community AS communityId, COUNT(*) AS count
ORDER BY count DESC
SKIP 1 LIMIT 1
MATCH (c:Character) WHERE c.community = communityId
RETURN c.name, c.communityPageRank
ORDER BY c.communityPageRank DESC
LIMIT 10

10 rows affected.


c.name,c.communityPageRank
Jon-Snow,2.2560885736795018
Ygritte,1.6987942064295838
Samwell-Tarly,1.4899245301607456
Tormund,1.203153118578281
Spare-Boot,1.1767029713863306
Small-Paul,0.8839200183293655
Weeper,0.8328483511290595
Pypar,0.8043644331609571
Theobald,0.7832179276515601
Satin,0.7807708142214981


### Referências

http://guides.neo4j.com/data_science/01_eda.html

https://guides.neo4j.com/sandbox/graph-algorithms/

https://github.com/mathbeveridge/asoiaf/tree/master/data

ON NEO4J :play http://guides.neo4j.com/sandbox/legis-graph