In [None]:
!pip install ipython-cypher -q

In [2]:
%load_ext cypher

## The Graph of Thrones

<center><img src="img/got-community.jpeg" align="center"/></center>

#### Clean the database

In [3]:
%%cypher http://neo4j:dextra2020@172.19.0.3:7474/db/data
Match(n)-[r]-(m) delete r,n            

187 nodes deleted.
684 relationship deleted.


#### **Load Data**

In [4]:
%%cypher http://neo4j:dextra2020@172.19.0.3:7474/db/data
CREATE CONSTRAINT ON (c:Character) ASSERT c.name IS UNIQUE

0 rows affected.


##### **Livro I**

In [5]:
%%cypher http://neo4j:dextra2020@172.19.0.3:7474/db/data
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/mathbeveridge/asoiaf/master/data/asoiaf-book1-edges.csv" AS row
MERGE (src:Character {name: row.Source})
MERGE (tgt:Character {name: row.Target})
MERGE (src)-[r:INTERACTS1]->(tgt)
ON CREATE SET r.weight = toInt(row.weight), r.book=1

187 nodes created.
1555 properties set.
684 relationships created.
187 labels added.


##### **Livro II**

In [7]:
%%cypher http://neo4j:dextra2020@172.19.0.3:7474/db/data
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/mathbeveridge/asoiaf/master/data/asoiaf-book2-edges.csv" AS row
MERGE (src:Character {name: row.Source})
MERGE (tgt:Character {name: row.Target})
MERGE (src)-[r:INTERACTS2]->(tgt)
ON CREATE SET r.weight = toInt(row.weight), r.book=2

169 nodes created.
1719 properties set.
775 relationships created.
169 labels added.


##### **Livro III**

In [8]:
%%cypher http://neo4j:dextra2020@172.19.0.3:7474/db/data
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/mathbeveridge/asoiaf/master/data/asoiaf-book3-edges.csv" AS row
MERGE (src:Character {name: row.Source})
MERGE (tgt:Character {name: row.Target})
MERGE (src)-[r:INTERACTS3]->(tgt)
ON CREATE SET r.weight = toInt(row.weight), r.book=3

142 nodes created.
2158 properties set.
1008 relationships created.
142 labels added.


##### **Livros IV e V**

In [9]:
%%cypher http://neo4j:dextra2020@172.19.0.3:7474/db/data
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/mathbeveridge/asoiaf/master/data/asoiaf-book45-edges.csv" AS row
MERGE (src:Character {name: row.Source})
MERGE (tgt:Character {name: row.Target})
MERGE (src)-[r:INTERACTS45]->(tgt)
ON CREATE SET r.weight = toInt(row.weight), r.book=45

298 nodes created.
2956 properties set.
1329 relationships created.
298 labels added.


#### **DB SCHEMA**

<center><img src="img/graph-schema.png" align="center"/></center>

### Quantos personagens temos?

In [10]:
%%cypher http://neo4j:dextra2020@172.19.0.3:7474/db/data
MATCH (c:Character)
RETURN count(c) as nr_personagens

1 rows affected.


nr_personagens
796


### Quantas interações existem em cada livro?

In [11]:
%%cypher http://neo4j:dextra2020@172.19.0.3:7474/db/data
MATCH ()-[r]->()
RETURN r.book as book, count(r) as interacoes
ORDER BY book

4 rows affected.


book,interacoes
1,684
2,775
3,1008
45,1329


### Número de interações da Arya-Stark ao longo dos livros

In [47]:
%%cypher http://neo4j:dextra2020@172.19.0.3:7474/db/data
MATCH (c:Character {name:'Arya-Stark'})-[r]->()
RETURN r.book as book, count(r) as interacoes
ORDER BY book

4 rows affected.


book,interacoes
1,27
2,37
3,36
45,24


--- 

### Diâmetro da rede

> The diameter (or geodesic) of a network is defined as the longest shortest path in the network.

<center><img src="img/network-diameter.gif" align="center"/></center>

Caminho curto mais longo na rede no segundo livro.

In [12]:
%%cypher http://neo4j:dextra2020@172.19.0.3:7474/db/data
MATCH (a:Character), (b:Character) WHERE id(a) > id(b)
MATCH p = shortestPath((a)-[:INTERACTS2*]-(b))
WITH length(p) AS len, p
ORDER BY len DESC
LIMIT 5
RETURN nodes(p) AS path, len

5 rows affected.


path,len
"[{'name': 'Steffon-Varner'}, {'name': 'Eldon-Estermont'}, {'name': 'Bryce-Caron'}, {'name': 'Renly-Baratheon'}, {'name': 'Tywin-Lannister'}, {'name': 'Arya-Stark'}, {'name': 'Gendry'}, {'name': 'Cutjack'}, {'name': 'Tarber'}]",8
"[{'name': 'Murch'}, {'name': 'Gariss'}, {'name': 'Aggar'}, {'name': 'Ramsay-Snow'}, {'name': 'Bran-Stark'}, {'name': 'Arya-Stark'}, {'name': 'Gendry'}, {'name': 'Cutjack'}, {'name': 'Kurz'}]",8
"[{'name': 'Steffon-Varner'}, {'name': 'Eldon-Estermont'}, {'name': 'Bryce-Caron'}, {'name': 'Renly-Baratheon'}, {'name': 'Tywin-Lannister'}, {'name': 'Arya-Stark'}, {'name': 'Gendry'}, {'name': 'Cutjack'}, {'name': 'Kurz'}]",8
"[{'name': 'Murch'}, {'name': 'Gariss'}, {'name': 'Aggar'}, {'name': 'Ramsay-Snow'}, {'name': 'Robb-Stark'}, {'name': 'Renly-Baratheon'}, {'name': 'Bryce-Caron'}, {'name': 'Eldon-Estermont'}, {'name': 'Steffon-Varner'}]",8
"[{'name': 'Murch'}, {'name': 'Gariss'}, {'name': 'Aggar'}, {'name': 'Ramsay-Snow'}, {'name': 'Bran-Stark'}, {'name': 'Arya-Stark'}, {'name': 'Gendry'}, {'name': 'Cutjack'}, {'name': 'Tarber'}]",8


--- 

### Medidas de Centralidade

<center><img src="img/measures.png" align="center"/></center>

--- 

### Degree Centrality

> The Degree Centrality algorithm **measures the number of incoming and outgoing relationships** from a node, and helps us find the **most popular nodes** in a graph.

The following query finds the most popular characters in the 1st book, based on the number of character interactions:

In [14]:
%%cypher http://neo4j:dextra2020@172.19.0.3:7474/db/data
CALL algo.degree.stream("Character", "INTERACTS1", {
  direction: "BOTH",
  weightProperty: "weight"
})
YIELD nodeId, score
RETURN algo.asNode(nodeId).name AS personagem, score
ORDER BY score DESC
LIMIT 10

10 rows affected.


personagem,score
Eddard-Stark,1284.0
Robert-Baratheon,941.0
Jon-Snow,784.0
Tyrion-Lannister,650.0
Sansa-Stark,545.0
Bran-Stark,531.0
Catelyn-Stark,520.0
Robb-Stark,516.0
Daenerys-Targaryen,443.0
Arya-Stark,430.0


---

### Betweenness Centrality 

<center><img src="img/betweenness-centrality.png" align="center"/></center>

> Betweenness centrality **identifies nodes that are strategically positioned in the network**, meaning that information will often travel through that person.  
Such an intermediary position gives that person **power and influence**.

Betweenness centrality is a raw count of the number of short paths that go through a given node. For example, if a node is located on a bottleneck between two large communities, then it will have high betweenness.

In [20]:
%%cypher http://neo4j:dextra2020@172.19.0.3:7474/db/data
CALL algo.betweenness.stream("Character", "INTERACTS1", {
  direction: "BOTH"
})
YIELD nodeId, centrality
RETURN algo.asNode(nodeId).name as personagem, centrality
ORDER BY centrality DESC
LIMIT 10

10 rows affected.


personagem,centrality
Eddard-Stark,4638.534951255039
Robert-Baratheon,3682.391035767813
Tyrion-Lannister,3272.606015526035
Jon-Snow,2952.057281565677
Catelyn-Stark,2604.7556467555924
Daenerys-Targaryen,1484.2780232288708
Robb-Stark,1255.6896562838226
Drogo,1115.0946392450378
Bran-Stark,960.0319135675136
Sansa-Stark,639.0769144474225


In [27]:
%%cypher http://neo4j:dextra2020@172.19.0.3:7474/db/data
CALL algo.betweenness("Character", "INTERACTS1", {direction: "BOTH", writeProperty: "book1BetweennessCentrality"})

1 rows affected.


loadMillis,computeMillis,writeMillis,nodes,minCentrality,maxCentrality,sumCentrality
3,6,4,796,-1.0,-1.0,-1.0


--- 

### Closeness Centrality

Closeness centrality is a **way of detecting nodes that are able to spread information very efficiently through a graph.**  
The closeness centrality of a node **measures its average farness (inverse distance) to all other nodes**. 

**Nodes with a high closeness score have the shortest distances to all other nodes.**

In [21]:
%%cypher http://neo4j:dextra2020@172.19.0.3:7474/db/data
CALL algo.closeness.stream("Character", "INTERACTS1", {
  direction: "BOTH"
})
YIELD nodeId, centrality
RETURN algo.asNode(nodeId).name  as personagem, centrality
ORDER BY centrality DESC
LIMIT 10

10 rows affected.


personagem,centrality
Eddard-Stark,0.5636363636363636
Robert-Baratheon,0.5454545454545454
Tyrion-Lannister,0.510989010989011
Catelyn-Stark,0.5054347826086957
Robb-Stark,0.4973262032085561
Jon-Snow,0.493368700265252
Sansa-Stark,0.4894736842105263
Bran-Stark,0.4869109947643979
Cersei-Lannister,0.484375
Joffrey-Baratheon,0.4806201550387597


--- 

### PageRank

<center><img src="img/pagerank.svg.png" width="360px" height="360px" align="center"/></center>

> PageRank captures how effectively you are taking advantage of your network contacts. 

In our context, PageRank centrality nicely captures narrative tension.   
Indeed, major developments occur when two important characters interact.

In [26]:
%%cypher http://neo4j:dextra2020@172.19.0.3:7474/db/data
CALL algo.pageRank.stream("Character", "INTERACTS1", {iterations:20, dampingFactor:0.85})
YIELD nodeId, score
RETURN algo.asNode(nodeId).name  as personagem, score
ORDER BY score DESC
LIMIT 10

10 rows affected.


personagem,score
Tyrion-Lannister,4.369831129789961
Varys,3.544865461588527
Tywin-Lannister,2.984199425279815
Robert-Baratheon,2.07448304103602
Sansa-Stark,1.9331457486114971
Walder-Frey,1.883857900665032
Robb-Stark,1.3012971131399649
Willis-Wode,1.209997296821769
Jon-Snow,1.1871823732254736
Vardis-Egen,1.1814906569102148


In [29]:
%%cypher http://neo4j:dextra2020@172.19.0.3:7474/db/data
CALL algo.pageRank("Character", "INTERACTS1", {direction: "BOTH", writeProperty:'book1PageRank'})

1 rows affected.


nodes,iterations,loadMillis,computeMillis,writeMillis,dampingFactor,write,writeProperty
796,20,3,23,3,0.85,True,book1PageRank


---

### Community Detection

<center><img src="img/community.png" width="360px" height="360px" align="center"/></center>

In [33]:
%%cypher http://neo4j:dextra2020@172.19.0.3:7474/db/data
CALL algo.labelPropagation(
  'MATCH (c:Character) RETURN id(c) as id',
  'MATCH (c:Character)-[rel]-(c2) RETURN id(c) as source, id(c2) as target, SUM(rel.weight) as weight',
  {graph:'cypher', partitionProperty: 'community'})

1 rows affected.


loadMillis,computeMillis,writeMillis,postProcessingMillis,nodes,communityCount,iterations,didConverge,p1,p5,p10,p25,p50,p75,p90,p95,p99,p100,weightProperty,write,partitionProperty,writeProperty
152,8,2,2,796,148,1,False,1,1,1,1,2,3,9,28,85,98,weight,True,community,community


### Querying Communities

In [43]:
%%cypher http://neo4j:dextra2020@172.19.0.3:7474/db/data
MATCH (c:Character)
WHERE exists(c.community)
RETURN c.community as community, count(c) AS count
ORDER BY count DESC limit 10

10 rows affected.


community,count
336,98
340,85
218,72
232,67
331,38
269,33
333,30
328,28
520,18
242,13


> It’d be good to know who are the influential people in each community. To do that we’ll need to calculate a PageRank score for each character across all the books:

In [35]:
%%cypher http://neo4j:dextra2020@172.19.0.3:7474/db/data
CALL algo.pageRank('MATCH (c:Character) RETURN id(c) as id', 'MATCH (c:Character)-[rel]-(c2) RETURN id(c) as source, id(c2) as target, SUM(rel.weight) as weight', {graph:'cypher', writeProperty: 'pageRank'})

1 rows affected.


nodes,iterations,loadMillis,computeMillis,writeMillis,dampingFactor,write,writeProperty
796,20,55,10,2,0.85,True,pageRank


In [36]:
%%cypher http://neo4j:dextra2020@172.19.0.3:7474/db/data
MATCH (c:Character)
WHERE exists(c.community)
WITH c ORDER BY c.pageRank DESC
RETURN c.community as cluster, count(*) AS count, collect(c.name)[..10] as influencers
ORDER BY count DESC limit 10

10 rows affected.


cluster,count,influencers
336,98,"['Tyrion-Lannister', 'Jaime-Lannister', 'Cersei-Lannister', 'Joffrey-Baratheon', 'Robert-Baratheon', 'Tywin-Lannister', 'Petyr-Baelish', 'Renly-Baratheon', 'Sandor-Clegane', 'Varys']"
340,85,"['Jon-Snow', 'Samwell-Tarly', 'Mance-Rayder', 'Jeor-Mormont', 'Aemon-Targaryen-(Maester-Aemon)', 'Janos-Slynt', 'Eddison-Tollett', 'Tormund', 'Bowen-Marsh', 'Craster']"
218,72,"['Catelyn-Stark', 'Robb-Stark', 'Bran-Stark', 'Rodrik-Cassel', 'Roose-Bolton', 'Edmure-Tully', 'Luwin', 'Walder-Frey', 'Brynden-Tully', 'Rickon-Stark']"
232,67,"['Daenerys-Targaryen', 'Barristan-Selmy', 'Jorah-Mormont', 'Hizdahr-zo-Loraq', 'Quentyn-Martell', 'Drogo', 'Daario-Naharis', 'Rhaegar-Targaryen', 'Irri', 'Viserys-Targaryen']"
331,38,"['Stannis-Baratheon', 'Davos-Seaworth', 'Melisandre', 'Selyse-Florent', 'Wyman-Manderly', 'Shireen-Baratheon', 'Axell-Florent', 'Cressen', 'Godry-Farring', 'Narbert-Grandison']"
269,33,"['Brienne-of-Tarth', 'Addam-Marbrand', 'Vargo-Hoat', 'Sybell-Spicer', 'Hyle-Hunt', 'Cleos-Frey', 'Shagwell', 'Dick-Crabb', 'Meribald', 'Walton']"
333,30,"['Theon-Greyjoy', 'Asha-Greyjoy', 'Ramsay-Snow', 'Balon-Greyjoy', 'Lorren', 'Aggar', 'Sour-Alyn', 'Wex-Pyke', 'Skinner', 'Barbrey-Dustin']"
328,28,"['Arya-Stark', 'Gendry', 'Hot-Pie', 'Harwin', 'Rorge', 'Tom-of-Sevenstreams', 'Weese', 'Lommy-Greenhands', 'Jaqen-Hghar', 'Lothor-Brune']"
520,18,"['Victarion-Greyjoy', 'Euron-Greyjoy', 'Rodrik-Harlaw', 'Moqorro', 'Nute', 'Hotho-Harlaw', 'Baelor-Blacktyde', 'Dunstan-Drumm', 'Ralf-Stonehouse', 'Humfrey-Hewett']"
242,13,"['Jory-Cassel', 'Jon-Arryn', 'Brandon-Stark', 'Tomard', 'Alyn', 'Lyanna-Stark', 'Vayon-Poole', 'Desmond', 'Willam-Dustin', 'Jacks']"


### Intra community PageRank

> We can also calculate the PageRank within communities.  
Run the following query to calculate the page rank for the 2nd largest community:

In [37]:
%%cypher http://neo4j:dextra2020@172.19.0.3:7474/db/data
MATCH (c:Character) WHERE EXISTS(c.community)
WITH c.community AS communityId, COUNT(*) AS count
ORDER BY count DESC
SKIP 1 LIMIT 1
CALL apoc.cypher.doIt(
  "CALL algo.pageRank(
    'MATCH (c:Character) WHERE c.community =" + communityId + " RETURN id(c) as id',
    'MATCH (c:Character)-[rel]->(c2) WHERE c.community =" + communityId + " AND c2.community =" + communityId + " RETURN id(c) as source,id(c2) as target, sum(rel.weight) as weight',
    {graph:'cypher', writeProperty: 'communityPageRank'}) YIELD nodes RETURN count(*)", {})
YIELD value
RETURN value

1 rows affected.


value
{'count(*)': 1}


In [40]:
%%cypher http://neo4j:dextra2020@172.19.0.3:7474/db/data
MATCH (c:Character) WHERE exists(c.community)
WITH c.community AS communityId, COUNT(*) AS count
ORDER BY count DESC
SKIP 1 LIMIT 1
MATCH (c:Character) WHERE c.community = communityId
RETURN c.name as personagem, c.communityPageRank as communityPageRank
ORDER BY c.communityPageRank DESC
LIMIT 10

10 rows affected.


personagem,communityPageRank
Jon-Snow,1.883386923642025
Samwell-Tarly,1.5686776117596155
Ygritte,1.4361659325170315
Spare-Boot,0.9189390271020416
Weeper,0.9016901271414542
Small-Paul,0.8404659449293452
Tormund,0.8194725042838826
Varamyr,0.6442103193062304
Xhondo,0.5944586642742257
Theobald,0.5944586642742257


### Referências

[Exploratory Data Analysis](http://guides.neo4j.com/data_science/01_eda.html)  
[Graph Algorithms](https://guides.neo4j.com/sandbox/graph-algorithms/)  
[A Song of Ice and Fire - Dataset](https://github.com/mathbeveridge/asoiaf/tree/master/data)  
[The Neo4j Graph Algorithms User Guide v3.5](https://neo4j.com/docs/graph-algorithms/current/)  
[Personalized Product Recommendations with Neo4j](http://guides.neo4j.com/sandbox/recommendations)  
[cypher-refcard](https://neo4j.com/docs/cypher-refcard/current/?ref=browser-guide)  
[Papers With Code](https://paperswithcode.com/area/graphs)  

---

PLAY ON NEO4J :play http://guides.neo4j.com/sandbox/legis-graph