# Neo4j实习——《权力的游戏》人物关系分析

**数据来源：**《Network of Thrones》Andrew Beveridge and Jie Shan
- https://www.maa.org/sites/default/files/pdf/Mathhorizons/NetworkofThrones.pdf
- stormofswords.csv

**数据模型：**(:Character {name})-[:INTERACTS {weight}]->(:Character {name})
- 带有标签Character的节点代表小说中的角色
- 用单向关系类型INTERACTS代表小说中的角色有过接触
- 节点属性会存储角色的名字name
- 两角色间接触的次数作为关系的属性——权重weight

## Outline

1. 导入数据，输出数据量。同时做唯一限制性约束，确保词汇节点name唯一。
2. 可视化显示部分数据。
3. 统计知识图谱中词汇数量。
4. 统计每个词汇相关的其他词汇数目，输出最小值、最大值、平均值、标准差。
5. 查询网络直径及其长度。
6. 任选两个词汇节点，分别查询任意最短路径和全部最短路径。
7. 查询网络中的关键节点，通过可视化验证。
8. 分析词汇节点中心度（度中心性、介数中心性、紧度中心性）。

In [1]:
!pip install py2neo

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple


In [2]:
from py2neo import Graph

# 连接Neo4j数据库，输入地址、用户名、密码
graph = Graph('http://localhost:7474', auth=('neo4j', 'neo4j'))

### 1.导入数据，输出数据量。同时做唯一限制性约束，确保词汇节点name唯一。

In [3]:
# 清空数据库原有的图，确保环境空白
graph.run('MATCH (n) DETACH DELETE n')
graph.run('DROP CONSTRAINT ON (c:Character) ASSERT c.name IS UNIQUE')

In [4]:
# 首先创建节点c，并做唯一限制性约束，c.name唯一，保证schema的完整性
graph.run('CREATE CONSTRAINT ON (c:Character) ASSERT c.name IS UNIQUE')
# 一旦约束创建会相应地创建索引，这将有助于通过角色名字查询的性能

In [5]:
# 将数据下载到本地，并加载到Neo4j中
# MERGE匹配某个节点，如果不存在就创建，存在就返回
for record in graph.run('''
LOAD CSV WITH HEADERS FROM "file:/wordnet-valid.csv" AS row
MERGE (src:Character {name: row.Source})
MERGE (tgt:Character {name: row.Target})
MERGE (src)-[r:INTERACTS]->(tgt)
SET r.weight = toInteger(row.Weight)
RETURN count(*) AS paths_created
'''):
    print(record)

5000


### 2.可视化显示部分数据。

In [6]:
# 可视化整个图形
for record in graph.run('''
MATCH p=(:Character)-[r:INTERACTS]-(:Character)
RETURN p, r.weight LIMIT 30
'''):
    print(record['p'], 'weight=', record['r.weight'])

(2174461)-[:INTERACTS {}]->(02176268) weight= None
(5074057)-[:INTERACTS {}]->(02310895) weight= None
(8390511)-[:INTERACTS {}]->(08198398) weight= None
(8390511)-[:INTERACTS {}]->(08199025) weight= None
(2045024)-[:INTERACTS {}]->(02046321) weight= None
(4758181)-[:INTERACTS {}]->(04757864) weight= None
(9419536)-[:INTERACTS {}]->(09411430) weight= None
(12165384)-[:INTERACTS {}]->(12163824) weight= None
(9384921)-[:INTERACTS {}]->(08853741) weight= None
(4881998)-[:INTERACTS {}]->(04881829) weight= None
(4881998)-[:INTERACTS {}]->(01299888) weight= None
(612652)-[:INTERACTS {}]->(01004072) weight= None
(2400139)-[:INTERACTS {}]->(02422860) weight= None
(2400139)-[:INTERACTS {}]->(02426339) weight= None
(2400139)-[:INTERACTS {}]->(02420389) weight= None
(5846932)-[:INTERACTS {}]->(01633173) weight= None
(8847694)-[:INTERACTS {}]->(08975106) weight= None
(12815925)-[:INTERACTS {}]->(12822650) weight= None
(1036194)-[:INTERACTS {}]->(01035853) weight= None
(15214840)-[:INTERACTS {}]->(1

### 3.统计知识图谱中词汇数量。

In [7]:
# 人物数量
for record in graph.run('''
MATCH (c:Character) 
RETURN count(c)
'''):
    print(record)

8428


### 4.统计每个词汇相关的其他词汇数目，输出最小值、最大值、平均值、标准差。

In [8]:
# 统计每个角色接触的其它角色数目
# 最小值、最大值、平均值、标准差
for record in graph.run('''
MATCH (c:Character)-[:INTERACTS]->()
WITH c, count(*) AS num
RETURN min(num) AS min, max(num) AS max, avg(num) AS avg, stdev(num) AS stdev
'''):
    print('min=', record['min'], 'max=', record['max'], 'avg=', record['avg'], 'stdev=', record['stdev'])

min= 1 max= 15 avg= 1.1731581417175057 stdev= 0.7948746418005416


### 5.查询网络直径及其长度。

In [9]:
# 网络直径/两点间最短路径的最大值
for record in graph.run('''
MATCH (a:Character), (b:Character) WHERE id(a) > id(b)
MATCH p=shortestPath((a)-[:INTERACTS*]-(b))
RETURN length(p) AS len, [x IN nodes(p) | x.name] AS path
ORDER BY len DESC LIMIT 20
'''):
    print('len=', record['len'], 'path=', record['path'])

len= 10 path= ['02429810', '792471', '01629958', '1617192', '00015388', '1313093', '07940552', '12992464', '11590783', '13063784', '13063046']
len= 9 path= ['02429810', '792471', '01629958', '1617192', '00015388', '1313093', '07940552', '12992464', '11590783', '12974457']
len= 9 path= ['13031690', '11590783', '12992464', '07940552', '1313093', '00015388', '1617192', '01629958', '792471', '02429810']
len= 9 path= ['13046512', '11590783', '12992464', '07940552', '1313093', '00015388', '1617192', '01629958', '792471', '02429810']
len= 9 path= ['02429810', '792471', '01629958', '1617192', '00015388', '1313093', '07940552', '12992464', '11590783', '13015040']
len= 9 path= ['02429810', '792471', '01629958', '1617192', '00015388', '1313093', '07940552', '12992464', '11590783', '12964130']
len= 9 path= ['02429810', '792471', '01629958', '1617192', '00015388', '1313093', '07940552', '12992464', '11590783', '13063784']
len= 9 path= ['13063046', '13063784', '11590783', '12992464', '07940552', '13

### 6.任选两个词汇节点，分别查询任意最短路径和全部最短路径。

In [10]:
# 两个角色之间的任意最短路径——shortestPath函数
for record in graph.run('''
MATCH (catelyn:Character {name:"Catelyn"}), (drogo:Character {name:"Drogo"})
MATCH p=shortestPath((catelyn)-[INTERACTS*]-(drogo))
RETURN p
'''):
    print(record['p'])

In [11]:
# 两个角色之间的所有最短路径——allShortestPaths函数
for record in graph.run('''
MATCH (catelyn:Character {name:"Catelyn"}), (drogo:Character {name:"Drogo"})
MATCH p=allShortestPaths((catelyn)-[INTERACTS*]-(drogo))
RETURN p
'''):
    print(record['p'])

### 7.查询网络中的关键节点，通过可视化验证。

In [12]:
# 关键节点
# 定义：在网络中，如果一个节点位于其它两个节点所有的最短路径上，即称为关键节点

# collect将所有值收集作为一个列表返回
for record in graph.run('''
MATCH (a:Character), (b:Character) WHERE id(a) > id(b)
MATCH p=allShortestPaths((a)-[:INTERACTS*]-(b)) WITH collect(p) AS paths, a, b  
MATCH (c:Character) WHERE all(x IN paths WHERE c IN nodes(x)) AND NOT c IN [a,b]
RETURN a.name, b.name, c.name AS PivotalNode SKIP 300 LIMIT 30
'''):
    print('a.name:', record['a.name'], 'b.name:', record['b.name'], 'PivotalNode:', record['PivotalNode'])

a.name: 12376740 b.name: 12602262 PivotalNode: 13118707
a.name: 685638 b.name: 698732 PivotalNode: 00697589
a.name: 8774374 b.name: 9151216 PivotalNode: 08524735
a.name: 8774374 b.name: 8899149 PivotalNode: 08524735
a.name: 114837 b.name: 2641378 PivotalNode: 06084469
a.name: 5502556 b.name: 5480794 PivotalNode: 05481095
a.name: 8038379 b.name: 8016385 PivotalNode: 00759694
a.name: 8038379 b.name: 8031386 PivotalNode: 00759694
a.name: 8715110 b.name: 8980300 PivotalNode: 08981244
a.name: 8715110 b.name: 8544813 PivotalNode: 08981244
a.name: 278221 b.name: 717748 PivotalNode: 01374767
a.name: 13518963 b.name: 6128570 PivotalNode: 13555915
a.name: 13518963 b.name: 6128570 PivotalNode: 06125041
a.name: 1886220 b.name: 1936219 PivotalNode: 08103777
a.name: 10201535 b.name: 3318438 PivotalNode: 00839526
a.name: 12642734 b.name: 12602262 PivotalNode: 13118707
a.name: 12642734 b.name: 12376740 PivotalNode: 13118707
a.name: 139729 b.name: 218475 PivotalNode: 00126264
a.name: 12697883 b.name: 1

In [13]:
# 上述结果显示Robert是Jojen和Daenerys的关键节点，意味着Jojen和Daenerys的所有最短路径都经过Robert
# 可视化Jojen和Daenerys所有最短路径，进行验证
for record in graph.run('''
MATCH (jojen:Character {name:"Jojen"}), (daenerys:Character {name:"Daenerys"})
MATCH p=allShortestPaths((jojen)-[:INTERACTS*]-(daenerys))
RETURN p
'''):
    print(record['p'])

### 8.分析词汇节点中心度（度中心性、介数中心性、紧度中心性）。

**节点中心度(Centrality Measures)**

节点中心度给出网络中节点的重要性的相对度量。

有许多不同的方式来度量中心度，每种方式都代表不同类型的“重要性”。
- 度中心性(Degree Centrality)
- 加权度中心性(Weighted Degree Centrality)
- 介数中心性(Betweenness Centrality)
- 紧度中心性(Closeness Centrality)

In [14]:
# 度中心性(Degree Centrality)
# 度中心性是最简单的度量，即为某个节点在网络中的联结数

# 本案例中，某个角色的度中心性是指该角色接触的其他角色数
for record in graph.run('''
MATCH (c:Character)-[:INTERACTS]-()
RETURN c.name AS character, count(*) AS degree 
ORDER BY degree DESC
'''):
    print(record['character'], record['degree'])

11585340 23
12205694 21
08860123 21
00007846 19
11579418 17
11556857 17
8860123 15
8524735 15
8441203 14
08524735 14
11575425 13
7846 13
1507175 13
11567411 12
08199025 12
1864707 11
11911591 11
13104059 11
13112664 11
01507175 11
8199025 10
1342529 10
01342529 10
08441203 10
00126264 10
1759182 9
12501745 9
06845599 9
08766988 8
00759694 8
146138 7
13920835 7
586262 7
759694 7
13518963 7
11590783 7
08392137 7
06084469 7
1862557 6
8574314 6
15113229 6
1762525 6
10467395 6
10630188 6
12838027 6
09044862 6
01759182 6
08103777 6
01864707 6
06295235 6
01762525 6
00015388 6
6084469 5
1312096 5
8392137 5
1831531 5
1432517 5
11744859 5
9641757 5
1657723 5
9977660 5
13121544 5
6845599 5
13526110 5
11592146 5
6295235 5
9765278 5
11545714 5
01432517 5
00586262 5
00243918 5
6128570 4
9411430 4
2958343 4
8633957 4
8103777 4
2327200 4
9044862 4
13167078 4
12226322 4
2566528 4
2553196 4
10677713 4
13100677 4
6037666 4
1429349 4
13460568 4
11554175 4
126264 4
17222 4
471613 4
10197967 4
10453533 4
11

In [15]:
# 加权度中心性(Weighted Degree Centrality)

# 对某个角色的INTERACTS关系所有weight相加得到加权度中心性
for record in graph.run('''
MATCH (c:Character)-[r:INTERACTS]-()
RETURN c.name AS character, sum(r.weight) AS weightedDegree
ORDER BY weightedDegree DESC
'''):
    print(record['character'], record['weightedDegree'])

2174461 0
5074057 0
8390511 0
2045024 0
4758181 0
9419536 0
12165384 0
9384921 0
4881998 0
612652 0
2400139 0
5846932 0
8847694 0
12815925 0
1036194 0
15214840 0
20090 0
7372959 0
11871916 0
12822284 0
2257141 0
5703429 0
9827683 0
13028611 0
8015731 0
264366 0
4188064 0
1862557 0
9929577 0
1652850 0
5922949 0
1949435 0
2118242 0
698732 0
11651259 0
12490671 0
1909397 0
2482425 0
2466670 0
12506784 0
6636806 0
15196746 0
6734467 0
3608870 0
7371293 0
1689226 0
6791372 0
2138766 0
3335600 0
5051896 0
858742 0
7950418 0
1575675 0
7439883 0
5248667 0
1247413 0
877083 0
14828683 0
13150741 0
8910668 0
1864707 0
13716084 0
9319456 0
7261300 0
649887 0
15158816 0
6018465 0
9444100 0
9075842 0
7283608 0
1201089 0
1675963 0
2468261 0
2616251 0
4565375 0
1930874 0
12626030 0
2290340 0
8143163 0
775156 0
189669 0
3574816 0
12532886 0
9151216 0
2147824 0
489837 0
9767197 0
11575425 0
14647235 0
2280845 0
7436986 0
1275389 0
9403734 0
629738 0
155797 0
10561861 0
10170989 0
12489815 0
5486510 0
23

In [16]:
# 删除原有图投影
for record in graph.run('''
CALL gds.graph.drop('myGraph', false) YIELD graphName
'''):
    print(record)

'myGraph'


In [17]:
# 介数中心性(Betweenness Centrality)
# 在网络中，一个节点的介数中心性是指其它两个节点的所有最短路径都经过这个节点，则这些所有最短路径数即为此节点的介数中心性

# 使用Neo4j Graph Data Science(GDS)库
# 首先创建图投影
for record in graph.run('''
CALL gds.graph.project('myGraph', 'Character', 'INTERACTS')
'''):
    print(record)

{'Character': {'label': 'Character', 'properties': {}}}	{'INTERACTS': {'orientation': 'NATURAL', 'aggregation': 'DEFAULT', 'type': 'INTERACTS', 'properties': {}}}	'myGraph'	8428	5000	19


In [18]:
# 调用GDS库的betweenness函数
for record in graph.run('''
CALL gds.betweenness.stream('myGraph') YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC
'''):
    print(record['name'], record['score'])

11585340 208.0
12205694 140.0
11556857 77.0
11579418 70.0
12697883 56.0
11575425 48.0
12694707 45.0
13112664 42.0
11567411 32.0
12684640 32.0
13104059 27.0
11911591 18.0
12521847 17.0
12570126 17.0
12690388 17.0
13518963 15.0
11747468 15.0
13083586 15.0
11590783 15.0
12226932 14.0
12501745 14.0
12476036 12.0
13920835 12.0
11592146 12.0
11556187 12.0
12050766 12.0
12387633 11.0
12226322 10.0
12411084 10.0
11744859 9.0
13167078 9.0
15113229 9.0
11571907 9.0
13206001 8.0
12200747 8.0
12838027 8.0
11545714 8.0
13016749 8.0
11704401 7.0
13121544 6.0
10630188 6.0
11555413 6.0
10177150 6.0
13016457 5.0
13205482 5.0
10467395 5.0
11703386 5.0
12480677 5.0
13178284 5.0
11529603 4.0
13063784 4.0
13467916 4.0
11092292 4.0
10197967 4.0
11562747 4.0
10560637 4.0
12626030 3.0
10677713 3.0
11766609 3.0
11554175 3.0
10488865 3.0
10428004 3.0
10453533 3.0
12612913 3.0
11707109 3.0
14299637 3.0
10088390 3.0
10840769 3.0
10650162 2.0
12930044 2.0
14778019 2.0
13460568 2.0
10890637 2.0
12423565 2.0
1308511

In [19]:
# 紧度中心性(Closeness Centrality)
# 紧度中心性是指到网络中所有其他角色的平均距离的倒数

# 使用Neo4j Graph Data Science(GDS)库
# 使用上述创建过的图投影

# 调用GDS库的closeness函数
for record in graph.run('''
CALL gds.beta.closeness.stream('myGraph') YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC
'''):
    print(record['name'], record['score'])

12626030 1.0
11565385 1.0
11529603 1.0
13166338 1.0
13308999 1.0
13796779 1.0
10719395 1.0
12200747 1.0
13582013 1.0
11911591 1.0
13063784 1.0
13920835 1.0
10650162 1.0
11567411 1.0
10257221 1.0
13104059 1.0
12930044 1.0
13278375 1.0
11579418 1.0
13368052 1.0
13016457 1.0
14285662 1.0
13329641 1.0
11665781 1.0
12226322 1.0
14857897 1.0
13367070 1.0
10140314 1.0
15113229 1.0
14287113 1.0
13205482 1.0
10677713 1.0
10001647 1.0
13440063 1.0
13125117 1.0
12521847 1.0
14778019 1.0
12387633 1.0
13121544 1.0
13460568 1.0
10890637 1.0
13518963 1.0
10201535 1.0
12293723 1.0
11554175 1.0
10407310 1.0
10058155 1.0
10467395 1.0
12423565 1.0
12287388 1.0
15266911 1.0
13085113 1.0
10630188 1.0
10287213 1.0
10488865 1.0
10741821 1.0
14004572 1.0
13129165 1.0
13534608 1.0
11692265 1.0
10409752 1.0
10404242 1.0
12570126 1.0
14034177 1.0
12143676 1.0
14465048 1.0
15094294 1.0
12838027 1.0
12737383 1.0
13192025 1.0
10197967 1.0
12132502 1.0
15101854 1.0
10453533 1.0
13809207 1.0
11562747 1.0
10207831 1.0

## 使用python-igraph

In [20]:
!pip install python-igraph

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple


In [21]:
from igraph import Graph as IGraph

In [22]:
# 从Neo4j构建一个igraph实例
# 传入py2neo查询结果对象到igraph的TupleList构造器，创建igraph实例

query = '''
MATCH (c1:Character)-[r:INTERACTS]->(c2:Character)
RETURN c1.name, c2.name, r.weight AS weight
'''
ig = IGraph.TupleList(graph.run(query), weights=True)
print(ig)

IGRAPH UNW- 8428 5000 --
+ attr: name (v), weight (e)
+ edges (vertex names):
2174461--02176268, 5074057--02310895, 8390511--08198398, 8390511--08199025,
2045024--02046321, 4758181--04757864, 9419536--09411430, 12165384--12163824,
9384921--08853741, 4881998--04881829, 4881998--01299888, 612652--01004072,
2400139--02422860, 2400139--02426339, 2400139--02420389, 5846932--01633173,
8847694--08975106, 12815925--12822650, 1036194--01035853, 15214840--15199033,
20090--14616939, 20090--00021265, 20090--14925198, 7372959--00748282,
11871916--11872473, 12822284--11744859, 2257141--04652930, 5703429--02118476,
9827683--10353016, 13028611--13028337, 8015731--08941895, 264366--00264529,
4188064--08860123, 1862557--01873007, 1862557--02373601, 1862557--02366702,
1862557--02491590, 1862557--01889328, 1862557--02348405, 9929577--06136258,
1652850--01653223, 5922949--00931467, 1949435--01106587, 1949435--01953810,
2118242--05768553, 698732--00697589, 11651259--11656974, 12490671--12487647,
1909397--00

In [23]:
# PageRank——特征向量中心性(Eigenvector Centrality)算法
# 在igraph实例中运行PageRank算法，然后把结果写回Neo4j，在角色节点创建一个pagerank属性存储igraph计算的值

pg = ig.pagerank()
pgvs = []
# ig.vs:图的顶点序列
for p in zip(ig.vs, pg):
    print(p)
    pgvs.append({'name':p[0]['name'], 'pg':p[1]})
print(pgvs)

(igraph.Vertex(<igraph.Graph object at 0xfffed52d8640>, 0, {'name': '2174461'}), 0.00011865211200764532)
(igraph.Vertex(<igraph.Graph object at 0xfffed52d8640>, 1, {'name': '02176268'}), 0.00011865211200755468)
(igraph.Vertex(<igraph.Graph object at 0xfffed52d8640>, 2, {'name': '5074057'}), 0.00011865211200764532)
(igraph.Vertex(<igraph.Graph object at 0xfffed52d8640>, 3, {'name': '02310895'}), 0.00011865211200755468)
(igraph.Vertex(<igraph.Graph object at 0xfffed52d8640>, 4, {'name': '8390511'}), 0.00013020732953936592)
(igraph.Vertex(<igraph.Graph object at 0xfffed52d8640>, 5, {'name': '08198398'}), 7.313593185537108e-05)
(igraph.Vertex(<igraph.Graph object at 0xfffed52d8640>, 6, {'name': '08199025'}), 0.0007093266446269644)
(igraph.Vertex(<igraph.Graph object at 0xfffed52d8640>, 7, {'name': '2045024'}), 0.00011865211200764532)
(igraph.Vertex(<igraph.Graph object at 0xfffed52d8640>, 8, {'name': '02046321'}), 0.00011865211200755468)
(igraph.Vertex(<igraph.Graph object at 0xfffed52d864

In [24]:
# UNWIND进行列表遍历
write_clusters_query = '''
UNWIND $nodes AS n
MATCH (c:Character) WHERE c.name = n.name
SET c.pagerank = n.pg
'''
graph.run(write_clusters_query, nodes=pgvs)

In [25]:
# 在Neo4j图中查询最高PageRank值的节点
for record in graph.run('''
MATCH (n:Character)
RETURN n.name AS name, n.pagerank AS pagerank
ORDER BY pagerank DESC LIMIT 30
'''):
    print(record['name'], record['pagerank'])

11585340 0.0012619258875393414
08860123 0.001201053632151163
12205694 0.0011864066502595017
00007846 0.0010661464058013322
11579418 0.0009826455009177347
11556857 0.0009826455009177347
8524735 0.0008818738054616463
8860123 0.0008581045945945451
8441203 0.0008273579702149319
08524735 0.0008053797200665181
1507175 0.0007728421349682173
7846 0.0007371731619622118
11567411 0.0007183262997215029
08199025 0.0007093266446269644
11575425 0.0007017870401409824
1864707 0.0006638104644747885
11911591 0.0006545861268696379
01507175 0.0006545861268696301
13112664 0.0006396347103608002
1342529 0.0006092946292280744
8199025 0.0006092946292280744
08441203 0.000599803077990492
13104059 0.0005912167760622011
00126264 0.0005800632556140185
01342529 0.0005775554637477789
06845599 0.0005547787939813498
1759182 0.0005362453054679728
12501745 0.0005232593217925074
00759694 0.0005002629587346342
08766988 0.0004900498024854673


In [26]:
# 社区发现算法——用来找出图中的社区聚类
# 使用igraph实现的随机游走算法(walktrap)来找到在社区中频繁有接触的角色社区，在社区之外角色不怎么接触
# 然后把社区发现的结果导入Neo4j，其中每个角色所属的社区用一个整数来表示

clusters = IGraph.community_walktrap(ig).as_clustering()
nodes = [{'name': node['name']} for node in ig.vs]
for node in nodes:
    idx = ig.vs.find(name=node['name']).index
    node['community'] = clusters.membership[idx]
print(nodes)

[{'name': '2174461', 'community': 0}, {'name': '02176268', 'community': 0}, {'name': '5074057', 'community': 1}, {'name': '02310895', 'community': 1}, {'name': '8390511', 'community': 2}, {'name': '08198398', 'community': 2}, {'name': '08199025', 'community': 2}, {'name': '2045024', 'community': 3}, {'name': '02046321', 'community': 3}, {'name': '4758181', 'community': 4}, {'name': '04757864', 'community': 4}, {'name': '9419536', 'community': 5}, {'name': '09411430', 'community': 5}, {'name': '12165384', 'community': 6}, {'name': '12163824', 'community': 6}, {'name': '9384921', 'community': 7}, {'name': '08853741', 'community': 7}, {'name': '4881998', 'community': 8}, {'name': '04881829', 'community': 8}, {'name': '01299888', 'community': 8}, {'name': '612652', 'community': 9}, {'name': '01004072', 'community': 9}, {'name': '2400139', 'community': 10}, {'name': '02422860', 'community': 10}, {'name': '02426339', 'community': 10}, {'name': '02420389', 'community': 10}, {'name': '5846932'

In [27]:
write_clusters_query = '''
UNWIND $nodes AS n
MATCH (c:Character) WHERE c.name = n.name
SET c.community = toInteger(n.community)
'''
graph.run(write_clusters_query, nodes=nodes)

In [28]:
# 在Neo4j中查询有多少个社区以及每个社区的成员数
for record in graph.run('''
MATCH (c:Character)
WITH c.community AS cluster, collect(c.name) AS members
RETURN cluster, members
ORDER BY cluster ASC
'''):
    print('cluster:', record['cluster'], 'members:', record['members'])

cluster: 0 members: ['2174461', '02176268']
cluster: 1 members: ['5074057', '02310895']
cluster: 2 members: ['8390511', '971463', '7339808', '7453063', '3550420', '8688779', '2937336', '291004', '1028381', '10317500', '10226060', '5035264', '08199025', '08198398']
cluster: 3 members: ['2045024', '02046321']
cluster: 4 members: ['4758181', '04757864']
cluster: 5 members: ['9419536', '9244972', '9246660', '9341145', '09411430', '09140148']
cluster: 6 members: ['12165384', '12163824']
cluster: 7 members: ['9384921', '8856266', '08853741']
cluster: 8 members: ['4881998', '01299888', '04881829']
cluster: 9 members: ['612652', '01004072']
cluster: 10 members: ['2400139', '02420389', '02422860', '02426339']
cluster: 11 members: ['5846932', '01633173']
cluster: 12 members: ['8847694', '08975106']
cluster: 13 members: ['12815925', '12822650']
cluster: 14 members: ['1036194', '01035853']
cluster: 15 members: ['15214840', '15199033']
cluster: 16 members: ['20090', '14925198', '00021265', '1461693