# Community Detection and Relationship Analysis for Glossary Terms Using the Leiden Algorithm

Description:

This notebook demonstrates how to use the Leiden algorithm to detect communities within a glossary of legal terms and analyze relationships between them. The glossary terms are grouped into distinct communities based on their interconnections, providing insights into how terms are related and clustered around specific themes or legal processes.

The notebook is structured as follows:

- Leiden Community Detection: We apply the Leiden algorithm to the glossary terms to uncover communities of closely related terms. Each term is assigned to a community, helping users understand which terms are more related to each other based on their relationships.

- Exploration of Communities: Users can explore which terms belong to the same community, compare different communities, and identify key terms within each group. This helps in clustering glossary terms that share similar legal contexts or processes.

- Relationship Queries: We provide pre-built Cypher queries to explore direct relationships between terms, such as finding related terms, counting relationships, and identifying central terms in a community.

- Graph Algorithms: Additional graph algorithms, such as PageRank, are used to highlight the most important terms based on their connections within the graph.

Benefits:

- Thematic Clustering: The Leiden algorithm reveals which glossary terms naturally group together, helping users better understand legal topics and their relationships.
  - Graph Exploration: The notebook provides tools to navigate the graph of glossary terms, discover related terms, and explore the structure of legal concepts.
  - Community Detection: By detecting communities, the user can identify clusters of terms, which may represent stages in legal processes or thematic groupings of legal terms.

This notebook is a powerful tool for analyzing glossary terms, making it easier to understand their relationships, navigate complex legal concepts, and identify important terms.

In [11]:
import os
from langchain_community.graphs import Neo4jGraph
from langchain_community.vectorstores import Neo4jVector

In [48]:
NEO4J_URI = "bolt://localhost:7687"

In [49]:
NEO4J_URI = 'bolt://' + os.getenv('NEO4J_HOST') + ':7687'
NEO4J_USERNAME = os.getenv('NEO4J_USER')
NEO4J_PASSWORD = os.getenv('NEO4J_PASSWORD')
NEO4J_DATABASE = 'neo4j' #os.getenv('NEO4J_DB')
print(NEO4J_URI)
print(NEO4J_DATABASE)

bolt://neo4j:7687
neo4j


In [50]:
kg = Neo4jGraph(
    url=NEO4J_URI, username=NEO4J_USERNAME, password=NEO4J_PASSWORD, database=NEO4J_DATABASE
)

In [51]:
cypher_get_glossary = """
MATCH (n:UpdatedChunk) where n.type='glossary' return n.glossaryTerm as  glossaryTerm, n.text as Text
"""

In [37]:
glossaries = kg.query(cypher_get_glossary)

In [38]:
print(len(glossary))

67


In [39]:
for glossary in glossaries:
    print(glossary['glossaryTerm'])

Act
Adjournment
Amendment
Amended Bill
Amending Bill
Amending Regulation
Annual Bound Statutes (Buckram Bound, hardcover)
Bill
Bill Stages
Cabinet
Chapter Numbers
Citation
Coming into force
Commencement Section
Committee
Committee of the Whole House
Consequential Amendment
Consolidated Provisions in Force
Consolidated Regulations of British Columbia
Consolidated Statutes of British Columbia ('the Looseleaf')
Consolidation Period
Corporate Registry
Current Unofficial Consolidation
Dissolution
Enact
Executive Council
First Reading
First Reading Bill
Gazette
Hansard
Historical Note
Historical Supreme Court Rules
House
Legislative Assembly
Legislature
Lieutenant Governor
Lieutenant Governor in Council
Looseleaf
Minister of Finance full-text Directives and Resumes
MLA
Motion
Official Version
Order, Order in Council, Ministerial Order
Parliament
Point in Time (PIT)
Proclamation
Prorogation
Provision
Provisions in Force
Regulation
Regulations Bulletins
Repealed
Report Bill
Revision
Royal Asse

In [88]:
check_gds = """
RETURN gds.version();
"""

In [89]:
kg.query(check_gds)

[{'gds.version()': '2.9.0'}]

## Proejct the graph into memory

In [94]:
drop_projected_graph = """
CALL gds.graph.drop('myGraph')
"""

In [95]:
kg.query(drop_projected_graph)



[{'graphName': 'myGraph',
  'database': 'neo4j',
  'databaseLocation': 'local',
  'memoryUsage': '',
  'sizeInBytes': -1,
  'nodeCount': 81611,
  'relationshipCount': 134,
  'configuration': {'relationshipProjection': {'RELATED_TERMS': {'aggregation': 'DEFAULT',
     'orientation': 'UNDIRECTED',
     'indexInverse': False,
     'properties': {},
     'type': 'RELATED_TERMS'}},
   'readConcurrency': 4,
   'relationshipProperties': {},
   'nodeProperties': {},
   'jobId': '2ba0ff4f-1cdc-427f-b5b0-23fce48b7a99',
   'nodeProjection': {'UpdatedChunk': {'label': 'UpdatedChunk',
     'properties': {}}},
   'logProgress': True,
   'creationTime': neo4j.time.DateTime(2024, 10, 22, 15, 27, 18, 611556457, tzinfo=<UTC>),
   'validateRelationships': False,
   'sudo': False},
  'density': 2.0119293265501548e-08,
  'creationTime': neo4j.time.DateTime(2024, 10, 22, 15, 27, 18, 611556457, tzinfo=<UTC>),
  'modificationTime': neo4j.time.DateTime(2024, 10, 22, 15, 27, 18, 635131836, tzinfo=<UTC>),
  'sch

In [96]:
myGraph = """
MATCH (source:UpdatedChunk)-[r:RELATED_TERMS]->(target:UpdatedChunk)
RETURN gds.graph.project(
  'myGraph',
  source,
  target,
  {

  },
  { undirectedRelationshipTypes: ['*'] }
)
  """

In [97]:
kg.query(myGraph)

[{"gds.graph.project(\n  'myGraph',\n  source,\n  target,\n  {\n\n  },\n  { undirectedRelationshipTypes: ['*'] }\n)": {'relationshipCount': 134,
   'graphName': 'myGraph',
   'query': "\nMATCH (source:UpdatedChunk)-[r:RELATED_TERMS]->(target:UpdatedChunk)\nRETURN gds.graph.project(\n  'myGraph',\n  source,\n  target,\n  {\n\n  },\n  { undirectedRelationshipTypes: ['*'] }\n)\n  ",
   'projectMillis': 6,
   'configuration': {'readConcurrency': 4,
    'undirectedRelationshipTypes': ['*'],
    'jobId': 'd5b930bd-7e0f-4ae7-82ef-c80124b51f37',
    'logProgress': True,
    'query': "\nMATCH (source:UpdatedChunk)-[r:RELATED_TERMS]->(target:UpdatedChunk)\nRETURN gds.graph.project(\n  'myGraph',\n  source,\n  target,\n  {\n\n  },\n  { undirectedRelationshipTypes: ['*'] }\n)\n  ",
    'inverseIndexedRelationshipTypes': [],
    'creationTime': neo4j.time.DateTime(2024, 10, 22, 15, 27, 48, 715904403, tzinfo=<UTC>)},
   'nodeCount': 42}}]

In [98]:
graph_memory_estimate = """
CALL gds.leiden.write.estimate('myGraph', {writeProperty: 'communityId', randomSeed: 19})
YIELD nodeCount, relationshipCount, requiredMemory
"""

In [99]:
kg.query(graph_memory_estimate)

[{'nodeCount': 42,
  'relationshipCount': 134,
  'requiredMemory': '[559 KiB ... 560 KiB]'}]

### ### Show the commnuity after the aglorithim has completed

In [157]:
stream_leiden = """
CALL gds.leiden.stream('myGraph')
YIELD nodeId, communityId
RETURN gds.util.asNode(nodeId).glossaryTerm AS name, communityId
ORDER BY name ASC
"""

In [158]:
kg.query(stream_leiden)

[{'name': 'Act', 'communityId': 11},
 {'name': 'Adjournment', 'communityId': 12},
 {'name': 'Amended Bill', 'communityId': 10},
 {'name': 'Amending Bill', 'communityId': 11},
 {'name': 'Amending Regulation', 'communityId': 11},
 {'name': 'Amendment', 'communityId': 4},
 {'name': 'Annual Bound Statutes (Buckram Bound, hardcover)',
  'communityId': 2},
 {'name': 'Bill', 'communityId': 6},
 {'name': 'Bill Stages', 'communityId': 6},
 {'name': 'Cabinet', 'communityId': 8},
 {'name': 'Chapter Numbers', 'communityId': 11},
 {'name': 'Coming into force', 'communityId': 5},
 {'name': 'Commencement Section', 'communityId': 5},
 {'name': 'Committee', 'communityId': 6},
 {'name': 'Committee of the Whole House', 'communityId': 6},
 {'name': 'Consequential Amendment', 'communityId': 4},
 {'name': 'Consolidated Provisions in Force', 'communityId': 9},
 {'name': 'Consolidated Regulations of British Columbia', 'communityId': 2},
 {'name': "Consolidated Statutes of British Columbia ('the Looseleaf')",


In [161]:
test = """
CALL gds.pageRank.stream('myGraph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).glossaryTerm AS name, score
ORDER BY score DESC
LIMIT 10
"""
kg.query(test)

[{'name': 'Official Version', 'score': 2.569884481094245},
 {'name': 'Bill Stages', 'score': 1.7573752355218164},
 {'name': 'Act', 'score': 1.6551578717726698},
 {'name': 'Consolidated Regulations of British Columbia',
  'score': 1.5920024447829655},
 {'name': 'Bill', 'score': 1.515237809744991},
 {'name': 'Royal Assent', 'score': 1.402891495173952},
 {'name': 'First Reading', 'score': 1.2780644314435003},
 {'name': 'Second Reading', 'score': 1.2780644314435003},
 {'name': 'Third Reading', 'score': 1.2780644314435003},
 {'name': 'Report Bill', 'score': 1.1804109542486736}]

In [162]:
test = """
CALL gds.leiden.stats('myGraph', { randomSeed: 19 })
YIELD communityCount
"""
kg.query(test)

[{'communityCount': 11}]

## Intermediate Communities

In [163]:
test = """
CALL gds.leiden.stream('myGraph', {
  randomSeed: 23,
  includeIntermediateCommunities: true,
  concurrency: 1
})
YIELD nodeId, communityId, intermediateCommunityIds
RETURN gds.util.asNode(nodeId).glossaryTerm AS name, communityId, intermediateCommunityIds
ORDER BY name ASC
"""
kg.query(test)

[{'name': 'Act', 'communityId': 12, 'intermediateCommunityIds': [41, 12]},
 {'name': 'Adjournment',
  'communityId': 13,
  'intermediateCommunityIds': [4, 13]},
 {'name': 'Amended Bill',
  'communityId': 8,
  'intermediateCommunityIds': [32, 8]},
 {'name': 'Amending Bill',
  'communityId': 12,
  'intermediateCommunityIds': [41, 12]},
 {'name': 'Amending Regulation',
  'communityId': 12,
  'intermediateCommunityIds': [9, 12]},
 {'name': 'Amendment', 'communityId': 2, 'intermediateCommunityIds': [25, 2]},
 {'name': 'Annual Bound Statutes (Buckram Bound, hardcover)',
  'communityId': 4,
  'intermediateCommunityIds': [11, 4]},
 {'name': 'Bill', 'communityId': 10, 'intermediateCommunityIds': [12, 10]},
 {'name': 'Bill Stages',
  'communityId': 10,
  'intermediateCommunityIds': [12, 10]},
 {'name': 'Cabinet', 'communityId': 11, 'intermediateCommunityIds': [17, 11]},
 {'name': 'Chapter Numbers',
  'communityId': 12,
  'intermediateCommunityIds': [41, 12]},
 {'name': 'Coming into force',
  'co

## Write the data back to the original node

In [128]:
write_community = """
CALL gds.leiden.write('myGraph', {
  writeProperty: 'intermediateCommunities',
  randomSeed: 19,
  includeIntermediateCommunities: true,
  concurrency: 1
})
YIELD communityCount, modularity, modularities
"""

In [129]:
kg.query(write_community)

[{'communityCount': 11,
  'modularity': 0.715860993539764,
  'modularities': [0.6697482735575851, 0.715860993539764]}]

## Now lets try to understnad what kind of questions this graph can help answer

### Querying the nodes and their community ID's from teh saved Node in NEO4J

In [164]:
query = """
MATCH (n:UpdatedChunk)
where n.communityId is not NULL
RETURN n.glossaryTerm AS Term, n.communityId
ORDER BY n.communityId
"""

In [165]:
kg.query(query)

[{'Term': 'Annual Bound Statutes (Buckram Bound, hardcover)',
  'n.communityId': 1},
 {'Term': 'Consolidated Regulations of British Columbia', 'n.communityId': 1},
 {'Term': "Consolidated Statutes of British Columbia ('the Looseleaf')",
  'n.communityId': 1},
 {'Term': 'Current Unofficial Consolidation', 'n.communityId': 1},
 {'Term': 'Gazette', 'n.communityId': 1},
 {'Term': 'Historical Note', 'n.communityId': 1},
 {'Term': 'Looseleaf', 'n.communityId': 1},
 {'Term': 'Official Version', 'n.communityId': 1},
 {'Term': 'Amendment', 'n.communityId': 3},
 {'Term': 'Consequential Amendment', 'n.communityId': 3},
 {'Term': 'Coming into force', 'n.communityId': 4},
 {'Term': 'Commencement Section', 'n.communityId': 4},
 {'Term': 'Royal Assent', 'n.communityId': 4},
 {'Term': 'Revision', 'n.communityId': 5},
 {'Term': 'R.S.B.C.', 'n.communityId': 5},
 {'Term': 'Parliament', 'n.communityId': 6},
 {'Term': 'Session', 'n.communityId': 6},
 {'Term': 'Sitting', 'n.communityId': 6},
 {'Term': 'Cons

### Find All Nodes in a Specific Community:

In [166]:
query = """
MATCH (n:UpdatedChunk)
WHERE n.communityId = 10
RETURN n.glossaryTerm AS Term
"""

In [167]:
kg.query(query)

[{'Term': 'Adjournment'}, {'Term': 'Dissolution'}, {'Term': 'Prorogation'}]

### What communities exist and what terms are in each

In [None]:
query = """
MATCH (n:UpdatedChunk)
where n.communityId is not null
RETURN n.communityId AS communityId, COLLECT(n.glossaryTerm) AS terms
"""

In [177]:
kg.query(query)

[{'communityId': 12,
  'terms': ['Act',
   'Amending Bill',
   'Amending Regulation',
   'Chapter Numbers',
   'Regulation',
   'Statute']},
 {'communityId': 10, 'terms': ['Adjournment', 'Dissolution', 'Prorogation']},
 {'communityId': 3, 'terms': ['Amendment', 'Consequential Amendment']},
 {'communityId': 9,
  'terms': ['Amended Bill',
   'First Reading Bill',
   'Report Bill',
   'Third Reading Bill']},
 {'communityId': 1,
  'terms': ['Annual Bound Statutes (Buckram Bound, hardcover)',
   'Consolidated Regulations of British Columbia',
   "Consolidated Statutes of British Columbia ('the Looseleaf')",
   'Current Unofficial Consolidation',
   'Gazette',
   'Historical Note',
   'Looseleaf',
   'Official Version']},
 {'communityId': 13,
  'terms': ['Bill',
   'Bill Stages',
   'Committee',
   'Committee of the Whole House',
   'First Reading',
   'Second Reading',
   'Third Reading']},
 {'communityId': 14, 'terms': ['Cabinet', 'Executive Council']},
 {'communityId': 4,
  'terms': ['Com

### Which glossary term is the central node in its community?"
If you want to find a "central" node in a community (using centrality or relationships count):

In [178]:
query = """
MATCH (n:UpdatedChunk)-[r:RELATED_TERMS]->()
where n.communityId is NOT NULL
WITH n, COUNT(r) AS degree
RETURN n.glossaryTerm, n.communityId, degree
ORDER BY degree DESC
"""

In [179]:
kg.query(query)

[{'n.glossaryTerm': 'Bill', 'n.communityId': 13, 'degree': 4},
 {'n.glossaryTerm': 'Bill Stages', 'n.communityId': 13, 'degree': 4},
 {'n.glossaryTerm': 'First Reading', 'n.communityId': 13, 'degree': 4},
 {'n.glossaryTerm': 'Second Reading', 'n.communityId': 13, 'degree': 4},
 {'n.glossaryTerm': 'Third Reading', 'n.communityId': 13, 'degree': 4},
 {'n.glossaryTerm': 'Consolidated Regulations of British Columbia',
  'n.communityId': 1,
  'degree': 3},
 {'n.glossaryTerm': 'First Reading Bill', 'n.communityId': 9, 'degree': 3},
 {'n.glossaryTerm': 'Report Bill', 'n.communityId': 9, 'degree': 3},
 {'n.glossaryTerm': 'Third Reading Bill', 'n.communityId': 9, 'degree': 3},
 {'n.glossaryTerm': 'Act', 'n.communityId': 12, 'degree': 2},
 {'n.glossaryTerm': 'Adjournment', 'n.communityId': 10, 'degree': 2},
 {'n.glossaryTerm': 'Amending Bill', 'n.communityId': 12, 'degree': 2},
 {'n.glossaryTerm': 'Amending Regulation', 'n.communityId': 12, 'degree': 2},
 {'n.glossaryTerm': 'Chapter Numbers', 'n

### "How many terms reference 'Bill'?"
This query counts the number of terms related to "Bill":

In [180]:
query = """
MATCH (n:UpdatedChunk {glossaryTerm: 'Bill'})-[:RELATED_TERMS]->(related)
RETURN COUNT(related)
"""

In [181]:
kg.query(query)

[{'COUNT(related)': 4}]

## Finding relationships
"Which terms reference both 'Second Reading' and 'Bill'?"

To find terms that are related to both "Amendment" and "Bill":

In [184]:
query = """
MATCH (n:UpdatedChunk)-[:RELATED_TERMS]->(a:UpdatedChunk {glossaryTerm: 'Second Reading'}),
      (n)-[:RELATED_TERMS]->(b:UpdatedChunk {glossaryTerm: 'Bill'})
RETURN n.glossaryTerm
"""

In [185]:
kg.query(query) 

[{'n.glossaryTerm': 'Bill Stages'}]

## Visulaizing the graph that has all related terms (nothing to do with this community detection)
- Run this on the NEO4J browser

In [155]:
query = """
MATCH (n:UpdatedChunk)-[r:RELATED_TERMS]->(m:UpdatedChunk)
RETURN n, r, m
"""

In [None]:
kg.query(query)

## Summarizing Communities and Terms:

"How many terms are there in each community?"

In [191]:
query = """
MATCH (n:UpdatedChunk)
where n.communityId is not null
RETURN n.communityId AS communityId, COUNT(n) AS term_count
"""

In [192]:
kg.query(query)

[{'communityId': 12, 'term_count': 6},
 {'communityId': 10, 'term_count': 3},
 {'communityId': 3, 'term_count': 2},
 {'communityId': 9, 'term_count': 4},
 {'communityId': 1, 'term_count': 8},
 {'communityId': 13, 'term_count': 7},
 {'communityId': 14, 'term_count': 2},
 {'communityId': 4, 'term_count': 3},
 {'communityId': 8, 'term_count': 2},
 {'communityId': 6, 'term_count': 3},
 {'communityId': 5, 'term_count': 2}]

Which community has the most terms?"

In [194]:
query = """
MATCH (n:UpdatedChunk)
where n.communityId is not NULL
RETURN n.communityId AS communityId, COUNT(n) AS term_count
ORDER BY term_count DESC
LIMIT 1
"""

In [195]:
kg.query(query)

[{'communityId': 1, 'term_count': 8}]

"What is the largest community by number of terms?"

In [196]:
query = """
MATCH (n:UpdatedChunk)
where n.communityId is not NULL
RETURN n.communityId AS communityId, COUNT(n) AS community_size
ORDER BY community_size DESC
LIMIT 1
"""

In [197]:
kg.query(query)

[{'communityId': 1, 'community_size': 8}]