<a href="https://colab.research.google.com/github/NathVM/GA/blob/main/Neo4JGraph.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Measuring performance of Graph Analytics Algorithms using Neo4j graphs

---



Imports:

---



In [1]:
!pip install py2neo
!pip install neo4j
!pip install graphdatascience

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting py2neo
  Downloading py2neo-2021.2.3-py2.py3-none-any.whl (177 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m177.0/177.0 KB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
Collecting monotonic
  Downloading monotonic-1.6-py2.py3-none-any.whl (8.2 kB)
Collecting pansi>=2020.7.3
  Downloading pansi-2020.7.3-py2.py3-none-any.whl (10 kB)
Collecting interchange~=2021.0.4
  Downloading interchange-2021.0.4-py2.py3-none-any.whl (28 kB)
Installing collected packages: monotonic, pansi, interchange, py2neo
Successfully installed interchange-2021.0.4 monotonic-1.6 pansi-2020.7.3 py2neo-2021.2.3
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting neo4j
  Downloading neo4j-5.7.0.tar.gz (176 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m176.3/176.3 KB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[?2

In [5]:
!neo4j --version

/bin/bash: neo4j: command not found


In [6]:
import pandas as pd
from py2neo import Graph, Node, Relationship
from neo4j import GraphDatabase
from google.colab import drive
from graphdatascience import GraphDataScience

In [7]:
!python --version

Python 3.9.16


Setup:

---



In [3]:
drive.mount('/content/drive')
!cp -r /content/drive/MyDrive/Dataset/share/GA/nj/ /content/
!sed -i '/#dbms.security.auth_enabled/s/^#//g' nj/conf/neo4j.conf
!chmod -R 777 nj
!nj/bin/neo4j start

Mounted at /content/drive
Directories in use:
home:         /content/nj
config:       /content/nj/conf
logs:         /content/nj/logs
plugins:      /content/nj/plugins
import:       /content/nj/import
data:         /content/nj/data
certificates: /content/nj/certificates
licenses:     /content/nj/licenses
run:          /content/nj/run
Starting Neo4j.
Started neo4j (pid:1444). It is available at http://localhost:7474
There may be a short delay until the server is ready.


Neo4j connection:

---



In [8]:
graph = Graph("bolt://localhost:7687")
driver = GraphDatabase.driver("bolt://localhost:7687")

Dataset: 

https://networkrepository.com/TWITTER-Real-Graph-Partial.php

Shared in the google drive 

In [26]:
# Please comment the below line to execute the cell
# Loadding dataset only needed for graph creation
%%script echo skipping
# Map the shared folder 
# https://drive.google.com/drive/folders/113gZK1io1MZGogAULYoBdrlEUHyJcxRh?usp=sharing 
# to your google drive and modify the file path accordingly
file = "/content/drive/MyDrive/Dataset/share/GA/TWITTER-Real-Graph-Partial.edges"
df = pd.read_csv(file)
df.rename(columns = {'1':'source', '2':'target'}, inplace = True)
print(df.head(5))
dft = df

   source  target
0       2       1
1       3       4
2       4       3
3       3       2
4       2       3


Create Graph :

In [27]:
# Please comment the below line to execute the cell
# Loadding dataset only needed for graph creation
# DB is loaded directly from drive for execution so no to run this code
%%script echo skipping
query = """
WITH $rows AS rows
UNWIND rows AS row
MERGE (source:Node {id: row.source})
MERGE (target:Node {id: row.target})
MERGE (source)-[:CONNECTS_TO]-(target)
"""

# set batch size and index properties
batch_size = 1000
index_properties = ['id']

# create indexes on node properties
with driver.session() as session:
    for property_name in index_properties:
        session.run(f"CREATE INDEX ON :Node({property_name})")

# execute the query in batch transactions
with driver.session() as session:
    for i in range(0, len(dft), batch_size):
        batch = dft[i:i+batch_size].to_dict('records')
        session.run(query, rows=batch)

Path Analytics: 

---



In [28]:
query = """
MATCH (source:Node {id: 357908})
MATCH (destination:Node)
WHERE source <> destination
MATCH path = allshortestPaths((source)-[:CONNECTS_TO*]-(destination))
WITH source, destination, reduce(distance = 0, r in relationships(path) | distance + 1) AS distance, nodes(path) AS nodes
RETURN source.id, destination.id, distance, nodes, COLLECT( DISTINCT nodes)
"""
result = graph.run(query)

for record in result:
  nodes = record["nodes"]
  print ([node["id"] for node in nodes])

[357908, 357909]
[357908, 357911]
[357908, 357910]


In [17]:
query = """
MATCH (n) 
LIMIT 100
WITH DISTINCT collect(n) AS nodes
UNWIND nodes AS source 
UNWIND nodes AS target 
WITH source, target
WHERE source <> target
MATCH path = shortestPath((source)-[:CONNECTED*]-(target))
RETURN source, target, length(path) AS shortest_path_length
""" 
result = graph.run(query)
print(result)
for record in result:
  nodes = record["nodes"]
  print ([node["id"] for node in nodes])

ClientError: ignored

In [15]:
query = """
RETURN apoc.version() AS output;
""" 

result = graph.run(query)
print(result)

 

 output   
----------
 4.4.0.13 



Centrality Analytics :

---



Community Analytics :

---

