# Neo4j Graph Data Science Starter Kit
This notebook acts as a simple starter kit for using the Neo4j GDS library from Python.
It contains code fragments to do the following:
1. Set up a connection to Neo4j and read/write data.
3. Create graph projections to run your algorithm on.
4. Run algorithms and stream back results to Neo4j.

This example uses the Game of Thrones dataset as present in the Neo4j graph data science sandbox. You can get your own for free here:
https://sandbox.neo4j.com/login?usecase=graph-data-science

If you want to import this dataset into your local instance, please clone this repository.
https://github.com/Derek8848/python-gds-examples

The dataset is based on neo4j 4.3 or above

## 1. Setting up the Neo4j Driver
Enter your own Neo4j credentials here:

In [1]:
url = "bolt://192.168.2.30:7687/got"
user = "neo4j"
password = "neo" 
dbname = 'got'

In [2]:
from neo4j import GraphDatabase
driver = GraphDatabase.driver(url, auth=(user, password))
neo4j = driver.session(database=dbname)

### Example - Reading Neo4j results using the driver

In [10]:
import pandas as pd

result = neo4j.run('MATCH (n:Person) RETURN n.name AS name, n.age as age LIMIT 10')
df = pd.DataFrame(result.data())
print(df)

                    name   age
0    Gunthor son of Gurn   NaN
1  High Septon (fat_one)   NaN
2        Jaime Lannister  39.0
3         Gregor Clegane  35.0
4            Andros Brax   NaN
5           Roose Bolton  45.0
6         Wylis Manderly  53.0
7          Medger Cerwyn   NaN
8       Harrion Karstark   NaN
9         Halys Hornwood   NaN


## 2. Creating a graph projection
As an example, we want to analyze which people are most influential using the PageRank algorithm.

First, create a graph projection `interactions` that contains only the pattern we are interested in: `(:Person)-[:INTERACTS]->(:Person)`. 

Then, go through the following steps:
- Check if we have enough memory to generate it.
- Check if the graph projection already exists, if so, delete it.
- Create the graph projection.


### Estimating the required size of the projection

In [14]:
# Run the Cypher query
result = neo4j.run("""
CALL gds.graph.create.cypher.estimate(
    'MATCH (p) WHERE p:Person RETURN id(p) as id',
    'MATCH (p)-[:INTERACTS]->(p2:Person) RETURN id(p) AS source, id(p2) AS target')
""")

# Print the results
row = result.single()
print("Estimated:", row['nodeCount'], "nodes,", row['relationshipCount'], "relationships,", row['requiredMemory']," memory required.")

Estimated: 2166 nodes, 3907 relationships, 282 KiB  memory required.


### Clear existing in-memory graphs (if they exist)


In [18]:
import pprint 

# This query drops the projected graph if it already exists, else it returns 'None'.
result = neo4j.run("""
CALL gds.graph.exists($name) YIELD exists
WHERE exists
CALL gds.graph.drop($name) YIELD graphName
RETURN graphName + " was dropped." as message
""", name = 'interactions-all-books')

# Print the results
pprint.pprint(result.data())

[{'message': 'interactions-all-books was dropped.'}]


### Creating the new graph projection

In [19]:
# Create a weighted Cypher projection graph of (Person)-[:INTERACTS]->(:Person)
result = neo4j.run("""
CALL gds.graph.create.cypher(
    'interactions-all-books',
    'MATCH (p) WHERE p:Person RETURN id(p) as id',
    'MATCH (p)-[i:INTERACTS]->(p2:Person) RETURN id(p) AS source, i.weight as weight, id(p2) AS target')
""")
#result = neo4j.run("""
#CALL gds.graph.create('interactions-all-books', 'Person', {
#  INTERACTS: {
#    orientation: 'UNDIRECTED'
 # }
#})
#""")

# Print the results
row = result.single()
print(row['graphName'],"-", row['nodeCount'], "nodes,", row['relationshipCount'], "relationships,", row['createMillis']," ms to create the projection.")


interactions-all-books - 2166 nodes, 3907 relationships, 32  ms to create the projection.


## 3. Running graph algorithms
Now that we have our graph project, we're ready to run the algorithm!

As always, best practice is to first check if we have enough memory for running the algorithm.

In [20]:
result = neo4j.run("""
CALL gds.pageRank.stream.estimate('interactions-all-books',  { relationshipWeightProperty: 'weight' })
""")

print(result.single()['requiredMemory'], ' memory required to run the algorithm.')

51 KiB  memory required to run the algorithm.


### Run the algorithm (stream mode)
First, use 'stream' mode to inspect the results:

In [21]:
result = neo4j.run("""
CALL gds.pageRank.stream('interactions-all-books', { relationshipWeightProperty: 'weight'}) 
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name as character, score 
ORDER BY score DESC
""")

df = pd.DataFrame(result.data())
print(df)

              character      score
0      Tyrion Lannister  16.006419
1       Tywin Lannister   8.975132
2                 Varys   8.487425
3     Stannis Baratheon   8.152840
4         Theon Greyjoy   5.456873
...                 ...        ...
2161       Ryger Rivers   0.150000
2162        Rupert Brax   0.150000
2163  Rymolf Stormdrunk   0.150000
2164      Ryon Allyrion   0.150000
2165       Sarella Sand   0.150000

[2166 rows x 2 columns]


### Run the algorithm (write mode)
Then, use 'write' mode to write the results back to the Neo4j database.

In [22]:
result = neo4j.run("CALL gds.pageRank.write('interactions-all-books', { writeProperty: 'pagerank-all-books', relationshipWeightProperty: 'weight' })")

pprint.pprint(result.data())

[{'centralityDistribution': {'max': 16.006468772888184,
                             'mean': 0.22740707106867655,
                             'min': 0.14999961853027344,
                             'p50': 0.14999961853027344,
                             'p75': 0.15213584899902344,
                             'p90': 0.2666940689086914,
                             'p95': 0.42917728424072266,
                             'p99': 1.382765769958496,
                             'p999': 8.487425804138184},
  'computeMillis': 33,
  'configuration': {'cacheWeights': False,
                    'concurrency': 4,
                    'dampingFactor': 0.85,
                    'maxIterations': 20,
                    'nodeLabels': ['*'],
                    'relationshipTypes': ['*'],
                    'relationshipWeightProperty': 'weight',
                    'scaler': 'NONE',
                    'sourceNodes': [],
                    'sudo': False,
                    'tolerance': 1e-07,
 

### 4.1 Loading the features from Neo4j
Now that we have all the input data for our model, load data from Neo4j into a dataframe:

We're seeing a slight improvement when adding our new feature! - however, keep in mind this dataset is *really tiny*: it's naturally very subceptible to randomness in the classifier and choice of train/test split. 

## What's next?
to learn more about the different execution modes of algorithms:
https://neo4j.com/docs/graph-data-science/current/common-usage/running-algos/

To speed up your process, consider looking into native projections:
https://neo4j.com/docs/graph-data-science/current/management-ops/native-projection/.

Read the docs on other algorithms, tips for modeling your data, and algo configurations:
https://neo4j.com/docs/graph-data-science/current/introduction/
