# It all starts with AuraDB

The new Project 🍓 (Strawberry) is a new way to run GDS on data hosted in an AuraDB database.
The way to do this currently in Aura is to copy the database over to an AuraDS instance.
This detaches the data and the two databases will likely diverge almost immediately.
It also has another couple of limitations:

- It's very manual; users have to click in the Aura Console to copy the database
- Once GDS computations are finished, writing back to the AuraDB instance is also a manual configuration
- AuraDS instances have to be manually managed in the Aura Console and do not encourage users to delete them after usage, thus causing increased running costs

With Project 🍓 we're addressing the top two limitations, and alleviating the final one a little bit.

## AuraDB

A base assumption is that there is an AuraDB already with data in it.
In this notebook, we will illustrate just briefly what example data we are using.

In [None]:
import os

# AuraDB data ingestion
from neo4j import GraphDatabase
from graphdatascience.query_runner.aura_db_arrow_query_runner import AuraDbConnectionInfo

# Aura environment
os.environ["AURA_ENV"] = "devstrawberry"

# DB credentials
db_id_and_pw = ("fd4de318", "")
db_connection_info = AuraDbConnectionInfo(
    f"neo4j+s://{db_id_and_pw[0]}-{os.environ['AURA_ENV']}.databases.neo4j-dev.io", ("neo4j", db_id_and_pw[1])
)
# start a Driver
driver = GraphDatabase.driver(db_connection_info.uri, auth=db_connection_info.auth)

# try out our connection
with driver.session() as session:
    display(session.run("RETURN true AS success").to_df())

In [None]:
# Add some data
with driver.session() as session:
    session.run("CREATE CONSTRAINT users FOR (u:User) REQUIRE u.id IS NODE KEY")
    session.run(
        """
        UNWIND range(0, 999) AS i
        CREATE (:User {id: i, age: toInteger(rand() * 75)})
        """
    ).consume()
    session.run(
        """
        UNWIND range(1, 8000) AS i
        WITH toInteger(rand() * 1000) AS source, toInteger(rand() * 1000) AS target
        MATCH (s:User {id: source})
        MATCH (t:User {id: target})
        CREATE (s)-[:KNOWS {since: 2020 - (rand() * 100)}]->(t)
        """
    ).consume()

In [None]:
# Let's check what data we have
with driver.session() as session:
    print(f"Number of nodes: {session.run('MATCH () RETURN count(*)').single().value()}")
    print(f"Number of relationships: {session.run('MATCH ()-->() RETURN count(*)').single().value()}")

# A new database component: Arrow Server

We have built a new piece of software into the Neo4j DBMS: an Arrow Server.
It is akin to the already existing Bolt and HTTP servers, but it has a very specific purpose: projecting graphs to a remote location, and receiving results to write back to the database.

With the Arrow Server comes one crucial new feature: an aggregating projection function.
This aggregating function is called `gds.graph.project` and is very similar to Cypher projection v2 in standard GDS.
There are two key differences between them:

1. In AuraDB, the aggregating function does not take a graph name as a parameter.
2. In AuraDB, the aggregating function does not project the graph to the local instance.

The aggregating function is used in queries that look quite identical to those of Cypher projections v2, and are authored by the user.

There is another function that comes with the Arrow Server, which is internal, undocumented, but is callable: `internal.arrow.status`.
It is used as a crucial part of the GDS Python Client functionality for managing the AuraDB - GDS connection.

In [None]:
# Let's call this function and see what it returns
with driver.session() as session:
    display(session.run("CALL internal.arrow.status").to_df())

# Aura API and GDS Python Client

Apart from the extension to AuraDB, we have also added a new API to the GDS Python Client.
This API is a Python frontend to the Aura API, as well as a set of internal management features for the AuraDB - GDS connection.
In order to use the Aura API, the user needs to have Aura API credentials.
These are generated in the Aura Console (under `Account settings`) and are a pair of strings: `CLIENT_ID` and `CLIENT_SECRET`.

Using these credentials the full set of features offered by the GDS Python Client can be used.
In particular, the features are:

- Create a new GDS session
- List all existing GDS sessions
- (Re-)connect to an existing GDS session
- Delete a GDS session

We will illustrate what this looks like below.

## Tenants

If the user is a member of multiple tenants, then they also need to enter their tenant id, in order to disambiguate which tenant they want to use.
In this notebook, we will use only a single tenant and omit the tenant id. 


In [None]:
# Initialise Aura API credentials
CLIENT_ID = os.environ.get("CLIENT_ID")
CLIENT_SECRET = os.environ.get("CLIENT_SECRET")

In [None]:
# The new stuff!
from graphdatascience.aura_sessions import AuraSessions

# Create a new AuraSessions object
sessions = AuraSessions(db_connection_info, (CLIENT_ID, CLIENT_SECRET))

# List our current sessions
sessions.list_sessions()

In [None]:
# Let's create a GDS session!
# This takes a few minutes to complete
gds = sessions.create_gds("pagerank-compute", "my-password")

In [None]:
# Since that takes a little bit of time, let's instead reconnect to an existing session
gds = sessions.connect("pagerank-compute", "my-password")

# The GDS session

A key new concept is the GDS session.
This takes the place of an AuraDS instance.
(In fact, it is exactly an AuraDS instance at this time, but we don't want to expose that to the user.
They should think of it as a GDS session and a separate thing, as much as possible.)
The GDS session offers all the GDS functionality that we are familiar with from AuraDS.
However, since the idea is to offload database work to AuraDB, the GDS session is not to be considered a database instance.

That means that all projections will go from AuraDB to GDS session, not from a local database.
Similarly, writing back will follow the same path back to AuraDB, and not to a local database.

## Implementation limitation

As mentioned in the parenthesis above, we do make use of existing AuraDS infrastructure to host the GDS sessions.
Due to that fact, there actually is a local database, but we try to not expose its Bolt URI, in an attempt to prohibit users adding data to that database. 

In [None]:
# The `gds` object that we get back from both `create_gds` and `connect` is a GraphDataScience object
gds.version()

In [None]:
# Project a graph from AuraDB to GDS
G, result = gds.graph.project.remoteDb(
    "pagerank-graph",
    """
    MATCH (u:User) 
    OPTIONAL MATCH (u)-[r:KNOWS]->(target:User) 
    RETURN gds.graph.project.remote(u, target, {
      sourceNodeProperties: {age: u.age},
      targetNodeProperties: {age: target.age},
      sourceNodeLabels: labels(u),
      targetNodeLabels: labels(target),
      relationshipType: 'KNOWS',
      relationshipProperties: {since: r.since}
    })
    """,
)

print(result)

In [None]:
# At this stage, we have a projected graph and we can do all of our normal GDS operations
# In this notebook, we will run PageRank

print("Running PageRank ...")
pr_result = gds.pageRank.mutate(G, mutateProperty="pagerank")
print(f"Compute millis: {pr_result['computeMillis']}")
print(f"Node properties written: {pr_result['nodePropertiesWritten']}")
print(f"Centrality distribution: {pr_result['centralityDistribution']}")

# And then we will run FastRP on that
print("Running FastRP ...")
frp_result = gds.fastRP.mutate(
    G,
    mutateProperty="fastRP",
    embeddingDimension=64,
    featureProperties=["pagerank"],
    propertyRatio=0.2,
    nodeSelfInfluence=0.2,
)
print(f"Compute millis: {frp_result['computeMillis']}")
print(f"Node properties written: {frp_result['nodePropertiesWritten']}")

In [None]:
gds.graph.nodeProperties.write(G, "pagerank")

In [None]:
# Now let's turn back to AuraDB and see what we have
with driver.session() as session:
    display(session.run("MATCH (u:User) RETURN u.pagerank").to_df())

# Conclusion

So, this is what Project 🍓 is all about.
We can create GDS sessions, project graphs from AuraDB, compute with GDS, and write back to AuraDB (well... almost).
We can also delete GDS sessions, which will delete the AuraDS instance that was created for it.

In [None]:
sessions.delete_gds("pagerank-compute")

In [None]:
# Drop the projected graph
gds.graph.get("pagerank-graph").drop()

In [None]:
# CLEAN UP ALL DATA IN AURADB DATABASE

with driver.session() as session:
    print(session.run("MATCH (n) DETACH DELETE n").single())

In [None]:
# Increase pandas display width
import pandas as pd

pd.set_option("display.max_colwidth", None)