Skip to content

Commit

Permalink
[GIE Doc] Refine Docs for Cypher (#2995)
Browse files Browse the repository at this point in the history
<!--
Thanks for your contribution! please review
https://github.com/alibaba/GraphScope/blob/main/CONTRIBUTING.md before
opening an issue.
-->

## What do these changes do?
as titled.

<!-- Please give a short brief about these changes. -->

## Related issue number

<!-- Are there any issues opened that will be resolved by merging this
change? -->

Fixes #2987

---------

Co-authored-by: Longbin Lai <longbin.lailb@alibaba-inc.com>
  • Loading branch information
shirly121 and longbinlai committed Jul 12, 2023
1 parent ce0407a commit 4944655
Show file tree
Hide file tree
Showing 12 changed files with 107 additions and 52 deletions.
6 changes: 5 additions & 1 deletion charts/gie-standalone/templates/frontend/statefulset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ spec:
done
$GRAPHSCOPE_HOME/bin/giectl start_frontend ${GRAPHSCOPE_RUNTIME} ${object_id} \
$json_file $runtime_hosts $GREMLIN_SERVER_PORT $EXTRA_CONFIG
$json_file $runtime_hosts $GREMLIN_SERVER_PORT $CYPHER_SERVER_PORT $EXTRA_CONFIG
exit_code=$?
while [ $exit_code -eq 0 ]
Expand All @@ -103,6 +103,8 @@ spec:
value: {{ .Values.executor.service.gaiaRpc | quote }}
- name: GREMLIN_SERVER_PORT
value: {{ .Values.frontend.service.gremlinPort | quote }}
- name: CYPHER_SERVER_PORT
value: {{ .Values.frontend.service.cypherPort | quote }}
- name: DNS_NAME_PREFIX_STORE
value: {{ $storeFullname }}-{}.{{ $storeFullname }}-headless.{{ $releaseNamespace }}.svc.{{ $clusterDomain }}
- name: SERVERSSIZE
Expand All @@ -124,6 +126,8 @@ spec:
ports:
- name: gremlin
containerPort: {{ .Values.frontend.service.gremlinPort }}
- name: cypher
containerPort: {{ .Values.frontend.service.cypherPort }}
{{- if .Values.frontend.readinessProbe.enabled }}
readinessProbe:
tcpSocket:
Expand Down
11 changes: 11 additions & 0 deletions charts/gie-standalone/templates/frontend/svc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,5 +36,16 @@ spec:
nodePort: null
{{- end }}
{{- end }}
- name: cypher
port: {{ .Values.frontend.service.cypherPort }}
protocol: TCP
targetPort: cypher
{{- if and (or (eq .Values.frontend.service.type "NodePort") (eq .Values.frontend.service.type "LoadBalancer")) (not (empty .Values.frontend.service.nodePorts.cypher)) }}
{{- if (not (empty .Values.frontend.service.nodePorts.cypher)) }}
nodePort: {{ .Values.frontend.service.nodePorts.cypher }}
{{- else if eq .Values.frontend.service.type "ClusterIP" }}
nodePort: null
{{- end }}
{{- end }}
selector: {{ include "graphscope-store.selectorLabels" . | nindent 4 }}
app.kubernetes.io/component: frontend
4 changes: 4 additions & 0 deletions charts/gie-standalone/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -363,12 +363,16 @@ frontend:
##
gremlinPort: 8182

## Cypher server port
cypherPort: 7687

## Specify the nodePort value for the LoadBalancer and NodePort service types.
## ref: https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport
##
nodePorts:
service: ""
gremlin: ""
cypher: ""
## Service clusterIP
##
# clusterIP: None
Expand Down
22 changes: 14 additions & 8 deletions docs/interactive_engine/deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,14 +70,20 @@ deployment and management of applications. To deploy GIE standalone using Helm,
You should see the `[YOUR_RELEASE_NAME]-gie-standalone-frontend-0` and `[YOUR_RELEASE_NAME]-gie-standalone-store-0` pods running.

- Get the endpoint of the GIE Frontend service:
```bash
kubectl describe svc [YOUR_RELEASE_NAME]-gie-standalone-frontend \
| grep "Endpoints:" | awk -F' ' '{print $2}'
```
You should see the GIE Frontend service endpoint as `<ip>:<gremlinPort>`.

- Connect to the GIE frontend service using the Tinkerpop's official SDKs or Gremlin console, which
can be found [here](./tinkerpop_gremlin.md).
1. get `<ip>:<gremlinPort>` for gremlin querying
```bash
kubectl describe svc [YOUR_RELEASE_NAME]-gie-standalone-frontend \
| grep "Endpoints:" | awk -F' ' '{print $2}' | head -1
```
2. get `<ip>:<cypherPort>` for cypher querying
```bash
kubectl describe svc [YOUR_RELEASE_NAME]-gie-standalone-frontend \
| grep "Endpoints:" | awk -F' ' '{print $2}' | tail -1
```

- Connect to the GIE frontend service by the following two ways:
1. using the Tinkerpop's official SDKs or Gremlin console, which can be found [here](./tinkerpop/tinkerpop_gremlin.md).
2. using the Neo4j's official SDKs or Cypher-Shell, which can be found [here](./neo4j/cypher_sdk.md).

## Remove the GIE Service
```bash
Expand Down
9 changes: 6 additions & 3 deletions docs/interactive_engine/dev_and_test.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,8 +117,11 @@ pegasus.hosts = localhost:1234
# graph schema path
graph.schema = /tmp/<v6d_object_id>.json

## Frontend Config
frontend.service.port = 8182
## Gremlin Server Port
gremlin.server.port = 8182

## Bolt Server Port
neo4j.bolt.server.port = 7687

# disable authentication if username or password is not set
# auth.username = default
Expand All @@ -131,7 +134,7 @@ java -cp ".:$GIE_TEST_HOME/lib/*" -Djna.library.path=$GIE_TEST_HOME/lib com.alib
```

With the frontend service, you can open the gremlin console and set the endpoint to
`localhost:8182`, as given [here](./tinkerpop_gremlin.md#gremlin-console).
`localhost:8182`, as given [here](./tinkerpop/tinkerpop_gremlin.md#connecting-via-gremlin-console). Similarly, you can open the cypher-shell and set the url to `neo4j://localhost:7687` by using `-a` option, as given [here](./neo4j/cypher_sdk.md#connecting-via-cypher-shell).

7. Kill the services of `vineyardd`, `gaia_executor` and `frontend`:
```
Expand Down
28 changes: 11 additions & 17 deletions docs/interactive_engine/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,14 +65,18 @@ gs.set_option(show_log=True)
# load the modern graph as example.
graph = load_modern_graph()

# Hereafter, you can use the `graph` object to create an `gremlin` query session
g = gs.gremlin(graph)
# then `execute` any supported gremlin query.
# Hereafter, you can use the `graph` object to create an `interactive` query session
g = gs.interactive(graph)
# then `execute` any supported gremlin query (by default)
q1 = g.execute('g.V().count()')
print(q1.all().result()) # should print [6]

q2 = g.execute('g.V().hasLabel(\'person\')')
print(q2.all().result()) # should print [[v[2], v[3], v[0], v[1]]]

# or `execute` any supported Cypher query, by passing `lang="cypher"`
q3 = g.execute("MATCH (n:person) RETURN count(n)", lang="cypher", routing_=RoutingControl.READ)
print(q3.records[0][0]) # should print 6
```

You may see something like:
Expand All @@ -87,31 +91,21 @@ You may see something like:
The number 6 is printed, which is the number of vertices in modern graph.
### Retrieve the gremlin client
The `g` returned by `gs.gremlin()` is a wrapper around `Client` of `gremlinpython`, you could get the `Client` by
```python
client = g.gremlin_client
print(client.submit('g.V()').all().result())
```
### Customize Configurations for GIE instance
You could pass additional key-value pairs to customize the startup configuration of GIE, for example:
```python
# Set the timeout value to 10 min
g = gs.gremlin(graph, params={'pegasus.timeout': 600000})
g = gs.interactive(graph, params={'query.execution.timeout.ms': 600000})
```
## What's the Next
As shown in the above example, it is very easy to use GraphScope to interactively query a graph using the gremlin query language on your local machine. You may find more tutorials [here](https://tinkerpop.apache.org/docs/current/tutorials/getting-started/) for the basic Gremlin usage, in which most read-only queries can be seamlessly executed with the above `g.execute()` function.
As shown in the above example, it is very easy to use GraphScope to interactively query a graph using both the Gremlin and Cypher query language on your local machine.
In addition to the above local-machine entr\'ee, we have prepared the following topics for your reference.
- GIE can handle much complex cases, for example, the complex LDBC
business intelligence workloads. [A walk-through tutorial is here!](./guide_and_examples)
- GIE can be deployed in a distributed environment to process very large graph. [How to do that?](./deployment)
- GIE has supported a lot of standard Gremlin steps, together with many useful syntactic sugars. [Please look into the details!](./supported_gremlin_steps)
- GIE has been designed to integrate with the Tinkerpop ecosystem, with necessary extensions such as some syntactic sugars to facilitate the use of Gremlin. [Please look into the details!](./tinkerpop/tinkerpop_gremlin.md)
- - GIE has been designed to integrate with the Neo4j ecosystem. [Please look into the details!](./neo4j/cypher_sdk.md)
- Want to learn more about the technique details of GIE. [This is the design and architecture of GIE!](./design_of_gie)
18 changes: 12 additions & 6 deletions docs/interactive_engine/neo4j/cypher_sdk.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# GIE for Cypher
This document will provide you with step-by-step guidance on how to connect your Cypher applications to the GIE's
FrontEnd service, which offers functionalities similar to the official Tinkerpop service.
We have implemented Neo4j's [Bolt](https://neo4j.com/docs/bolt/current/bolt/) protocol for you to connect your Neo4j applications to the GIE's Frontend service.

Your first step is to obtain the Bolt Connector of GIE Frontend service:
- Follow the [instruction](./dev_and_test.md#manually-start-the-gie-services) while starting GIE on a local machine.
Your first step is to obtain the Cypher endpoint for the [Bolt](https://neo4j.com/docs/bolt/current/bolt/) connector
- Follow the [instruction](../deployment.md) while deploying GIE in a K8s cluster,
- Follow the [instruction](../dev_and_test.md) while starting GIE on a local machine.

## Connecting via Python Driver

Expand All @@ -19,7 +19,7 @@ Then connect to the service and run queries:
```Python
from neo4j import GraphDatabase, RoutingControl

URI = "neo4j://localhost:7687" # the bolt connector you've obtained
URI = "neo4j://localhost:7687" # neo4j:// + Cypher endpoint you've obtained
AUTH = ("", "") # We have not implemented authentication yet

def print_top_10(driver):
Expand All @@ -35,14 +35,20 @@ with GraphDatabase.driver(URI, auth=AUTH) as driver:
print_top_10(driver)
```

````{hint}
A simpler option is to use the `interactive` object for submitting Cypher queries through
[GraphScope's python SDK](../getting_started.md), which is a wrapper that encompasses Neo4j's
Python Driver and will automatically acquire the endpoint.
````


## Connecting via Cypher-Shell
1. Download and extract `cypher-shell`
```bash
wget https://dist.neo4j.org/cypher-shell/cypher-shell-4.4.19.zip
unzip cypher-shell-4.4.19.zip && cd cypher-shell
```
2. Connect to the Bolt Connector
2. Connect to the Bolt connector with the Cypher endpoint you've obtained
```bash
./cypher-shell -a neo4j://localhost:7687
```
Expand Down
8 changes: 4 additions & 4 deletions docs/interactive_engine/tinkerpop/tinkerpop_gremlin.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ This document will provide you with step-by-step guidance on how to connect your
FrontEnd service, which offers functionalities similar to the official Tinkerpop service.

Your first step is to obtain the endpoint of GIE Frontend service:
- Follow the [instruction](./deployment.md#deploy-your-first-gie-service) while deploying GIE in a K8s cluster,
- Follow the [instruction](./dev_and_test.md#manually-start-the-gie-services) while starting GIE on a local machine.
- Follow the [instruction](../deployment.md) while deploying GIE in a K8s cluster,
- Follow the [instruction](../dev_and_test.md) while starting GIE on a local machine.

## Connecting via Python SDK

Expand Down Expand Up @@ -35,8 +35,8 @@ Then connect to the service and run queries:
```

````{hint}
A simpler option is to use the `gremlin` object for submitting Gremlin queries through
[GraphScope's python SDK](./getting_started.md), which is a wrapper that encompasses Tinkerpop's
A simpler option is to use the `interactive` object for submitting Gremlin queries through
[GraphScope's python SDK](../getting_started.md), which is a wrapper that encompasses Tinkerpop's
Gremlin-Python and will automatically acquire the endpoint.
````

Expand Down
32 changes: 22 additions & 10 deletions docs/overview/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,20 +70,28 @@ g = load_ogbn_mag()
```
````

Interactive queries enable users to explore, examine, and present graph data in a flexible and in-depth manner, allowing them to find specific information quickly. GraphScope utilizes Gremlin, a high-level graph traversal language, for interactive queries and offers efficient execution at scale.
Interactive queries enable users to explore, examine, and present graph data in a flexible and in-depth manner, allowing them to find specific information quickly. GraphScope enhances the presentation of interactive queries and ensures efficient execution of these queries on a large scale by providing support for the popular query languages [Gremlin](https://tinkerpop.apache.org/gremlin.html) and [Cypher](https://opencypher.org/).


````{dropdown} Run interactive queries with Gremlin
````{dropdown} Run interactive queries with Gremlin and Cypher
In this example, we use graph traversal to count the number of papers two given authors have co-authored. To simplify the query, we assume the authors can be uniquely identified by ID 2 and 4307, respectively.
```python
# get the endpoint for submitting Gremlin queries on graph g.
interactive = graphscope.gremlin(g)
# get the endpoint for submitting interactive queries on graph g.
interactive = graphscope.interactive(g)
# count the number of papers two authors (with id 2 and 4307) have co-authored
# Gremlin query for counting the number of papers two authors (with id 2 and 4307) have co-authored
papers = interactive.execute("g.V().has('author', 'id', 2).out('writes').where(__.in('writes').has('id', 4307)).count()").one()
# Cypher query for counting the number of papers two authors (with id 2 and 4307) have co-authored
# Note that for Cypher query, the parameter of lang="cypher" is mandatory
papers = interactive.execute( \
"MATCH (n1:author)-[:writes]->(p:paper)<-[:writes]-(n2:author) \
WHERE n1.id = 2 AND n2.id = 4307 \
RETURN count(DISTINCT p)", \
lang="cypher", routing_=RoutingControl.READ)
```
````

Expand Down Expand Up @@ -218,7 +226,7 @@ from graphscope.dataset.modern_graph import load_modern_graph
gs.set_option(show_log=True)
# load the modern graph as example.
#(modern graph is an example property graph for Gremlin queries given by Apache at https://tinkerpop.apache.org/docs/current/tutorials/getting-started/)
#(modern graph is an example property graph given by Apache at https://tinkerpop.apache.org/docs/current/tutorials/getting-started/)
graph = load_modern_graph()
# triggers label propagation algorithm(LPA)
Expand All @@ -238,7 +246,7 @@ print(ret.to_dataframe(selector={'id': 'v.id', 'distance': 'r'})

## Graph Interactive Query Quick Start
With the `graphscope` package already installed, you can effortlessly engage with a graph on your local machine.
You simply need to create the `gremlin` instance to serve as the conduit for submitting all Gremlin queries.
You simply need to create the `interactive` instance to serve as the conduit for submitting Gremlin or Cypher queries.

````{dropdown} Example: Run Interactive Queries in GraphScope
```python
Expand All @@ -249,17 +257,21 @@ from graphscope.dataset.modern_graph import load_modern_graph
gs.set_option(show_log=True)
# load the modern graph as example.
#(modern graph is an example property graph for Gremlin queries given by Apache at https://tinkerpop.apache.org/docs/current/tutorials/getting-started/)
#(modern graph is an example property graph given by Apache at https://tinkerpop.apache.org/docs/current/tutorials/getting-started/)
graph = load_modern_graph()
# Hereafter, you can use the `graph` object to create an `gremlin` query session
g = gs.gremlin(graph)
# Hereafter, you can use the `graph` object to create an `interactive` query session, which will start one Gremlin service and one Cypher service simultaneously on the backend.
g = gs.interactive(graph)
# then `execute` any supported gremlin query.
q1 = g.execute('g.V().count()')
print(q1.all().result()) # should print [6]
q2 = g.execute('g.V().hasLabel(\'person\')')
print(q2.all().result()) # should print [[v[2], v[3], v[0], v[1]]]
# or `execute` any supported Cypher query
q3 = g.execute("MATCH (n:person) RETURN count(n)", lang="cypher", routing_=RoutingControl.READ)
print(q3.records[0][0]) # should print 6
```
````

Expand Down
13 changes: 12 additions & 1 deletion docs/overview/graph_interactive_workloads.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Graph interactive workloads primarily focus on exploring complex graph structure
all occurrences (or instances) of the pattern in the graph. Pattern matching often involves relational operations to project, order and group the matched instances.

In GraphScope, the Graph Interactive Engine (GIE) has been developed to handle such interactive workloads,
which provides widely used query languages, such as Gremlin, that allow users to easily
which provides widely used query languages, such as Gremlin or Cypher, that allow users to easily
express both graph traversal and pattern matching queries. These queries will be executed with massive
parallelism in a cluster of machines, providing efficient and scalable solutions to graph interactive
workloads.
Expand Down Expand Up @@ -87,3 +87,14 @@ g.V().match(

The pattern matching query is declarative in the sense that users only describes the pattern using the `match()` step, while the engine determine how to execute the query (i.e. the execution plan) at runtime according to a pre-defined cost model. For example, a [worst-case optimal](https://vldb.org/pvldb/vol12/p1692-mhedhbi.pdf) execution plan may first compute the matches of `v1` and `v2`, and then intersect the neighbors of `v1` and `v2` as the matches of `v3`.

## Neo4j and Cypher
[Neo4j](https://neo4j.com/docs/) is a popular graph database management system known for its native graph processing capabilities. It provides an efficient and scalable solution for storing, querying, and analyzing graph data. One of the key components of Neo4j is the query language [Cypher](https://neo4j.com/docs/cypher-manual/current/introduction/), which is specifically designed for working with graph data. We have fully embraced the power of Neo4j by implementing essential and impactful operators in Cypher, which enables users to leverage the expressive capabilities of Cypher for querying and manipulating graph data. Additionally, we have integrated Neo4j's Bolt server into our system, allowing Cypher users to submit their queries using the open SDK. As a result, Cypher users can easily get started with GIE through the existing [Neo4j ecosystem](../interactive_engine/neo4j_eco.md), including the language wrappers of Python and Cypher-Shell.

### Pattern Matching
The `MATCH` operator in Cypher provides a declarative syntax that allows you to express graph patterns in a concise and intuitive manner. The pattern-based approach aligns well with the structure of graph data, making it easier to understand and write queries. This helps both beginners and experienced users to quickly grasp and work with complex graph patterns. Moreover, The `MATCH` operator allows you to combine multiple patterns, optional patterns, and logical operators to create complex queries, which empowers you to express complex relationships and conditions within a single query. It can be written in Cypher for the above `Triangle` example:
```bash
Match (v1)-[:Knows]-(v2),
(v1)-[:Purchases]->(v3),
(v2)-[:Purchases]->(v3)
Return DISTINCT v1, v2, v3;
```
2 changes: 1 addition & 1 deletion interactive_engine/compiler/ir_k8s_failover_ci.sh
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ wait_role_pods_to_run store ${store_total}

sleep 5

node_port=$(kubectl --namespace=${namespace} get svc ${role_prefix}-frontend -o go-template='{{range.spec.ports}}{{if .nodePort}}{{.nodePort}}{{"\n"}}{{end}}{{end}}')
node_port=$(kubectl --namespace=${namespace} get svc ${role_prefix}-frontend -o go-template='{{range.spec.ports}}{{if .nodePort}}{{.nodePort}}{{"\n"}}{{end}}{{end}}' | head -1)
hostname=$(minikube ip)
python3 ./submit_query.py $hostname:${node_port}

Expand Down
6 changes: 5 additions & 1 deletion interactive_engine/compiler/set_properties.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,10 @@ hosts="pegasus.hosts: $DNS_NAME_PREFIX_STORE:$GAIA_RPC_PORT";

hosts="${hosts/"{}"/0}";

gremlin_server_port="gremlin.server.port: $GREMLIN_SERVER_PORT";

cypher_server_port="neo4j.bolt.server.port: $CYPHER_SERVER_PORT";

count=1;
while (($count<$SERVERSSIZE))
do
Expand All @@ -37,6 +41,6 @@ done

graph_schema="graph.schema: $GRAPH_SCHEMA"

properties="$worker_num\n$timeout\n$batch_size\n$output_capacity\n$hosts\n$server_num\n$graph_schema"
properties="$worker_num\n$timeout\n$batch_size\n$output_capacity\n$hosts\n$server_num\n$graph_schema\n$gremlin_server_port\n$cypher_server_port"

echo -e $properties > ./conf/ir.compiler.properties

0 comments on commit 4944655

Please sign in to comment.