Improve the deploy_with_existing_vineyard_cluster doc for easy unde…

…rstanding (#2930) Improve the `deploy_with_existing_vineyard_cluster` doc. * Add the pictures for easy understanding. * Add the details and fix some errors. Fixes #2549 Signed-off-by: Ye Cao <caoye.cao@alibaba-inc.com>
alibaba · Jun 27, 2023 · f543cd2 · f543cd2
1 parent ae8d96c
commit f543cd2
Show file tree

Hide file tree

Showing 3 changed files with 60 additions and 7 deletions.
diff --git a/docs/deployment/deploy_with_existing_vineyard_cluster.md b/docs/deployment/deploy_with_existing_vineyard_cluster.md
@@ -1,8 +1,34 @@
-# Depoly with Existing Vineyard Cluster
+# Deploy with Existing Vineyard Cluster
 
-If you have already deployed a vineyard cluster, you can easily deploy GraphScope on the existing cluster and reuse the vineyard data such as graph with several GraphScope sessions. This will allow you to load a graph to the existing vineyard cluster and then reuse it with multiple GraphScope sessions, without needing to deploy a separate vineyard cluster for each session.
+If you have already deployed a vineyard cluster, you can easily deploy GraphScope on the existing cluster and reuse the vineyard data such as graph with several GraphScope sessions. This will allow you to load a graph to the existing vineyard cluster and then reuse it with multiple GraphScope sessions, without the need to deploy a separate vineyard cluster for each session
 
-This doc provides step-by-step instructions on how to do this.
+:::{figure-md}
+
+<img src="../images/default_session.png"
+     alt="GraphScope default session"
+     width="80%">
+
+Create a default GraphScope session
+:::
+
+If you create a default GraphScope session, all engines including Vineyard are bundled in the same pod, so that they can be deployed on
+any node within the Kubernetes cluster. However, this creates a closed Vineyard cluster, which is only accessible to the GraphScope session. When the session is closed, the Vineyard cluster is also deleted, and it cannot be accessed by other GraphScope sessions.
+
+
+:::{figure-md}
+
+<img src="../images/session_with_vineyard_cluster.png"
+     alt="GraphScope sessions connect to an existing vineyard cluster"
+     width="80%">
+
+Connecting GraphScope sessions to an existing vineyard cluster for data sharing
+:::
+
+The figure above shows that GraphScope sessions can share the data in the same vineyard cluster as the engines in different sessions are deployed on the same node within the Kubernetes cluster and connected to the same vineyard socket. Multiple sessions can reuse the same graph as long as the vineyard cluster is alive. This is a common use pattern of vineyard on Kubernetes.
+
+If you don't want to reserve the vineyard cluster for a long time, you can store the graphs in the vineyard cluster in the persistent storage, and then load the data from the persistent storage to the vineyard cluster when you need it. For more details, please refer to [Persistent storage of graphs on the Kubernetes cluster](./persistent_storage_of_graphs_on_k8s.md).
+
+Next provides step-by-step instructions on how to do this.
 
 ## Prerequisites
 
@@ -36,16 +62,28 @@ python3 -m pip install vineyard
 ```
 
 By default, the Vineyard cluster consists of three Vineyard instances and three etcd instances. 
-However, since we only have one node in the Kubernetes cluster, we need to specify the number of Vineyard instances and etcd instances using the `vineyard_replicas` and `vineyard_etcd_replicas` parameters. 
+However, since we only have one node in the Kubernetes cluster, we need to specify the number of Vineyard instances and etcd instances using the `vineyard_replicas` and `vineyard_etcd_replicas` parameters. DON'T set the number of Vineyard instances and etcd instances to be greater than the number of nodes in the Kubernetes cluster. Instead, the number of vineyard replicas and the number of engine pods can be set independently.
+
+Create and check the namespace `vineyard-system` as follows.
+
+```bash
+$ kubectl create namespace vineyard-system
+namespace/vineyard-system created
+$ kubectl get namespace vineyard-system
+NAME              STATUS   AGE
+vineyard-system   Active   33s
+```
+
 To deploy a simple Vineyard cluster with one Vineyard instance and one etcd instance, follow the next step:
 
 ```python
 import vineyard
 
+# The default deployment name is `vineyardd-sample` and the default namespace is `vineyard-system`. Also, you can specify the deployment name and namespace by `name` parameter and `namespace` parameter. For more details about the parameters, please refer to the doc of vineyardctl
+# https://github.com/v6d-io/v6d/blob/main/k8s/cmd/README.md
+# Notice, all character `-` in the parameter of vineyardctl should be replaced with `_` in the python API
 vineyard.deploy.vineyardctl.deploy.vineyard_deployment(
     vineyard_replicas=1, 
-    vineyard_etcd_replicas=1, 
-    create_namespace=True
 )
 ```
 
@@ -60,6 +98,17 @@ etcd0                              1/1     Running   0          73m
 vineyardd-sample-5db59987f-vr2fg   1/1     Running   0          73m
 ```
 
+## The lifecycle of a vineyard cluster
+
+If you deploy the vineyard cluster with the vineyardctl API, it will persist until you manually delete it. The vineyard cluster will not be affected by quitting the GraphScope session. You can delete the vineyard cluster with the following command:
+
+```python
+import vineyard
+vineyard.deploy.vineyardctl.delete.vineyard_deployment()
+```
+
+However, if you do not deploy the vineyard cluster beforehand, it will be created when you create a GraphScope session with the specified vineyard deployment name and namespace. The vineyard cluster will be deleted when you close the GraphScope session.
+
 ## Load the dataset to the Kubernetes cluster
 
 Depending on how the Kubernetes cluster was created, you may need to take different steps to make your dataset available within the cluster. If the cluster was not created using minikube, you will need to either copy the dataset to the nodes of the Kubernetes cluster or mount it onto them. On the other hand, if the cluster was created using minikube, you can directly mount the dataset to the minikube VM, without the need for further copying or mounting operations.
@@ -103,6 +152,8 @@ k8s_volumes = {
 }
 
 # the step will be long as it will create a graphscope cluster
+# Make sure the vineyard cluster is created before creating the GraphScope session
+# if it's not exist, a new vineyard cluster will be created and the graph will be loaded to the new vineyard cluster
 sess = graphscope.session(
     k8s_namespace='vineyard-system',
     k8s_vineyard_deployment='vineyardd-sample',
@@ -125,6 +176,7 @@ then load the graph with the vineyard id in the new GraphScope session.
 
 ```python
 import graphscope
+import vineyard
 
 # the step will be long as it will create a graphscope cluster
 new_sess = graphscope.session(
@@ -134,7 +186,7 @@ new_sess = graphscope.session(
 
 # Use the vineyard id of the graph the last GraphScope session loaded
 # assume the vineyard id is 22731319746904674, you can load it as follows
-graph = new_sess.load_from(vineyard_id=22731319746904674)
+graph = new_sess.load_from(vineyard.ObjectID(22731319746904674))
 ```
 
 Check the graph as follows.
@@ -156,6 +208,7 @@ If you see the output above, that means you have successfully reused the existin
 Delete the Vineyard cluster by
 
 ```python
+# the default vineyard deployment name is `vineyardd-sample` and namespace is `vineyard-system`, if you don't specify the arguments when you create the vineyard cluster, you can delete it as follows 
 vineyard.deploy.vineyardctl.delete.vineyard_deployment()
 ```
 

diff --git a/docs/images/default_session.png b/docs/images/default_session.png
diff --git a/docs/images/session_with_vineyard_cluster.png b/docs/images/session_with_vineyard_cluster.png