Skip to content

Commit

Permalink
Improve the deploy_with_existing_vineyard_cluster doc for easy unde…
Browse files Browse the repository at this point in the history
…rstanding (#2930)

Improve the `deploy_with_existing_vineyard_cluster` doc.

* Add the pictures for easy understanding.
* Add the details and fix some errors.

Fixes #2549

Signed-off-by: Ye Cao <caoye.cao@alibaba-inc.com>
  • Loading branch information
dashanji committed Jun 27, 2023
1 parent ae8d96c commit f543cd2
Show file tree
Hide file tree
Showing 3 changed files with 60 additions and 7 deletions.
67 changes: 60 additions & 7 deletions docs/deployment/deploy_with_existing_vineyard_cluster.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,34 @@
# Depoly with Existing Vineyard Cluster
# Deploy with Existing Vineyard Cluster

If you have already deployed a vineyard cluster, you can easily deploy GraphScope on the existing cluster and reuse the vineyard data such as graph with several GraphScope sessions. This will allow you to load a graph to the existing vineyard cluster and then reuse it with multiple GraphScope sessions, without needing to deploy a separate vineyard cluster for each session.
If you have already deployed a vineyard cluster, you can easily deploy GraphScope on the existing cluster and reuse the vineyard data such as graph with several GraphScope sessions. This will allow you to load a graph to the existing vineyard cluster and then reuse it with multiple GraphScope sessions, without the need to deploy a separate vineyard cluster for each session

This doc provides step-by-step instructions on how to do this.
:::{figure-md}

<img src="../images/default_session.png"
alt="GraphScope default session"
width="80%">

Create a default GraphScope session
:::

If you create a default GraphScope session, all engines including Vineyard are bundled in the same pod, so that they can be deployed on
any node within the Kubernetes cluster. However, this creates a closed Vineyard cluster, which is only accessible to the GraphScope session. When the session is closed, the Vineyard cluster is also deleted, and it cannot be accessed by other GraphScope sessions.


:::{figure-md}

<img src="../images/session_with_vineyard_cluster.png"
alt="GraphScope sessions connect to an existing vineyard cluster"
width="80%">

Connecting GraphScope sessions to an existing vineyard cluster for data sharing
:::

The figure above shows that GraphScope sessions can share the data in the same vineyard cluster as the engines in different sessions are deployed on the same node within the Kubernetes cluster and connected to the same vineyard socket. Multiple sessions can reuse the same graph as long as the vineyard cluster is alive. This is a common use pattern of vineyard on Kubernetes.

If you don't want to reserve the vineyard cluster for a long time, you can store the graphs in the vineyard cluster in the persistent storage, and then load the data from the persistent storage to the vineyard cluster when you need it. For more details, please refer to [Persistent storage of graphs on the Kubernetes cluster](./persistent_storage_of_graphs_on_k8s.md).

Next provides step-by-step instructions on how to do this.

## Prerequisites

Expand Down Expand Up @@ -36,16 +62,28 @@ python3 -m pip install vineyard
```

By default, the Vineyard cluster consists of three Vineyard instances and three etcd instances.
However, since we only have one node in the Kubernetes cluster, we need to specify the number of Vineyard instances and etcd instances using the `vineyard_replicas` and `vineyard_etcd_replicas` parameters.
However, since we only have one node in the Kubernetes cluster, we need to specify the number of Vineyard instances and etcd instances using the `vineyard_replicas` and `vineyard_etcd_replicas` parameters. DON'T set the number of Vineyard instances and etcd instances to be greater than the number of nodes in the Kubernetes cluster. Instead, the number of vineyard replicas and the number of engine pods can be set independently.

Create and check the namespace `vineyard-system` as follows.

```bash
$ kubectl create namespace vineyard-system
namespace/vineyard-system created
$ kubectl get namespace vineyard-system
NAME STATUS AGE
vineyard-system Active 33s
```

To deploy a simple Vineyard cluster with one Vineyard instance and one etcd instance, follow the next step:

```python
import vineyard

# The default deployment name is `vineyardd-sample` and the default namespace is `vineyard-system`. Also, you can specify the deployment name and namespace by `name` parameter and `namespace` parameter. For more details about the parameters, please refer to the doc of vineyardctl
# https://github.com/v6d-io/v6d/blob/main/k8s/cmd/README.md
# Notice, all character `-` in the parameter of vineyardctl should be replaced with `_` in the python API
vineyard.deploy.vineyardctl.deploy.vineyard_deployment(
vineyard_replicas=1,
vineyard_etcd_replicas=1,
create_namespace=True
)
```

Expand All @@ -60,6 +98,17 @@ etcd0 1/1 Running 0 73m
vineyardd-sample-5db59987f-vr2fg 1/1 Running 0 73m
```

## The lifecycle of a vineyard cluster

If you deploy the vineyard cluster with the vineyardctl API, it will persist until you manually delete it. The vineyard cluster will not be affected by quitting the GraphScope session. You can delete the vineyard cluster with the following command:

```python
import vineyard
vineyard.deploy.vineyardctl.delete.vineyard_deployment()
```

However, if you do not deploy the vineyard cluster beforehand, it will be created when you create a GraphScope session with the specified vineyard deployment name and namespace. The vineyard cluster will be deleted when you close the GraphScope session.

## Load the dataset to the Kubernetes cluster

Depending on how the Kubernetes cluster was created, you may need to take different steps to make your dataset available within the cluster. If the cluster was not created using minikube, you will need to either copy the dataset to the nodes of the Kubernetes cluster or mount it onto them. On the other hand, if the cluster was created using minikube, you can directly mount the dataset to the minikube VM, without the need for further copying or mounting operations.
Expand Down Expand Up @@ -103,6 +152,8 @@ k8s_volumes = {
}

# the step will be long as it will create a graphscope cluster
# Make sure the vineyard cluster is created before creating the GraphScope session
# if it's not exist, a new vineyard cluster will be created and the graph will be loaded to the new vineyard cluster
sess = graphscope.session(
k8s_namespace='vineyard-system',
k8s_vineyard_deployment='vineyardd-sample',
Expand All @@ -125,6 +176,7 @@ then load the graph with the vineyard id in the new GraphScope session.

```python
import graphscope
import vineyard

# the step will be long as it will create a graphscope cluster
new_sess = graphscope.session(
Expand All @@ -134,7 +186,7 @@ new_sess = graphscope.session(

# Use the vineyard id of the graph the last GraphScope session loaded
# assume the vineyard id is 22731319746904674, you can load it as follows
graph = new_sess.load_from(vineyard_id=22731319746904674)
graph = new_sess.load_from(vineyard.ObjectID(22731319746904674))
```

Check the graph as follows.
Expand All @@ -156,6 +208,7 @@ If you see the output above, that means you have successfully reused the existin
Delete the Vineyard cluster by

```python
# the default vineyard deployment name is `vineyardd-sample` and namespace is `vineyard-system`, if you don't specify the arguments when you create the vineyard cluster, you can delete it as follows
vineyard.deploy.vineyardctl.delete.vineyard_deployment()
```

Expand Down
Binary file added docs/images/default_session.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/session_with_vineyard_cluster.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit f543cd2

Please sign in to comment.