Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deploy on local doc #2523

Merged
merged 7 commits into from
Mar 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions README-zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,9 +123,7 @@ sub_graph = sub_graph.add_column(ret2, {"tc": "r"})
```python

# define the features for learning
paper_features = []
for i in range(128):
paper_features.append("feat_" + str(i))
paper_features = [f"feat_{i}" for i in range(128)]

paper_features.append("kcore")
paper_features.append("tc")
Expand Down
4 changes: 1 addition & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,9 +148,7 @@ following the last step.
```python

# define the features for learning
paper_features = []
for i in range(128):
paper_features.append("feat_" + str(i))
paper_features = [f"feat_{i}" for i in range(128)]

paper_features.append("kcore")
paper_features.append("tc")
Expand Down
4 changes: 2 additions & 2 deletions coordinator/gscoordinator/op_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -452,7 +452,7 @@ def run_on_interactive_engine(self, dag_def: op_def_pb2.DagDef):
return message_pb2.RunStepResponse(head=response_head), []

def _execute_gremlin_query(self, op: op_def_pb2.OpDef):
logger.info("execute gremlin query")
logger.debug("execute gremlin query")
message = op.attr[types_pb2.GIE_GREMLIN_QUERY_MESSAGE].s.decode()
request_options = None
if types_pb2.GIE_GREMLIN_REQUEST_OPTIONS in op.attr:
Expand All @@ -462,7 +462,7 @@ def _execute_gremlin_query(self, op: op_def_pb2.OpDef):
object_id = op.attr[types_pb2.VINEYARD_ID].i
gremlin_client = self._object_manager.get(object_id)
rlt = gremlin_client.submit(message, request_options=request_options)
logger.info("put %s, client %s", op.key, gremlin_client)
logger.debug("put %s, client %s", op.key, gremlin_client)
self._object_manager.put(op.key, GremlinResultSet(op.key, rlt))
return op_def_pb2.OpResult(code=OK, key=op.key)

Expand Down
5 changes: 0 additions & 5 deletions docs/deploy_as_job_for_analytical_tasks.md

This file was deleted.

5 changes: 0 additions & 5 deletions docs/deploy_as_service_with_groot.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/deploy_graphscope_on_local.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/deploy_graphscope_on_self_managed_k8s.md

This file was deleted.

280 changes: 280 additions & 0 deletions docs/deployment/deploy_graphscope_on_self_managed_k8s.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,280 @@
# Deploy GraphScope on K8s cluster

To processing large-scale graph distributedly, GraphScope is designed to be deployed on a Kubernetes(K8s) cluster.

As shown in the figure, users deploy and manage the workloads of GraphScope through a python client, which communicates with the
GraphScope engines on the K8s cluster through a gRPC service.

:::{figure-md}

<img src="../images/k8s.png"
alt="GraphScope on K8s"
width="80%">

GraphScope on K8s.
:::

A cluster on k8s contains a pod running the coordinator, and a `deployment` of GraphScope engines.

The coordinator in GraphScope is the endpoint of the backend. It manages the connections from python client via grpc, and takes responsibility for applying or releasing the pods for interactive, analytical and learning engines.

This document describes how to deploy GraphScope on a K8s cluster.

## Prerequisites

- Linux or macOS.
- Python 3.7 ~ 3.11.

## Install GraphScope Client
Different from the standalone mode, you only need to install the client package of GraphScope.

```bash
python3 -m pip install graphscope-client
```

````{tip}
Use Aliyun mirror to accelerate downloading if in need.

```bash
python3 -m pip install graphscope-client -i http://mirrors.aliyun.com/pypi/simple/ \
--trusted-host=mirrors.aliyun.com
```
````

## Prepare a Kubernetes cluster

To deploy GraphScope on Kubernetes, you must have a kubernetes cluster.

````{tip}
If you already have a K8s cluster, just skip this section and continue on deploying.
````

We recommend using [minikube](https://minikube.sigs.k8s.io/docs/start/).
Please follow the instructions of minikube to download an appropriate binary for your platform.

Then, start the minikube by

```bash
minikube start
```

On macOS, you can just use [Docker Desktop](https://docs.docker.com/desktop/kubernetes/), which includes a
standalone Kubernetes server and client.

Using this command to verify minikube is running.

```bash
minikube status
```

A normal status should looks like this

```bash
$ minikube status
minikube
type: Control Plane
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured
```

The output should show that the cluster is running, and the kubectl context is set to the minikube context.
Once started, minikube generates a [kubeconfig](https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/) file for users to communicate and interact with the cluster.

The default location of this file is `~/.kube/config`, which should look like this:

```yaml
apiVersion: v1
clusters:
- cluster:
certificate-authority: /root/.minikube/ca.crt
extensions:
- extension:
last-update: Thu, 16 Mar 2023 16:44:05 CST
provider: minikube.sigs.k8s.io
version: v1.28.0
name: cluster_info
server: https://172.21.67.111:8443
name: minikube
contexts:
- context:
cluster: minikube
extensions:
- extension:
last-update: Thu, 16 Mar 2023 16:44:05 CST
provider: minikube.sigs.k8s.io
version: v1.28.0
name: context_info
namespace: default
user: minikube
name: minikube
current-context: minikube
kind: Config
preferences: {}
users:
- name: minikube
user:
client-certificate: /root/.minikube/profiles/minikube/client.crt
client-key: /root/.minikube/profiles/minikube/client.key
```


## Deploying GraphScope

### Launch with default parameters
The engines of GraphScope are distributed as a docker image. The graphscope python client will pull the image if they are not present. If you run GraphScope on a k8s cluster, make sure the cluster is able to access the public registry.

A session encapsulates the control and state of the GraphScope engines. It serves as the entrance in the python client to GraphScope. A session allows users to deploy and connect GraphScope on a k8s cluster.

```python
import graphscope

sess = graphscope.session()
```
As default, it will look for a kubeconfig file in `~/.kube/config`, the file generated by minikube in the previous step will be used.

As shown above, a session can easily launch a cluster on k8s.

### Frequently used parameters

#### Customize image URI

Considering that users may want to use a different tag other than the default, or deploy in an intranet environment without internet access, they might need to customize the image URIs.

Users can configure the image URIs for the engines using a set of image-related parameters. The default configurations are as follows:

```python
sess = graphscope.session(
k8s_image_registry="registry.cn-hongkong.aliyuncs.com",
k8s_image_repository="graphscope",
k8s_image_tag="0.20.0",
)
```

see more details in [Session](https://graphscope.io/docs/reference/session.html#session).

#### Specify the number of workers

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBF

GraphScope is designed to handle extremely large-scale graphs that cannot fit in the memory of a single worker.
To process such graphs, users can increase the number of workers, as well as the CPU and memories of workers.

To achieve this, use the `num_workers` parameter:

```python
sess = graphscope.session(
num_workers=4,
k8s_engine_cpu=32,
k8s_engine_mem="256Gi",
vineyard_shared_mem="256Gi",
)
```

#### Provide a kubeconfig file other than default

If users want to deploy on a pre-existing cluster with a kubeconfig file located in a non-default location,
they can manually specify the path to the kubeconfig file as follows:

```python
sess = graphscope.session(k8s_client_config='/path/to/config')
```

#### Mount volumes

Sometimes users may want to use their dataset on the local disk, in this case, we provide options to mount a host directory to the cluster.

Assume we want to mount `~/test_data` in the host machine to `/testingdata` in pods, we can define a `dict` as follows, then pass it as `k8s_volumes` in session constructor.

Note that the host path is relative to the kubernetes node, that is, if you have a cluster created by a VM driver, then you need to copy that directory to the minikube VM, or mount that path to minikube VM. See more details [here](https://minikube.sigs.k8s.io/docs/handbook/mount/).

```python
import os
import graphscope

k8s_volumes = {
"data": {
"type": "hostPath",
"field": {
"path": os.path.expanduser("~/test_data/"),
"type": "Directory"
},
"mounts": {
"mountPath": "/testingdata"
}
}
}

sess = graphscope.session(k8s_volumes=k8s_volumes)
```

````{tip}
You could also create a cluster by [none driver](https://minikube.sigs.k8s.io/docs/drivers/none/).

```bash
minikube start --driver=none
```
````

### Inspect the deployment

The launch time of GraphScope depends on the time it takes to pull the necessary Docker images.
The pulling time is influenced by the network conditions.
Once the images are pulled, you can expect GraphScope to be up and running in less than 10 seconds.

Monitor the status of the deployment with the following command:

```bash
kubectl get pods
```

The output should show the status of the GraphScope pods. Here's an example

```
$ kubectl -n demo get po
NAME READY STATUS RESTARTS AGE
coordinator-demo-549cf6695f-86pkx 2/2 Running 0 10s
gs-engine-demo-0 0/4 ContainerCreating 0 6s
gs-interactive-frontend-demo-648487488f-bpls5 0/1 ContainerCreating 0 6s
```

Wait until all pods are running before proceeding.

You can further inspect the status of pods using `kubectl describe pod <pod-name>`.

That's it! You now have a running instance of GraphScope in a Kubernetes cluster.

You can use GraphScope to analyze graphs as usual.
Check out the [getting started](../overview/getting_started.md) guide for more information.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting Started




## Cleaning Up

When you are done with the GraphScope, you can delete the deployment by running this command.

```python
sess.close()
```

You can check if there are any remaining resources by:

```bash
kubectl get deployments
kubectl get statefulsets
kubectl get svc
```

If there are still resources, you may need to delete them manually by:

```bash
kubectl delete deployment <deployment-name>
kubectl delete statefulsets <statefulsets-name>
kubectl delete svc <svc-name>
```

To stop and delete the minikube cluster, run:

```bash
minikube stop
minikube delete
```
Loading