Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate vineyard operator python API with Graphscope #2458

Merged
merged 9 commits into from Mar 20, 2023

Conversation

dashanji
Copy link
Collaborator

What do these changes do?

The pr uses Deployment to replace the original Daemonset as the vineyard operator uses Deployment to install the vineyard cluster. Also, after specifying the vineyard deployment, we need to use the vineyard operator python API to deploy the engines to vineyard nodes.

BTW, the pr needs to wait for the v6d-io/v6d#1183 ready.

@dashanji dashanji force-pushed the integrate-vineyard-operator branch 7 times, most recently from 0021083 to d86b91d Compare March 10, 2023 08:34
@dashanji dashanji force-pushed the integrate-vineyard-operator branch from 7c32d13 to b9791a1 Compare March 10, 2023 08:58
@dashanji dashanji changed the title [WIP] Integrate vineyard operator python API with Graphscope Integrate vineyard operator python API with Graphscope Mar 10, 2023
@codecov-commenter
Copy link

codecov-commenter commented Mar 10, 2023

Codecov Report

Merging #2458 (5b55f97) into main (ff287e4) will increase coverage by 26.93%.
The diff coverage is 3.33%.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##             main    #2458       +/-   ##
===========================================
+ Coverage   39.88%   66.81%   +26.93%     
===========================================
  Files          88       88               
  Lines        9814     9842       +28     
===========================================
+ Hits         3914     6576     +2662     
+ Misses       5900     3266     -2634     
Impacted Files Coverage Δ
python/graphscope/client/session.py 72.52% <ø> (+6.72%) ⬆️
python/graphscope/deploy/kubernetes/cluster.py 23.41% <0.00%> (-52.78%) ⬇️
...on/graphscope/tests/kubernetes/test_demo_script.py 0.00% <0.00%> (-75.66%) ⬇️
python/graphscope/config.py 100.00% <100.00%> (ø)

... and 51 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ff287e4...5b55f97. Read the comment docs.

@dashanji dashanji force-pushed the integrate-vineyard-operator branch 3 times, most recently from a04d75d to 72de11c Compare March 12, 2023 15:19
coordinator/gscoordinator/cluster_builder.py Outdated Show resolved Hide resolved
coordinator/gscoordinator/cluster_builder.py Outdated Show resolved Hide resolved
coordinator/gscoordinator/cluster_builder.py Outdated Show resolved Hide resolved
.github/workflows/k8s-ci.yml Show resolved Hide resolved
@dashanji dashanji force-pushed the integrate-vineyard-operator branch from 72de11c to 12a81bc Compare March 13, 2023 05:05
@dashanji
Copy link
Collaborator Author

All comments addressed. Also, we could check the ci for creating a multi-node cluster with the specific base image.

@siyuan0322
Copy link
Collaborator

if self.vineyard_deployment_exists():
socket_volume = self.get_vineyard_socket_volume()
container.volume_mounts = [socket_volume[2]]

This still exists in frontend container

@siyuan0322
Copy link
Collaborator

we could check the ci for creating a multi-node cluster with the specific base image.

You could do sleep in ci.yml and exec into that pod to create multi-node cluster manually.
To tweak the resources requests of the self-hosted container, see yuque doc.

@dashanji
Copy link
Collaborator Author

we could check the ci for creating a multi-node cluster with the specific base image.

You could do sleep in ci.yml and exec into that pod to create multi-node cluster manually. To tweak the resources requests of the self-hosted container, see yuque doc.

I have tried creating the multi-node cluster with the latest minikube base image gcr.io/k8s-minikube/kicbase:v0.0.32 and it can work as fine. Could we tag and push it as the base image? I think it's the simplest way to achieve the kubernetes ci test. WDYT? @siyuan0322

minikube start -p test12 --cpus='12' --memory='32000mb' --disk-size='40000mb' --mount=true --mount-string="/opt/caoye/graphscope/test/gstest:/testingdata" --base-image='gcr.io/k8s-minikube/kicbase:v0.0.32' --nodes 2                    
😄  [test12] minikube v1.26.0 on Ubuntu 20.04 (amd64)
    ▪ KUBECONFIG=/opt/caoye/testkubeconfig
✨  Automatically selected the docker driver. Other choices: none, ssh
📌  Using Docker driver with root privileges
👍  Starting control plane node test12 in cluster test12
🚜  Pulling base image ...
🔥  Creating docker container (CPUs=12, Memory=32000MB) ...
🐳  Preparing Kubernetes v1.24.1 on Docker 20.10.17 ...
    ▪ Generating certificates and keys ...
    ▪ Booting up control plane ...
    ▪ Configuring RBAC rules ...
🔗  Configuring CNI (Container Networking Interface) ...
🔎  Verifying Kubernetes components...
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: storage-provisioner, default-storageclass

👍  Starting worker node test12-m02 in cluster test12
🚜  Pulling base image ...
🔥  Creating docker container (CPUs=12, Memory=32000MB) ...
🌐  Found network options:
    ▪ NO_PROXY=192.168.67.2
🐳  Preparing Kubernetes v1.24.1 on Docker 20.10.17 ...
    ▪ env NO_PROXY=192.168.67.2
🔎  Verifying Kubernetes components...
🏄  Done! kubectl is now configured to use "test12" cluster and "default" namespace by default

@siyuan0322
Copy link
Collaborator

siyuan0322 commented Mar 14, 2023

Could we tag and push it as the base image? I think it's the simplest way to achieve the kubernetes ci test. WDYT? @siyuan0322

Sure, you could retag and push it on ECS.

But it works in your machine doesn't mean it works in the CI...

This file limits the total resources of the self-hosted runner.
I suggest you could lower the resources of your fixturegs_session_distributed_with_vineyard_deployment and gs_session_with_vineyard_deployment (as you only launched the session and does not has any heavy workload) So you could lower the node resources(--cpus=xxx, --memory=xxx). And you could dedicate a new x.yml to your tests.
Typically, the CI fails with a self-hosted runner with a X without any messages, means it exceeds the maximum resources of the runner in the runtime, OOM or sth.

@dashanji
Copy link
Collaborator Author

Could we tag and push it as the base image? I think it's the simplest way to achieve the kubernetes ci test. WDYT? @siyuan0322

Sure, you could retag and push it on ECS.

But it works in your machine doesn't mean it works in the CI...

This file limits the total resources of the self-hosted runner. I suggest you could lower the resources of your fixturegs_session_distributed_with_vineyard_deployment and gs_session_with_vineyard_deployment (as you only launched the session and does not has any heavy workload) So you could lower the node resources(--cpus=xxx, --memory=xxx). And you could dedicate a new x.yml to your tests. Typically, the CI fails with a self-hosted runner with a X without any messages, means it exceeds the maximum resources of the runner in the runtime, OOM or sth.

Thanks for the detail, I will create a new test ci yaml later.

sighingnow added a commit that referenced this pull request Mar 14, 2023
## What do these changes do?

We will decouple the deploy of vineyard and graphscope (see also #2458)
where vineyard and graphscope workers may not aligned. After that, on
one kubernetes node, there might be only 1 vineyardd instance and more
than one graphscope workers (specially GIE executors) on it.

Before this pull request, that workable, but yields wrong (duplication)
query results. This pull fixed that by first sync process partition list
and splitted the partitions to "local" work processes.

To avoid disturbe existing users, the new feature is only availbel when
the environment variable `PARALLEL_INTERACTIVE_EXECUTOR_ON_VINEYARD ` is
set to `OFF` (both `ON` and `OFF` are covered by CI).

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
@dashanji dashanji force-pushed the integrate-vineyard-operator branch from c840681 to ee98085 Compare March 15, 2023 12:54
@dashanji dashanji force-pushed the integrate-vineyard-operator branch 8 times, most recently from 5b55f97 to bd1de63 Compare March 20, 2023 03:24
* Add relevant ci test.

Signed-off-by: Ye Cao <caoye.cao@alibaba-inc.com>
Signed-off-by: Ye Cao <caoye.cao@alibaba-inc.com>
Signed-off-by: Ye Cao <caoye.cao@alibaba-inc.com>
…it doesn't the socket

Signed-off-by: Ye Cao <caoye.cao@alibaba-inc.com>
Signed-off-by: Ye Cao <caoye.cao@alibaba-inc.com>
Signed-off-by: Ye Cao <caoye.cao@alibaba-inc.com>
…ment.

Signed-off-by: Ye Cao <caoye.cao@alibaba-inc.com>
@dashanji dashanji force-pushed the integrate-vineyard-operator branch 2 times, most recently from 8acc228 to 5d1bc13 Compare March 20, 2023 04:00
Copy link
Collaborator

@sighingnow sighingnow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM. We cuold this pr merged (as current changeset is not in the critical path) then we can go ahead.

# values:
# - graphscope-system-vineyard-deployment # [vineyard deployment namespace]-[vineyard deployment name]
# topologyKey: kubernetes.io/hostname
def _add_podAffinity_for_vineyard_deployment(self, workload):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_add_podAffinity_for_vineyard_deployment -> _add_pod_affinity_for_vineyard_deployment to make it pythonic.

See https://peps.python.org/pep-0257/ for docstring convensions for Python functions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense to me.

except ImportError:
raise RuntimeError(
"vineyard is not installed, please install vineyard first."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the check. As vineyard is a required dependency for coordinator: https://github.com/alibaba/GraphScope/blob/main/coordinator/requirements.txt

The catch would hide the underlying error during importing vineyard.

try:
import vineyard
except ImportError:
raise RuntimeError("vineyard is not installed, please install vineyard first.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the check. As commented above.

@@ -361,6 +361,9 @@ jobs:
with:
path: artifacts

#- name: sleep
# run: sleep infinity

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember to delete the debugging logs.

@dashanji dashanji force-pushed the integrate-vineyard-operator branch from 5d1bc13 to 66631b9 Compare March 20, 2023 05:06
@dashanji
Copy link
Collaborator Author

Basically LGTM. We cuold this pr merged (as current changeset is not in the critical path) then we can go ahead.

Thanks, I will debug it ASAP to avoid interrupting others‘ work.

Signed-off-by: Ye Cao <caoye.cao@alibaba-inc.com>
@dashanji dashanji force-pushed the integrate-vineyard-operator branch from 66631b9 to 8abb250 Compare March 20, 2023 07:10
Signed-off-by: Ye Cao <caoye.cao@alibaba-inc.com>
@sighingnow sighingnow merged commit e728f2d into alibaba:main Mar 20, 2023
@dashanji dashanji deleted the integrate-vineyard-operator branch March 20, 2023 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants