Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-cluster, multi-namespace workflows #3523

Open
alexec opened this issue Jul 20, 2020 · 74 comments
Open

Multi-cluster, multi-namespace workflows #3523

alexec opened this issue Jul 20, 2020 · 74 comments

Comments

@alexec
Copy link
Contributor

alexec commented Jul 20, 2020

Summary

Run workflows across multiple clusters.

Motivation

So you only need to run one Argo Workflows installation.
So you can run a workflow that has nodes in different clusters.

Proposal

Like Argo CD.

#3516


Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

@luozhaoyu
Copy link

Not only multi-cluster, shall I create another issue for multi-namespace support? This is a related issue: #2063 (comment) to install Argo Workflow in one namespace, but support creating pods in multiple namespaces (not Cluster installation, as the permission would be too broad)

@CaramelMario
Copy link

More details:

  • Only install Argo server + controller in the master cluster
  • The master cluster dispatches steps of workflow to slave clusters
  • The slave clusters don't need to install Argo things

@alexec
Copy link
Contributor Author

alexec commented Dec 1, 2020

PoC findings:

What went well:

  • Assuming the same model as Argo CD for configuring works well. argo cluster add other
  • We primarily need to watch pods in other clusters - not workflows.
  • We need to garbage collect those pods in other clusters. Easy to do using a finalizer.

Questions raised:

  • What about cross-cluster Kubernetes API communications? Same issue as Argo CD?
  • It is not only pods, we will also need to create config maps and secrets for artifacts.
  • Workflows run in a single namespace. What do we do with a pod in another cluster? Presumably, we also might need a different namespace?
  • How do we treat service accounts? Will users need to use different accounts in different clusters?
  • How do users want to determine which cluster a workflow pod runs in? Do they want to set this in the container/script/resource template? Override at DAG? Somehow parameterize?

alexec added a commit to alexec/argo-workflows that referenced this issue Dec 1, 2020
…rgoproj#3523

Signed-off-by: Alex Collins <alex_collins@intuit.com>
@alexec
Copy link
Contributor Author

alexec commented Dec 2, 2020

I've created a dev build for people to test out multi-cluster workflows (and therefore prove demand for it)

argoproj/workflow-controller:multic

Instruction for use:

https://github.com/argoproj/argo/blob/399286fc1884bf20419de4b091766b29bbca7d94/docs/multi-cluster.md

Please let me know how you get on with this.

@alexec alexec changed the title Multi-Clusters Workflows Multi-cluster workflows Dec 4, 2020
@alexec
Copy link
Contributor Author

alexec commented Dec 4, 2020

@adrienjt
Copy link
Contributor

adrienjt commented Dec 4, 2020

@alexec what do you think of this? https://admiralty.io/blog/2019/01/17/running-argo-workflows-across-multiple-kubernetes-clusters (the link was once listed on the Argo Workflows website)

The blog post is slightly outdated, as Admiralty uses Virtual-Kubelet and the Scheduler Framework now, but the use case still works. Admiralty creates a node that represents a remote cluster, makes multi-cluster workflows possible without any code change in the Argo project.

IMHO, multi-cluster is a common concern best treated separately. BTW, Admiralty also works with Argo CD.

@alexec
Copy link
Contributor Author

alexec commented Dec 4, 2020

Hi @adrienjt thank you - I just tweeted the post's author before realizing it was you. I'm aware that any first-class solution in Argo would be in competition with a multi-cluster scheduler as it would make the need moot. I'm also aware from working on Argo CD, that security with multi-cluster is difficult, because you end up with a single main cluster that has a lot of permissions..

@alexec alexec changed the title Multi-cluster workflows Multi-cluster, multi-namespace workflows Jan 8, 2021
@alexec alexec linked a pull request Jan 8, 2021 that will close this issue
1 task
@alexec alexec assigned alexec and unassigned alexec Jan 9, 2021
@alexec
Copy link
Contributor Author

alexec commented Jan 14, 2021

I've updated the dev images during my white-space time today. You can test with these images:

  • alexcollinsintuit/workflow-controller:multic
  • alexcollinsintuit/argocli:multic

Instructions

We really need to hear more concrete use case to progress this.

@dudicoco
Copy link

dudicoco commented Feb 8, 2021

Isn't multi namespace already supported?
I assume this could be done by using a cluster scope installation, but instead of creating a cluster role, create roles in each namespace you would like argo to have access to.

@shadiramadan
Copy link

We really need to hear more concrete use case to progress this.

@alexec For background on our use case:

We have 4 environments - each are separate clusters.

One is an 'operations' cluster that has argo-workflows installed. The rest are dev, staging, and production.

We have a workflow that updates multiple data stores with a lot of data.

Instead of 3 argo installations / UIs or instead of exposing endpoints to the data stores so they can be accessed by the operations argo workflow- I'd rather be able to run a workflow pod in a different cluster than argo is installed in so I can have one UI/Login with all my workflows that run in multiple clusters.

Right now we have to expose all these data stores and copy over a lot of the k8s secrets from the dev/staging/production clusters to the operations cluster in order for everything to work. I'd rather be able to just run a container in any connected cluster I specify.

@roshpr
Copy link

roshpr commented Feb 11, 2021

@alexec our usecase follows as below

We have a central master cluster which needs to connect to multiple different regional and edge K8 clusters to run different workflows depending on what workflows provisioned in our central master Argo server.

Right now we worked around by using git runners on each regional cluster to run some of our tasks. It is a cumbersome solution difficult to maintain and organize the sequence of tasks.

@alexec
Copy link
Contributor Author

alexec commented Mar 7, 2021

There are two main interpretations of "multi-cluster":

  1. 🚀 A single installation of Argo Workflows in one cluster that runs workflows created in other clusters, and exposes a single user interface for all workflows in all clusters (Multi-cluster UI: aka "control plane" #4684). It is possible to run workflows in multiple clusters today with multiple installations, so this is about simplifying management or Argo Workflow when you have many clusters.
  2. 👀 One or more installations that can run workflows where a single workflow can have two steps running in different namespaces (Multi-namespace workflows #2063?) and/or different clusters. This is not possible today, so this is about opening up new use cases.

As this is ambiguous, we don't actually know which of these you want (or both).

Can I ask you to vote by adding the appropriate reaction (🚀 / 👀 ) to this comment. Go further to demonstrate your interest by adding a comment with the use case you're trying to solve.

@joyciep
Copy link
Contributor

joyciep commented Mar 8, 2021

Our use case for option 2:

Our workflow involves different steps involving different clusters. First few steps are for extracting and preprocessing the data in the first cluster, then the next step is to train the data in a separate cluster (with GPU) for machine learning purposes.

@joshuajorel
Copy link

@alexec point 2 might be more extensible in terms of scaling i.e. deploy workflow controllers in different namespaces and/or clusters and they communicate with a single argo server. Might also open the possibility of having workflow controllers outside kubernetes (VM deployments) since we might not want specialized hardware such as GPU machines to be part of a cluster.

@servo1x
Copy link

servo1x commented Mar 9, 2021

Where option 2 might be nice is where there is a secondary cluster for windows nodes. Our primary cluster (linux) uses a CNI not compatible with windows so we had to set up a separate cluster. It would be nice if our argo workflows that is on our primary server had the capability to schedule workloads on the secondary cluster for windows specific tasks.

Imagine in the case someone was using argo workflows for CI and was working in a monorepo for Linux and Windows docker images. Instead of having separate workflows, a single one with tasks that could be scheduled on the correct cluster, could open up a lot of interesting possibilities.

@Guillermogsjc
Copy link

Guillermogsjc commented Mar 9, 2021

Point 1 is the straightforward use case where you have several clients and cloud accounts with distinct clusters.

Managing workflows (UI+single client) specially Cron ones from one of the Argo installed on a "central" one would simplify a lot the work. It might be several Argos (one on each cluster) but to have an existing main one with an abstraction with credentials over the rest, maybe an easier option than trying to avoid existing Argos in the subrogated clusters

@awcchungster
Copy link

awcchungster commented Mar 26, 2021

Our use case for option 2:

Our workflow involves different steps involving different clusters. First few steps are for extracting and preprocessing the data in the first cluster, then the next step is to train the data in a separate cluster (with GPU) for machine learning purposes.

From the machine learning perspective, this use case is increasingly popular. At AWS, I meet with many customers who are hybrid or multi cloud. The ability to run steps that transfer data, run a container in different clusters, merge final results, and manage all steps in a single interface is highly valuable.

@srivathsanvc

@alexec
Copy link
Contributor Author

alexec commented Mar 29, 2022

profile goes in the argo system namespace

@alexec
Copy link
Contributor Author

alexec commented Apr 6, 2022

@bengoldenberg
Copy link

any update?

@amitde69
Copy link

from our stand point this is definitely a must !! we are waiting for the official implementation.

@jallen-frb
Copy link

we are very interested in this feature as well.

@shuker85
Copy link
Contributor

@alexec hello, do you know if any progress was made for this topic ? I see https://github.com/argoproj/argo-workflows/blob/dev-mc/docs/multi-cluster.md quite empty :/

@Stelagowski-Tomasz
Copy link

@alexec Is this feature also allowing to watch argoworkflows deployed in a single cluster but view them in UI deployed in another one?

@mikeshng
Copy link

mikeshng commented Oct 3, 2022

Based on the design provided in this doc: https://github.com/argoproj/argo-workflows/blob/dev-mc/docs/multi-cluster.md

I think integration with https://open-cluster-management.io/ could be a candidate to help Argo workflow with the multi-cluster capabilities by providing:

  • cluster registration
  • cluster inventory
  • workload delivery to remote clusters

With no code changes or importing the Open Cluster Management (OCM) API, the following scenario is already possible:
Primary cluster:

  • contains list of remote clusters each with its own dedicated namespace that only the remote cluster has access to
  • create ManifestWork wrapping the Argo Workflow CR in the remote clusters namespaces

Remote clusters:

  • install Argo Workflow controller
  • pulls the Argo Workflow CR from the hub cluster, reconciles and report back results to the primary cluster.

With code changes and adopting the OCM APIs as first class citizen, Argo Workflow can delegate the cluster registration and inventory security concerns to OCM and focus on reconciling Workflow on the primary cluster and instead of creating workload pods on the primary cluster delegate the workload to the remote clusters.

PoC integration: https://github.com/mikeshng/argo-workflow-multicluster

@sarabala1979
Copy link
Member

@mikeshng Thanks Mike for putting together the information, Yes, I am also thinking the same line. Multi-Cluster and Multi-namespace workflow support should be at the kubernetes level.

@mrkwtz
Copy link

mrkwtz commented Oct 11, 2022

Because @sarabala1979 asked for it in the ArgoCon talk here is a MultiCluster use-case from us.

We need to sync databases across environments (and - in our case - therefore Kubernetes clusters). So the workflow is like

  1. Create dump from source_db
  2. Copy dump to S3
  3. Restore dump into target_db

Between 2 and 3 there could be some scripts that alter the data like anonymizing. Right now how we implemented is like

  1. (Cluster A) Create dump from source_db
  2. (Cluster A) Copy dump to S3
  3. (Cluster A) Write Kafka message with some metadata (Workflow input parameters needed for restore, like dump location) to a given topic
  4. (Cluster A) Mirror the given topic to a Kafka instance in Cluster B
  5. (Cluster B) Consume Kafka message via Argo Events
  6. (Cluster B) Trigger Argo Workflow from the event for a dump restore

With multi cluster support that whole process could be made much easier. Really looking forward to it.

@christophkreutzer
Copy link

I could also provide an example what would be the benefit from our side:

We currently have a centralized cluster running Argo Workflows, which handles diverse data pipelines of multiple teams. It can scale pretty flexible thanks to autoscaling, but everything has it's limits.
There are already teams with their own clusters (without Argo Workflows currently), because they have special needs or would be hard to handle at scale in a shared cluster (GPU nodes, huge nodes, noisy neighbor avoidance, ...).
Some already added some boilerplate with a Workflow Step shelling out to a remote cluster (basically kubectl ...).

Multi-cluster support would be helpful, because:

  • we could manage (for now) one Argo Workflows instance on "our" cluster and handle dependencies locally (instead of some multi-cluster event-based system like @mrkwtz described), but dispatch steps to their specialized clusters (e.g. large Spark or ML training jobs)
  • we don't need any boilerplate for starting steps on remote clusters and afterwards missing metadata/logs about the remote job

@alexec alexec added this to To do in FAANG Scale CI Nov 7, 2022
@geowalrus4gh
Copy link

geowalrus4gh commented Nov 8, 2022

We are evaluating Argo Workflows and Keptn (A CD orchestration tool, no CI) from a deployment pipeline use case (CD pipeline). Found out Keptn has a multi cluster support which will orchestrate multi cluster deployment of an application in a single dev to prod deployment workflow. We would like to see the same feature in Argo Workflow, since we don't want to add yet another tool (keptn) solely for deployment orchestration. We can use Argo Workflows for CI and CD in an end to end workflow.

@sarabala1979
Copy link
Member

Argo workflow Team and OCM(https://open-cluster-management.io/) are working POC for Multiple cluster Argo workflow with the below use cases. We will demo to the community soon.

  1. Running the whole workflow on the remote cluster and bringing the status to the central cluster.
    a. User can provide the cluster where workflow needs to run
    b. Auto Load balancing based on resource availability on remote clusters
  2. Running a particular step/task on the remote cluster.
    a. User can provide the cluster where the step/task needs to run
    b. Auto Load balancing based on resource availability on remote clusters

@Stelagowski-Tomasz
Copy link

@sarabala1979 does works around POC, also would allow to see/manage Argo Workflows deployed on remote clusters, on the central cluster?
In our usecase we use Wokrflows for orchestrating job pipelines on different Production environments, we love to be able to see and manage those from single UI on central cluster instead of providing seperate UI access for every environment.

@kolorful
Copy link
Contributor

kolorful commented Nov 16, 2022

Use case of cross cluster workflow failover:
We are trying to adopt Argo-Workflow and everything we tested so far are fantastic, except the only blocker that kept us from migrationg is lack of a workflow cross cluster failover feature.

E.g. A user can specify a list of clusters a CronWorkflow can run in and if the workflow fails in one cluster or if the remote cluster is down, it will get re-schedule in another cluster (a.k.a. failover).

It would be really nice to see this feaute with the new multi-cluster workflow architecture. Thank you!

@ajonnavi
Copy link

Just curious, is the solution to both the problems, i.e., multi-cluster and multi-workspace the same? I see a lot of comments on multi-cluster use-cases, but I am not getting a clear picture of running and managing workloads in multiple namespaces in the same cluster.

We would like to have one cluster for CI purposes, but would still like to separate out workloads into multiple namespaces depending on team.

@Sharathmk99
Copy link

Hi All, we are trying to solve multi cluster setup using Liqo(using virtual node) which doesn’t need any changes to Argo Workflow.
Link - https://github.com/liqotech/liqo
Thanks

@joebowbeer
Copy link
Contributor

FWIW, I am interested in using one instance of Argo Worlflows to run worklfows in multiple vclusters.

@Kuvesz
Copy link

Kuvesz commented Sep 28, 2023

Any news on this? I really wanted to use this for something but realized it's not ready yet and isn't on master.

@mikeshng
Copy link

Any news on this? I really wanted to use this for something but realized it's not ready yet and isn't on master.

I recommend giving https://github.com/sarabala1979/argo-wf-ocm-multicluster a try.

@13567436138
Copy link

how is everything going

@x-7
Copy link

x-7 commented Jan 11, 2024

Any news on this? I really wanted to use this for something but realized it's not ready yet and isn't on master.

I recommend giving https://github.com/sarabala1979/argo-wf-ocm-multicluster a try.

@sarabala1979 @juliev0

I really want to try the solutionargo-wf-ocm-multicluster to schedule my workflow to multi-cloud, but I have a question:
I currently use argo api client in my business. Since argo apiserver does not belong to the scope of this addon library, how should I use argo api Combine with the addon. This achieves the purpose of keeping my existing code for calling the argo api unchanged.

the question also meationed in argo-workflow-multicluster

@juliev0
Copy link
Contributor

juliev0 commented Jan 11, 2024

Any news on this? I really wanted to use this for something but realized it's not ready yet and isn't on master.

I recommend giving https://github.com/sarabala1979/argo-wf-ocm-multicluster a try.

@sarabala1979 @juliev0

I really want to try the solutionargo-wf-ocm-multicluster to schedule my workflow to multi-cloud, but I have a question: I currently use argo api client in my business. Since argo apiserver does not belong to the scope of this addon library, how should I use argo api Combine with the addon. This achieves the purpose of keeping my existing code for calling the argo api unchanged.

the question aslo meationed in argo-workflow-multicluster

This was really more of a POC than an end-to-end solution. Somebody should probably fork it and take it from where we left off, if interested. Basically, in my recollection, the HUB cluster Controller is able to resolve Workflows with the "ocm-managed-cluster" annotation or the "ocm-placement" annotation (which resolves to a particular cluster). As long as you are submitting a Workflow with the right annotations, it should theoretically work.

Even if that works, there probably will be many things that don't work, however.

@alexec
Copy link
Contributor Author

alexec commented Mar 24, 2024

The challenge with this issue, is that it a BIG change. Someone (or some company) needs to be willing to invest the effort.

@shuangkun
Copy link
Member

shuangkun commented Mar 28, 2024

The challenge with this issue, is that it a BIG change. Someone (or some company) needs to be willing to invest the effort.

If the community has plans to implement it, I am very willing to invest in developing some related works. I have some experience developing workflows and multi-clusters. I would be very grateful if there is some guidance.

@x-7
Copy link

x-7 commented Mar 30, 2024

The challenge with this issue, is that it a BIG change. Someone (or some company) needs to be willing to invest the effort.

If the community has plans to implement it, I am very willing to invest in developing some related works. I have some experience developing workflows and multi-clusters. I would be very grateful if there is some guidance.

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment