Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for swarm mode in 1.12? #54

Open
padyx opened this issue Jul 21, 2016 · 38 comments
Open

Support for swarm mode in 1.12? #54

padyx opened this issue Jul 21, 2016 · 38 comments
Labels

Comments

@padyx
Copy link

@padyx padyx commented Jul 21, 2016

Do you think it is possible -and likely- to support swarm mode coming with Docker 1.12 in this plugin?

Checking the remote api 1.24 for services, I'd say that it would be entirely possible to create a service with a single task.

It would be great to offer possible features of swarm in the cloud / image configuration. For example:

  • Resource reservation, resource limit per service/image
  • Placement constraints per service/image, maybe even on a job basis (I have no clue if this is possible in jenkins )
@KostyaSha
Copy link
Owner

@KostyaSha KostyaSha commented Jul 21, 2016

Yes, anything! But i stuck in docker-java upstream with integration tests :(

@KostyaSha
Copy link
Owner

@KostyaSha KostyaSha commented Jul 21, 2016

My usual process is update docker-java with APIs, then define how it should work in plugin and implement.

Could you provide ideas how configuration should look/work in jenkins?

Having configuration on Job basis would be very useful, that should be very simple like in jenkinsci/docker-plugin#383

@padyx
Copy link
Author

@padyx padyx commented Jul 25, 2016

I skimmed through the documentation of the remote API. Most of the current configuration won't need to be changed. The changes that would likely be needed:

Cloud configuration:

  • Mode setting (swarm mode 1.12 or single host mode [or swarm standalone])
  • Master: Possibly: Specify more than one master for a single cloud? (Not sure if needed in the plugin itself)

Image configuration

  • Privileged Flag not supported yet, needs to be disabled (See also this comment )
  • Add fields for cpu, memory reservation
  • Add fields for cpu, memory limit
  • Add field for placement constraints

I could imagine the following override settings for the job-based configuration, but I'd think it would be good to think this over first:

  • Override cpu, memory reservation/limit
  • Override argument, and possibly command to execute

Also, from what I've read in the documentation, exposing ports might be more difficult than in the current version: From what I see, they need to be exposed explicitly (no nice "Publish All") flag.
So this would require some random generator and possibly retry if it conflicts with an existing service.

As an inspiration, the Kubernetes Plugin looks very similar to the likely solution:
image

@KostyaSha
Copy link
Owner

@KostyaSha KostyaSha commented Aug 21, 2016

As an inspiration, the Kubernetes Plugin looks very similar to the likely solution:

That looks the same :/

Master: Possibly: Specify more than one master for a single cloud? (Not sure if needed in the plugin itself)

It may make sense when DockerClient will throw exception, but net-split issue will be under question.

Image configuration

Are they are the same as for standard docker? I can sync API to latest create/stop/remove features.

Override cpu, memory reservation/limit
Override argument, and possibly command to execute

It could be possible extend JobProperty #72 in future and require contstraints.

As an inspiration, the Kubernetes Plugin looks very similar to the likely solution:

They has only reservation/etc limits, and it will be solved with syncing create command to latest features as soon as docker-java/docker-java#673 will be added in docker-java.

@KostyaSha
Copy link
Owner

@KostyaSha KostyaSha commented Aug 21, 2016

@padyx can standard docker client work with docker engine that in swarm mode?

@padyx
Copy link
Author

@padyx padyx commented Aug 22, 2016

can standard docker client work with docker engine that in swarm mode?

Yes, but any containers started via the regular container API (/containers/create) will be created on that specific host - and not in the swarm. So we do need to call a different API if swarm mode is selected.

Image configuration

Not all of the features that we currently can configure for regular containers are possible for swarm mode. Priviledged and SHM-Size are two of the features that don't work with Docker 1.12 in swarm mode. I haven't made a full comparison yet.

@avandorp
Copy link

@avandorp avandorp commented Sep 1, 2016

I've seen a few related and promising looking pull requests over at docker-java (docker-java/docker-java#686, docker-java/docker-java#678, docker-java/docker-java#673). How long - if at all - do you think it will take to take advantage of those in this plugin? Is it on anyone's priority list?

@KostyaSha
Copy link
Owner

@KostyaSha KostyaSha commented Sep 1, 2016

@avandorp unfortunately i do both projects in my free time, in docker-java they stuck because of integration tests. In docker-plugin it bit unclear how better design classes (i can add additional checkbox in Cloud and Template or subclass classes according to architecture design).

@padyx
Copy link
Author

@padyx padyx commented Sep 2, 2016

@KostyaSha Can we assist you in some way to get this moving?
Help fix integration tests in docker-java, help with architecture sketches here, or something else?

@KostyaSha
Copy link
Owner

@KostyaSha KostyaSha commented Sep 2, 2016

Yes, sure. But swarm mode is not needed for this plugin as jenkins should have exact mapping. I think swarm cli is the only useful thing for orchestration.

@KostyaSha
Copy link
Owner

@KostyaSha KostyaSha commented Sep 2, 2016

But i may mistake... open for discussion.

@padyx
Copy link
Author

@padyx padyx commented Sep 2, 2016

But swarm mode is not needed for this plugin as jenkins should have exact mapping. I think swarm cli is the only useful thing for orchestration.

Could you elaborate on what you mean with this? I don't quite follow.

@KostyaSha
Copy link
Owner

@KostyaSha KostyaSha commented Sep 2, 2016

One of the swarm-mode features is to have scaling, but with jenkins you can't do it without pre-creating Cloud objects on jenkins side. So you it would like create a lot of single services for every job that looks weird.

@padyx
Copy link
Author

@padyx padyx commented Sep 2, 2016

I see - my assumption for a possible solution was to use the Docker Remote API for services and to adapt the plugin to:

  • For each starting job (cloud node provision): Spawn a swarm service with 1 task (the jenkins slave task)
  • For each terminating job (cloud node unprovision): Stop and remove the swarm service

This would lead to creating and destroying services without taking advantage of scaling.

Is this what you thought, or do you see another option to support running jenkins jobs on Docker Swarms (with Swarm mode)?
Or would you have increased the scaling of the Swarm service and connected to the "free" task created by the scaling?

@KostyaSha
Copy link
Owner

@KostyaSha KostyaSha commented Sep 2, 2016

So you will have a lot of similar services?

Or would you have increased the scaling of the Swarm service and connected to the "free" task created by the scaling?

It may be possible if docker will listen events, but it would be too difficult i think. In any case we can create experimental provisionings and try!

@padyx
Copy link
Author

@padyx padyx commented Sep 7, 2016

So you will have a lot of similar services?

Yes, we'd have a lot of similar services if implemented that way, because we'd not care about scaling.

We checked the documentation and experimented with the remote API and our conclusions are:

  • Using a single service per Docker image would not work:
    • The swarm does not return the id of the created task from the /services/.../update endpoint, leaving us with no option to identify which task was just created. Unless only a single jenkins were to control the swarm, then theoretically we could compare the task list before/after the operation to identify the new task. But that would be a very unstable implementation
  • Using a single service per job run seems to work:
    • Use a GET request to /services to list all services, and identify already reserved ports
    • Generate random ports in the ephemeral range for all ports that need bindings
    • Use POST request to /services/create to start a single service (replicas=1)
    • (If necessary and the port got taken in the meantime, repeat the steps above)
    • Use a GET request to /services/<serviceid>/ to identify the created task and connect to it
    • After job completes: Use a DELETE request to /services/<serviceid> to remove the service

We'd suggest the "single service per job" implementation. What is your opinion?

@KostyaSha
Copy link
Owner

@KostyaSha KostyaSha commented Sep 7, 2016

We'd suggest the "single service per job" implementation. What is your opinion?

Looks similar to existing logic. Now the question will be how code could be refactored... and how generic swarm could fit...

@padyx
Copy link
Author

@padyx padyx commented Sep 9, 2016

Without knowing at all how the plugin is structured today - that sounds like a strategy pattern.
There'd be one strategy for normal use and one strategy for the swarm, depending which configuration was chosen.

@skahlhoefer
Copy link

@skahlhoefer skahlhoefer commented Sep 26, 2016

+1 supporting swarm mode would be great!

@KostyaSha
Copy link
Owner

@KostyaSha KostyaSha commented Jan 8, 2017

Small note, near this topic. Thinking how better implement 2 level provisioning in jenkins.

@KostyaSha
Copy link
Owner

@KostyaSha KostyaSha commented Feb 20, 2017

Swarm mode itself is not suitable for jenkins. Classical swarm is the best choice. It will expose api that could be used for balanced slave containers runs and building images. Swarm mode is mostly for app runs: run X containers, restart them. That's all isn't possible for jenkins builds.

@adityacs
Copy link

@adityacs adityacs commented Mar 1, 2017

@KostyaSha From jenkins build perspective we can spin new container for build with replica = "1" always. Then internally swarm mode will load balance and spin container in some host. For people using swarm mode already, they have to do another classical swarm setup just for jenkins builds. This would be a overhead of maintaining two clusters. Supporting swarm-mode would be very helpful.

@dsahithi9
Copy link

@dsahithi9 dsahithi9 commented Jul 24, 2017

+1

@goffinf
Copy link

@goffinf goffinf commented Aug 16, 2017

I also would like to see swarm mode support. I have raised a separate issue talking about how connecting the Cloud URL to a load balancer fails catastrophically apparently because the launched container cannot be located on subsequent calls after create (because the load balancer redirects the request to different nodes). My next thought was, maybe I could connect the Cloud URL to a swarm master since it's internal service discovery knows where all the containers that relate to a service exist (in this case there would only ever be one). But of course YADP needs to support the API calls to create swarm services rather than simple docker containers I suspect.

In an enterprise setting not being able to scale to use multiple hosts associated to a single YADP Cloud is a significant problem. Sure we could have multiple Clouds but that doesn't really equal scalability and you are still left with a single point of failure of your singleton host.

@witokondoria
Copy link
Contributor

@witokondoria witokondoria commented Aug 17, 2017

@goffinf In the meantime you can switch to docker swarm, that keeps the docker API (non-service based) while keeping a clusterized docker installation. There wouldnt be any need for load balancers, as docker swarm already does it.

@goffinf
Copy link

@goffinf goffinf commented Aug 19, 2017

@witokondoria, thx for your comment. I might try that, although I am somewhat reluctant to use what is essentially a deprecated product.

I would probably keep the ELB since it allows the use of a CName (R53 recordset alias) and would abstract the physical IP of the swarm master.

@goffinf
Copy link

@goffinf goffinf commented Aug 24, 2017

@padyx @KostyaSha @adityacs What do you think the propspects are for supporting swarm mode in YADP (in the constrained way outlined in this issue - single job per service) in the near term ?

Certainly in the corporate space, everyone I come across is using a scheduler of one type or another and therefore needs to leverage the service abstraction (nothing says a service can't be a single container stack). So whilst scalability (and resilience) won't necessarily be achieved by starting multiple containers, being able to schedule individual Jenkins slave containers across a cluster of managed nodes still represents a significant improvement from the single point of failure that is the current situation.

This isn't a criticism of the work to-date which I'm sure we all appreciate very much, but I am certainly having a tough time persuading architects and solution designers where I work of the elegance of ephemeral slaves when they discover this limitation.

As @padyx I am more than happy to contribute in any way I can, maintaining multiple projects with lots of people asking for change and trying to separate the high priorities from the nice to haves can be a lonely place :-)

Kind Regards

Fraser.

@adityacs
Copy link

@adityacs adityacs commented Aug 24, 2017

YADP uses https://github.com/docker-java/docker-java client for all docker operations. From the changelog(https://github.com/docker-java/docker-java/blob/master/CHANGELOG.md) I see that swarm-mode is yet not officially supported in docker-java client.

@padyx
Copy link
Author

@padyx padyx commented Aug 24, 2017

@goffinf The comment of adityacs is correct, that first there would need to be an implementation of Swarm APIs in docker-java. Another java api would be the https://github.com/spotify/docker-client which also offers a Java API and already supports Swarm APIs.

The changes themselves are likely not that big - refactoring the plugin to use different strategies would be required though. I have a very rough proof-of-concept Jenkins plugin using the Spotify APIs that successfully launches a service, executes a job and kills the service. (Currently not open sourced)

The major question for me is for @KostyaSha : Since this is your repository (and plugin), would you consider such a Swarm mode at all? If not, we'd probably have to create another plugin.

@danieleagle
Copy link

@danieleagle danieleagle commented Aug 24, 2017

@padyx I am also in a situation that requires ephemeral build slaves launched across a Docker Swarm Cluster via short lived services. Ultimately, if I cannot find a solution to allow this functionality then I was going to roll my own plugin.

To restate the sentiment that has been expressed here multiple times, I absolutely appreciate what @KostyaSha has done with this plugin. I also understand the time involved to maintain and add features can be quite challenging, especially with multiple endeavors such as career and family taking their toll. So by all means I am not complaining in the slightest and absolutely understand the situation.

Rather, I'd like to figure out a plan like everyone else so that the future state allows for Jenkins to use modern Docker Swarm for Ephemeral Slaves. I know many of us would be more than eager to contribute directly to this project to allow for this capability.

I think it's important to get a definitive answer for when or if this plugin will ultimately support what we are after. If it's not in the cards or may be much longer down the road than we desire, then we either get approval to contribute to this plugin or perhaps band together to create a fork or a new project.

Since this feature is so important to many I could see it adding so much value that it'd be very popular. We all win as a community if we can band together and make this happen. Setting a plan in motion is the next step and I'd be happy to get involved.

@KostyaSha
Copy link
Owner

@KostyaSha KostyaSha commented Aug 24, 2017

I see that swarm-mode is yet not officially supported in docker-java client.

It's in master, i'm fighting with tests.

The major question for me is for @KostyaSha : Since this is your repository (and plugin), would you consider such a Swarm mode at all? If not, we'd probably have to create another plugin.

Feel free to PR, i can review and comment something but in last time i can't checkout PR, rework, test. Especially when it was too hot outside (+32 degrees and 🌞 )
DockerCloud could be refactored with parent/child classes if can share something with swarm cloud.

Btw, the official answer from docker was "better to use classical docker swarm" because in swarm mode you can't even run docker build against.

@adityacs
Copy link

@adityacs adityacs commented Aug 24, 2017

Btw, the official answer from docker was "better to use classical docker swarm" because in swarm mode you can't even run docker build against.

I agree on this.

The problem is that in swarm mode both cluster level APIs and the docker APIs are exposed on common endpoint(:2375). So, we will be able to launch a service thereby launching a container in the swarm cluster calling APIs on swarm manager and after that container might come up on any of the host. However, to perform container specific API calls we have to know on which host the container is running on and then call API from that host. This adds a complexity of enabling API(opening port 2375) on each of the host. I don't think this is a good idea.

Even in classical swarm we have to open :2375 on each host. Might be maintaining small classical swarm cluster is the way to go.

@goffinf
Copy link

@goffinf goffinf commented Aug 24, 2017

Personally I don't have a problem having port 2375 open on all hosts in the cluster. From a security perspective we limit access to hosts from very specific sources (typically an ELB) via security groups, so this mitigates much of the attack surface. Also the slaves hosts themselves don't store and data, typically don't process any data that is sensitive (unless you have source code IP concerns), and the slave containers have a very short life-span.

Still not keen on maintaining a classic swarm. I just don't see it as a product with a sustainable shelf-life, plus I assume we would need to also provide the service discovery and registration features using the usual suspects (or won't those be required). It just feels like a solution that adds technical debt rather than takes advantage of the primary docker development route map ?

But of course YMMV :-)

Fraser.

@goffinf
Copy link

@goffinf goffinf commented Sep 6, 2017

At @KostyaSha suggestion, transferring some comments from another thread here to keep conversations relating to enabling YADP to use swarm mode.

I tried out using the 'legacy' docker swarm with a consul cluster (didn't need registrator for container events) and as @cpoole said this works as you would expect, with the swarm manager distributing the slave containers across the cluster using the default spread pattern (which you can change if you want). Although a more complex build (even though It's relatively straight-forward to automate with Terraform, Ansible or your tool of choice) it is probably a better approach since it provides a nicer separation of concerns, i.e. YADP deals with Jenkins job management and swarm with workload distribution.

@KostyaSha the sweet spot clearly is to support the swarm mode service API to simplify the build further. Have you had an opportunity to consider this further in both docker-java and YADP ? .. I am not at all concerned about the fact that we would be ignoring service level scaling.

@cpoole How did you get on with your testing of K8s ?

@KostyaSha commented 22 hours ago

the 'legacy' docker swarm ...

It's not legacy. It's called 'classical docker swarm' < that what docker devs suggesting to use for build infrastructure. Swarm mode is really designed for cloud ready application, while jenkins and it's slaves are "static" and not HA by design.

I am not at all concerned about the fact that we would be ignoring service level scaling....

I'm not ignoring, i just don't use the it. I did research for standard jenkins builds, cases and talked with docker people. We found that because of docker builds the best choice is classical swarm. But i don't reject swarm mode. I guess it should be not so difficult to run slave as a service if it just another way of running image and all the most annoying parts with launching are already solved. Btw nobody answered how they plan to build images while using swarm mode. Probably in hacky dind way like with k8s.

For classical swarm there are some fixes in docker-java master and integration tests started failing...
And for YAD i would also need some swarm mode setup to test that plugin works. Last time in docker-java i spent a lot of time preparing scripts :(

@KostyaSha
Copy link
Owner

@KostyaSha KostyaSha commented Sep 7, 2017

@goffinf please don't copy-paste threads into threads. That issue was only for swarm mode discussion. Provisioning strategy is fully unrelated.

@goffinf
Copy link

@goffinf goffinf commented Sep 7, 2017

@KostyaSha Actually the majority of that relates directly to supporting docker swarm mode in the YADP plugin, so it highly related !

@KostyaSha
Copy link
Owner

@KostyaSha KostyaSha commented Sep 7, 2017

I asked to do discussion in already existing issue because i have no time separating unrelated stuff. You copy-pasted absolutely unrelated stuff in this issue. Unrelated information will be removed.

Provisioning strategy is fully unrelated.

@samrocketman change only changes DockerCloud lookup order. It wouldn't be needed for swarm mode.

@goffinf
Copy link

@goffinf goffinf commented Sep 7, 2017

Very well, I have removed that part of the post. Hope that helps.

So, what's your current thinking in terms of supporting swarm mode ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
9 participants