Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Allow drone to spin-up remote agents #1052

Closed
themihai opened this issue Jun 9, 2015 · 13 comments
Closed

Feature: Allow drone to spin-up remote agents #1052

themihai opened this issue Jun 9, 2015 · 13 comments

Comments

@themihai
Copy link

themihai commented Jun 9, 2015

I would like to propose a feature which allows drone to scale and reduce the costs. The proposal targets AWS but we can develop an interface so that various IAAS providers can implement it.

A remote drone agent is a service that can run job builds on machines, other than the host server, that run the drone agent tool.
The developer may specify what machine is required to run the job (e.g. an AMI ID), in drone.yml configuration.
If drone.yml specifies to run the job on a remote agent/machine it uses the AWS API to start the instance (if if there is none available) and then makes a request to the remote agent (perhaps through a http API just connect through ssh) with the job details. If the remote agent resources are exhausted by other jobs the developer may choose to "force" (in drone.yml) spinning a new instance of the same kind and execute the builds in parallel.
What uses cases it addresses:

  • Time / CPU intensive jobs could be executed in parallel thus it reduces the build times.
  • The builds may have different requirements in terms of resources(CPU/RAM). So instead to keep a large and expensive instance running all the time to address all kind of build jobs you might need to run, you can run drone(the host server) on the most commonly used instance type or just the smallest instance type which can spin-up larger instances only when you need them.
@bradrydzewski
Copy link

@themihai thanks for the proposal. Some good news is that we have some of the necessary components in place to facilitate auto-scaling.

Drone already supports remote build servers using the Docker Remote API:
http://readme.drone.io/setup/config/workers/

And includes API endpoints to add and remove remote servers:
https://github.com/drone/drone/blob/master/server/router/router.go#L73

Version 0.4 (new branch) of Drone also includes an agent as an alternative to the Docker Remote API. This should (in theory) scale much better to a large number of servers (ie 1000+), although the existing option can easily scale to 40 servers.

If drone.yml specifies to run the job on a remote agent/machine it uses the AWS API to start the instance (if if there is none available) and then makes a request to the remote agent (perhaps through a http API just connect through ssh) with the job details.

We don't have this today (the ability to delegate builds to a particular server or group of servers). I like the idea of declaring this in the yaml file as you proposed.

If the remote agent resources are exhausted by other jobs the developer may choose to "force" (in drone.yml) spinning a new instance of the same kind and execute the builds in parallel.

I do see this as outside the scope of the core Drone codebase at this time. I think this would be a great stand-alone utility that runs as a cron job, queries the build queue for pending jobs using the API, and determines if instances should be added or removed.

I also think we should build on top of Docker machine, which can provision Docker servers on nearly any cloud provider. A server provisioned by Docker Machine can be registered with Drone using the API with no additional configuration required, assuming Drone has the key/cert that was used.

@ramonskie
Copy link

maby use coreOS with fleet?

or use bosh for deploying and managing the drone cluster

@bradrydzewski bradrydzewski modified the milestone: Unplanned Aug 18, 2015
@bradrydzewski
Copy link

We've improved the ability to setup and register remote agents with Drone. This can now be done with docker-machine and the drone node add command. See http://readme.drone.io/cli/machines.html

The ability to setup and teardown machines on-the-fly should be handled external to Drone using the API. We are pushing the community toward plugins and standalone services instead of building everything directly into the Drone binary.

@cleeland
Copy link

cleeland commented Aug 9, 2016

I notice this was closed, but I find no evidence in the docs that the feature is actually available. Further, when I try to chase links like http://readme.drone.io/cli/machines.html I get a "403"..."denied".

What am I missing?

@donny-dont
Copy link

@cleeland in 0.4 @bradrydzewski added a way to easily add in nodes as adding them in the UI was an error prone process. I don't believe the process was ever documented.

In 0.5 the agents register themselves as 0.4 was a push queue and 0.5 is a pull queue.

@bradrydzewski
Copy link

the 0.4 docs are archived at https://github.com/drone/drone/blob/v0.4.0/docs/cli/machines.md

@cleeland
Copy link

cleeland commented Aug 9, 2016

Excellent. Thanks!

@ozbillwang
Copy link

ozbillwang commented Aug 16, 2017

I have the same request and hope to get some updates and suggestions for this topic.

First I agree we shouldn't let drone to take care of scaling.

From the exist discussion and documents, I think both drone servers and agents are stateless (if I am wrong, please correct me, because below statements are based on it). Data are saved into database. By the way, drone's database is mysql, I immediately switch it to AWS Aurora with better performance.

(Use aws as sample)

Because drone servers and agents are stateless, we can scale up and down any time with cloudwatch metrics

With below metrics I can collect I should be easily scale up/down drone servers and agents.

Docker services: AWS ECS, Kubernetes or others.

Drone servers

  1. Metric detect on Elastic load balancer (ELB)'s ActiveConnectionCount, if more than a threshold, scale up a new drone server.

Drone agents

  1. export drone queue counts regularly to Cloudwatch as a metric (for example, name is drone_queue_count)
  2. Scale up and down agents when queue is more than a threshold, for example 2.

My questions here:

  1. Will Drone support Application load balancer (ALB, http/https only), so agents can communicate via API through HTTP/HTTPS, not TCP?

Updates:

Saw this creaking change in 0.8.0-rc.3 (Switch the agent server protocol to grpc. See #2065.). Since AWS ALB doesn't support gRPC, so we have to use ELB.

@tboerger
Copy link

Currently the server is not directly stateless, the queue is still processed in memory, but there are ongoing efforts to outsource it to queues provided by cloud providers

@rmoriz
Copy link

rmoriz commented Feb 26, 2019

Is this information still correct? No persistence of queue? No zero loss restart/redeployment of master?

@bradrydzewski
Copy link

bradrydzewski commented Feb 26, 2019

nope, most of the comments in this thread are very outdated. In drone 1.0, the queue is persisted and you can take server down while builds are running as long as you are using agents.

@JanBerktold
Copy link

@bradrydzewski Is this queue on the server or in the database? Can we run several drone servers fronting a single database?

@bradrydzewski
Copy link

bradrydzewski commented Apr 24, 2019

Is this queue on the server or in the database? Can we run several drone servers fronting a single database?

It is not currently possible to cluster multiple servers in front of a single Drone database. Can you describe more of your use case? This thread may also be relevant #756 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants