Plugin Idea: Dynamic temporary virtual machine workers via DigitalOcean #675

kfatehi · 2014-12-16T05:11:03Z

A new Runner plugin that is most similar to the docker runner except that instead of creating a docker container, it uses the digitalocean API to spin up a server, wait for IP and shell access, and then unblock and allow things to happen via SSH as normal.

https://github.com/keyvanfatehi/saasbox-app/blob/master/src/workers/instance_provisioner/index.js#L35-L70

knownasilya · 2014-12-16T12:49:58Z

Awesome idea! I would love to tackle this, since I use DO for my personal projects.

garymcleanhall · 2014-12-18T14:49:13Z

@knownasilya If you want any help, let me know and I'd like to contribute.

knownasilya · 2014-12-18T14:53:04Z

@garymcleanhall I'm limited on time right now, so feel free to start 👍.

Submit your PR's here: https://github.com/Strider-CD/strider-do-runner

If you plan on contributing, we can add you as a contributor there.

niallo · 2014-12-19T18:51:32Z

Just something to think about (more general than just DO) is how to manage usage-based scaling (or auto-scaling) of worker boxes.

This is basically a must-have for larger shops, where job load can spike during the day and then is fairly idle overnight.

Any thoughts?

kfatehi · 2014-12-19T18:55:49Z

Niall I was thinking that parallel jobs trigger parallel VM's like how the
Docker runner works. You still incur the same cost, just in a shorter
period of time compared to serial. When a job completes the VM is
destroyed. The admin must design a template for VM, indicating
RAM/CPU/Price, etc. Not sure if i understood your concern fully though
On Fri, Dec 19, 2014 at 10:51 AM niallo notifications@github.com wrote:

Just something to think about (more general than just DO) is how to manage
usage-based scaling (or auto-scaling) of worker boxes.

This is basically a must-have for larger shops, where job load can spike
during the day and then is fairly idle overnight.

Any thoughts?

—
Reply to this email directly or view it on GitHub
#675 (comment).

niallo · 2014-12-19T18:59:13Z

Sure, I understand. This is great.

But booting VMs has a much greater overhead compared with spinning Docker containers.

It might take 5-15 minutes for them to be ready for jobs. Maybe on DO boxes come up a lot faster, but certainly AWS can easily take 15 minutes. And even if booting is super fast on DO, it's quite likely you'll have an expensive Puppet (or whatever) setup procedure.

Therefore, you want to have a way to keep them around while load is high. Otherwise you keep taking the startup overhead.

kfatehi · 2014-12-19T18:59:24Z

Actually, I see, since each run costs money, a shop may not find it worth
it to run every build that accumulates during a spike.
On Fri, Dec 19, 2014 at 10:55 AM Keyvan Fatehi keyvanfatehi@gmail.com
wrote:

Niall I was thinking that parallel jobs trigger parallel VM's like how the
Docker runner works. You still incur the same cost, just in a shorter
period of time compared to serial. When a job completes the VM is
destroyed. The admin must design a template for VM, indicating
RAM/CPU/Price, etc. Not sure if i understood your concern fully though
On Fri, Dec 19, 2014 at 10:51 AM niallo notifications@github.com wrote:

Just something to think about (more general than just DO) is how to
manage usage-based scaling (or auto-scaling) of worker boxes.

This is basically a must-have for larger shops, where job load can spike
during the day and then is fairly idle overnight.

Any thoughts?

—
Reply to this email directly or view it on GitHub
#675 (comment).

niallo · 2014-12-19T19:00:09Z

Yes, that's another issue - on some providers (EC2 being the big one) you pay a minimum of one hour per box.

kfatehi · 2014-12-19T19:01:09Z

Right, forgot about that. DO is also 1 hour per box. Perhaps that should be
built into the plugin then as a baseline for reuse...
On Fri, Dec 19, 2014 at 10:59 AM niallo notifications@github.com wrote:

Sure, I understand. This is great.

But booting VMs has a much greater overhead compared with spinning Docker
containers.

It might take 5-15 minutes for them to be ready for jobs. Maybe on DO
boxes come up a lot faster, but certainly AWS can easily take 15 minutes.
And even if booting is super fast on DO, it's quite likely you'll have an
expensive Puppet (or whatever) setup procedure.

Therefore, you want to have a way to keep them around while load is high.
Otherwise you keep taking the startup overhead.

—
Reply to this email directly or view it on GitHub
#675 (comment).

garymcleanhall · 2014-12-19T20:04:44Z

Atlassian Bamboo, which uses AWS, takes 15 mins to spin up a box and its JVM to run. It allows you to configure it to keep a box around for a while, so something similar to that makes sense.

Provisioning is the question mark for me. You can use the DO API to spin a new box up, but you need puppet/chef to then install something meaningful on it. For chef, it might involve making the Strider server a knife workstation so that it can create a droplet, bootstrap it and the provision it (using chef-zero or a specified chef server)?

niallo · 2014-12-19T20:09:21Z

@garymcleanhall Right. Some provisioning step needed. I think this should be a generic shell script, with some sane minimal defaults.

Once Node is on the target machine, Strider can send its own code over SSH to be executed by the worker.

kfatehi · 2014-12-19T20:16:23Z

I want to target DO first because:

VM spin up in 55 seconds
VM can be spun up from a saved "image" which is preserved for you on
your DO account at no charge
The API allows saving a machine to an image, starting a machine from an
image, everything we need.

Then I would target OpenStack... Not AWS, someone else can do that, I stay
away from it. Also prior to the blind reuse of my existing work with DO API
it is worth investigating pkgcloud. If pkgcloud supports the subset of
functionality we require then one can pick any cloud. It just seems
wiser to focus on DO first.

I think a user needs to be told to create an image, or the plugin guides
this process. Thereafter the image should be selected from a dropdown for
use on all future builds.

On Friday, December 19, 2014, niallo notifications@github.com wrote:

@garymcleanhall https://github.com/garymcleanhall Right. Some
provisioning step needed. I think this should be a generic shell script,
with some sane defaults.

Once Node is on the target machine, Strider can send its own code over SSH
to be executed by the worker.

—
Reply to this email directly or view it on GitHub
#675 (comment).

kfatehi added the plugin label Dec 16, 2014

kfatehi self-assigned this Dec 16, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plugin Idea: Dynamic temporary virtual machine workers via DigitalOcean #675

Plugin Idea: Dynamic temporary virtual machine workers via DigitalOcean #675

kfatehi commented Dec 16, 2014

knownasilya commented Dec 16, 2014

garymcleanhall commented Dec 18, 2014

knownasilya commented Dec 18, 2014

niallo commented Dec 19, 2014

kfatehi commented Dec 19, 2014

niallo commented Dec 19, 2014

kfatehi commented Dec 19, 2014

niallo commented Dec 19, 2014

kfatehi commented Dec 19, 2014

garymcleanhall commented Dec 19, 2014

niallo commented Dec 19, 2014

kfatehi commented Dec 19, 2014

Plugin Idea: Dynamic temporary virtual machine workers via DigitalOcean #675

Plugin Idea: Dynamic temporary virtual machine workers via DigitalOcean #675

Comments

kfatehi commented Dec 16, 2014

knownasilya commented Dec 16, 2014

garymcleanhall commented Dec 18, 2014

knownasilya commented Dec 18, 2014

niallo commented Dec 19, 2014

kfatehi commented Dec 19, 2014

niallo commented Dec 19, 2014

kfatehi commented Dec 19, 2014

niallo commented Dec 19, 2014

kfatehi commented Dec 19, 2014

garymcleanhall commented Dec 19, 2014

niallo commented Dec 19, 2014

kfatehi commented Dec 19, 2014