Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin Idea: Dynamic temporary virtual machine workers via DigitalOcean #675

Open
kfatehi opened this issue Dec 16, 2014 · 12 comments
Open
Assignees
Labels

Comments

@kfatehi
Copy link
Member

kfatehi commented Dec 16, 2014

A new Runner plugin that is most similar to the docker runner except that instead of creating a docker container, it uses the digitalocean API to spin up a server, wait for IP and shell access, and then unblock and allow things to happen via SSH as normal.

https://github.com/keyvanfatehi/saasbox-app/blob/master/src/workers/instance_provisioner/index.js#L35-L70

@kfatehi kfatehi added the plugin label Dec 16, 2014
@kfatehi kfatehi self-assigned this Dec 16, 2014
@knownasilya
Copy link
Member

Awesome idea! I would love to tackle this, since I use DO for my personal projects.

@garymcleanhall
Copy link

@knownasilya If you want any help, let me know and I'd like to contribute.

@knownasilya
Copy link
Member

@garymcleanhall I'm limited on time right now, so feel free to start 👍.

Submit your PR's here: https://github.com/Strider-CD/strider-do-runner

If you plan on contributing, we can add you as a contributor there.

@niallo
Copy link
Member

niallo commented Dec 19, 2014

Just something to think about (more general than just DO) is how to manage usage-based scaling (or auto-scaling) of worker boxes.

This is basically a must-have for larger shops, where job load can spike during the day and then is fairly idle overnight.

Any thoughts?

@kfatehi
Copy link
Member Author

kfatehi commented Dec 19, 2014

Niall I was thinking that parallel jobs trigger parallel VM's like how the
Docker runner works. You still incur the same cost, just in a shorter
period of time compared to serial. When a job completes the VM is
destroyed. The admin must design a template for VM, indicating
RAM/CPU/Price, etc. Not sure if i understood your concern fully though
On Fri, Dec 19, 2014 at 10:51 AM niallo notifications@github.com wrote:

Just something to think about (more general than just DO) is how to manage
usage-based scaling (or auto-scaling) of worker boxes.

This is basically a must-have for larger shops, where job load can spike
during the day and then is fairly idle overnight.

Any thoughts?


Reply to this email directly or view it on GitHub
#675 (comment).

@niallo
Copy link
Member

niallo commented Dec 19, 2014

Sure, I understand. This is great.

But booting VMs has a much greater overhead compared with spinning Docker containers.

It might take 5-15 minutes for them to be ready for jobs. Maybe on DO boxes come up a lot faster, but certainly AWS can easily take 15 minutes. And even if booting is super fast on DO, it's quite likely you'll have an expensive Puppet (or whatever) setup procedure.

Therefore, you want to have a way to keep them around while load is high. Otherwise you keep taking the startup overhead.

@kfatehi
Copy link
Member Author

kfatehi commented Dec 19, 2014

Actually, I see, since each run costs money, a shop may not find it worth
it to run every build that accumulates during a spike.
On Fri, Dec 19, 2014 at 10:55 AM Keyvan Fatehi keyvanfatehi@gmail.com
wrote:

Niall I was thinking that parallel jobs trigger parallel VM's like how the
Docker runner works. You still incur the same cost, just in a shorter
period of time compared to serial. When a job completes the VM is
destroyed. The admin must design a template for VM, indicating
RAM/CPU/Price, etc. Not sure if i understood your concern fully though
On Fri, Dec 19, 2014 at 10:51 AM niallo notifications@github.com wrote:

Just something to think about (more general than just DO) is how to
manage usage-based scaling (or auto-scaling) of worker boxes.

This is basically a must-have for larger shops, where job load can spike
during the day and then is fairly idle overnight.

Any thoughts?


Reply to this email directly or view it on GitHub
#675 (comment).

@niallo
Copy link
Member

niallo commented Dec 19, 2014

Yes, that's another issue - on some providers (EC2 being the big one) you pay a minimum of one hour per box.

@kfatehi
Copy link
Member Author

kfatehi commented Dec 19, 2014

Right, forgot about that. DO is also 1 hour per box. Perhaps that should be
built into the plugin then as a baseline for reuse...
On Fri, Dec 19, 2014 at 10:59 AM niallo notifications@github.com wrote:

Sure, I understand. This is great.

But booting VMs has a much greater overhead compared with spinning Docker
containers.

It might take 5-15 minutes for them to be ready for jobs. Maybe on DO
boxes come up a lot faster, but certainly AWS can easily take 15 minutes.
And even if booting is super fast on DO, it's quite likely you'll have an
expensive Puppet (or whatever) setup procedure.

Therefore, you want to have a way to keep them around while load is high.
Otherwise you keep taking the startup overhead.


Reply to this email directly or view it on GitHub
#675 (comment).

@garymcleanhall
Copy link

Atlassian Bamboo, which uses AWS, takes 15 mins to spin up a box and its JVM to run. It allows you to configure it to keep a box around for a while, so something similar to that makes sense.

Provisioning is the question mark for me. You can use the DO API to spin a new box up, but you need puppet/chef to then install something meaningful on it. For chef, it might involve making the Strider server a knife workstation so that it can create a droplet, bootstrap it and the provision it (using chef-zero or a specified chef server)?

@niallo
Copy link
Member

niallo commented Dec 19, 2014

@garymcleanhall Right. Some provisioning step needed. I think this should be a generic shell script, with some sane minimal defaults.

Once Node is on the target machine, Strider can send its own code over SSH to be executed by the worker.

@kfatehi
Copy link
Member Author

kfatehi commented Dec 19, 2014

I want to target DO first because:

  1. VM spin up in 55 seconds
  2. VM can be spun up from a saved "image" which is preserved for you on
    your DO account at no charge
  3. The API allows saving a machine to an image, starting a machine from an
    image, everything we need.

Then I would target OpenStack... Not AWS, someone else can do that, I stay
away from it. Also prior to the blind reuse of my existing work with DO API
it is worth investigating pkgcloud. If pkgcloud supports the subset of
functionality we require then one can pick any cloud. It just seems
wiser to focus on DO first.

I think a user needs to be told to create an image, or the plugin guides
this process. Thereafter the image should be selected from a dropdown for
use on all future builds.

On Friday, December 19, 2014, niallo notifications@github.com wrote:

@garymcleanhall https://github.com/garymcleanhall Right. Some
provisioning step needed. I think this should be a generic shell script,
with some sane defaults.

Once Node is on the target machine, Strider can send its own code over SSH
to be executed by the worker.


Reply to this email directly or view it on GitHub
#675 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants