Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: Make Docker wrapper for BOINC tasks #5512

Open
1 of 2 tasks
makeasnek opened this issue Feb 4, 2024 · 7 comments
Open
1 of 2 tasks

Idea: Make Docker wrapper for BOINC tasks #5512

makeasnek opened this issue Feb 4, 2024 · 7 comments

Comments

@makeasnek
Copy link
Member

makeasnek commented Feb 4, 2024

Background

Docker is a containerization format which has become quite popular in recent years, particularly in the development of AI models. It solves many of the same problems Virtualbox does but with some key advantages:

  • Overhead is less than Virtualbox as Docker can run at near native OS speed. This means less memory usage, lower disk space, and faster performance.
  • Access to host system GPU. This access can be shared among multiple docker containers (multiple workunits).
  • Gives BOINC project admins access to vast DockerHub library of containers pre-built with science applications. Can make packaging an app not needed at all or can massively simplify packaging.
  • Project admins could update their app without needing to change any settings on BOINC server or make a new app version. This is because Docker files can automatically pull in content from a Github etc.
  • Avoid many Virtualbox-specific headaches
  • A single docker compose file can pull in existing docker images and dependencies. For example, you can pull in a docker image which contains the version of Python your workunit runs best with. This pull can be done from the client side, saving bandwidth for BOINC projects. This makes packaging much simpler than Virtualbox.

Even if Docker cannot be stably run on BOINC Windows clients, it may still be attractive to BOINC projects targeting Linux/MacOS clients due to simplifying packaging for complex apps.

The major downside with Docker is that if installed on Windows machines, it interferes with Virtualbox and forces Virtualbox to run in "emulation" mode which is significantly slower.

SCI has volunteered to pay a bounty out to a developer who can implement a docker plan class, once a proper spec is hashed out.

WSL vs Docker vs Virtualbox

WSL, Docker, and Virtualbox all offer similar functionalities with different tradeoffs in terms of performance and usability.

Virtualbox runs an entire virtual machine/operating system. This means you can package up an entire Linux OS into a single image file and run it on any platform Virtualbox supports.

WSL and Docker are both more lightweight than Virtualbox. Instead of running an entire OS, they run just the Linux kernel and any libraries you add. This achieves near native runtime speed and enables you run any Linux application in Windows. In Docker, you can share libraries and between different containers, potentially reducing disk space usage.

Here's how that looks as a diagram. From this site.
image

The only situation where Virtualbox can do something Docker can't is when:

  • You need to fully emulate an OS including emulating the hardware. Otherwise, even if you need to emulate an entire OS, Docker is faster and more lightweight at best and the same at worst. I'm not sure if BOINC even supports this full "emulation mode", on some test installs when emulation mode was enabled, BOINC event log shows virtualization ability is not detected. But I haven't thoroughly tested this. This could conceivably be an important feature if a scientist wants to "boincify" an app which expects a particular hardware environment.
  • Your app needs to be running in a non-Linux environment. Docker can't run an image of a windows installation or OS X installation whereas vbox can. However, for licensing reasons, I doubt any BOINC project would ever use this feature.
  • Your security environment requires strict sandboxing. While docker offers some sandboxing ability, it is not as comprehensive as a virtual machine.

Docker also supports checkpointing, running multiple instances off a base image, etc.

Integration options

There are several ways BOINC can utilize these tools to deploy WUs to end clients

Deploy WSL2 distributions natively

This will get you performance gains vs virtualbox, but WSL2 images are only useful on the Windows OS.

Deploy Docker containers natively

  • This gets you performance gains vs virtualbox
  • Will run on all platforms (natively on Linux/OSX, via WSL2 on Windows)

Deploy Docker containers inside Virtualbox images

  • You get to keep Docker's ease of administration/packaging, which is useful for BOINC project admins
  • You lose out on Docker's performance gains vs Virtualbox and this will be slower than running just in Virtualbox
  • You lose out on Docker's GPU abilities
  • This is how boinc2docker works.

Deploy Docker containers inside a WSL2 image

  • You lose cross-platform compatibility since WSL2 image can only be run on Windows

What BOINC needs to know

If we want the BOINC client to be "docker un-aware" and to be able to deploy docker images to clients which haven't updated to a recent version, a wrapper would need to be written which the BOINC client could call just like any other application. The same would be true if we just used WSL2 images or deployed docker inside WSL2. The wrapper could support existing BOINC hooks for starting, suspending, etc.

The primary downsides of having each BOINC WU contain a wrapper are:

  • Wrapper is essentially a duplicate file included with every WU. This is not a huge problem.
  • Since the wrapper can only know about its individual workunit, it can't be "aware" of other docker containers from other WUs. This would lead to downloading the same docker images multiple times etc.

The upside to doing things this way is not needing to modify the core BOINC client and being able to deploy docker containers to hosts with older BOINC versions. Though that second benefit is debatable since those clients likely don't have WSL2/docker installed.

Having BOINC be "docker aware" might be more complex, but gives BOINC a better high-level overview and control over the docker containers running on a client machine.

Existing work:

  • Marius has written boinc2docker where you can send docker workunits to BOINC clients. This works by using BOINC's existing plan class for Virtualbox and having docker run inside the virtualbox environment.
  • Old Github issue Run tasks natively on Docker #1620

Known problems/open questions/tasks:

WSL, Docker, and Virtualbox Conflict

Docker runs in WSL2. While it did run in WSL1 at some point in time, I'm not sure if this is still possible, but WSL2 is the suggested WSL to use with Docker.

WSL can be tricky to run alongside Virtualbox, but it can be done. The downside is that if you have WSL2 installed, Virtualbox runs in emulation mode which is a significant performance degradation.

List of popular operating systems and their compatibility of VirtualBox/Docker.

Legend:
✅ - yes
✅✅ - yes, and we have actually tested this

OS Can be vbox host? Can be docker host? Can run docker and vbox without reboot? Can run rocker and vbox at same time?
Windows 10 Home ✅✅ ✅✅ ✅✅*1 ✅✅*1
Windows 10 Pro ✅✅ ✅✅ ✅✅*1 ✅✅*1
Windows 11 Home ✅*1
Windows 11 Pro ✅*1
MacOS 12-14
Linux ✅✅ ✅✅

Notes:

  1. Yes, but with significantly degraded performance for vbox, this is due to "software emulation" mode being required

Available solutions to WSL2/Virtualbox conflict

  • If user doesn't have VirtualBox or WSL2 installed, at BOINC installation, BOINC can default to whichever one it prefers. BOINC can preferentially run Docker containers inside Virtualbox if it wants or run Virtualbox images inside a Docker container with WSL2. Note that running docker containers inside vbox deprives them of GPU access.
  • If the user does have Vbox installed, leave everything at it is, wrap docker WUs in a Virtualbox image and launch them that way. Vice versa for WSL2. Advanced users can always modify their machine settings for extra crunching performance, BOINC could throw up a warning letting them know of this option, something along the lines of "Your project xyz uses Docker. You can improve performance by enabling WSL2, click to enable".
  • Don't deploy docker workunits to Windows machines at all. This avoids the conflict while still giving Docker's ease of administration/packaging and performance benefits to BOINC projects which can benefit from it.
  • We could also wait. It sounds like Oracle is working on this problem, so perhaps this conflict will resolve itself in time. I don't know how likely that is to happen, but it's an option.

Other tasks/open questions

  • How can docker interact with BOINC scheduler? Can we get it to respect CPU/Memory/etc limits set by the user? Docker does have a built-in tool for reporting cpu and memory usage. Network usage could be measured using standard os tools for this purpose as each docker container is just a regular process.
  • Survey BOINC project admins about interest in Docker. Would they find it beneficial? What features are important? How would they use it?

Feature layout

  • Give BOINC project admins to create a docker-based workunit by referencing an existing image on docker hub or their own docker compose file
  • Docker on Windows uses WSL. There are some problems with having WSL and Virtualbox on the same machine. If this conflict could be automatically detected by a BOINC client, based on client environment, it could automatically wrap a docker workunit with Virtualbox wrapper to avoid issues.
  • BOINC client would call this new docker wrapper and know nothing about docker. It would run the executable like any other executable, just like it currently does for the Virtualbox wrapper.
  • Wrapper would run docker make command to download appropriate docker images from docker hub, github, build them
  • Wrapper will need a mechanism to remove old docker containers and volumes which are no longer in use
  • Wrapper will need to know if a docker container is "sticky" or not so it can preserve it for future workunits
  • Wrapper will need to know the size of a docker container to add to its disk space calculations to respect user preferences
  • Wrapper will need a mechanism to start, stop, pause, and resume a docker container and the app contained inside it
  • Wrapper will need to report back output to the BOINC server such as: app output & logs from docker
  • Wrapper client will need to monitor bandwidth used by the docker compose command which can be quite significant. Or the config on the server side needs to provide that estimate for the end user so user network access preferences can be respected.
  • BOINC installer would need to include option to install WSL2 & Docker. It should warn users that it will make Virtualbox slower, if they have Virtualbox installed on their machine
  • Running docker containers on Linux requires root unless the boinc process is in the docker user group. A check needs to be written for this and the BOINC Linux packaging will need to be updated to give the boinc user appropriate permissions
@davidpanderson
Copy link
Contributor

There are various little confusions here;
e.g. I don't think the client should know anything about Docker.
But this is a good starting point for discussion.

@makeasnek
Copy link
Member Author

There are various little confusions here; e.g. I don't think the client should know anything about Docker. But this is a good starting point for discussion.

Why shouldn't BOINC client know anything about docker? What is the alternative?

@AenBleidd
Copy link
Member

There are various little confusions here; e.g. I don't think the client should know anything about Docker. But this is a good starting point for discussion.

Why shouldn't BOINC client know anything about docker? What is the alternative?

BOINC client knows nothing about VitrualBox either, and all the communication is done via the wrapper.
In this way client if more flexible and more generic.

@makeasnek
Copy link
Member Author

There are various little confusions here; e.g. I don't think the client should know anything about Docker. But this is a good starting point for discussion.

Why shouldn't BOINC client know anything about docker? What is the alternative?

BOINC client knows nothing about VitrualBox either, and all the communication is done via the wrapper. In this way client if more flexible and more generic.

Aah ok I was confused on wording. I'll update the post to indicate the wrapper should know about these things, not BOINC client itself.

@davidpanderson
Copy link
Contributor

Let's not confuse 'Docker Desktop' with Docker.
You don't need Docker Desktop to run Docker containers on Windows.
You can - for example - run them in VBox VMs (as in boinc2docker), or in WSL.

An app version can contain lots of files (which are by default persistent).
Some of these can be Docker layers.

So I think this discussion is about efficiency (runtime, file sizes etc.)
rather than capabilities.
We can already run Docker apps on all platforms;
it would be nice to make it more efficient.
In particular, I'd think in terms of a WSL wrapper, not a Docker wrapper.

@makeasnek
Copy link
Member Author

makeasnek commented Feb 18, 2024

I did some testing on this on a Windows 10 machine starting from a fresh Windows install for Windows 10 Home and Windows 10 Pro. I haven't tested Windows 11 yet but I will hopefully have a machine to test it on later this week. I expect the results will be similar as I haven't read anything to indicate they'd be different. Also updated the main issue with some new info.

Test procedure:

  1. Install fresh copy of Windows
  2. Verify Virtualbox runs normally in native execution mode (AMD-vtx etc)
  3. Install Docker
  4. Check if Virtualbox runs in a new mode
  5. Undo installation, verify Virtualbox can run natively again
  6. Install hyper-v and see if that changes Virtualbox mode.

In both cases, after installation of Docker/WSL2, Virtualbox runs in slower "turtle"/emulation mode. In Windows 10 pro, the docker installation offers to use WSL2 or Hyper-V. I tried both methods, both of which resulted in turtle mode.

@makeasnek
Copy link
Member Author

Added some updates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

No branches or pull requests

3 participants