Support LXD as a backend driver #1533

townsend2010 · 2020-05-13T20:23:22Z

This allows Multipass to communicate with the LXD daemon in order to create and mange instances.

List of TODO items needed for the next iteration:

Additional test coverage.
See if it's possible to mock NetworkAccessManager and LocalSocketReply instead of using MockLocalSocketServer.
Figure out a way to use a single NetworkAccessManager across LXDVirtualMachineFactory and LXDVMImageVault.
Overhaul LXDVirtualMachineFactory::hypervisor_health_check() including detecting if LXD socket is not available, etc.
Determine how best to handle the cloud-init YAML config. Do we store it somewhere and re-populate upon daemon start so VirtualMachineDesription contains the values? Do we just create it once and let LXDVirtualMachineFactory configure it and then forget about it since LXD has it stored?
Re-add back in lxd-request() the ability to wait on operations to complete.
Fix LXDVirtualMachine to wait on an instance to be created instead of the clunky while loop w/ sleep().
Determine if LXD itself can properly update and prune source images and potentially refactor the VMImageVault's update_images() and prune_expired_images and associated timer.
Determine if the source part of the LXD initialization can be modified to not explicitly call out the remote since the image should already be in the local repository.
Allow importing images from the "custom" image host and http/file based launches.

Saviq · 2020-05-14T11:03:34Z

In file included from /root/parts/multipass/src/src/platform/backends/lxd/lxd_vm_image_vault.cpp:18:
/root/parts/multipass/src/src/platform/backends/lxd/lxd_vm_image_vault.h:51:26: error: private field 'url_downloader' is not used [-Werror,-Wunused-private-field]
    URLDownloader* const url_downloader;
                         ^
1 error generated.

townsend2010 · 2020-05-14T12:08:31Z

@Saviq, yep, saw that yesterday and will fix today along with some other things 😀

codecov · 2020-05-14T14:47:51Z

Codecov Report

Merging #1533 into master will decrease coverage by 0.47%.
The diff coverage is 65.32%.

@@            Coverage Diff             @@
##           master    #1533      +/-   ##
==========================================
- Coverage   74.47%   73.99%   -0.48%     
==========================================
  Files         215      222       +7     
  Lines        7785     8138     +353     
==========================================
+ Hits         5798     6022     +224     
- Misses       1987     2116     +129

Impacted Files	Coverage Δ
include/multipass/simple_streams_manifest.h	`100.00% <ø> (ø)`
include/multipass/utils.h	`69.23% <ø> (+7.69%)`	⬆️
include/multipass/virtual_machine_description.h	`100.00% <ø> (ø)`
include/multipass/vm_image.h	`100.00% <ø> (ø)`
include/multipass/vm_image_info.h	`100.00% <ø> (ø)`
src/daemon/daemon_config.cpp	`60.27% <0.00%> (+0.81%)`	⬆️
src/platform/backends/lxd/lxd_vm_image_vault.cpp	`0.00% <0.00%> (ø)`
src/platform/backends/lxd/lxd_vm_image_vault.h	`0.00% <0.00%> (ø)`
src/platform/platform_linux.cpp	`59.67% <40.00%> (-6.99%)`	⬇️
src/daemon/daemon.cpp	`31.25% <57.14%> (-0.31%)`	⬇️
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6f7a41e...686d2cd. Read the comment docs.

Saviq

Need a lxd plug, don't we?

include/multipass/fetch_type.h

Saviq · 2020-05-26T13:51:35Z

src/daemon/daemon.cpp

+                                              {},
+                                              {},
+                                              {}};


I wonder if we should store and populate this…

Maybe, but as you just mentioned, perhaps the image vault should configure it? I'll need to think about it some.

src/daemon/daemon.cpp

src/daemon/default_vm_image_vault.cpp

src/platform/backends/lxd/lxd_request.cpp

src/platform/backends/lxd/lxd_virtual_machine.cpp

Saviq · 2020-05-26T14:00:03Z

src/platform/backends/lxd/lxd_virtual_machine.cpp

+        if (json_reply["metadata"].toObject()["class"] == QStringLiteral("task") &&
+            json_reply["status_code"].toInt(-1) == 100)
+        {
+            QUrl task_url(QString("%1/operations/%2")
+                              .arg(base_url.toString())
+                              .arg(json_reply["metadata"].toObject()["id"].toString()));
+
+            // Instead of polling, need to use websockets to get events
+            while (true)
+            {
+                try
+                {
+                    auto task_reply = lxd_request(manager, "GET", task_url);
+
+                    if (task_reply["error_code"].toInt(-1) != 0)
+                    {
+                        break;
+                    }
+
+                    auto status_code = task_reply["metadata"].toObject()["status_code"].toInt(-1);
+                    if (status_code == 200)
+                    {
+                        break;
+                    }
+                    else
+                    {
+                        std::this_thread::sleep_for(5s);
+                    }
+                }
+                // Implies the task is finished
+                catch (const LXDNotFoundException& e)
+                {
+                    break;
+                }
+            }
+        }


Should this go into lxd_request?

I kind of think the caller of lxd_request() should determine how it waits on an operation to finish. Some callers may use the operations/blah/wait method while others may use a polling (and later websockets) method.

Sure, it would have to be the caller's decision whether to let lxd_request() handle the async operation or not. I'm kinda thinking ::waitForFinished() style. Except it's not an object, I know ;)

Well, still, the caller should determine how it waits on an operation to complete. I suppose we could have a default wait logic in lxd_request and if the caller doesn't want or need to use that, then it can pass in 0 for the timeout and lxd_request will just return the json response from LXD. WDYT?

That's roughly what I had in mind. Maybe not inferred from timeout == 0 but rather something explicit like bool wait. Ideally something that resembles something already existing :)

Hmm, ok, I'll basically put it back to how you originally had it 😁

src/platform/backends/lxd/lxd_virtual_machine.cpp

Saviq · 2020-05-26T14:03:14Z

src/platform/backends/lxd/lxd_virtual_machine.cpp

+            QUrl task_url(QString("%1/operations/%2/wait")
+                              .arg(base_url.toString())
+                              .arg(state_task["metadata"].toObject()["id"].toString()));
+            lxd_request(manager, "GET", task_url);


Couldn't use that instead of the above poll? Or will it time out?

I'm not sure if you're referring to using wait here or the polling method.

Yes, using wait, why not use it above?

Hmm, right, I think using a wait with no or a long-ish timeout above would be just as effective. I observed that LXD can take quite a long time uncompressing an image for the first time.

Saviq

Need to prune multipass find.

src/platform/backends/lxd/lxd_virtual_machine.cpp

src/platform/backends/lxd/lxd_virtual_machine_factory.cpp

Saviq · 2020-05-26T14:09:56Z

src/platform/backends/lxd/lxd_virtual_machine_factory.cpp

+void mp::LXDVirtualMachineFactory::configure(const std::string& /* name */, YAML::Node& /* meta_config */,
+                                             YAML::Node& /* user_config */)
+{
+}


Hmm should we actually configure cloud init through here rather than shipping it with the vm description?

Hmm, I'll look into that.

src/platform/backends/lxd/lxd_vm_image_vault.cpp

Saviq · 2020-05-26T14:16:46Z

src/platform/backends/lxd/lxd_vm_image_vault.cpp

+} // namespace
+
+mp::LXDVMImageVault::LXDVMImageVault(std::vector<VMImageHost*> image_hosts, const QUrl& base_url)
+    : image_hosts{image_hosts}, base_url{base_url}, manager{std::make_unique<mp::NetworkAccessManager>()}


{C,Sh}ould the manager not be shared?

I think the only way to make that work and still be generic enough for all image vault & backend driver cases would be to refactor/expand URLDownloader to use the new NetworkManager class.

I just mean that, ideally, we'd have a shared manager for everything LXD… Not saying that URLDownloader should, too.

My rationale being that we could then set it up once (especially with SSL and authentication) and share it through all LXD components. Also saving memories.

The problem I'm seeing here is that DaemonConfig is responsible for setting up the image vault and setting up the backend driver and does that as two separate steps that are only coupled by URLDownloader. In order for LXDVMImageVault and LXDVirtualMachineFactory to share a NetworkAccessManager object would be to somehow relate those two, but do it in a way that DaemonConfig does not have to account for backend and image host differences. I think we want DaemonConfig to be a generic as possible and we achieve that via the platform calls, but still, those platform calls do not couple the image vault and the backed driver.

If we don't go the URLDownloader way, then we'd need to refactor the VirtualMachineFactory to accept the VMImageVault object (at least a pointer to it) and then each factory and image vault can determine their relationship to one another. Some would just ignore each other like now and we'd basically be forcing it hold at least a pointer to the vault and others like LXD would have a fairly coupled relationship like sharing the NetworkAcessManager, etc.

I'll think about it some more, but I don't think it will be trivial.

Bear in mind none of the comments I made here are my requirement, or stuff to be fixed in this PR. It's just driving my thinking about the architecture, since this is pushing what we had in place.

Sure, and I totally agree with the idea of being able to couple the image vault and vm factory when needed.

townsend2010 · 2020-05-26T14:38:26Z

@Saviq,

Thanks for the first review.

Need a lxd plug, don't we?

Yep

Need to prune multipass find.

I'm pretty sure I already did that for cloud images. I know I have not accounted for our custom repository yet. Is this what you are seeing?

Saviq

That's it for now, it does work fine, though :)

src/platform/backends/lxd/lxd_vm_image_vault.cpp

Saviq · 2020-05-26T14:31:15Z

src/platform/backends/lxd/lxd_vm_image_vault.cpp

+void mp::LXDVMImageVault::prune_expired_images()
+{
+}
+
+void mp::LXDVMImageVault::update_images(const FetchType& fetch_type, const PrepareAction& prepare,
+                                        const ProgressMonitor& monitor)
+{
+}


These are TODO, right? We should definitely prune old images, and {c,sh}ould we drive the update some way?

They might be TODO. LXD can also handle this in and of itself, so should we config LXD to do this for us or we drive it?

If we can make it DTRT without our involvement, probably best.

Ok, then I will look into that and failing being able to have LXD control all of that, I'll implement it here.

src/platform/platform_linux.cpp

src/simplestreams/simple_streams_manifest.cpp

Saviq · 2020-05-26T14:41:16Z

tests/lxd/test_lxd_backend.cpp

+{
+    LXDBackend()
+        : socket_path{QString("%1/test_socket").arg(data_dir.path())},
+          test_server{mpt::MockLocalSocketServer(socket_path)},


Should we not test this on the access manager level?

We already unit test the access manager itself in other tests, so I think mocking it as high up as possible is the best way to go for testing the backend. I sense you disagree with this approach, so am I missing some benefits if we use the access manger as well?

I mean to mock the access manager, rather than talking through a test socket (which was the correct thing to do for testing the manager itself).

Unless I'm misunderstanding what MockLocalSocketServer does?

Duh on my end. I will do that to simplify this.

Looking at this a bit more, what will need to happen is to make LocalAccessManager a non-final class and then create a mock LocalAccessManager class and have it respond how we want. Then we'll need to modify LXDVirtualMachine and LXDVMImageVault to accept a LocalAccessManager object (ie, the mocked object) and not create one in the ctor.

Then we can get rid of the MockLocalSocketServer and just put that logic back into the test_local_network_access_manager.cpp file.

Even some more diving into this and really, doing it the current way where we talk over a local socket to a "mocked" server is the most straightforward way to go. If we went the route of mocking NetworkAccessManager itself, we'll basically need to reimplement that along with LocalSocketReply and that doesn't really make much sense. I think leaving it the way it is is the right thing to do in order not to introduce much more complexity in testing for the sake of not using a temporary local socket.

- Don't return const variable. - Use current_state() for getting and saving state upon initialization.

Co-authored-by: Michał Sawicz <michal.sawicz@canonical.com>

- Add `lxd` snap plug. - Remove "None" fetch type since there is now the LXDVMImageVault implementation. - Make some logging messages trace level.

- This will create a "multipass" project and define a default profile that uses a Multipass specific network bridge.

- Also, do not allow URL based launches.

Saviq · 2020-06-01T12:10:13Z

We'll need to handle this on startup:

cze 01 12:48:46 michal-laptop multipassd[767819]: Requesting LXD: GET unix:///var/snap/lxd/common/lxd/unix.socket@1.0/projects/multipass
cze 01 12:48:46 michal-laptop multipassd[767819]: Caught an unhandled exception: Cannot connect to /var/snap/lxd/common/lxd/unix.socket: 0
cze 01 12:48:46 michal-laptop systemd[1]: snap.multipass.multipassd.service: Main process exited, code=exited, status=1/FAILURE

townsend2010 · 2020-06-01T12:27:29Z

@Saviq,

We'll need to handle this on startup:

Yes, the health check needs a complete overhaul. Is that something that needs to be in a alpha/preview/experimental release of this?

Saviq · 2020-06-01T12:36:55Z

Is that something that needs to be in a alpha/preview/experimental release of this?

It's probably ok for now.

townsend2010 · 2020-06-01T12:44:36Z

@Saviq,

Ack, then I'll put this on my list of TODO's for the next release of this 😁

multipass-ci-bot · 2020-06-01T16:43:31Z

macOS build available: multipass-1.3.0-dev.269.pr1533+g056f1874.mac-Darwin.pkg
Snap build available: snap refresh multipass --channel edge/pr1533

Saviq · 2020-06-01T17:56:37Z

I say:

bors merge

bors · 2020-06-01T18:29:36Z

Build failed:

continuous-integration/travis-ci/push

1533: Support LXD as a backend driver r=Saviq a=townsend2010 This allows Multipass to communicate with the LXD daemon in order to create and mange instances. Co-authored-by: Michał Sawicz <michal@sawicz.net> Co-authored-by: Chris Townsend <christopher.townsend@canonical.com>

townsend2010 marked this pull request as draft May 13, 2020 20:23

This comment has been minimized.

Sign in to view

townsend2010 force-pushed the add-lxd-driver-support branch 2 times, most recently from 06dd5bb to 8ae5c74 Compare May 14, 2020 13:00

This comment has been minimized.

Sign in to view

townsend2010 mentioned this pull request May 18, 2020

[network] Implement Qt classes to communicate with a Unix socket #1519

Merged

townsend2010 force-pushed the add-lxd-driver-support branch 3 times, most recently from 44bb6dd to e2ab23e Compare May 20, 2020 16:02

This comment has been minimized.

Sign in to view

townsend2010 force-pushed the add-lxd-driver-support branch from e2ab23e to c217ea0 Compare May 20, 2020 20:06

This comment has been minimized.

Sign in to view

townsend2010 force-pushed the add-lxd-driver-support branch from c217ea0 to 9f25d64 Compare May 21, 2020 20:37

This comment has been minimized.

Sign in to view

townsend2010 force-pushed the add-lxd-driver-support branch from 9f25d64 to 88f6e9d Compare May 22, 2020 12:33

This comment has been minimized.

Sign in to view

townsend2010 force-pushed the add-lxd-driver-support branch from 88f6e9d to 81b3a82 Compare May 22, 2020 16:14

This comment has been minimized.

Sign in to view

townsend2010 force-pushed the add-lxd-driver-support branch from 81b3a82 to dc742c9 Compare May 22, 2020 17:06

This comment has been minimized.

Sign in to view

townsend2010 force-pushed the add-lxd-driver-support branch from dc742c9 to c86c490 Compare May 22, 2020 18:47

This comment has been minimized.

Sign in to view

Saviq requested changes May 26, 2020

View reviewed changes

This comment has been minimized.

Sign in to view

Chris Townsend and others added 11 commits May 29, 2020 12:19

[tests] Add SimpleStreams manifest test for LXD support

a691e2e

[lxd] Misc changes to LXDVirtualMachine impl

821a34d

- Don't return const variable. - Use current_state() for getting and saving state upon initialization.

[lxd] Default to using 10GB for disk size if less than that is requested

aa799cc

Apply suggestions from code review

832cec5

Co-authored-by: Michał Sawicz <michal.sawicz@canonical.com>

[lxd] More changes based on review feedback

e8de27e

- Add `lxd` snap plug. - Remove "None" fetch type since there is now the LXDVMImageVault implementation. - Make some logging messages trace level.

[network] Fix parsing of URL to account for queries

23d1658

[lxd] Add support for LXD projects

a71a569

- This will create a "multipass" project and define a default profile that uses a Multipass specific network bridge.

[lxd] Only support the "release" and "daily" remotes for launching

28e2a32

- Also, do not allow URL based launches.

[snap] add lxd to plugs.json

07aae42

[lxd] fix TODO typos

8551587

[daemon] Initialize query with default values to avoid garbage data

04d19ac

This comment has been minimized.

Sign in to view

townsend2010 force-pushed the add-lxd-driver-support branch from 08df4be to 5f90d8e Compare May 29, 2020 19:31

This comment has been minimized.

Sign in to view

Chris Townsend added 2 commits June 1, 2020 11:53

[image vault/lxd] Only allow supported remotes

d19ccaf

[image vault/lxd] Find image info for an already existing instance

686d2cd

townsend2010 force-pushed the add-lxd-driver-support branch from 5f90d8e to 686d2cd Compare June 1, 2020 15:53

Saviq merged commit 3c66cfd into master Jun 1, 2020

bors bot deleted the add-lxd-driver-support branch June 1, 2020 19:05

Saviq added this to the v1.3.0 milestone Jun 5, 2020

townsend2010 mentioned this pull request Jun 10, 2020

[lxd] Additional items to implement/fix #1586

Closed

22 tasks

Saviq mentioned this pull request Jun 17, 2020

LXD containers PoC #1472

Closed

Support LXD as a backend driver #1533

Support LXD as a backend driver #1533

Conversation

townsend2010 commented May 13, 2020 • edited

This comment has been minimized.

Saviq commented May 14, 2020

townsend2010 commented May 14, 2020

This comment has been minimized.

codecov bot commented May 14, 2020 • edited

Codecov Report

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Saviq left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

townsend2010 May 26, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Saviq left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

townsend2010 commented May 26, 2020

Saviq left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Saviq May 26, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Saviq commented Jun 1, 2020

townsend2010 commented Jun 1, 2020

Saviq commented Jun 1, 2020

townsend2010 commented Jun 1, 2020

multipass-ci-bot commented Jun 1, 2020 • edited

Saviq commented Jun 1, 2020

bors bot commented Jun 1, 2020

townsend2010 commented May 13, 2020 •

edited

codecov bot commented May 14, 2020 •

edited

townsend2010 May 26, 2020 •

edited

Saviq May 26, 2020 •

edited

multipass-ci-bot commented Jun 1, 2020 •

edited