Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scratch size customization and UVM scratch creation for WCOW snapshotter #4912

Merged
merged 1 commit into from Feb 17, 2021

Conversation

dcantah
Copy link
Member

@dcantah dcantah commented Jan 6, 2021

  • Currently we rely on making the UVMs sandbox.vhdx in the shim instead of this being made by the snapshotter. This change adds a label that affects whether to create the UVMs scratch layer in the snapshotter itself.

  • Adds container and UVM scratch size customization. Before adding the computestorage calls (vendored in with Update hcsshim and go-winio vendoring #4859) there was no way to make a containers or UVMs scratch size less than the default (20 for containers and 10 for the UVM).

Signed-off-by: Daniel Canter dcanter@microsoft.com

@k8s-ci-robot
Copy link

Hi @dcantah. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dcantah
Copy link
Member Author

dcantah commented Jan 6, 2021

@theopenlab-ci
Copy link

theopenlab-ci bot commented Jan 6, 2021

Build succeeded.

@theopenlab-ci
Copy link

theopenlab-ci bot commented Jan 8, 2021

Build succeeded.

snapshots/windows/windows.go Outdated Show resolved Hide resolved
snapshots/windows/windows.go Outdated Show resolved Hide resolved
snapshots/windows/windows.go Outdated Show resolved Hide resolved
@theopenlab-ci
Copy link

theopenlab-ci bot commented Jan 13, 2021

Build succeeded.

@ambarve
Copy link
Contributor

ambarve commented Jan 14, 2021

LGTM!

Copy link
Contributor

@anmaxvl anmaxvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

* Currently we rely on making the UVMs sandbox.vhdx in the shim itself instead of this being
made by the snapshotter itself. This change adds a label that affects whether to create the UVMs
scratch layer in the snapshotter itself.

* Adds container scratch size customization. Before adding the computestorage calls
(vendored in with containerd#4859) there was no way to make a containers
or UVMs scratch size less than the default (20 for containers and 10 for the UVM).

Signed-off-by: Daniel Canter <dcanter@microsoft.com>
@theopenlab-ci
Copy link

theopenlab-ci bot commented Jan 18, 2021

Build succeeded.

@dcantah
Copy link
Member Author

dcantah commented Jan 21, 2021

@dmcgowan @crosbymichael If anyone has time could I get some eyes on this?

@dcantah
Copy link
Member Author

dcantah commented Feb 4, 2021

@kevpar Are you able to take a look at this sometime this week? Feel like this has enough to go in

@kevpar
Copy link
Member

kevpar commented Feb 4, 2021

@kevpar Are you able to take a look at this sometime this week? Feel like this has enough to go in

Will look when I get a chance, but you also asked me to look at the ncproxy PR :)

@dcantah
Copy link
Member Author

dcantah commented Feb 4, 2021

@kevpar Are you able to take a look at this sometime this week? Feel like this has enough to go in

Will look when I get a chance, but you also asked me to look at the ncproxy PR :)

Not wrong 😆 Appreciate it

Copy link
Contributor

@jterry75 jterry75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dcantah
Copy link
Member Author

dcantah commented Feb 17, 2021

@jterry75 Lovely, the mac OS linter seems to have timed out

@dcantah
Copy link
Member Author

dcantah commented Mar 8, 2021

@TBBle, the compute storage calls are present in RS5 but looking at the code it does have a guard to throw a specific HRESULT if the API isn't present. It would be available after enabling the Containers feature (dism /online /get-featureinfo /featurename:Containers to see if it's on), but there's no way this isn't on I'd imagine?? I'll spin up a VM later today or tomorrow and verify things).

I've only tested this on 19H1, 20H1, 20H2 and my machine (insiders build) and with more of a focus on hypervisor isolated containers, although I did give process some validation on 20H1 and 20H2. The file in use failure makes perfect sense though and is known. The hives directory would be in use and I just thought of this last week and brought it up; I'm planning to fix this week.

The following scenario wouldn't work:

  1. Run container1 with mcr.microsoft.com/windows/image and no scratch settings (use the default, no new disks are made and skip computestorage.SetupContainerBaseLayer call)
  2. Run container2 with mcr.microsoft.com/windows/image and specify that you want a 5GB scratch size. computestorage.SetupContainerBaseLayer gets invoked and fails with file in use trying to delete hives directory already present.

However, if the 1GB disk already exists we're A-ok. This would happen if the first container we started asked for xGB (as long as it's less than 20), so just the reverse scenario of the above. The SetupContainerBaseLayer call should ideally be made in the differ right after the equivalent older storage api is called, this would just make Windows container image unpacking slower by about 1-1.5 seconds which is.... not the greatest but Windows container pull+unpack is already not the greatest. Otherwise the only other way to avoid ensuring it's present before another container uses the image is on the first writable layer snapshot just call SetupContainerBaseLayer, which also has the cruddy side effect of making the first container you'd launch with an image about 1-1.5 seconds slower also. I feel like pull+unpack time being a bit slower is the lesser of two evils.

Ideally I'd hope we could get to something like we have for LCOW where we keep a cache of scratch disks to use of different sizes but the file in use issue kind of blocks this -_-, so we're stuck with this 1GB + expand path which sucks as the expand call isn't exactly the fastest.

@TBBle
Copy link
Contributor

TBBle commented Mar 8, 2021

I think the "file in use" error I mentioned isn't related to the one you mention here, as in this code-flow, after I changed the read-only layer creation to use 20GB instead of 1GB, nothing will call computestorage.SetupContainerBaseLayer in the Snapshotter test suite.

Also, it was happening before these changes landed in containerd when it was purely hcsshim's layer API. And it happens on ActivateLayer, but only when the tests are in parallel.

In a general sense, it makes a lot more sense to me for the base layers to have this extra pair of VHDX files created as early as possible (in hcsshim.ProcessBaseLayer or the computestorage equivalent), so that the layers are immutable as soon as possible, and definitely after the Differ's Apply or Snapshotter's Commit is completed.

It would be nicer if the extra VHDX files could be created by an API that doesn't also try and re-setup the base layer. I imagine there would still be a conflict if two child layers tried simultaneously to create a scratch from the same base layer, and both tried to create the extra VHDX files, but at least they wouldn't be modifying any other data in the base layer that might be in-use.

I'm not super-keen on the current approach here, as once the base layer has passed the 'process' stage, its usage is much less controlled and more corner-cases appear. I'd prefer, if the appropriate extra VHDX files aren't present (i.e. containerd today extracted the base layer, and did not create a scratch layer from it) then we could add them during WCOW snapshotter plugin startup when we "know" that nothing is currently trying to create scratch layers from the base layers. But perhaps if hot-restarting containerd, we cannot know? I'm not sure off-hand what serialisation guarantees we have.

TBBle added a commit to TBBle/containerd that referenced this pull request Sep 24, 2021
I'm getting weird failures from CI when `func (s *snapshotter)
createScratchLayer` calls `computestorage.SetupContainerBaseLayer` to
create new disks.

> 2021-03-07T06:07:50.7594090Z         testsuite.go:782: failed to
create scratch layer: failed to create scratch vhdx at
"C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\snapshot-suite-Windows-351506444\\root\\snapshots\\1":
failed to format writable layer vhd: The request is not supported.

So let's just avoid that code-path, since this wasn't happening before
this code-path was introduced in containerd#4912.

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
TBBle added a commit to TBBle/containerd that referenced this pull request Apr 26, 2022
I'm getting weird failures from CI when `func (s *snapshotter)
createScratchLayer` calls `computestorage.SetupContainerBaseLayer` to
create new disks.

> 2021-03-07T06:07:50.7594090Z         testsuite.go:782: failed to
create scratch layer: failed to create scratch vhdx at
"C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\snapshot-suite-Windows-351506444\\root\\snapshots\\1":
failed to format writable layer vhd: The request is not supported.

So let's just avoid that code-path, since this wasn't happening before
this code-path was introduced in containerd#4912.

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
TBBle added a commit to TBBle/containerd that referenced this pull request Apr 26, 2022
I'm getting weird failures from CI when `func (s *snapshotter)
createScratchLayer` calls `computestorage.SetupContainerBaseLayer` to
create new disks.

> 2021-03-07T06:07:50.7594090Z         testsuite.go:782: failed to
create scratch layer: failed to create scratch vhdx at
"C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\snapshot-suite-Windows-351506444\\root\\snapshots\\1":
failed to format writable layer vhd: The request is not supported.

So let's just avoid that code-path, since this wasn't happening before
this code-path was introduced in containerd#4912.

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
TBBle added a commit to TBBle/containerd that referenced this pull request Apr 26, 2022
I'm getting weird failures from CI when `func (s *snapshotter)
createScratchLayer` calls `computestorage.SetupContainerBaseLayer` to
create new disks.

> 2021-03-07T06:07:50.7594090Z         testsuite.go:782: failed to
create scratch layer: failed to create scratch vhdx at
"C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\snapshot-suite-Windows-351506444\\root\\snapshots\\1":
failed to format writable layer vhd: The request is not supported.

So let's just avoid that code-path, since this wasn't happening before
this code-path was introduced in containerd#4912.

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
TBBle added a commit to TBBle/containerd that referenced this pull request Apr 26, 2022
I'm getting weird failures from CI when `func (s *snapshotter)
createScratchLayer` calls `computestorage.SetupContainerBaseLayer` to
create new disks.

> 2021-03-07T06:07:50.7594090Z         testsuite.go:782: failed to
create scratch layer: failed to create scratch vhdx at
"C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\snapshot-suite-Windows-351506444\\root\\snapshots\\1":
failed to format writable layer vhd: The request is not supported.

So let's just avoid that code-path, since this wasn't happening before
this code-path was introduced in containerd#4912.

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
TBBle added a commit to TBBle/containerd that referenced this pull request Apr 26, 2022
I'm getting weird failures from CI when `func (s *snapshotter)
createScratchLayer` calls `computestorage.SetupContainerBaseLayer` to
create new disks.

> 2021-03-07T06:07:50.7594090Z         testsuite.go:782: failed to
create scratch layer: failed to create scratch vhdx at
"C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\snapshot-suite-Windows-351506444\\root\\snapshots\\1":
failed to format writable layer vhd: The request is not supported.

So let's just avoid that code-path, since this wasn't happening before
this code-path was introduced in containerd#4912.

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
TBBle added a commit to TBBle/containerd that referenced this pull request Apr 26, 2022
I'm getting weird failures from CI when `func (s *snapshotter)
createScratchLayer` calls `computestorage.SetupContainerBaseLayer` to
create new disks.

> 2021-03-07T06:07:50.7594090Z         testsuite.go:782: failed to
create scratch layer: failed to create scratch vhdx at
"C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\snapshot-suite-Windows-351506444\\root\\snapshots\\1":
failed to format writable layer vhd: The request is not supported.

So let's just avoid that code-path, since this wasn't happening before
this code-path was introduced in containerd#4912.

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
TBBle added a commit to TBBle/containerd that referenced this pull request Apr 26, 2022
I'm getting weird failures from CI when `func (s *snapshotter)
createScratchLayer` calls `computestorage.SetupContainerBaseLayer` to
create new disks.

> 2021-03-07T06:07:50.7594090Z         testsuite.go:782: failed to
create scratch layer: failed to create scratch vhdx at
"C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\snapshot-suite-Windows-351506444\\root\\snapshots\\1":
failed to format writable layer vhd: The request is not supported.

So let's just avoid that code-path, since this wasn't happening before
this code-path was introduced in containerd#4912.

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
TBBle added a commit to TBBle/containerd that referenced this pull request Apr 26, 2022
I'm getting weird failures from CI when `func (s *snapshotter)
createScratchLayer` calls `computestorage.SetupContainerBaseLayer` to
create new disks.

> 2021-03-07T06:07:50.7594090Z         testsuite.go:782: failed to
create scratch layer: failed to create scratch vhdx at
"C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\snapshot-suite-Windows-351506444\\root\\snapshots\\1":
failed to format writable layer vhd: The request is not supported.

So let's just avoid that code-path, since this wasn't happening before
this code-path was introduced in containerd#4912.

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
TBBle added a commit to TBBle/containerd that referenced this pull request Jul 23, 2022
I'm getting weird failures from CI when `func (s *snapshotter)
createScratchLayer` calls `computestorage.SetupContainerBaseLayer` to
create new disks.

> 2021-03-07T06:07:50.7594090Z         testsuite.go:782: failed to
create scratch layer: failed to create scratch vhdx at
"C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\snapshot-suite-Windows-351506444\\root\\snapshots\\1":
failed to format writable layer vhd: The request is not supported.

So let's just avoid that code-path, since this wasn't happening before
this code-path was introduced in containerd#4912.

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
aznashwan pushed a commit to aznashwan/containerd that referenced this pull request Aug 1, 2022
I'm getting weird failures from CI when `func (s *snapshotter)
createScratchLayer` calls `computestorage.SetupContainerBaseLayer` to
create new disks.

> 2021-03-07T06:07:50.7594090Z         testsuite.go:782: failed to
create scratch layer: failed to create scratch vhdx at
"C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\snapshot-suite-Windows-351506444\\root\\snapshots\\1":
failed to format writable layer vhd: The request is not supported.

So let's just avoid that code-path, since this wasn't happening before
this code-path was introduced in containerd#4912.

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
TBBle added a commit to TBBle/containerd that referenced this pull request Aug 10, 2022
I'm getting weird failures from CI when `func (s *snapshotter)
createScratchLayer` calls `computestorage.SetupContainerBaseLayer` to
create new disks.

> 2021-03-07T06:07:50.7594090Z         testsuite.go:782: failed to
create scratch layer: failed to create scratch vhdx at
"C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\snapshot-suite-Windows-351506444\\root\\snapshots\\1":
failed to format writable layer vhd: The request is not supported.

So let's just avoid that code-path, since this wasn't happening before
this code-path was introduced in containerd#4912.

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
TBBle added a commit to TBBle/containerd that referenced this pull request Aug 11, 2022
I'm getting weird failures from CI when `func (s *snapshotter)
createScratchLayer` calls `computestorage.SetupContainerBaseLayer` to
create new disks.

> 2021-03-07T06:07:50.7594090Z         testsuite.go:782: failed to
create scratch layer: failed to create scratch vhdx at
"C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\snapshot-suite-Windows-351506444\\root\\snapshots\\1":
failed to format writable layer vhd: The request is not supported.

So let's just avoid that code-path, since this wasn't happening before
this code-path was introduced in containerd#4912.

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
TBBle added a commit to TBBle/containerd that referenced this pull request Aug 11, 2022
I'm getting weird failures from CI when `func (s *snapshotter)
createScratchLayer` calls `computestorage.SetupContainerBaseLayer` to
create new disks.

> 2021-03-07T06:07:50.7594090Z         testsuite.go:782: failed to
create scratch layer: failed to create scratch vhdx at
"C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\snapshot-suite-Windows-351506444\\root\\snapshots\\1":
failed to format writable layer vhd: The request is not supported.

So let's just avoid that code-path, since this wasn't happening before
this code-path was introduced in containerd#4912.

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants