Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store: simplify imagestore implementation #1784

Merged
merged 4 commits into from Mar 1, 2024

Conversation

giuseppe
Copy link
Member

@giuseppe giuseppe commented Dec 21, 2023

simplify the imagestore implementation and use it as a first-class store. It reduces the amount of code to deal with an image store in the driver (only overlay supports it at the moment).

Closes: #1779

Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com

@rhatdan
Copy link
Member

rhatdan commented Dec 21, 2023

Can you run a test against podman and Buildah as well?

@giuseppe giuseppe marked this pull request as draft December 21, 2023 20:27
@giuseppe giuseppe force-pushed the refactor-image-store branch 4 times, most recently from f6e38aa to 5385212 Compare December 22, 2023 10:02
@giuseppe giuseppe force-pushed the refactor-image-store branch 2 times, most recently from bc35798 to 814fddc Compare December 23, 2023 09:54
@giuseppe giuseppe marked this pull request as ready for review January 8, 2024 09:23
@giuseppe
Copy link
Member Author

giuseppe commented Jan 8, 2024

this is ready for review. Both Podman and Buildah pass

Podman test PR: containers/podman#21070
Buildah test PR: containers/buildah#5236

@rhatdan
Copy link
Member

rhatdan commented Jan 8, 2024

LGTM
@saschagrunert @mtrmac @nalind @vrothberg PTAL

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are parts of this, like removing all the d.imageStore conditions throughout overlay, which I very much like.

On the general principle, I suppose that’s pre-existing, but I just can’t see how this works. All of the read-only stores need to be RO locked, don’t they? And locks are handled in the generic code, drivers can’t do that for themselves. So any kind of iteration over stores in dir2 seems completely broken to me.

So… what’s really going on? Is dir2 only to handle cross-store parent layers during an overlay mount, or something? If that were so, the driver code should not be touching the other stores during any other operation; and still that cross-store parent situation would need to be somehow explicitly tracked and locked.

I’m afraid I can’t do a full review now. Maybe there is some way to split this into much smaller PRs?

Comment on lines 881 to 882
if d.imageStore != "" {
_ = os.RemoveAll(filepath.Join(d.imageStore, stagingDir))
}
_ = os.RemoveAll(filepath.Join(d.home, stagingDir))
Copy link
Collaborator

@mtrmac mtrmac Jan 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work? stagingDir is here a per-driver directory, but getStagingDir creates per-layer-ID staging directories.

Similarly the EDIT check use of use in ListLayers seems inappropriate now — or quite possibly it needs to be retained for compatibility with older writers, but then it should be explicitly documented as such a non-obvious case.


Stylitically, I’d prefer:

  • Define a separate globalStagingDir/perLayerStagingDir constants; don’t use a single string for two different purposes!
  • As I always want, document top-level symbols.

store.go Outdated
Comment on lines 955 to 1002
stores := driver.AdditionalImageStores()
if s.imageStoreDir != "" {
stores = append([]string{s.graphRoot}, stores...)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would greatly appreciate consolidating this data further:

  • Only ever call AdditionalImageStores() once in load
  • Only ever have one if s.imageStoreDir != "" condition, somewhere around there.
  • Together, populate something like s.rwImageLayerStoreDirs / s.roImageLayerStoreDirs. And then have all other graph iterations be trivial loops.

Comment on lines 1047 to 1096
func (s *store) createGraphDriverLocked() (drivers.Driver, error) {
driverRoot := s.imageStoreDir
imageStoreBase := s.graphRoot
if driverRoot == "" {
driverRoot = s.graphRoot
imageStoreBase = ""
}
config := drivers.Options{
Root: driverRoot,
ImageStore: imageStoreBase,
Root: s.graphRoot,
ImageStore: s.imageStoreDir,
RunRoot: s.runRoot,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS this effectively reverses the semantics of the two fields. While I 100% support the resulting simpler code and semantics, I’m not smart enough to review such changes in a GitHub PR without ~re-doing the work.

One possible approach:

  • Define completely new fields with the new semantics. Set them in the first commit.
  • In the first and/or subsequent commits, migrate the users to the new fields.
  • Now that the old fields have no readers, possibly rename the fields back to the original names, with the new semantics,

store.go Outdated

ro := store != s.imageStoreDir && store != s.graphRoot

rls, err := s.newLayerStore(rlpath, glpath, s.graphDriver, false, ro)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this introducing the “second read-write layer store” concept? Yes, it mirrors the “second read-write image store” thing (which I found dubious in #1578), but AFAICS the Delete code which uses the “second RW image store” never deletes layers from the second RW layer store.

So either that should be added as well, or (hopefully) we don’t need this complexity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is needed because we have this weird specification that a layer must be deleted from the current store if it is found there. We even have a test for it :/

https://github.com/containers/storage/blob/main/tests/split-store.bats#L96-L101

I am fine if we decide to break this specification, but let's do it separately after we verify with CRI-O folks that it is not necessary.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine… but I can’t see how that works. layersToRemove is only applied to one rlstore. That uses getLayerStoreLocked, and AFAICS never gets to write to this special member of roLayerStoresUseGetters.

I’m probably being stupid.

layers.go Show resolved Hide resolved
layers.go Outdated Show resolved Hide resolved
drivers/vfs/driver.go Outdated Show resolved Hide resolved
newpath := filepath.Join(homedir, d.String(), "dir", filepath.Base(id))
if _, err := os.Stat(newpath); err != nil {
additionalHomes := d.homes[1:]
if d.imageStore != "" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can’t this condition be determined at driver initialization time?


Every single consumer of d.homes is either indexing [0] or [:1], suggesting that the data can be recorded in some more convenient way.

homedir = d.homes[0]
}
newpath := filepath.Join(homedir, d.String(), "dir", filepath.Base(id))
if _, err := os.Stat(newpath); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ How does this handle any locking? The generic code maintains RW/RO locks on the individual images / layers. This… does nothing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still outstanding.

I don't see how this is different than the previous version, what locking do we need here?

}

newpath := path.Join(homedir, id)

if _, err := os.Stat(newpath); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ How does this handle any locking? The generic code maintains RW/RO locks on the individual images / layers. This… does nothing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still outstanding.

I don't see how this is different than the previous version, what locking do we need here?

Copy link
Collaborator

@mtrmac mtrmac Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, see the top-level comment #1784 (review) .

One immediate thing that springs to mind is deleting the parent layer while the child is being created/mounted/unmounted. I don’t know that we can fully defend against those kinds of things (we don’t hold layers locked for the whole time a child is mounted, and mount reference counts are, I think, basically per-machine and not visible over network shares).

Or maybe a layer with the same ID (pulled layers have deterministic IDs) being simultaneously created in both stores, making the path returned by d.dir[2]() change over time — e.g. for getStagingDir as an example.

There might well be more, or maybe these are handled by some mechanism I don’t know about.


Assuming the concern is real:

  • I’d like to see the scope of the problem restricted to the unavoidable minimum (whatever that is); in all situations where the generic code is iterating over getROLayerStoresLocked to find the right layer, and handling locking, the driver should not be iterating over the secondary layer stores again / redundantly without locking. Structure the code, or document it, in a way that is likely to be maintained correctly over time.
  • In the situations where the problem remains outstanding afterwards and is not fixable short-term, we should loudly document the problem, the situations where it arises, and the constraints, for future maintainers (compare LOCKING BUG in layers.go)
  • Alternatively, if this is never going to be fixed, and users need to use the feature appropriately (“the image store must never be modified while it is used with outer stores” ?), clearly document that in end-user documentation, and maybe point at it in the code to acknowledge and justify the unsynchronized accesses.
  • (As another option, a few years ago overlay used to do its own per-layer locking. I have removed that because I thought it’s redundant; actually external users can get a pointer to the driver object, so it is not 100% redundant, and doing that locking here might also be an option. I would prefer not to add that cost to every operation.)

I could see an argument that this is out of scope of this PR … I’m thinking that with this touching dir2 and dir, while the feature is still comparatively new, now is the best chance for fixing this we’ll ever have. Admittedly the timing is not super convenient now, but it will probably never be ideal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the assumption with additional image stores is that images/layers are not removed. There are cases where we don't grab any lock, e.g. a read-only file system, so that is already the current state.

When we use a read-only layer, the layer could disappear also while a container is running, and there is nothing that prevents the removal now:

# mkdir -p /tmp/store/{rw,ro,empty}
# podman --root /tmp/store/rw pull busybox
# mount --bind --read-only /tmp/store/rw /tmp/store/ro
# podman --root /tmp/store/empty --storage-opt overlay.imagestore=/tmp/store/ro run -d busybox top
5d4baa2643c77deca49ff9b1a0a10df5c2965626b725b8a2225db54706e69980
# podman --root /tmp/store/rw rmi busybox
Untagged: docker.io/library/busybox:latest
Deleted: 9211bbaa0dbd68fed073065eb9f0a6ed00a75090a9235eca2554c62d1e75c58f

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the silent deletion of dependent parents seems hard to fix. It’s also not really documented that the users must prevent that, AFAICT.

What about concurrent creations, and other operations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not thought about them, but are they very different than delete though? We would end up with the same layer in multiple stores and use it from the first one where we find it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could see e.g. a partial pull of a layer being started on our store, triggering the creation of a staging directory; before it finishes, the other image store creates the same layer; and now the code trying too commit the partially-pulled layer sees the new layer, determines a different staging directory path, and rejects the commit attempt. (I’m not saying that this is the only problem.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's address this in a separate PR. I've played with it this morning, but I am not sure yet how this should be done.

Do you've any suggestions?

@giuseppe giuseppe force-pushed the refactor-image-store branch 6 times, most recently from f3ffbd6 to 5c00dd3 Compare January 10, 2024 14:43
store.go Outdated Show resolved Hide resolved
store.go Outdated

ro := store != s.imageStoreDir && store != s.graphRoot

rls, err := s.newLayerStore(rlpath, glpath, s.graphDriver, false, ro)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine… but I can’t see how that works. layersToRemove is only applied to one rlstore. That uses getLayerStoreLocked, and AFAICS never gets to write to this special member of roLayerStoresUseGetters.

I’m probably being stupid.

drivers/vfs/driver.go Outdated Show resolved Hide resolved
drivers/vfs/driver.go Outdated Show resolved Hide resolved
@giuseppe giuseppe force-pushed the refactor-image-store branch 3 times, most recently from 19fc9a6 to be5720d Compare January 15, 2024 12:20
@giuseppe giuseppe marked this pull request as ready for review January 25, 2024 10:43
@giuseppe
Copy link
Member Author

ready for review

@giuseppe
Copy link
Member Author

@rhatdan @mtrmac can we please move this forward?

@rhatdan
Copy link
Member

rhatdan commented Jan 30, 2024

@nalind PTAL

@mtrmac
Copy link
Collaborator

mtrmac commented Jan 30, 2024

Is this related to the Zstd work?

@giuseppe
Copy link
Member Author

Is this related to the Zstd work?

no, this is not related to zstd

@rhatdan
Copy link
Member

rhatdan commented Feb 10, 2024

@giuseppe @mtrmac is this something we should get in before 5.0?

@giuseppe
Copy link
Member Author

I don't think it is too important for Podman 5.0, but it solves an issue that was reported some time ago: #1779

@giuseppe
Copy link
Member Author

@kolyshkin @nalind PTAL

@rhatdan
Copy link
Member

rhatdan commented Feb 14, 2024

@saschagrunert PTAL

@giuseppe
Copy link
Member Author

giuseppe commented Feb 21, 2024

@haircommander @saschagrunert this PR affects the "split store" functionality in CRI-O

@haircommander
Copy link
Contributor

cc @kannon92

@giuseppe
Copy link
Member Author

rebased

this is a preparation for the next commit, which will use this feature.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
simplify the imagestore implementation and use it as a first-class
store.  It reduces the amount of code to deal with an image store in
the driver (only overlay supports it at the moment).

Closes: containers#1779

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
when an image store is used, lock it as well as the graphroot.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Copy link
Contributor

openshift-ci bot commented Mar 1, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [giuseppe,saschagrunert]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rhatdan
Copy link
Member

rhatdan commented Mar 1, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Mar 1, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 226cffb into containers:main Mar 1, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

podman image mount issue when additional image stores are used
5 participants