Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix handling of static/volume dir #20088

Merged
merged 1 commit into from Sep 25, 2023

Conversation

vrothberg
Copy link
Member

@vrothberg vrothberg commented Sep 21, 2023

The processing and setting of the static and volume directories was scattered across the code base (including c/common) leading to subtle errors that surfaced in #19938.

There were multiple issues that I try to summarize below:

  • c/common loaded the graphroot from c/storage to set the defaults for static and volume dir. That ignored Podman's --root flag and surfaced in podman with custom static_dir slow to tear-down containers #19938 and other bugs. c/common does not set the defaults anymore which gives Podman the ability to detect when the user/admin configured a custom directory (not empty value).

  • When parsing the CLI, Podman (ab)uses containers.conf structures to set the defaults but also to override them in case the user specified a flag. The --root flag overrode the static dir which is wrong and broke a couple of use cases. Now there is a dedicated field for in the "PodmanConfig" which also includes a containers.conf struct.

  • The defaults for static and volume dir and now being set correctly and adhere to --root.

  • The CONTAINERS_CONF_OVERRIDE env variable has not been passed to the cleanup process. I believe that all env variables should be passed to conmon to avoid such subtle bugs.

Overall I find that the code and logic is scattered and hard to understand and follow. I refrained from larger refactorings as I really just want to get #19938 fixed and then go back to other priorities.

Fixes: #19938

Does this PR introduce a user-facing change?

Fix a bug where setting the static or volume directory in containers.conf would lead to cleanup errors.

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Sep 21, 2023
@vrothberg
Copy link
Member Author

Marked as a draft for now as it's vendoring containers/common#1660. Once green and Acked we can merge containers/common#1660 and do the dance.

@packit-as-a-service
Copy link

Cockpit tests failed for commit 020c8dd. @martinpitt, @jelly, @mvollmer please check.

@packit-as-a-service
Copy link

Cockpit tests failed for commit 6156986. @martinpitt, @jelly, @mvollmer please check.

@vrothberg
Copy link
Member Author

@mheon PTAL

@martinpitt
Copy link
Contributor

Thanks @vrothberg for fixing. Tests previously failed on

 podman rmi --all
Error: creating runtime static files directory "": mkdir : no such file or directory

It's a bit curious that no other test picked this up. The test setup script doesn't do anything super weird though. Anyway, water under the bridge!

@vrothberg
Copy link
Member Author

Thanks, @martinpitt ! The mkdir issue has been fixed in the meantime.

@@ -540,7 +540,7 @@ func rootFlags(cmd *cobra.Command, podmanConfig *entities.PodmanConfig) {
_ = pFlags.MarkHidden(networkBackendFlagName)

rootFlagName := "root"
pFlags.StringVar(&podmanConfig.ContainersConf.Engine.StaticDir, rootFlagName, podmanConfig.ContainersConfDefaultsRO.Engine.StaticDir, "Path to the root directory in which data, including images, is stored")
pFlags.StringVar(&podmanConfig.GraphRoot, rootFlagName, "", "Path to the graph root directory where images, containers, etc. are stored")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we concerned about this breaking folks? Setting --root is now doing something very different than it did before.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly don't know. This stuff is really broken :(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're good to go. It essentially overrode the graph root before as well (that's how it was being used).

@@ -1328,6 +1328,9 @@ func (r *ConmonOCIRuntime) configureConmonEnv(runtimeDir string) []string {
if conf, ok := os.LookupEnv("CONTAINERS_CONF"); ok {
env = append(env, fmt.Sprintf("CONTAINERS_CONF=%s", conf))
}
if conf, ok := os.LookupEnv("CONTAINERS_CONF_OVERRIDE"); ok {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mheon do you recall why we're not passing all env variables to conmon? I feel like we should do this since there is a growing number of those which I do not see being passed down.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it dates back to before we forked off Conmon, so this is probably older than my involvement in the project. I don't see a good reason not to forward on full environment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I will open a follow-up PR 👍

test/system/030-run.bats Outdated Show resolved Hide resolved
@vrothberg vrothberg marked this pull request as ready for review September 22, 2023 10:45
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 22, 2023
@vrothberg
Copy link
Member Author

OK, this is ready from end ✔️

@Luap99
Copy link
Member

Luap99 commented Sep 22, 2023

If this is fully rebased on the latest c/common I expect the machine tests to fail: #20094

@vrothberg
Copy link
Member Author

If this is fully rebased on the latest c/common I expect the machine tests to fail: #20094

This is the 2nd time in 24 hours that a commit in common got merged and broke Podman CI :( I'll take a look at fixing the machine tests.

@vrothberg
Copy link
Member Author

Repushed and commented out the failures. I let @ashley-cui adjust the tests once this PR here got in.

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 22, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99, vrothberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@vrothberg
Copy link
Member Author

sqlite behaves differently than boltdb :(

Not sure whether I can get this fixed in time today, apologies.

@vrothberg
Copy link
Member Author

Opened containers/common#1663 to revert my two commits, so I can work on the remaining issues in peace.

@vrothberg
Copy link
Member Author

sqlite behaves differently than boltdb :(

Found the issue. The problem was that the sqlite DB was always placed on the graph root and not on the static dir. Now, since the static dir can really be configured it blew up.

@ashley-cui
Copy link
Member

Sorry about that, opened #20101 and will rebase/adjust once this PR goes in.

@vrothberg
Copy link
Member Author

Sorry about that, opened #20101 and will rebase/adjust once this PR goes in.

Thanks! In case this PR here fails again, feel free to just merge yours and I will rebase this one here on Monday :)

@vrothberg
Copy link
Member Author

The fight is still not over. Transient store e2e tests are crying. I'll calm them Monday morning.

@vrothberg
Copy link
Member Author

Hitting #20119 now

The processing and setting of the static and volume directories was
scattered across the code base (including c/common) leading to subtle
errors that surfaced in containers#19938.

There were multiple issues that I try to summarize below:

 - c/common loaded the graphroot from c/storage to set the defaults for
   static and volume dir.  That ignored Podman's --root flag and
   surfaced in containers#19938 and other bugs.  c/common does not set the
   defaults anymore which gives Podman the ability to detect when the
   user/admin configured a custom directory (not empty value).

 - When parsing the CLI, Podman (ab)uses containers.conf structures to
   set the defaults but also to override them in case the user specified
   a flag.  The --root flag overrode the static dir which is wrong and
   broke a couple of use cases.  Now there is a dedicated field for in
   the "PodmanConfig" which also includes a containers.conf struct.

 - The defaults for static and volume dir and now being set correctly
   and adhere to --root.

 - The CONTAINERS_CONF_OVERRIDE env variable has not been passed to the
   cleanup process.  I believe that _all_ env variables should be passed
   to conmon to avoid such subtle bugs.

Overall I find that the code and logic is scattered and hard to
understand and follow.  I refrained from larger refactorings as I really
just want to get containers#19938 fixed and then go back to other priorities.

containers/common#1659 broke three pkg/machine
tests.  Those have been commented out until getting fixed.

Fixes: containers#19938
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
@vrothberg
Copy link
Member Author

@mheon @baude this should finally be green in a jiffy. Please take a look. I did not change existing tests, so I am fairly confident it's not introducing a regression but this code is quite fragile.

@baude
Copy link
Member

baude commented Sep 25, 2023

code LGTM

@mheon
Copy link
Member

mheon commented Sep 25, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 25, 2023
@openshift-merge-robot openshift-merge-robot merged commit 2fef4c7 into containers:main Sep 25, 2023
98 checks passed
@vrothberg vrothberg deleted the fix-19938 branch September 26, 2023 07:06
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Dec 26, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

podman with custom static_dir slow to tear-down containers
8 participants