Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Cirrus: Update to XFS root in Fedora VM images #10201

Closed
wants to merge 1 commit into from

Conversation

cevich
Copy link
Member

@cevich cevich commented May 3, 2021

COMPLETELY experimental; DO NOT MERGE

Ref: containers/automation_images#69

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 3, 2021
@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cevich
To complete the pull request process, please assign rhatdan after the PR has been reviewed.
You can assign the PR to them by writing /assign @rhatdan in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cevich
Copy link
Member Author

cevich commented May 4, 2021

@rhatdan I have some...errr...results 😁 Both F33 and F34 are running with XFS, and in all cases do not appear to run significantly faster compared to EXT4. Though I'll assume that deeper inspection/debugging would find some/many low-level inefficiencies that need fixing. The test failures are quite a bit more worrying, since they might describe real-world problems that need fixing.

The other alternative is test/environment setup bugs (i.e. not "real world" reproducible) but these too may be worth fixing. In all cases it would be good to put some expert eyeballs on these failures. Especially since switching to XFS appears to cause checkpoint/restore to outright hang 😖

Please let me know how I can help.

@cevich cevich requested a review from rhatdan May 4, 2021 15:07
@rhatdan
Copy link
Member

rhatdan commented May 4, 2021

Could you check to see if reflink is enabled?

xfs_info /var | grep ref
         =                       reflink=1    bigtime=0

@cevich
Copy link
Member Author

cevich commented May 4, 2021

~/dev/podman (kickstart_fedora|✔) $ hack/get_ci_vm.sh int podman fedora-34 root host

# Initializing get_ci_vm
...cut...
# Accessing instance cevich-fedora-c6202812959817728
+ Entering Cirrus-CI environment
+ Changing into /var/tmp/go/src/github.com/containers/podman
+ Dropping into a bash login shell inside Cirrus-CI 'int podman fedora-34 root host' task environment
[root@cevich-fedora-c6202812959817728 podman]# xfs_info /var | grep ref
xfs_info: cannot open /var: Is a directory
[root@cevich-fedora-c6202812959817728 podman]# xfs_info / | grep ref
         =                       reflink=1    bigtime=0
[root@cevich-fedora-c6202812959817728 podman]# xfs_info /tmp | grep ref
xfs_info: cannot open /tmp: Is a directory
[root@cevich-fedora-c6202812959817728 podman]#

@rhatdan
Copy link
Member

rhatdan commented May 4, 2021

If /tmp is on the / OS then reflinks are enabled and look like they are enabled by default, which is goodness.

@cevich
Copy link
Member Author

cevich commented May 5, 2021

If /tmp is on the / OS then reflinks are enabled and look like they are enabled by default, which is goodness.

Yep, so either our bits aren't taking advantage as expected, or the affect isn't very substantial, or both. Still, I'm a bit more concerned about the test failures I mentioned above. They all strike me as things which shouldn't fail due to underlying FS changing. Albeit, I also don't know the details, maybe it makes sense to someone?

@rhatdan
Copy link
Member

rhatdan commented May 5, 2021

Yes I have no idea what is causing the timeouts.

@cevich
Copy link
Member Author

cevich commented May 5, 2021

The network-create failures are even more strange: Error: container create failed (no logs from conmon): EOF

I wonder if maybe these are all race-conditions in the tests which are simply being exposed by timing changes. If so, the failures are pretty consistent. I got basically the same failures running these jobs multiple times and with BTRFS also 😕

Signed-off-by: Chris Evich <cevich@redhat.com>
@cevich
Copy link
Member Author

cevich commented May 11, 2021

rebased on master

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 11, 2021

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cevich
To complete the pull request process, please assign rhatdan after the PR has been reviewed.
You can assign the PR to them by writing /assign @rhatdan in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 16, 2021

@cevich: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 16, 2021
@cevich
Copy link
Member Author

cevich commented May 17, 2021

Closing this for now, since there's a LOT of additional complexity in the image-build workflow for apparently minimal performance gain. Happy to revisit/re-open this at a later date if there is a need.

@cevich cevich closed this May 17, 2021
@cevich cevich deleted the kickstart_fedora branch June 30, 2021 17:59
@cevich cevich restored the kickstart_fedora branch June 30, 2021 17:59
@cevich cevich deleted the kickstart_fedora branch April 18, 2023 14:46
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 31, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 31, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants