New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aardvark,commit: acquire fs lock when performing commit to avoid race across parallel invocations.
#375
aardvark,commit: acquire fs lock when performing commit to avoid race across parallel invocations.
#375
Conversation
* We should avoid overriding configs when another instance of aardvark is trying to commit configs on the same path. * On certain system a race exists where more than one aardvark instance are started in the frame where one instance has not yet completed updating its `aardvark.pid` causing more than one instance to get started and eventually causing conflits on the requested ports. Some conditions are reported on `low power hardware which tries to start a significant amount of containers` and it looks like the case matches with what is described in the second point. Signed-off-by: Aditya R <arajan@redhat.com>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: flouthoc The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| // Acquire fs lock to ensure other instance of aardvark cannot commit | ||
| // or start aardvark instance till already running instance has not | ||
| // completed its `commit` phase. | ||
| let lockfile_path = Path::new(&self.config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Path for lock file is intentionally <run-root-base>/networks/aardvark.lock instead of <run-root-base>/networks/aardvark-dns/aardvark.lock to make this patch backward compatible with older aardvark versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
@baude @mheon PTAL, could we ask the folks to try this patch on top of I have played while trying to run multiple containers in parallel but it is exactly like what is mentioned in the issue that So while I was not able to reproduce the actual issue but I think following patch should close it. /hold till we get ackd. |
|
/hold |
|
Confirmed this fixed the issue reported ... ready for merging. LGTM |
|
/hold cancel |
|
Addresses: https://bugzilla.redhat.com/show_bug.cgi?id=2116481 Once this BZ merges, please assign the BZ to Jindrich. |
|
/lgtm |
|
Podman already locks all operations in the network backend so there should never be two different netavark processes at any given time. I think the problem here is that we do not wait for aardvark setup. Currently we return before the child writes its pid. So the actual fix is to make aardvark parent wait until the child wrote the pid file. |
|
@Luap99 I think we can verify the pid here , just after we get ackd from |
|
@Luap99 Lets discuss this here: containers/aardvark-dns#220 |
|
/cherry-pick v1.1.0-rhel |
|
@baude: new pull request created: #421 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/cherry-pick v1.0.1-rhel |
|
@mheon: #375 failed to apply on top of branch "v1.0.1-rhel": In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
We should avoid overriding configs when another instance of aardvark
is trying to commit configs on the same path.
On certain system a race exists where more than one aardvark instance
are started in the frame where one instance has not yet completed
updating its
aardvark.pidcausing more than one instance to getstarted and eventually causing conflits on the requested ports.
Some conditions are reported on
low power hardware which tries to start a significant amount of containersand it looks like the case matches with what is described in the second point.