Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for OVS bridge datapath ID to be available after creating br-int #6472

Conversation

antoninbas
Copy link
Contributor

We wait (for a maximum of 5s) for the datapath_id of the br-int OVS bridge to be reported in OVSDB, after creating the bridge and before checking supported datapath features. This prevents errors when querying the supported features before the ofproto-dpif provider has been initialized.

Fixes #6471

We wait (for a maximum of 5s) for the datapath_id of the br-int OVS
bridge to be reported in OVSDB, after creating the bridge and before
checking supported datapath features. This prevents errors when querying
the supported features before the ofproto-dpif provider has been
initialized.

Fixes antrea-io#6471

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
@antoninbas antoninbas force-pushed the wait-for-bridge-datapath-id-before-checking-dp-features branch from 2b6c71c to d869a13 Compare June 20, 2024 23:57
@@ -317,6 +317,45 @@ func (br *OVSBridge) GetDatapathID() (string, Error) {
}
}

func (br *OVSBridge) WaitForDatapathID(timeout time.Duration) (string, Error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a GetDatapathID() method and a GetOFPort(ifName string, waitUntilValid bool) method. Does it make sense to add a waitUntilValid to the former to unify them and probably to reduce some code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a big fan of the current GetOFPort method, but maybe I can come up with a solution that I like that avoid the code duplication

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find a simple solution I really liked. GetOFPort is a bit different from GetDatapathID / WaitForDatapathID. In GetOFPort we always use a Wait in the transaction, and waitUntilValid only determines the wait condition. For Linux, we wait for the value to be non-empty, for Windows, we wait for the value to be different from -1 (which indicates that the interface doesn't exist).

Here for the datapath ID, if we want one version with wait and one version without wait, then we have significant differences in the implementations, because one version has a single operation in its OVSDB transaction and one version has two operations. I feel like it is not worth unifying in this case, but let me know what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I was just not sure if you missed it. No problem from me if it looks worse by unifying them.

@antoninbas antoninbas changed the title Wait for OVS bridge datapath ID to be avilable after creating br-int Wait for OVS bridge datapath ID to be available after creating br-int Jun 24, 2024
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@antoninbas antoninbas added the action/release-note Indicates a PR that should be included in release notes. label Jun 25, 2024
@antoninbas
Copy link
Contributor Author

/test-all
/test-windows-all

@antoninbas antoninbas merged commit b03e894 into antrea-io:main Jun 25, 2024
56 of 59 checks passed
@antoninbas antoninbas deleted the wait-for-bridge-datapath-id-before-checking-dp-features branch June 25, 2024 21:28
@antoninbas
Copy link
Contributor Author

I am not backporting this based on the risk-return tradeoff. The issue is quite rare and even when it does happen, older versions of Antrea can recover automatically and very quickly. I also only have observed the issue on "fresh" installations so far, even though it may (should?) be possible for the issue to also happen during restarts / upgrades.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Antrea Agent sometimes (rarely) fails to query supported OVS datapath features
2 participants