Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DHCP IP Retries in PrepareHNSNetwork #5819

Merged
merged 1 commit into from
Jan 2, 2024
Merged

Conversation

XinShuYang
Copy link
Contributor

@XinShuYang XinShuYang commented Dec 21, 2023

To address the potential race condition issue where acquiring a DHCP IP address may fail after CreateHNSNetwork,
we added a retry mechanism to wait for an available IP. If the DHCP IP cannot be acquired within six seconds,
an error will be logged.

@XinShuYang
Copy link
Contributor Author

/test-windows-containerd-e2e

@XinShuYang
Copy link
Contributor Author

/test-windows-containerd-e2e

1 similar comment
@XinShuYang
Copy link
Contributor Author

/test-windows-containerd-e2e

Copy link
Contributor

@wenyingd wenyingd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a e2e test to verify that vNIC migration is not supposed to modify the original properties.

pkg/agent/util/net_windows.go Outdated Show resolved Hide resolved
pkg/agent/util/net_windows.go Outdated Show resolved Hide resolved
pkg/agent/util/net_windows.go Outdated Show resolved Hide resolved
@XinShuYang
Copy link
Contributor Author

/test-windows-containerd-e2e

pkg/agent/util/net_windows.go Outdated Show resolved Hide resolved
pkg/agent/util/net_windows.go Outdated Show resolved Hide resolved
pkg/agent/util/net_windows.go Outdated Show resolved Hide resolved
pkg/agent/util/net_windows.go Outdated Show resolved Hide resolved
ci/jenkins/test.sh Outdated Show resolved Hide resolved
ci/jenkins/test.sh Outdated Show resolved Hide resolved
ci/jenkins/test.sh Outdated Show resolved Hide resolved
pkg/agent/util/net_windows.go Outdated Show resolved Hide resolved
pkg/agent/util/net_windows.go Outdated Show resolved Hide resolved
@XinShuYang
Copy link
Contributor Author

/test-windows-containerd-e2e

wenyingd
wenyingd previously approved these changes Dec 28, 2023
Copy link
Contributor

@wenyingd wenyingd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@XinShuYang
Copy link
Contributor Author

/test-windows-containerd-e2e

@XinShuYang XinShuYang requested a review from tnqn December 28, 2023 06:13
@XinShuYang
Copy link
Contributor Author

/test-windows-containerd-e2e

@XinShuYang
Copy link
Contributor Author

/test-windows-containerd-e2e

ci/jenkins/test.sh Outdated Show resolved Hide resolved
ci/jenkins/test.sh Outdated Show resolved Hide resolved
pkg/agent/util/net_windows.go Show resolved Hide resolved
if err != nil {
klog.ErrorS(err, "Failed to get Ipv4 DHCP status on the network adapter", "adapter", uplinkAdapter.Name)
}
klog.Warningf("Timeout acquiring IP for the adapter, DHCP status: %t", dhcpStatus)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Use structured logging for new logs, use InfoS if this is something expected and ErrorS if not expected to happen.
  2. Logging dhcpStatus could be confusing when it fails to get its status.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the logic, now we only print dhcpStatus if we successfully retrieve its value from adapter. @tnqn

// Therefore, we set the timeout limit to triple of that value, allowing a maximum wait of 6 seconds here.
err = wait.PollImmediate(1*time.Second, 6*time.Second, func() (bool, error) {
var checkErr error
adapter, ipFound, checkErr = adapterIPExists(nodeIPNet.IP, uplinkAdapter.HardwareAddr, ContainerVNICPrefix)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Windows support both DHCP and static case? If yes, these retries would add unnecessary initialization delay for static IP case.
I think it should first check if this is DHCP case, and only expects it to get IP from DHCP server when DHCP is enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my understanding, the "IP delay available issue" occurs during the creation of a new uplink interface, and this delay is consistent regardless of whether DHCP is enabled on the interface. Regarding the static IP case, we still expect to get available IP from adapter timely. Otherwise it may indicate an issue with the adapter itself. @wenyingd please correct me if I am wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For static IP configurations, we still expect Windows OS can automatically migrate the IP address from pnic to vnic. We would like to use this check to ensure that Windows OS has performed expected behavior, and gives warning logs if not.

@XinShuYang
Copy link
Contributor Author

/test-windows-all

@XinShuYang
Copy link
Contributor Author

/test-windows-all

@XinShuYang XinShuYang requested a review from tnqn January 2, 2024 05:42
tnqn
tnqn previously approved these changes Jan 2, 2024
if err == wait.ErrWaitTimeout {
dhcpStatus, err := InterfaceIPv4DhcpEnabled(uplinkAdapter.Name)
if err != nil {
klog.ErrorS(err, "Failed to get Ipv4 DHCP status on the network adapter", "adapter", uplinkAdapter.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
klog.ErrorS(err, "Failed to get Ipv4 DHCP status on the network adapter", "adapter", uplinkAdapter.Name)
klog.ErrorS(err, "Failed to get IPv4 DHCP status on the network adapter", "adapter", uplinkAdapter.Name)

@@ -647,6 +670,16 @@ func HostInterfaceExists(ifaceName string) bool {
return true
}

// InterfaceIPv4DhcpEnabled returns the Ipv4 DHCP status on the specified interface.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@tnqn
Copy link
Member

tnqn commented Jan 2, 2024

In the commit message: s/a warning message will be returned/an error will be logged/

@tnqn
Copy link
Member

tnqn commented Jan 2, 2024

No need to rerun tests after addressing the typos

To address the potential race condition issue where acquiring a DHCP IP address may fail after CreateHNSNetwork,
we added a retry mechanism to wait for an available IP. If the DHCP IP cannot be acquired within six seconds,
an error will be logged.

Signed-off-by: Shuyang Xin <gavinx@vmware.com>
@XinShuYang
Copy link
Contributor Author

No need to rerun tests after addressing the typos

Got it, PR has been updated.

@tnqn
Copy link
Member

tnqn commented Jan 2, 2024

/skip-all

@tnqn tnqn merged commit 923b429 into antrea-io:main Jan 2, 2024
45 of 52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants