Skip to content

Conversation

@msvik
Copy link
Contributor

@msvik msvik commented Apr 18, 2022

Reason for Change:
Currently when CNI does a system call, it doesn't have a timeout. It sometimes hangs or waits a long time before it fails leading to other CNI instances getting blocked. This PR adds 10 second timeout for system calls so CNI can timeout and let other CNI instances get through it.

Issue Fixed:
It fixes the live site repair item ADO # 13142210.

Requirements:

Notes:

@rbtr rbtr requested a review from tamilmani1989 April 18, 2022 22:52
@rbtr rbtr added cni Related to CNI. fix Fixes something. labels Apr 18, 2022
@msvik msvik force-pushed the cnilivesiterepair branch from 809d45d to d727406 Compare April 18, 2022 23:28
thatmattlong
thatmattlong previously approved these changes Apr 20, 2022
Copy link
Collaborator

@thatmattlong thatmattlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thatmattlong
Copy link
Collaborator

Any reason to do this only for linux and not windows?

@msvik
Copy link
Contributor Author

msvik commented Apr 20, 2022

Any reason to do this only for linux and not windows?

The livesite was seen/debugged in Linux CNI. Second, Tamilmani pointed out that his observation is that these system commands finish much faster in Linux (order of couple of seconds) and on windows take longer anyway. So, it is just safer to add timeout to Linux.

tamilmani1989
tamilmani1989 previously approved these changes Apr 20, 2022
Copy link
Member

@tamilmani1989 tamilmani1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@tamilmani1989
Copy link
Member

Any reason to do this only for linux and not windows?

In addition to what vipin said, in windows we use hns calls lot than exec cmds. we have a repair item to add timeout for hns calls

@msvik msvik dismissed stale reviews from tamilmani1989 and thatmattlong via e83cb82 April 20, 2022 20:08
Copy link
Member

@tamilmani1989 tamilmani1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@msvik msvik merged commit 9604d6e into Azure:master Apr 21, 2022
matmerr pushed a commit to matmerr/azure-container-networking that referenced this pull request Jun 29, 2022
* CNI Linux: Add 10 seconds timeout for ExecuteCommand

* ran gofmt on changed file

* Fix lint error

* Added unit test for ExecuteCommand for linux

* Moved timeout to execlient. It has default timeout and method to override it now

* reduce timeout values in unit tests

Co-authored-by: msvik <msvik@users.noreply.github.com>
Co-authored-by: VK <misvik@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cni Related to CNI. fix Fixes something.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants