Skip to content

Conversation

@tamilmani1989
Copy link
Member

Reason for Change:

CNI executes iptable cmds for swift podsubnet case and if iptable chain/rule already exists, it throws error and following cni calls fails. Need a reboot or removal of that rule/chain to recover cni. This fix ensure iptable calls are idempotent and not to add iptable rule/chain if it already exists

Issue Fixed:

Requirements:

Notes:

@tamilmani1989 tamilmani1989 requested review from matmerr and rbtr February 8, 2023 22:11
@tamilmani1989 tamilmani1989 added fix Fixes something. bug labels Feb 8, 2023
iptables.GetInsertIptableRuleCmd(iptables.V4, iptables.Nat, iptables.Swift, azureIMDSMatch, snatHostIPJump),

var iptableCmds []iptables.IPTableEntry
if !iptables.ChainExists(iptables.V4, iptables.Nat, iptables.Swift) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assumign *exits functions don't take a lock.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't but what would be the problem even if it does.. that call may fail but retries ensure eventually it will succeed right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exist check is only at start up time.

so if somone put a retry around handle common options and the first one failed at third step then the retry wont re-run exist. and you will fail till your retry expires.

Exist check closter to create/append/insert would be more reliable but its hypothetical if we're not really retrying.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are not retrying so should be good.. you re right but i have to make considerable code change to achieve that. anyway this iptables part gonna move to cns in long run. even if it fails after exist check, it succeeds in next attempt. it wouldn't endup stuck in that error indefinitely as like now.

Copy link
Member

@paulgmiller paulgmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like a safe mitigation . Be nice if all commands were a omic or did the check themselves but that is likely a bigger cange.

iptableCmds = append(iptableCmds, iptables.GetInsertIptableRuleCmd(iptables.V4, iptables.Nat, iptables.Swift, azureIMDSMatch, snatHostIPJump))
}

options[network.IPTablesKey] = iptableCmds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is done at invokeation of azure cni process
So if we ever tried to retry witin the cni we might still hit the same thing but that would only block one pod i guess.

another way to do this is to haave all cmds first run the exit. I assume there is no way to do them all atomically with one lock.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

today cni executes sequentially since we acquire lock at beginning and there can be only one CNI process executes these iptable cmds at a time but we have a plan to parallelize cni .. In parallel execution of cni, there are possiblities that there will be race in adding iptable rules but eventually one will succeed...if cni returns error, containerd runtime will retry the operation

@tamilmani1989
Copy link
Member Author

this seems like a safe mitigation . Be nice if all commands were a omic or did the check themselves but that is likely a bigger cange.

thats right.. initally thought of redesigning but its a bigger change.. anyway in future this will be part of cns

@rbtr rbtr added the cni Related to CNI. label Feb 9, 2023
@csfmomo csfmomo merged commit 7ae0a74 into master Feb 9, 2023
@csfmomo csfmomo deleted the tamanoha/swiftIptablesFix branch February 9, 2023 19:46
rjdenney pushed a commit that referenced this pull request Mar 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug cni Related to CNI. fix Fixes something.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants