-
Notifications
You must be signed in to change notification settings - Fork 260
feat: [NPM] Adding DPShim layer in controller pods #1206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } | ||
|
|
||
| func (dp *DPShim) getIPSet(setName string) *controlplane.ControllerIPSets { | ||
| return dp.setCache[setName] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be getCache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getIPSet is in line with our interface, so prefer to keep it this. Unless you have strong reservations ?
npm/pkg/dataplane/dpshim/dpshim.go
Outdated
| dp.Lock() | ||
| defer dp.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might make more sense to move the lock/unlock to the helper functions deleteIPSet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am following the practice of Exposed function deal with locks and internal functions to not worry about locks. Internal functions by default we can assume that they are to be handled with care. Also, createIPSet is called from some external functions and we cannot deal with locking and unlocking always. Wdyt ?
npm/pkg/dataplane/dpshim/dpshim.go
Outdated
| dp.Lock() | ||
| defer dp.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have this in createIPSet we wouldn't have to lock on a large code block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except for the exists check everything needs to be locked anyway, so i believe trying to lock for each loop might add more overhead. wdyt ?
| } | ||
|
|
||
| policy := dp.policyCache[policyKey] | ||
| toApplyPolicies[idx] = policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont need to use ID - we can just do this,
toApplyPolicies = append(toApplyPolicies, policy)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you use append, you should set the initial capacity of the slice to avoid resizing: make([]thing, 0, len(cache))
| return nil, npmerrors.Errorf(npmerrors.AddPolicy, false, fmt.Sprintf("policy %s not found", policyKey)) | ||
| } | ||
|
|
||
| policy := dp.policyCache[policyKey] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the policy doesnt exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats the check we are doing one line above ?
| idx := 0 | ||
|
|
||
| for policyKey := range dp.dirtyCache.toDeletePolicies { | ||
| toDeletePolicies[idx] = policyKey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same - make use of append
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoiding resizing since we know the size of data.
| dp.checkSetReferences() | ||
| err := dp.ApplyDataPlane() | ||
| if err != nil { | ||
| klog.Errorf("deleteUnusedSets: failed to apply dataplane %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been curious about error propagation in go routines. Should we create an error channel to feedback errors and handle them in the caller code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is across our codebase I mean
nitishm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionality looks good - just some nit picking
huntergregory
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dpshim looks pretty good to me. Made several suggestions. This PR seems dependent on #1202 for validation logic
| func DecodeStrings(payload *bytes.Buffer) ([]string, error) { | ||
| if payload == nil { | ||
| return "", npmerrors.SimpleError("failed to decode, payload is nil") | ||
| return nil, npmerrors.SimpleError("failed to decode, payload is nil") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we check if len(payload) == 0 instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly for other decode funcs
| IPSetMetadata: metadata, | ||
| IPPodMetadata: make(map[string]*dp.PodMetadata), | ||
| MemberIPSets: make(map[string]*ipsets.IPSetMetadata), | ||
| ipsetReference: make(map[string]struct{}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this to be a map? What about having referCount like in ipsets.IPSet?
| } | ||
|
|
||
| set := dp.setCache[prefixedSetName] | ||
| if set.IPSetMetadata.GetSetKind() != ipsets.HashSet { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should generalize addToSet, deleteFromSet, addToList, ... validations into a dp func for both dpshim and Dataplane
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed lets plan on how we can achieve this properly.
| } | ||
|
|
||
| if len(goalStates) == 0 { | ||
| klog.Info("ApplyDataPlane: No changes to apply") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this ever happen?
| } | ||
|
|
||
| policy := dp.policyCache[policyKey] | ||
| toApplyPolicies[idx] = policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you use append, you should set the initial capacity of the slice to avoid resizing: make([]thing, 0, len(cache))
npm/http/server/server.go
Outdated
| // prometheus handlers | ||
| if config.Toggles.EnablePrometheusMetrics { | ||
| rs.router.Handle(api.NodeMetricsPath, metrics.GetHandler(metrics.NodeMetrics)) | ||
| rs.router.Handle(api.ClusterMetricsPath, metrics.GetHandler(metrics.ClusterMetrics)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this nil check for the encoder important for v1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not relevant in V1 case, this is specifically for daemon where we will need the PProf endpoint but epmencoder is not yet supported. That is a todo item for later.
| }() | ||
| dp.Lock() | ||
| defer dp.Unlock() | ||
| dp.lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlock in the defer func? with current code, dp will be unlocked before dp.ApplyDataplane() because defers are processed LIFO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that ApplyDataplane locks too. Maybe we make a little applyDataplane that doesn't lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intentional and keeping the same design from our existing DP, we do not anticipate another process coming in between and making changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if it does i do not think there is a major issue, except for one, i.e. if a Set is added/created and immediately deleted then kernel might complain about it does not exists. But i think you had suggested a fix for it to use --exists, is that in master ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will it add system overhead to lock/unlock twice for each Policy call though? We haven't solved the created then deleted edge case yet (in the non-fan-out model, we thought that situation shouldn't happen)
nitishm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic seems wrong in b7ffd35#r806133602
nitishm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
huntergregory
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Reason for Change:
Adding DPShim component,
DPShim works as a translator, cache and transporter of Resource events into sets/policies and relays them to gRPC server. The gRPC server will distribute these goal states to all the registered daemon clients.
DPShim:
Issue Fixed:
Requirements:
Notes: