-
Notifications
You must be signed in to change notification settings - Fork 260
CNS Scaling Integration (3/3) #668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #668 +/- ##
==========================================
+ Coverage 42.26% 42.65% +0.39%
==========================================
Files 72 72
Lines 10350 10392 +42
==========================================
+ Hits 4374 4433 +59
+ Misses 5508 5483 -25
- Partials 468 476 +8 |
f0a7b34 to
6991854
Compare
ramiro-gamarra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to review further but leaving some initial feedback for ipampoolmonitor
| doNothing = 0 | ||
| increasePoolSize = int64(1) | ||
| decreasePoolSize = int64(-1) | ||
| doNothing = int64(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these seem unused
| func (pm *CNSIPAMPoolMonitor) Start(poolMonitorRefreshMilliseconds int, exitChan <-chan struct{}) error { | ||
| logger.Printf("[ipam-pool-monitor] Starting CNS IPAM Pool Monitor") | ||
| for { | ||
| if stopReconcile(exitChan) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gather the intent is to have a cancellable loop that does something on an interval. The pattern for that is to use a context and a ticker. Then you can do something like
for {
select {
case <-ctx.Done():
return fmt.Errorf("CNS IPAM Pool Monitor received cancellation signal")
case <- ticker.C:
// do things
}
}The reason that is preferred to time.Sleeping is that there's no way to cancel the goroutine while it sleeps. If Reconcile is also a potentially blocking method, it should probably also accept a context.
| pm.MaximumFreeIps = int(pm.scalarUnits.BatchSize * (pm.scalarUnits.ReleaseThresholdPercent / 100)) | ||
| err := pm.Reconcile() | ||
| if err != nil { | ||
| logger.Printf("[ipam-pool-monitor] CRITICAL %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably for offline discussion, but are there any cases where an error from reconciliation should prompt the app to crash and be restarted?
|
|
||
| type CNSIPAMPoolMonitor struct { | ||
| initialized bool | ||
| initialized bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about the reason we need an initialized flag. Is there a way that the pool monitor can be hydrated on construction? That simplifies the rest of the methods, where we can assume that it's been initialized properly and that dependencies are there.
| return decreasePoolSize | ||
| func (pm *CNSIPAMPoolMonitor) Reconcile() error { | ||
| if pm.initialized { | ||
| cnsPodIPConfigCount := len(pm.cns.GetPodIPConfigState()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy path is usually best unindented, so I would flip the condition and return early. Even cleaner if we can skip the check and assume things have been initialized :)
| // no pods scheduled | ||
| case allocatedPodIPCount == 0: | ||
| logger.Printf("[ipam-pool-monitor] No pods scheduled") | ||
| return fmt.Errorf("No pods scheduled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this just be a no-op?
| case freeIPConfigCount > pm.MaximumFreeIps: | ||
| logger.Printf("Number of free IP's (%d) > maximum free IPs (%d), request batch decrease\n", freeIPConfigCount, pm.MaximumFreeIps) | ||
| return decreasePoolSize | ||
| func (pm *CNSIPAMPoolMonitor) Reconcile() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably take a lock as well. While the loop above might only run this in a single goroutine, the internals of the pool monitor can also be changed using UpdatePoolMonitor which could be executed at the same time.
| MinimumFreeIps int64 | ||
| MaximumFreeIps int64 | ||
|
|
||
| sync.RWMutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider naming this mutex instead of embedding it. Embedding it makes the lock/unlock methods exported as well, which is something calling code should not be aware of.
| pm.MaximumFreeIps = int64(float64(pm.scalarUnits.BatchSize) * (float64(pm.scalarUnits.ReleaseThresholdPercent) / 100)) | ||
|
|
||
| if pm.cns == nil { | ||
| return fmt.Errorf("Error Updating Pool Limits, reference to CNS is nil") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bit of a defensive check to be done at runtime. It's probably a programming error if we don't have a cns reference at this point.
cns/logger/log.go
Outdated
| var msg string | ||
| if err == nil && returnCode == 0 { | ||
| msg = fmt.Sprintf("[%s] Sent %T %+v.", tag, response, response) | ||
| //msg = fmt.Sprintf("[%s] Sent %T %+v.", tag, response, response) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these changes intentional?
| pm.initialized = true | ||
| } | ||
| // if there's a pending change to the spec count, and the pending release state is nonzero, | ||
| // skip so we don't thrash the UpdateCRD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is okay to updateCRD and request for more from CNS. We wont be able to update the request anyways allocatedCount wont increase until the next batch is ready. We dont need this check
| if pm.cns == nil { | ||
| return fmt.Errorf("Error Updating Pool Limits, reference to CNS is nil") | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial RequestedIpCount should be same as what is specified in the spec. Say you updated the CRP spec RequestedCount to 20 and then CNS died. On reconcile we want to make sure we cache the CRD spec. (Both RequestedCount + IPs which are marked as pending release)
| "net/http" | ||
| "net/http/httptest" | ||
| "reflect" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also pass the CRD spec details. And we should cache the complete spec.
Also as part of reconcile, ips which are listed in ToBeDeleted list must be marked as PendingRelease if they still exists in PodIpConfigState.
| pm.initialized = true | ||
| } | ||
| // if there's a pending change to the spec count, and the pending release state is nonzero, | ||
| // skip so we don't thrash the UpdateCRD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this check, not required.
| if pm.cns == nil { | ||
| return fmt.Errorf("Error Updating Pool Limits, reference to CNS is nil") | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RequestedCount should be picked from the crd spec.
Cachedspec should match the CRD spec.
- Created NC, SecIpCount = len(podipConfigState) = 10, Cachedspec.RequestedCount = 10
- Updated CRD.spec.RequestedCount = 20
- CNS crashed
- on restart, CND wil recreate NC, SecIpCount = len(podIpConfigstate)=10, Cachedspec.RequestedCount = 10
Also we are missing the ToBeDeletedIPs
| type CNSIPAMPoolMonitor struct { | ||
| initialized bool | ||
| pendingRelease bool | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Save the whole spec.
scale down testing test work reconcile fake requestcontroller test revision check for cached nonreleased ipconfigs remove test struct from decrease fakes initial integration fix interface signature test get pending release futher integration tseting test fixes
2564b22 to
96758b1
Compare
* init with test framework * handle multiple batch size pool request * test verification * handle batch size multiples * address feedback * fix test * remove test struct from fakes * fix nil test * fix tests * feedback * scale down * init scale down scale down testing test work reconcile fake requestcontroller test revision check for cached nonreleased ipconfigs remove test struct from decrease fakes initial integration fix interface signature test get pending release futher integration tseting test fixes * start feedback * update request controller * fixed tests * test fix * ipampoolmonitor in cns not request controller * init fake ipam pool monitor
Reason for Change:
Integrates the IPAM Pool Monitor with existing request controller and CNS.
DO NOT MERGE BEFORE #667, this PR is continually rebased off that one.
Issue Fixed:
Requirements:
Notes: