Skip to content

Conversation

@matmerr
Copy link
Member

@matmerr matmerr commented Sep 4, 2020

Reason for Change:

Integrates the IPAM Pool Monitor with existing request controller and CNS.

DO NOT MERGE BEFORE #667, this PR is continually rebased off that one.

Issue Fixed:

Requirements:

Notes:

@codecov
Copy link

codecov bot commented Sep 4, 2020

Codecov Report

Merging #668 into master will increase coverage by 0.39%.
The diff coverage is 71.69%.

@@            Coverage Diff             @@
##           master     #668      +/-   ##
==========================================
+ Coverage   42.26%   42.65%   +0.39%     
==========================================
  Files          72       72              
  Lines       10350    10392      +42     
==========================================
+ Hits         4374     4433      +59     
+ Misses       5508     5483      -25     
- Partials      468      476       +8     

@matmerr matmerr force-pushed the cnspoolintegration branch 4 times, most recently from f0a7b34 to 6991854 Compare September 9, 2020 00:49
Copy link
Contributor

@ramiro-gamarra ramiro-gamarra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to review further but leaving some initial feedback for ipampoolmonitor

doNothing = 0
increasePoolSize = int64(1)
decreasePoolSize = int64(-1)
doNothing = int64(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these seem unused

func (pm *CNSIPAMPoolMonitor) Start(poolMonitorRefreshMilliseconds int, exitChan <-chan struct{}) error {
logger.Printf("[ipam-pool-monitor] Starting CNS IPAM Pool Monitor")
for {
if stopReconcile(exitChan) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gather the intent is to have a cancellable loop that does something on an interval. The pattern for that is to use a context and a ticker. Then you can do something like

for {
    select {
    case <-ctx.Done():
        return fmt.Errorf("CNS IPAM Pool Monitor received cancellation signal")
    case <- ticker.C:
        // do things
    }
}

The reason that is preferred to time.Sleeping is that there's no way to cancel the goroutine while it sleeps. If Reconcile is also a potentially blocking method, it should probably also accept a context.

pm.MaximumFreeIps = int(pm.scalarUnits.BatchSize * (pm.scalarUnits.ReleaseThresholdPercent / 100))
err := pm.Reconcile()
if err != nil {
logger.Printf("[ipam-pool-monitor] CRITICAL %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably for offline discussion, but are there any cases where an error from reconciliation should prompt the app to crash and be restarted?


type CNSIPAMPoolMonitor struct {
initialized bool
initialized bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about the reason we need an initialized flag. Is there a way that the pool monitor can be hydrated on construction? That simplifies the rest of the methods, where we can assume that it's been initialized properly and that dependencies are there.

return decreasePoolSize
func (pm *CNSIPAMPoolMonitor) Reconcile() error {
if pm.initialized {
cnsPodIPConfigCount := len(pm.cns.GetPodIPConfigState())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy path is usually best unindented, so I would flip the condition and return early. Even cleaner if we can skip the check and assume things have been initialized :)

// no pods scheduled
case allocatedPodIPCount == 0:
logger.Printf("[ipam-pool-monitor] No pods scheduled")
return fmt.Errorf("No pods scheduled")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this just be a no-op?

case freeIPConfigCount > pm.MaximumFreeIps:
logger.Printf("Number of free IP's (%d) > maximum free IPs (%d), request batch decrease\n", freeIPConfigCount, pm.MaximumFreeIps)
return decreasePoolSize
func (pm *CNSIPAMPoolMonitor) Reconcile() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably take a lock as well. While the loop above might only run this in a single goroutine, the internals of the pool monitor can also be changed using UpdatePoolMonitor which could be executed at the same time.

MinimumFreeIps int64
MaximumFreeIps int64

sync.RWMutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider naming this mutex instead of embedding it. Embedding it makes the lock/unlock methods exported as well, which is something calling code should not be aware of.

pm.MaximumFreeIps = int64(float64(pm.scalarUnits.BatchSize) * (float64(pm.scalarUnits.ReleaseThresholdPercent) / 100))

if pm.cns == nil {
return fmt.Errorf("Error Updating Pool Limits, reference to CNS is nil")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a defensive check to be done at runtime. It's probably a programming error if we don't have a cns reference at this point.

var msg string
if err == nil && returnCode == 0 {
msg = fmt.Sprintf("[%s] Sent %T %+v.", tag, response, response)
//msg = fmt.Sprintf("[%s] Sent %T %+v.", tag, response, response)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these changes intentional?

pm.initialized = true
}
// if there's a pending change to the spec count, and the pending release state is nonzero,
// skip so we don't thrash the UpdateCRD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is okay to updateCRD and request for more from CNS. We wont be able to update the request anyways allocatedCount wont increase until the next batch is ready. We dont need this check

if pm.cns == nil {
return fmt.Errorf("Error Updating Pool Limits, reference to CNS is nil")
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial RequestedIpCount should be same as what is specified in the spec. Say you updated the CRP spec RequestedCount to 20 and then CNS died. On reconcile we want to make sure we cache the CRD spec. (Both RequestedCount + IPs which are marked as pending release)

"net/http"
"net/http/httptest"
"reflect"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also pass the CRD spec details. And we should cache the complete spec.

Also as part of reconcile, ips which are listed in ToBeDeleted list must be marked as PendingRelease if they still exists in PodIpConfigState.

pm.initialized = true
}
// if there's a pending change to the spec count, and the pending release state is nonzero,
// skip so we don't thrash the UpdateCRD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this check, not required.

if pm.cns == nil {
return fmt.Errorf("Error Updating Pool Limits, reference to CNS is nil")
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RequestedCount should be picked from the crd spec.
Cachedspec should match the CRD spec.

  1. Created NC, SecIpCount = len(podipConfigState) = 10, Cachedspec.RequestedCount = 10
  2. Updated CRD.spec.RequestedCount = 20
  3. CNS crashed
  4. on restart, CND wil recreate NC, SecIpCount = len(podIpConfigstate)=10, Cachedspec.RequestedCount = 10

Also we are missing the ToBeDeletedIPs

type CNSIPAMPoolMonitor struct {
initialized bool
pendingRelease bool

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Save the whole spec.

scale down testing

test work

reconcile fake requestcontroller

test revision

check for cached nonreleased ipconfigs

remove test struct from decrease fakes

initial integration

fix interface signature

test get pending release

futher integration tseting

test fixes
neaggarwMS
neaggarwMS previously approved these changes Sep 16, 2020
@matmerr matmerr merged commit 51a8884 into Azure:master Sep 16, 2020
neaggarwMS pushed a commit to neaggarwMS/azure-container-networking that referenced this pull request Nov 13, 2020
* init with test framework

* handle multiple batch size pool request

* test verification

* handle batch size multiples

* address feedback

* fix test

* remove test struct from fakes

* fix nil test

* fix tests

* feedback

* scale down

* init scale down

scale down testing

test work

reconcile fake requestcontroller

test revision

check for cached nonreleased ipconfigs

remove test struct from decrease fakes

initial integration

fix interface signature

test get pending release

futher integration tseting

test fixes

* start feedback

* update request controller

* fixed tests

* test fix

* ipampoolmonitor in cns not request controller

* init fake ipam pool monitor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants