-
Notifications
You must be signed in to change notification settings - Fork 260
CNS to DNC communication in Managed DNC #574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #574 +/- ##
==========================================
- Coverage 42.43% 41.97% -0.47%
==========================================
Files 72 72
Lines 9993 10151 +158
==========================================
+ Hits 4241 4261 +20
- Misses 5280 5421 +141
+ Partials 472 469 -3 |
7a4c854 to
603f703
Compare
d4f235a to
c7d9479
Compare
3c80a80 to
510ce73
Compare
a8609ac to
db7726b
Compare
cns/restserver/restserver.go
Outdated
|
|
||
| var ( | ||
| context = podInfo.PodName + podInfo.PodNamespace | ||
| dncEP = service.GetOption(acn.OptPrivateEndpoint).(string) | ||
| infraVnet = service.GetOption(acn.OptInfrastructureNetwork).(string) | ||
| nodeID = service.GetOption(acn.OptNodeID).(string) | ||
| isManagedDnc = dncEP != "" && infraVnet != "" && nodeID != "" | ||
| ) | ||
|
|
||
| containerID, exists = service.state.ContainerIDByOrchestratorContext[context] | ||
| if !exists { | ||
| if isManagedDnc { | ||
| service.lock.Unlock() | ||
| getNetworkContainerResponse.Response.ReturnCode, getNetworkContainerResponse.Response.Message = service.SyncNodeStatus(dncEP, infraVnet, nodeID, req.OrchestratorContext) | ||
| service.lock.Lock() | ||
| if getNetworkContainerResponse.Response.ReturnCode == NotFound { | ||
| return getNetworkContainerResponse | ||
| } | ||
|
|
||
| containerID = service.state.ContainerIDByOrchestratorContext[context] | ||
| } | ||
| } else if isManagedDnc { | ||
| _, getNetworkContainerResponse.Response.ReturnCode, getNetworkContainerResponse.Response.Message = service.isNCWaitingForUpdate(service.state.ContainerStatus[containerID].CreateNetworkContainerRequest.Version, containerID) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to separate function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move what to separate function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
github doesnt have good gui...
Comment on lines 1266 to 1289
cns/restserver/restserver.go
Outdated
| containerID, exists = service.state.ContainerIDByOrchestratorContext[context] | ||
| if !exists { | ||
| if isManagedDnc { | ||
| service.lock.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you unlock?..no lock acquired before this code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's actually supposed to be RUnlock and RLock
4a63970 to
cef191e
Compare
cns/service/main.go
Outdated
| go func(ep, vnet, node string) { | ||
| // Periodically poll (30s) DNC for node updates | ||
| for { | ||
| <-time.NewTicker(time.Second * 30).C |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move 30 sec to a const
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's only used here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my comment in general is to not hardcode values like 5 seconds, 30 seconds. Instead declare those as variable like syncNodeInterval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just finished reviewing the DNC PR. We are retrieving all the NCs every 30 seconds. let's expose this as a configurable param via config file. If we see we are hitting cosmosdb RU limit due to NC processing in DNC we can increase this w/o needing to update the code.
|
Can you add the UTs to test register, sync and NMA api? You will need to inject the dependency / mock those components |
5037681 to
3530953
Compare
| getNetworkContainerVersionURL string) (*http.Response, error) { | ||
| logger.Printf("[NMAgentClient] GetNetworkContainerVersion NC: %s", networkContainerID) | ||
|
|
||
| response, err := common.GetHttpClient().Get(getNetworkContainerVersionURL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
common.GetHttpClient()- can this function return nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's init at start
cns/restserver/api.go
Outdated
|
|
||
| // SetNodeOrchestrator :- Set node orchestrator after registering with mDNC | ||
| func (service *HTTPRestService) SetNodeOrchestrator(r *cns.SetOrchestratorTypeRequest) { | ||
| body, _ := json.Marshal(r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
catching marshal error and logging it?
cns/restserver/api.go
Outdated
| // SetNodeOrchestrator :- Set node orchestrator after registering with mDNC | ||
| func (service *HTTPRestService) SetNodeOrchestrator(r *cns.SetOrchestratorTypeRequest) { | ||
| body, _ := json.Marshal(r) | ||
| req, _ := http.NewRequest(http.MethodPost, "", bytes.NewBuffer(body)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here..catching error and logging it
cns/restserver/api.go
Outdated
| ) | ||
|
|
||
| // try to retrieve NodeInfoResponse from mDNC | ||
| response, err = httpc.Get(fmt.Sprintf(common.SyncNodeNetworkContainersURLFmt, dncEP, infraVnet, nodeID, dncApiVersion)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be good we move this to separate statement and log full url
| waitingForUpdate = true | ||
| returnCode = NetworkContainerPendingStatePropagation | ||
| message = fmt.Sprintf("[Azure-CNS] Network container %s v%d had not propagated to respective NMA w/ v%d", ncid, programmedVersion, nmaVersion) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to log if programmedversion < nmaversion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, CNS should be receiving latest
b8b1c2f to
ebe5faf
Compare
cns/configuration/configuration.go
Outdated
|
|
||
| type ManagedSettings struct { | ||
| PrivateEndpoint string | ||
| InfrastructureNetwork string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this vnetid? if this is id, can we name it as InfrastructureNetworkID.
also can you add a comment what value privateendpoint takes(is it dns or ip?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added ID, and it takes either IP or DNS, that's why it's just Endpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes but can add a comment there
|
|
||
| if tmpReturnCode == UnexpectedError { | ||
| continue | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think i asked earlier...not sure what you replied for that.. what if tmpReturnCode != Success && tmpReturnCode !=unexpectederror && bytes.Compare(nc.OrchestratorContext, contextFromCNI) != 0 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then that means it's NetworkContainerPendingStatePropagation; in this case, the err gets returned to CNI and CNS will still save the state
| dncEP = service.GetOption(acn.OptPrivateEndpoint).(string) | ||
| infraVnet = service.GetOption(acn.OptInfrastructureNetwork).(string) | ||
| nodeID = service.GetOption(acn.OptNodeID).(string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we not using config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do use config, those values are passed to the service
| nodeID = service.GetOption(acn.OptNodeID).(string) | ||
| ) | ||
|
|
||
| returnCode, msg := service.SyncNodeStatus(dncEP, infraVnet, nodeID, json.RawMessage{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add comment why are we passing empty podcontext
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only CNI passes non-empty, there's a func comment about this already
| privateEndpoint := acn.GetArg(acn.OptPrivateEndpoint).(string) | ||
| infravnet := acn.GetArg(acn.OptInfrastructureNetworkID).(string) | ||
| nodeID := acn.GetArg(acn.OptNodeID).(string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this? we are going to get these from config anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checked with jaeryn..we should not advise customer to use cmdline args..its just for internal testing and it should be removed if not needed
tamilmani1989
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
* initial changes for CNS->DNC support * Adding changes for CNS to be compatible with managed DNC (reverse communication channel) * adding NC version validation with respective NMA * return errors for respective NC based on orchestrator context from CNI * add nc version check via NMA * adding logic to SyncNodeStatus and check if NCWaitingForUpdate for CniADD and CnsAttach calls * addressing most of ashvin's comments * adding managed config * fat rebase * addressing some comments * slight optimizations... * adding channel mode instead of managed bool * set err in register node so that we keep looping * addressing ashvin's comments * fix test * removing swift prefix mods for mdnc * addressing tamanoha's comments Co-authored-by: Jaeryn <tsun.chu@microsoft.com>
What this PR does / why we need it:
Special notes for your reviewer:
re-review after pretty fat rebase