-
Notifications
You must be signed in to change notification settings - Fork 71
In service_manager, if tags of an association cannot be fetched, skip instead of error #600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… instead of error
28410b0 to
edb39d1
Compare
| assocs, err := m.getAllAssociations(ctx, svcSum) | ||
| if err != nil { | ||
| return err | ||
| return fmt.Errorf("in updateAssociations, getAllAssociations failed with %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding a some more detailed error messages since the err here is just
status code: 404, request id: ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idiomatic error string would be "update associations: get all associations: %w" lower case for function names, chained by colon. Also it's better to use one of the approaches for error composition:
- function that throw error need to add it's own name to errors
- or caller adds names of functions it calls
Otherwise you might have duplicates in the final error message. Here both are mixed, updateAssociation adds itself and getAllAssociations. You probably want to move getAllAssociations part into it's own function. For example:
in updateAssociations:
return fmt.Errorf("update associations: %w, err);in getAllAssociations
return fmt.Errorf("get all associations: %w, err);Also dont use words "failed" if it's not original error, there is a tendency to add "failed", "unsuccessful", etc when we wrap errors, so when error pop-up to the top level of stack it will have 5-6 "failed" words in single sentence. So back to the first comment, chain method names with colons, and actual error last in the chain.
| var svcErr error | ||
| for _, resService := range resServices { | ||
| svcName := fmt.Sprintf("%s-%s", resService.Spec.RouteName, resService.Spec.RouteNamespace) | ||
| svcName := utils.LatticeServiceName(resService.Spec.RouteName, resService.Spec.RouteNamespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor fix on error messaging. was confusing to see a non truncated service name here.
| if isManaged { | ||
| m.log.Errorf("in updateAssociations failed when attempting IsArnManaged check. Skipping delete association. service: %s, association: %s, %s", svc.LatticeServiceName(), assoc.Arn, err) | ||
| } else if isManaged { | ||
| err = m.deleteAssociation(ctx, assoc.Arn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we mimic this logic to only ignore get tags AccessDenied and still return other err? for example:
for _, assoc := range toDelete {
isManaged, err := m.cloud.IsArnManaged(ctx, *assoc.Arn)
if err != nil {
aerr, ok := err.(awserr.Error)
if ok && aerr.Code() == vpclattice.ErrCodeAccessDeniedException {
// In a scenario that the service association is created by a foreign account,
// the owner account's controller cannot read the tags of this ServiceNetworkServiceAssociation,
// and AccessDeniedException is expected.
continue
} else {
return err
}
}
if isManaged {
err = m.deleteAssociation(ctx, assoc.Arn)
if err != nil {
return err
}
}
}
@solmonk can you review this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I think this way makes more sense in terms of consistency.
solmonk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe service network has the same issue, I'm curious if you are also affected by it? https://github.com/aws/aws-application-networking-k8s/blob/main/pkg/deploy/lattice/service_network_manager.go#L101
| if isManaged { | ||
| m.log.Errorf("in updateAssociations failed when attempting IsArnManaged check. Skipping delete association. service: %s, association: %s, %s", svc.LatticeServiceName(), assoc.Arn, err) | ||
| } else if isManaged { | ||
| err = m.deleteAssociation(ctx, assoc.Arn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I think this way makes more sense in terms of consistency.
|
@solmonk interesting. have not run into the same issue with vpc association but seems like the same issue. I will update the PR. |
17a4ec5 to
4942650
Compare
4942650 to
e8e79e2
Compare
|
@mikhail-aws @zijun726911 it seems the presubmit hook is stuck, any advice on how to fix it? |
|
presubmit passed, will merge it |
|
@shulin-sq Do you need our side to do a new controller version release( new helm chart) to include this fix? Or you build your image by yourself in your private image repo? |
|
@zijun726911 We already created an internal build that includes this change. No rush on a release version. |
|
I had to redact some fields from a comment I left earlier. Here is the comment: some notes: I was unable to reproduce the VPC problem. I do have my VPC association set up in another account, so I'm unsure why I'm not running into the same issue. Specifically my scenario is: account A owns the sn, account A shares sn to account B, account B does the vpc association. account A does not have access to the vpc association. The controller has iam permissions for account A. before while debugging (when inspecting the awserr.Error) {"level":"error","ts":"2024-02-26T01:31:06.420Z","logger":"controller.route","caller":"lattice/service_manager.go:227","msg":"shulin {"level":"warn","ts":"2024-02-26T01:47:25.568Z","logger":"controller.route","caller":"lattice/service_manager.go:232","msg":"skippin |
What type of PR is this?
bug
Which issue does this PR fix:
In the scenario where
Account A can be in a situation where does not have permissions to read the tags of that association
Therefore this results in an error during route reconciliation.
What does this PR do / Why do we need it:
if the get tags of a service network association request fails, instead of erroring, skip deleting the association and log.
If an issue # is not available please add repro steps and logs from aws-gateway-controller showing the issue:
see above scenario
Testing done on this change:
tested the fix in a staging environment (on top of most recent master) but open to suggestions on how test it via unit tests (looks like current tests don't touch on this scenario).
sample error message:
Automation added to e2e:
Will this PR introduce any new dependencies?:
no
Will this break upgrades or downgrades. Has updating a running cluster been tested?:
no, yes
Does this PR introduce any user-facing change?:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.