Skip to content

Conversation

@solmonk
Copy link
Contributor

@solmonk solmonk commented Nov 16, 2023

What type of PR is this?

bug

Which issue does this PR fix:

  • Use strongly consistent list call on ListTargetGroups, fixes target group leaking issue (which seems to happen more often when we have a lot of existing target groups)
  • Correctly handle when there are multiple target groups matching tags in findTargetGroup(). This can happen in the middle of TG protocol update, etc. Target groups only become unique after matching more fields including protocol. Changed related e2e test to make sure unused target groups are getting replaced by this logic.
  • Fix wrong e2e test cases on delete_target_group_test, deleting a deployment still results in target deregistration as it does not delete service.
  • Fix bug where e2e test will pick up existing other hosted zone and delete it
  • Made a few tests ordered. We are running tests sequentially anyways, and even in parallel execution we will likely keep one file per worker.
  • Made a few tweaks to be ready for parallel execution, it seems working with 2 procs but not sure if it is reliable.

What does this PR do / Why do we need it:

If an issue # is not available please add repro steps and logs from aws-gateway-controller showing the issue:

Testing done on this change:
make e2e-test

Automation added to e2e:

Will this PR introduce any new dependencies?:

Will this break upgrades or downgrades. Has updating a running cluster been tested?:

Does this PR introduce any user-facing change?:


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Comment on lines +296 to +298
if aws.StringValue(hz.Name) != name {
return nil, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why name should not be equal? list zones should apply dns name filter
out, err := client.ListHostedZonesByName(&route53.ListHostedZonesByNameInput{DNSName: &name})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Fix bug where e2e test will pick up existing other hosted zone and delete it"

There was intention to use existing hosted zone if test was terminated and zone leaked. Test re-run should pick-up previous zone and cleanup after. Otherwise need to manually delete them from route53.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ListHostedZonesByName always returns all hosted zones in the account. DNSName field only guarantees the first element that is getting picked up, and you still get other hosted zones in the following elements.

switch status {
case vpclattice.TargetGroupStatusCreateInProgress, vpclattice.TargetGroupStatusDeleteInProgress:
return nil, errors.New(LATTICE_RETRY)
case vpclattice.TargetGroupStatusDeleteFailed, vpclattice.TargetGroupStatusActive:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to return err for TargetGroupStatusDeleteFailed tg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, they do exist.

ResourceArn: service.Arn,
})
managed, err := env.Cloud.IsArnManaged(ctx, *service.Arn)
if err == nil { // for err != nil, it is possible that this service own by other account, and it is shared to current account by RAM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: change variable name to managedByCurrentCluster?

Seems this comment description is not really precise:
// for err != nil, it is possible that this service own by other account, and it is shared to current account by RAM

should be // for err != nil, it is possible that this lattice service is own by other cluster
not just limit to the RAM share

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its failing to fetch tags so most of the time it will be other account, not other cluster within the same account. I can update the comment though.

}
}

retrievedTargetGroups, _ := env.LatticeClient.ListTargetGroupsAsList(ctx, &vpclattice.ListTargetGroupsInput{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to pass vpcIdentifier in vpclattice.ListTargetGroupsInput{} ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense

continue
}
managed, err := env.Cloud.IsArnManaged(ctx, *tg.Arn)
if err == nil { // for err != nil, it is possible that this service own by other account, and it is shared to current account by RAM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to change this comment, it is checking target group existing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed that, will change

IpAddressType: latticeTg.Config.IpAddressType,
Type: latticeTg.Type,
VpcIdentifier: latticeTg.Config.VpcIdentifier,
}, nil) // we already know that tags match
Copy link
Contributor

@zijun726911 zijun726911 Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering should we add a Protocol(or other tg fields) tags for TargetGroup?
So that to make sure one FindResourcesByTags() api call could definitely exactly match one TG?

(not in this PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it is necessary. Actually.. I wanted to remove as many tags as possible. I "had to" add protocolVersion in the tag because it is not available in TGSummary, so it needed extra Get() call to be figured out. Ideally VPC Lattice should just return protocolVersion in the summary instead.

Copy link
Contributor

@zijun726911 zijun726911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, are we safe to add the --procs param in the Makefile?

@solmonk
Copy link
Contributor Author

solmonk commented Nov 16, 2023

I have thought about that but I think it is at experimental stage yet. Will add that once I have more confidence

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants