Skip to content

Conversation

@erikfuller
Copy link
Contributor

What type of PR is this?
Huge. I'm sorry.

Which issue does this PR fix:
#425

What does this PR do / Why do we need it:
This PR changes how the controller manages target groups, which in turn impacts everything that references target groups, which is a lot.

Previously, target groups and other objects were stored in a local cache/hashmap called the lattice data store. The logic around the data store was replicated in many places and prone to error. This change aims to improve target group handling by changing the following:

  • removes the cache, instead relying on other sources of truth
  • removes duplicated "how to identify a target group" logic (also services and listeners)
  • TG names now include a random element, allowing more flexibility across clusters and VPCs
    • For example, the same k8s service can now be used as a backendRef on multiple routes
  • Target groups are identifiable by their attributes, including tags, rather than their names

As part of this change, I have done a lot of refactoring as well as updating logic around services, listeners, and rules, including logic that removes unused resources. Basically everything that builds or deploys a model relating to these objects has had a major overhaul. My aim was consistency across the different model builder/synthesizer/manager components.

Additionally, I have aimed to improve unit tests to be more focused on outcomes. Some unit test files I've completely rewritten because porting the originals to the new logic just didn't make sense.


Overall, our basic approach has not changed. We have a controller which uses model builders to build a model, then deployers which use synthesizers. Synthesizers use managers. What has changed is that previously these model builders appeared more nicely decoupled, at least at the model build stage (see model_build_lattice_service.go). This decoupled logic comes at the cost of needing to replicate referential logic throughout each builder. For example, when building a rule we needed to know how to reference a target group that was created by a different builder. Similarly, in the synthesizers/managers, we would need to reconstruct references (typically hash keys) to find the objects we needed stored during synthesis.

The core of this change is a shift to a recursive, referential approach. When building the model now, we do not, for example, build model rules separately at the top level alongside services. Instead, we build rules for specific listeners in the stack. Similarly, we now build target groups as we build the rules that need those target groups. This approach keeps pointers back to the object in the stack that will later be required during synthesis.

Then, during synthesis, we start with services, which will yield the actual service ARN and update the modeled service in the stack via the Status field. When we synthesize the listeners, we will resolve the reference to the service in the stack, then pull the ARN from the status. The same approach works for rules, where we pull service, listener, and target group Ids out of the objects stored in the stack using the references we have.

So, previously we had something like (gross oversimplification):

func synthesize(rule1) {
  // ... find all the fields to reference a target group, service name, namespace, port protocol etc
  dsTg := datastore.get(tgKey)
  // if not in datastore, try to get it
  latticeTg := lattice.find(tgInfo)
  // get tg id from either DS or Lattice
  ...
}

Now it looks something like:

func synthesize(rule1) {
  tg := stack.get(rule1.Tg.stackId)
  rule1.Tg.Id = tg.Status.Id
  ...
}

This does complicate the model building logic as builders are now nested, but it greatly simplifies later resource identification and consolidates any "find resource X" logic to only that resource's synthesizer+manager.

Testing done on this change:

?   	github.com/aws/aws-application-networking-k8s/pkg/apis/applicationnetworking/v1alpha1	[no test files]
?   	github.com/aws/aws-application-networking-k8s/pkg/deploy	[no test files]
ok  	github.com/aws/aws-application-networking-k8s/pkg/aws	0.321s	coverage: 25.4% of statements
ok  	github.com/aws/aws-application-networking-k8s/pkg/aws/services	0.527s	coverage: 9.4% of statements
ok  	github.com/aws/aws-application-networking-k8s/pkg/config	0.425s	coverage: 30.3% of statements
?   	github.com/aws/aws-application-networking-k8s/pkg/k8s	[no test files]
?   	github.com/aws/aws-application-networking-k8s/pkg/model/lattice	[no test files]
ok  	github.com/aws/aws-application-networking-k8s/pkg/deploy/externaldns	0.685s	coverage: 75.6% of statements
?   	github.com/aws/aws-application-networking-k8s/pkg/runtime	[no test files]
?   	github.com/aws/aws-application-networking-k8s/pkg/utils	[no test files]
?   	github.com/aws/aws-application-networking-k8s/pkg/utils/gwlog	[no test files]
?   	github.com/aws/aws-application-networking-k8s/pkg/utils/retry	[no test files]
?   	github.com/aws/aws-application-networking-k8s/pkg/utils/ttime	[no test files]
ok  	github.com/aws/aws-application-networking-k8s/pkg/deploy/lattice	0.915s	coverage: 82.6% of statements
ok  	github.com/aws/aws-application-networking-k8s/pkg/gateway	1.600s	coverage: 72.4% of statements
ok  	github.com/aws/aws-application-networking-k8s/pkg/model/core	1.823s	coverage: 59.6% of statements
ok  	github.com/aws/aws-application-networking-k8s/pkg/model/core/graph	0.893s	coverage: 17.2% of statements
ok  	github.com/aws/aws-application-networking-k8s/controllers	0.715s	coverage: 11.4% of statements
ok  	github.com/aws/aws-application-networking-k8s/controllers/eventhandlers	1.040s	coverage: 53.5% of statements

Automated tests were also run, though I had issues with an unrelated test around the vpc+service network association target group not being cleaned up. e2e tests required minimal updates, other than the logic for identifying the correct target group, which is expected. I also added in a small optimization to speed up route deletion by first deleting all rules. This first disassociates target groups from services meaning they are not required to drain.

Automation added to e2e:

Will this PR introduce any new dependencies?:
No

Will this break upgrades or downgrades. Has updating a running cluster been tested?:
Target group naming is incompatible with previous versions. A one-time cleanup would be required. This is true of upgrade or downgrade, but this is otherwise a non-impactful change for existing configurations.

Does this PR introduce any user-facing change?:

Target group names will no longer conflict across clusters for identical specs. Target group names now include a randomized suffix and all identification/deduplication is done through first class fields on the target group and tags.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@coveralls
Copy link

coveralls commented Oct 28, 2023

Pull Request Test Coverage Report for Build 6712208465

  • 1299 of 1767 (73.51%) changed or added relevant lines in 33 files are covered.
  • 219 unchanged lines in 17 files lost coverage.
  • Overall coverage decreased (-0.6%) to 44.166%

Changes Missing Coverage Covered Lines Changed/Added Lines %
controllers/iamauthpolicy_controller.go 0 2 0.0%
pkg/model/core/httproute.go 0 3 0.0%
controllers/service_controller.go 0 4 0.0%
pkg/gateway/model_build_listener.go 39 44 88.64%
controllers/serviceexport_controller.go 0 6 0.0%
pkg/deploy/lattice/targets_synthesizer.go 7 13 53.85%
pkg/deploy/lattice/service_manager.go 16 23 69.57%
controllers/route_controller.go 9 17 52.94%
pkg/model/core/stack.go 17 25 68.0%
pkg/gateway/model_build_targets.go 97 107 90.65%
Files with Coverage Reduction New Missed Lines %
controllers/route_controller.go 1 42.12%
controllers/serviceimport_controller.go 1 0.0%
pkg/deploy/lattice/listener_synthesizer.go 2 70.0%
pkg/deploy/lattice/rule_synthesizer.go 2 74.85%
pkg/deploy/lattice/service_manager.go 2 73.21%
pkg/gateway/model_build_lattice_service.go 2 52.59%
pkg/gateway/model_build_listener.go 2 80.52%
pkg/gateway/model_build_rule.go 3 79.79%
pkg/gateway/model_build_targets.go 3 89.33%
pkg/deploy/lattice/target_group_manager.go 4 90.57%
Totals Coverage Status
Change from base Build 6672307899: -0.6%
Covered Lines: 3884
Relevant Lines: 8794

💛 - Coveralls

if err != nil {
return err
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use one outer for _, resTargetGroup := range resTargetGroups{ loop could make it more clear and concise ?

For example:

	for _, resTargetGroup := range resTargetGroups {
		prefix := model.TgNamePrefix(resTargetGroup.Spec)
		if bool(performDeletes) && resTargetGroup.IsDeleted {
			err := t.targetGroupManager.Delete(ctx, resTargetGroup)
			if err != nil {
				t.log.Infof("Failed TargetGroupManager.Delete %s due to %s", prefix, err)
				returnErr = true
			}
		}
		if bool(performUpserts) && !resTargetGroup.IsDeleted {
			tgStatus, err := t.targetGroupManager.Upsert(ctx, resTargetGroup)
			if err == nil {
				resTargetGroup.Status = &tgStatus
			} else {
				t.log.Debugf("Failed TargetGroupManager.Upsert %s due to %s", prefix, err)
				returnErr = true
			}
		}
	}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had it this way before but I agree it's not optimal. I think I may have a better option and will just separate them entirely.

}
type TargetGroupTagFields struct {
EKSClusterName string `json:"eksclustername"`
K8SParentRefType ParentRefType `json:"k8sparentreftype"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems values of this K8SParentRefType could be : ServiceExport, HTTPRoute, GRPCRoute that is totally not related to gateway api Route's parentRefs

Do we need to change to another tag name here to aviod confusion? (e.g, K8SResourceRefType)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. Will update the name to be more appropriate.

TargetGroupTagFields
}
type TargetGroupTagFields struct {
EKSClusterName string `json:"eksclustername"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be a more general tag name: K8sClusterName(or ClusterName)? because users are able to install the controller in their self-managed k8s cluster in aws (for example, by kops)

BTW, we already had the DefaultTags:

func getManagedByTag(cfg CloudConfig) string {
	return fmt.Sprintf("%s/%s/%s", cfg.AccountId, cfg.ClusterName, cfg.VpcId)
}

Do you still need a new EKSClusterName tag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on removing "EKS" from the name. I'll have to think if there are any downsides using the manged-by tag.

return t.K8SParentRefType == ParentRefTypeSvcExport
}

func (t *TargetGroupTagFields) IsRoute() bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsRefByRoute()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change based on the updated tag name

resp, err := vpcLatticeSess.ListTargetGroupsAsList(ctx, &targetGroupListInput)
listInput := vpclattice.ListTargetGroupsInput{}
resp, err := s.cloud.Lattice().ListTargetGroupsAsList(ctx, &listInput)
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:
Seems we have 2 places to invoke findTargetGroup():

  • (s *defaultTargetGroupManager) Upsert()
  • (s *defaultTargetGroupManager) Delete()

I assmue above 2 places just manage "local" cluster's target groups(i.e., intendToCreate/intendToDelete TG.VpcId === config.VpcID)?

In that case, can we add a VpcIdentifier filtering condition when listTG to improve preformance?:

listInput := vpclattice.ListTargetGroupsInput{
VpcIdentifier: &config.VpcID,
}
resp, err := s.cloud.Lattice().ListTargetGroupsAsList(ctx, &listInput)

(Could consider to do It in a seperate PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll double-check, but this should work. I didn't realize we had Vpc as an input to the list call, thanks for calling this out!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a closer look it's possible that in the service export case the VPC will be different than what's in the controller's config. Using the vpc id of the target group we're looking for should work though.

Copy link
Contributor

@mikhail-aws mikhail-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont see blockers for merge, many good changes and code getting into better shape. It will be easier to improve it further after this PR. I see there are behavioral changes, but cant tell by looking at the code whats the impact.

If it were a number of smaller PRs I would left more comments, this one just too big to talk details.

So high level thoughts, not related to your PR in particular. I dont see reasons for Generic Stack to exists, reading code and see how we use it, it makes it harder to comprehend and trace. Seems unnecessary abstraction.

Comment on lines +91 to +93
// today, target registration is handled through the route controller
// if that proves not to be fast enough, we can look to do quicker
// target registration with the service controller
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this controller doesn't do anything now, do we need it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually removed it then put it back. I do think we'll eventually want something responding just to service changes to do target group updates, though I don't know when we would add that logic. Previously, this controller was handling certain cases the route controller logic was missing, but now it's all covered in the route controller.

Copy link
Contributor

@mikhail-aws mikhail-aws Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still curious why k8s service is not reconciled into lattice service. And k8s Routes to target groups. Why everything in Routes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this controller will have to handle removing old finalizers I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The finalizers issue is a great point. Without this, services would never be able to be deleted without manually removing the finalizers after an upgrade.

Comment on lines 21 to 23
// Get a resource by its id and type
GetResource(id string, resType Resource) (Resource, error)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove generic stack, and replace with exact models of lattice resources. Our dependency relationship is strict and single way, we dont need to model it as arbitrary graph. May be in some distant future we will manage something else than lattice, we can revisit it.

Also philosophy of controllers same as API, do a job on single resource type, if we build complex graph for single controller, it might be good idea to split it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would probably simplify things to simply remove the modeling step entirely. Something else I noticed was the code has to be more complicated because it handles both delete and upsert scenarios. Eliminating the model may allow us to further simplify the delete logic as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense to remove generic stack, created a issue for it: #463

}

func (r *ruleSynthesizer) adjustPriorities(ctx context.Context, snlStackRules map[snlKey]ruleIdMap, resRule []*model.Rule) error {
var lastUpdateErr error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than showing only last error, you can display all of them with about same code using errors.Join. Looks like this:

var updateErr error

for ... {
  err := doUpdate()
  if err != nil {
    updateErr = errors.Join(updateErr, err)
  }
}

if updateErr != nil { ... }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Comment on lines 253 to 261
resListener, err := r.stack.GetResource(rule.Spec.StackListenerId, &model.Listener{})
if err != nil {
return nil, nil, err
}

listener, ok := resListener.(*model.Listener)
if !ok {
return nil, nil, errors.New("unexpected type conversion failure for listener stack object")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we enforce type check inside GetResource, so we dont need to do type assertion later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there must be a way to do this that's more idiomatic - I'll sync up with you offline to get your input

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can pass reference and let stack populate content of struct. Same as k8s client in first lines of Reconcile functions.

resListener := &model.Listener{}
err := r.stack.GetResource(id, resListener)
if err != nil { ... }

// use listener
resListener.Foo 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, after much hacking I think I have something nicer here.

Copy link
Contributor

@mikhail-aws mikhail-aws Oct 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Least invasive, a bit ugly. Works on top of new GetResource method. But change method signature to (id, resType string)

func GetStackResource[T any](s stack, id string) (T, error) {
  var zero T // need zero in case T is a struct, that will create an empty struct, for pointer type it will be nil
  r, err := s.GetResource(id, reflect.TypeOf(zero)) // change method signature to GetResource(id, resType string)
  if err != nil {
    return zero, err
  }
  t, ok := r.(T)
  if !ok {
    return zero, errors.New("...")
  }
  return t, nil
}

// if IpAddressTypeIpv4 is not set, then default to nil
if targetGroup.Spec.Config.IpAddressType == "" {
ipAddressType = nil
func (s *defaultTargetGroupManager) create(ctx context.Context, modelTg *model.TargetGroup, err error) (model.TargetGroupStatus, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need err as argument?

Copy link
Contributor Author

@erikfuller erikfuller Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may have been a Goland refactoring artifact. I'll have a look and thanks for picking it up.

err := t.stack.ListResources(&resTargetGroups)
if err != nil {
return err
func (t *TargetGroupSynthesizer) isControllerManaged(latticeTg tgListOutput) (model.TargetGroupTagFields, bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bit over-complicated function. I would make to return bool and then copy tagFields separately.

if  !isControllerManaged(latticeTg) {
 return ...
}
tagFields := model.TGTagFieldsFromTags(latticeTg.targetGroupTags.Tags)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've separated and simplified this logic

s.log.Errorf("Deregistering targets for target group %s failed due to %s", tg.ID, err)
}
}
s.deregisterStaleTargets(ctx, modelTargets, modelTg, listTargetsOutput)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should return errors not log them. Probably ignore NotFound errors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have another look at this. I believe the logic previously was to ignore, but in the spirit of ensuring all the desired changes are made we should consider erroring instead.

Comment on lines 107 to 108
d.log.Infof("Error during tg synthesis %s", err)
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rule of thumb: log or throw, not both. If error is not informative enough, errors.Join or fmt.Errorf("%w: new information", err)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding this. I had a few of these in while I was debugging but tried to clean them up - must have missed this one. Will have another look to see if there are any others still lingering about.

HealthCheckConfig *vpclattice.HealthCheckConfig `json:"healthcheckconfig"`
TargetGroupTagFields
}
type TargetGroupTagFields struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TagFields are Tags, Keys or Values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the fields that become tags on the target group. Totally open to a better name here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort of typed placeholder for the tags that will eventually merge with other tags?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having them in a separate struct is really just a convenience that allows us operate on them independent of all the other spec fields. None of the other spec fields become tags. This structure/name is just internal to the controller code and could be changed later with zero impact.

return false, nil
}

// one last check - ProtocolVersion is not present on TargetGroupSummary, so we have to do a Get
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if we could save a call by putting protocolversion to tags.

Copy link
Contributor Author

@erikfuller erikfuller Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think we should just return ProtocolVersion in the API as part of List

@solmonk
Copy link
Contributor

solmonk commented Oct 30, 2023

I like the how tests are written and that we have much more test coverage, but looks like the styles across overall code is quite inconsistent.

  • rule_synthesizer_test.go:58 - I like the use of subtests but I could not see this in some other tests
  • target_group_manager_test.go:32 - I think this could be more simplified as table-driven test, which is being well used in other tests
  • Mocks, my understanding is we need mocks for manager which involves Lattice call, not sure why we would need mocks for model builders. e.g. model_build_targetgroup_mock.go, model_build_lattice_service_mock.go

These are not really critical but this might be a good opportunity to think about having a good unit testing practice

) (core.Stack, error) {
stack := core.NewDefaultStack(core.StackID(k8s.NamespacedName(route.K8sObject())))

if b.brTgBuilder == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why b.brTgBuilder could possibly be nil? we already initialized it in NewLatticeServiceBuilder()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for flagging this. I actually wanted to move all this initialization outside and rather take it as a parameter in NewLatticeServiceBuilder(), but looks like I missed it. Will follow up on this.

TargetGroupID string `json:"latticeServiceID"`
Name string `json:"name"`
Arn string `json:"arn"`
Id string `json:"id"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: In TargetGroupStatus, Arn and Id are duplicated, just need the Arn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We pretty much use Id everywhere, though we use sometimes use the Arn in logging. It doesn't really hurt anything to have both, but it does mean we need to set them both. I think we can probably defer this to later.

Comment on lines 21 to 23
// Get a resource by its id and type
GetResource(id string, resType Resource) (Resource, error)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense to remove generic stack, created a issue for it: #463

@erikfuller
Copy link
Contributor Author

I like the how tests are written and that we have much more test coverage, but looks like the styles across overall code is quite inconsistent.

This is 100% true. In general, I spent the minimum amount of time I thought I could get away with while still getting the testing that was needed. In some cases, this was modifying the tests that already existed, in others it was writing new ones from scratch. I also took the path of least resistance for the tests. Ones that don't have much mocking I felt work better with an "inputs and outputs" format, and sometimes I created subtests when it helped me organize my thoughts but didn't slow me down. When there is any kind of conditional logic involved I like the table-based tests less since the logic for the test starts to get complicated and it's harder to tell what's actually being tested.

  • rule_synthesizer_test.go:58 - I like the use of subtests but I could not see this in some other tests

I see if I can add some more subtests in the longer tests.

  • target_group_manager_test.go:32 - I think this could be more simplified as table-driven test, which is being well used in other tests

Yes. This is one I was able to get away with making small modifications, so this mostly looks like what's on the main branch.

  • Mocks, my understanding is we need mocks for manager which involves Lattice call, not sure why we would need mocks for model builders. e.g. model_build_targetgroup_mock.go, model_build_lattice_service_mock.go

With the changes, we now have nested model builders. So, now, the service builder calls the listener builder which calls the target group and rule builders. Rather than provide test inputs so these all process successfully, I've changed the logic to just accept a separate (mock) builder. This helps separate unit testing and reduces overall complexity of the tests.

These are not really critical but this might be a good opportunity to think about having a good unit testing practice

Appreciate the feedback here. I think overall these are an improvement over what was there previously, but I'll see if there are any small adjustments I can make to further improve. Please just let me know if there are any other specifics you'd like me to address before the merge.

@solmonk ^^

prefix := TgNamePrefix(spec)
randomSuffix := make([]rune, RandomSuffixLength)
for i := range randomSuffix {
randomSuffix[i] = rune(rand.Intn(26) + 'a')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so looks like the random suffix is made of alphabets, might have chance to hit some bad name here. One mitigation I heard before is to not use vowels

K8SServiceNameKey = "K8SServiceName"
K8SServiceNamespaceKey = "K8SServiceNamespace"
K8SRouteNameKey = "K8SRouteName"
K8SRouteNamespaceKey = "K8SRouteNamespace"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like you are updating some of the tag keys anyways, have you thought about adding a common prefix to these as well? Like gateway-api: (idk exactly but colon seems like a common practice) gives a sense that these tags are managed by controller

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we talked about it briefly in a separate thread. We could use the same as the managedBy tag, so application-networking.k8s.aws. If we think it's better to just make that change now (which it probably is) we can maybe sync up quickly and decide on this. Would be a quick thing for me to change I think.

@solmonk
Copy link
Contributor

solmonk commented Oct 31, 2023

I think tests are something we could improve on later (thanks for the explanation), and the decisions on overall refactoring makes a lot of sense to me. TG tags and names are key user impacting changes so I'm more focused on that, I think we are good if this behavior is (quick) e2e tested and the comments on this part are addressed.

@erikfuller
Copy link
Contributor Author

I've made updates based on yesterday's comments. I also tested upgrade/downgrade using the example in the "Getting Started" doc (up to the point it goes to multicluster). kubectl exec succeeded throughout, and target groups were transitioned successfully both during upgrade and downgrade. As expected, target groups that aren't compatible with the controller version are left behind, but they are pretty easy to identify by suffix and/or tags.

@mikhail-aws
Copy link
Contributor

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants