Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Flesh out v2 LoadBalancer #203

Merged
merged 9 commits into from
Jan 25, 2017
Merged

Conversation

PtrTeixeira
Copy link
Contributor

WIP effort at filling out a handler for the ApplicationLoadBalancer to
match the interface from the old (v1/classic) load balancer within the
BaragonService. Should allow us to handle BaragonAgent connections /
lgoad balancer connections using the new API, once it gets finished.

WIP effort at filling out a handler for the ApplicationLoadBalancer to
match the interface from the old (v1/classic) load balancer within the
BaragonService. Should allow us to handle BaragonAgent connections /
lgoad balancer connections using the new API, once it gets finished.
Move away from using `String.format(...)` in logging calls, in
favor of slf4j's native interpolation with `{}`. This has no real
impact and was just a quick win in the process of working on other
parts of the application.
Removing a target (=== Instance) should (probably) work now, both
individually and as part of the sync cycle. Adding new instances is
going to take a little more work because ApplicationLoadBalancers have a
little bit more infrastructural overhead than Classic load balancers.

This commit also finishes making the change-over from using a string to
represent the load balancer (TrafficSource). There was a datastore that
should needed to move over to the new interface, which meant that there
was a resource that also needed to move over.
Currently have sync ~ working in the ApplicationLoadBalancer.
At the moment, syncing will
1. Find a target group with the same name as a baragon group, if possible
2. Guarantee that the targets (== instances) that baragon agents think
   that they're on are members of the target group
3. Guarantee that the load balancers that they baragon agents think
   they're talking to can reach all the az's that the baragon agents
   think that they're on
4. De-register targets that don't appear to have a known baragon
   agent associated with them, or which appear to have an uncommunicative
   baragon agent associated with them.

It is limited in that, among other things, it requires that the target
group be named the same as the baragon group. At the moment, the
application load balancer will not support explicit requests to add
or remove targets and requires a certain amount of infrastructure
to already be in place.
@PtrTeixeira
Copy link
Contributor Author

cc @ssalinas Mind taking a look at this?

Copy link
Member

@ssalinas ssalinas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bunch of smaller comments for efficiency/etc. Nice first draft of this!

* For this class I am working under the assumption that the ID's of Targets can
* be interchanged with the ID's of instances. That is, if you ask this class
* for the health of instance 12345, then you will actually expect the health
* of target 12345.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be accurate. The docs allude to the ability to register something besides an ec2 instance as a target (maybe future functionality?) but we won't have to worry about that. For our purposes it's accurate that instance id == target id 👍

} else {
LOG.debug(String.format("Agent %s is already registered", agent));
LOG.debug("Agent {} is already registered", agent);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for updating all of these

@@ -68,7 +70,8 @@ public LoadBalancerResource(BaragonLoadBalancerDatastore loadBalancerDatastore,
@POST
@Path("/{clusterName}/sources")
public BaragonGroup addSource(@PathParam("clusterName") String clusterName, @QueryParam("source") String source) {
return loadBalancerDatastore.addSourceToGroup(clusterName, source);
TrafficSource trafficSource = new TrafficSource(source, TrafficSourceType.CLASSIC);
return loadBalancerDatastore.addSourceToGroup(clusterName, trafficSource);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll need an additional endpoint that can take a TrafficSource json body. Maybe add that as one with @Path("/{clusterName}/traffic-sources") and we can mark the old one as @Deprecated

if (baragonGroup.getSources().isEmpty()) {
LOG.debug("No traffic sources present for group {}", baragonGroup.getName());
} else {
Collection<LoadBalancer> elbsForBaragonGroup = getLoadBalancersByBaragonGroup(baragonGroup);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we've had problems in the past with hitting aws' api rate limits. Rather than making an individual call for each (even though it is filtered down), it might be wise to move this outside of the loop. We can describe all ALBs in one call, then grab what we need from the object. It's likely faster to loop through the object in memory than to make a new call for new data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that I got most of this, but it is still making calls in a pretty tight loop in isLastHeathlyInstance during de-registration. I can try and patch that up too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a tough one, especially since we really want to be sure that info is up to date for that check. If we can lessen the api calls in other areas at least i think that is fine. We are doing something similar for classic load balancers with isLastHealthyInstance as well


LOG.debug("Registering TargetGroup for baragon group {}", baragonGroup.getName());
Optional<TargetGroup> maybeTargetGroup = guaranteeTargetGroupFor(baragonGroup);
if (maybeTargetGroup.isPresent()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this isn't present let's log, maybe at WARN level, so that we have an indicator of what's going on

}
} else {
/* Is this really an appropriate thing to do here? */
throw new IllegalArgumentException(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good here since we catch it right below. We want to continue trying to sync the remaining agents (which we do), but this is a rather blocking error for that one agent, so logging at error level + the notify is good to do here.

}

guaranteeHasAllTargets(targetGroup, baragonAgents);
guaranteeAzEnabled(baragonAgents, loadBalancers);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The az enable will need to go before the target addition most likely. In the case of the classic load balancers, if the az/subnet were not enabled, any registration in that az/subnet would fail. I would suspect it is likely the case here as well, although I can't find any docs that say so

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swapped. I think it mattered more when the instances were registered against the load balancers directly, but I couldn't find anything to say that TargetGroups cared about subnets or az's.

}

try {
SetSubnetsRequest subnetsRequest = new SetSubnetsRequest()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does SetSubnets override all current ones? If so we'd want to make sure this list also includes ones currently active so we aren't removing any that might still be needed (i.e. subnetsOnLoadBalancer + subnetIds here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thank you!

}
}

private void guaranteeAzEnabled(Collection<BaragonAgentMetadata> agents,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read over the docs here again. In the old classic the az/subnet operations were a bit more separated. I think here we can get away with gathering the list of subnets from the agents and feeding that to guaranteeHasAllSubnets, rather than doing the az check and than having to check subnets after

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that if there were agents that were assigned to the same az, but different subnets then we might accidentally wind up clobbering access to one of the subnets. I think load balancers are still on a one-subnet-per-az policy, and I didn't see anything that said what would happen if two subnets in the same az were added at the same time.


Collection<TargetDescription> removableTargets = new HashSet<>();
for (TargetDescription targetDescription : targetsOnGroup) {
if (! knownAgentIds.contains(targetDescription.getId()) && canDeregisterAgent(baragonGroup, targetDescription.getId())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should be able to copy some of the logic from here for this. If you find yourself rewriting bits of those methods, feel free to move them to the ElasticLoadBalancer class so they can be shared

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the data you are looking at is slightly different than you are expecting here. Your agentIds method below is returning Baragon's agent id (which in this case is the url on which the agent can be hit by BaragonService). What you probably want in the knwonAgentIds variable is the instance IDs which should be equal to the target id. Instance ID should also be available in the BaraongAgentMetadata. Maybe rename that list as well to better show what it contains once you update it.

@ssalinas
Copy link
Member

Added a number of comments around efficiency of the api calls we are making/etc @PtrTeixeira , nice first draft of this!

Patch up (most) of the issues raised during review. Still to do:
1. Migrate duplicated logic in `knownAgents` &c into `ElasticLoadBalancer`
2. Add an additional endpoint that offers a little bit more control
   over creating traffic sources to `LoadBalancerResource`
I had duplicated some of the functionality in `ElasticLoadBalancer`
w/r/t checking whether an instance / target could be de-registered in
order to keep the scope of the changes that I was making smaller.  This
DRYs it a little bit; makes `ElasticLoadBalancer` take an ID rather than
a model object from the elasticloadbalancingv1, so it can be used from
both the `ApplicationLoadBalancer` and the `ClassicLoadBalancer`.
String nextPage = result.getNextMarker();
loadBalancers.addAll(result.getLoadBalancers());

while (! nextPage.isEmpty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking, will the aws API ever return a null for this, or is it just an empty string when there are no more pages? If it's possible there might be a null, I'd opt for !Strings.isNullOrEmpty() here

deregisterIfOld(baragonGroup, targetGroup, baragonAgents, targets);
}
} else {
LOG.debug("Not TargetGroup for Baragon Group {}", baragonGroup);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not -> No

public void syncAll(Collection<BaragonGroup> baragonGroups) {
Collection<LoadBalancer> allLoadBalancers = getAllLoadBalancers(baragonGroups);
try {
for (BaragonGroup baragonGroup : baragonGroups) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since a few of the methods in this for loop can possibly throw exceptions, we should probably move the try/catch inside the loop. This way we are still syncing the remaining groups, even if one should fail

return targetDescriptions;
}

private Collection<TargetDescription> listOldTargets(BaragonGroup baragonGroup,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the sake of clarify I'd rename this method to something like listRemovableTargets. Old is just a bit generic in terms of what we are actually finding here.


private Collection<TargetDescription> listOldTargets(BaragonGroup baragonGroup,
Collection<TargetDescription> targetsOnGroup,
Collection<BaragonAgentMetadata> knownAgents) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be careful what you call 'known' for agents. There is a concept of BaragonKnownAgentMetadata which is the agent metadata along with a 'list seen' timestamp (stored in a separate zk node). However, what you are passing here is actually the regular agent metadata for the active agent (as stored in the leader-latch zk node)


Collection<TargetDescription> removableTargets = new HashSet<>();
for (TargetDescription targetDescription : targetsOnGroup) {
if (! knownAgentIds.contains(targetDescription.getId()) && canDeregisterAgent(baragonGroup, targetDescription.getId())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the data you are looking at is slightly different than you are expecting here. Your agentIds method below is returning Baragon's agent id (which in this case is the url on which the agent can be hit by BaragonService). What you probably want in the knwonAgentIds variable is the instance IDs which should be equal to the target id. Instance ID should also be available in the BaraongAgentMetadata. Maybe rename that list as well to better show what it contains once you update it.

@ssalinas
Copy link
Member

Once the comments above are addressed we should just need to implement the removeInstance and registerInstance methods, then we can merge this into the alb_support branch and give it a whirl 👍

Involved a handful of changes, including a nullity check, swapping the
order of a loop and try block, and clarifying some conceptual
differences related to agent metadata.
}

LOG.debug("ELB sync complete for group {}", baragonGroup);
} catch (AmazonClientException acexn) {
LOG.error("Could not retrieve elb information due to ELB client error {}", acexn);
exceptionNotifier.notify(acexn, ImmutableMap.of("groups", baragonGroups == null ? "" : baragonGroups.toString()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We're already accessing baragonGroups above by entering the for loop, probably don't need an additional null check

This was just a copy-paste relic; when I swapped the order of the try
block and the loop, the error message still checked for the nullity of
the _collection_ and had the _collection_ in the error message. The
nullity check was removed as being unnecessary, and the error message
now refers to the particular group that failed, rather than the
collection of groups.
Implement methods to register and remove instances (ie, targets) in the
context of an application load balancer. This works a little bit
differently, semantically, than the equivalent methods for the classic
load balancers. In particular, it treats the `TargetGroup` as the
`TrafficSource`, and registers and deregisters from the `TargetGroup`
rather than from load balancers individually. This means that whereas
before it was possible to register and deregister a single instances
from individual load balancers, now it is only possible to register and
deregister from load balancers on the scope of the target group.

For example, suppose that LB1 and LB2 were talking to Instance.  It
would have been possible to remove Instance from LB2 and leave it
talking to LB1. With ApplicationLoadBalancers, LB1 and LB2 talk to a
target group, so they can't be individually talked to at the instance
level.

It winds up working out the same way, if you consider
ClassicLoadBalancers and TargetGroups to be the equivalent
TrafficSources, rather than ClassicLoadBalancers and
ApplicationLoadBalancers. You just have the target belong to multiple
TargetGroups.
@ssalinas
Copy link
Member

Going to merge this into the other ALB pr, will be simpler to rebase off of the basepom upgrade that way

@ssalinas ssalinas merged commit 07bfceb into alb_support Jan 25, 2017
@ssalinas ssalinas deleted the aws_application_load_balancer branch January 25, 2017 15:44
@ssalinas ssalinas modified the milestone: 0.5.0 May 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants