-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(aws-elasticloadbalancingv2): Default HealthCheck Protocol Logic Flawed for NLB Target Groups with ALB Targets #18422
Comments
I updated my example a little. Bit of a CDK noob so the certificate and VPC info need to be imported for the example to run, but otherwise it should replicate the situation. After further thought, I think there's multiple bugs at play here, both in CDK and in AWS itself. Possible AWS bug: The traditional approach for NLB -> ALB is that the transfer protocol is TCP and the health check protocol is TCP (aimed at the IPs of the ALB). However, for the official NLB -> ALB approach directly through AWS Console, the only health check options are HTTP and HTTPS. TCP isn't even a protocol option for the health check for this target type (possible bug?). Definite AWS bug: You can't even change the protocol type associated with a health check on an already existing TCP protocol target group because it assumes the health check protocol is also TCP and could never be anything else. In reality, the health check protocol for an ALB target can never be TCP currently, so this is an AWS bug. You receive this error in CFN (and the AWS Console itself, so it's not a CFN exclusive issue) when attempting to change a resource that already exists:
CDK Docs issue: In my case, CDK apparently created me one with HTTP (the 'real' default in spite of the docs, I guess). This fails because my ALB does not respond to HTTP (only to HTTPS). What it needs it is a TCP health check, but that's not an option in the base AWS experience itself. However, the CDK docs for the health check protocol don't mention this. In fact, the CDK docs don't have a clean working example of NLB -> ALB at all, which would certainly be a good addition. This issue's goal - CDK defaults wrong: CDK's behavior leaves something to be desired in this instance. If the port is 80 on an NLB listener's target group with an ALB target, it should assume an HTTP health check. If it's 443, it should assume an HTTPS health check. If it is none of those, it should probably explicitly require a healthCheck object to be provided in order to clarify things. Figuring out the default health check was unhealthy weeks later because HTTP was the default health check protocol for a transfer protocol of TCP with a port of 443 is very unexpected. |
Updated both of my previous posts extensively after additional testing and research, including suggesting fallback logic in the 'Other information' section of the OP that I think will resolve the issue as users expect this to function. |
It's not just health check's protocol that needs this logic applied, technically, by the way. Health check's port does as well. Same idea, though. |
Thanks for the detailed report and thoughts. The CloudFormation documentation states that all of the health-check related parameters are optional, but does not state what the default is. It does appear that in lieu of an explicit setting, an HTTP health check on As with so many things, simply changing the defaults here leaves a possibility of breaking existing customers' setups. We would need to be careful to be only altering the default for this specific use case. Of course, as a workaround, simply explicitly setting the health check works. |
That's a good point. Hmm. It's probably very unlikely that a user currently has their target port at 443 and has made their ALB use an HTTP listener for that port, but it is possible. Additionally, due to the AWS bug mentioned above, if we changed the default protocol in some situations, it would break deploys for everyone affected because when it tries to update an existing target group, they will receive that CloudFormation error. I suppose the ideal case then for now is to issue a warning when building. Something like: if using the ALB target type for a target group and the health check object is not explicitly provided, put a warning line that says that using implicit health check options for target groups with an ALB target type can lead to unexpected results so we recommend they explicitly define at least the protocol and port. The underlying problem I want to address with this issue is most users will assume the default port is TCP, like it is for every other type of NLB target group, so we should at least give them some kind of heads up that no TCP option exists and that the default is actually not what they're going to expect. |
Actually, it doesn't appear to correctly use the default target port in the NLB -> ALB scenario either. At least, in the above example, it was sending HTTP checks on port 80 - not HTTP on port 443. It does appear to use the target port in other target group target types. Possibly another oversight on the general AWS case for this mechanism. |
I'm running into an issue similar to this. This doc (https://docs.aws.amazon.com/elasticloadbalancing/latest/network/application-load-balancer-target.html) says that when an Application Load Balancer is set to be the target for a Network Load Balancer, then the health check can be set to HTTPS with port 443 and the Listener is TCP port 443. I was able to manually configure this in the Console but CDK (Cloudformation, really, as the error occurs only at aws-cdk ts/js just updated to 2.18.0 this morning. https://www.npmjs.com/package/aws-cdk code example: // ... snip ...
const nlbTarget = new elbv2targets.AlbTarget(lb, 443)
const nlbTargetGroup = new elbv2.NetworkTargetGroup(this,'nlbTargetGroup', {
vpc,
targetType: elbv2.TargetType.ALB,
port: 443,
healthCheck: {
enabled: true,
port: '443',
path: '/health',
protocol: elbv2.Protocol.HTTPS
}
})
nlbTargetGroup.addTarget(nlbTarget)
nlb.addListener('nlbListenerHttps', {
port: 443,
defaultTargetGroups:[nlbTargetGroup]
})
// ...snip... error message:
|
Yeah, that's the AWS bug I mentioned above. It's not only a bug that this error even happens, but the text is bugged too. It omits the most important part. It should read like this: "You cannot change the health check protocol for AN EXISTING target group with the TCP protocol." I submitted a bug report to AWS as well using their feedback form (not sure how else to do that) for that particular error. As a workaround, you need to destroy the resource and then recreate it with the target group being HTTPS from the start. You can't change it. |
I'll go back and try again but I'm pretty sure my case was not editing an existing one but was creating it brand new. |
You're right. I was able to attach and ALB to an NLB using port 443 and an HTTPS health check when the TG was brand new. |
This issue has not received any attention in 1 year. If you want to keep this issue open, please leave a comment below and auto-close will be canceled. |
It seems that we still waiting for CFN-fix here in order to move forward? Maybe the label should be added here. |
What is the problem?
This is a bug present in the new NLB -> ALB functionality added to AWS and CloudFormation a few months ago and added to CDK shortly afterwards.
CDK incorrectly assumes the protocol for the healthCheck is HTTP even if the target group is using TCP on port 443, which puts the target group into an unhealthy state.
Reproduction Steps
What did you expect to happen?
Creation of an NLB with a listener and a target group with an ALB target that correctly functions in all aspects, including not failing its health check.
What actually happened?
In the AWS Console, you can see that the target group reports an unhealthy status and that the protocol is HTTP.
CDK CLI Version
1.139.0
Framework Version
No response
Node.js Version
v14.17.3
OS
Mac OS X 11.6.2
Language
Typescript
Language Version
No response
Other information
Here's what the docs currently say should happen (but this is impossible for this target type, so the docs should be amended to mention the ALB target type scenario):
However, this doesn't make sense here. Apparently, AWS did not implement a TCP health check protocol option for target groups with ALB targets, which would be the obvious solution and what the CDK docs suggest should be happening. Instead, CDK needs to go down a list of logic rules to determine the default protocol it should be attempting for the health check. Here's the logic that I think should be used to determine health check protocol:
healthCheck
object has been provided with a protocol when the NLB target group has an ALB target, just use that protocol.healthCheck
object has been provided without a protocol when the NLB target group has an ALB target and is using port 80, assume the health check's protocol is going to be HTTP.healthCheck
object has been provided without a protocol when the NLB target group has an ALB target and is using port 443, assume the health check's protocol is going to be HTTPS.healthCheck
object has been provided without a protocol when the NLB target group has an ALB target and is using neither port 80 nor 443, throw an error like "When using a non-standard port with an NLB target group that has an ALB target, you must explicitly declare a protocol of HTTP or HTTPS in yourhealthCheck
object."healthCheck
object has not been provided when the NLB target group has an ALB target and is using port 80, assume the health check's protocol is going to be HTTP.healthCheck
object has not been provided when the NLB target group has an ALB target and is using port 443, assume the health check's protocol is going to be HTTPS.healthCheck
object has been provided without a protocol when the NLB target group has an ALB target and is using neither port 80 nor 443, throw an error like "When using a non-standard port with an NLB target group that has an ALB target, you must explicitly define ahealthCheck
object with a protocol of HTTP or HTTPS."Complicated? Kinda, yeah. Won't be fun to explain in the docs, either, but I don't see any way around it. This is the behavior that users are going to expect to happen without doing loads of research on the specific quirks involved here.
The text was updated successfully, but these errors were encountered: