(aws-elasticloadbalancingv2): Default HealthCheck Protocol Logic Flawed for NLB Target Groups with ALB Targets #18422

MMOSimca · 2022-01-14T00:08:51Z

What is the problem?

This is a bug present in the new NLB -> ALB functionality added to AWS and CloudFormation a few months ago and added to CDK shortly afterwards.

CDK incorrectly assumes the protocol for the healthCheck is HTTP even if the target group is using TCP on port 443, which puts the target group into an unhealthy state.

Reproduction Steps

import * as cdk from '@aws-cdk/core';
import * as ec2 from '@aws-cdk/aws-ec2';
import * as iam from "@aws-cdk/aws-iam";
import * as elbv2 from '@aws-cdk/aws-elasticloadbalancingv2';
import * as targets from '@aws-cdk/aws-elasticloadbalancingv2-targets';
import * as ecs from '@aws-cdk/aws-ecs';
import * as patterns from '@aws-cdk/aws-ecs-patterns';
import * as cm from '@aws-cdk/aws-certificatemanager';

export class NblToAlbStack extends cdk.Stack {
    constructor(scope: cdk.App, id: string, props?: cdk.StackProps) {
        super(scope, id, props);

        const coreVpc = ec2.Vpc.fromLookup(this, 'CoreVPC', {isDefault: false, vpcName: 'core-vpc' });

        const task = new ecs.FargateTaskDefinition(this, 'Task', { cpu: 256, memoryLimitMiB: 512 });
        task.addContainer('nginx', {
            image: ecs.ContainerImage.fromRegistry('public.ecr.aws/nginx/nginx:latest'),
            portMappings: [{ containerPort: 80 }],
        });

        const certificate = cm.Certificate.fromCertificateArn(this, 'Certificate', 'MY-CERT-ARN');
        const svc = new patterns.ApplicationLoadBalancedFargateService(this, 'Service', {
            vpc: coreVpc,
            taskDefinition: task,
            publicLoadBalancer: false,
            certificate,
            protocol: elb.Protocol.HTTPS,
            targetProtocol: elb.Protocol.HTTP,
        });

        const nlb = new elbv2.NetworkLoadBalancer(this, 'Nlb', {
            vpc:coreVpc,
            crossZoneEnabled: true,
            internetFacing: true,
        });

        const listener = nlb.addListener('listener', { port: 443 });

        listener.addTargets('Targets', {
            targets: [new targets.AlbTarget(svc.loadBalancer, 443)],
            port: 443,
            protocol: elb.Protocol.TCP,
            // Note the absence of the 'healthCheck' object here
        });

        new cdk.CfnOutput(this, 'NlbEndpoint', { value: `http://${nlb.loadBalancerDnsName}`})
    }
}

What did you expect to happen?

Creation of an NLB with a listener and a target group with an ALB target that correctly functions in all aspects, including not failing its health check.

What actually happened?

In the AWS Console, you can see that the target group reports an unhealthy status and that the protocol is HTTP.

CDK CLI Version

1.139.0

Framework Version

No response

Node.js Version

v14.17.3

OS

Mac OS X 11.6.2

Language

Typescript

Language Version

No response

Other information

Here's what the docs currently say should happen (but this is impossible for this target type, so the docs should be amended to mention the ALB target type scenario):

/**
* (experimental) The protocol the load balancer uses when performing health checks on targets.
*
* The TCP protocol is supported for health checks only if the protocol of the target group is TCP, TLS, UDP, or TCP_UDP.
* The TLS, UDP, and TCP_UDP protocols are not supported for health checks.
*
* @default HTTP for ALBs, TCP for NLBs
* @experimental
*/
readonly protocol?: Protocol;

However, this doesn't make sense here. Apparently, AWS did not implement a TCP health check protocol option for target groups with ALB targets, which would be the obvious solution and what the CDK docs suggest should be happening. Instead, CDK needs to go down a list of logic rules to determine the default protocol it should be attempting for the health check. Here's the logic that I think should be used to determine health check protocol:

If an NLB target group's healthCheck object has been provided with a protocol when the NLB target group has an ALB target, just use that protocol.
If an NLB target group's healthCheck object has been provided without a protocol when the NLB target group has an ALB target and is using port 80, assume the health check's protocol is going to be HTTP.
If an NLB target group's healthCheck object has been provided without a protocol when the NLB target group has an ALB target and is using port 443, assume the health check's protocol is going to be HTTPS.
If an NLB target group's healthCheck object has been provided without a protocol when the NLB target group has an ALB target and is using neither port 80 nor 443, throw an error like "When using a non-standard port with an NLB target group that has an ALB target, you must explicitly declare a protocol of HTTP or HTTPS in your healthCheck object."
If an NLB target group's healthCheck object has not been provided when the NLB target group has an ALB target and is using port 80, assume the health check's protocol is going to be HTTP.
If an NLB target group's healthCheck object has not been provided when the NLB target group has an ALB target and is using port 443, assume the health check's protocol is going to be HTTPS.
If an NLB target group's healthCheck object has been provided without a protocol when the NLB target group has an ALB target and is using neither port 80 nor 443, throw an error like "When using a non-standard port with an NLB target group that has an ALB target, you must explicitly define a healthCheck object with a protocol of HTTP or HTTPS."

Complicated? Kinda, yeah. Won't be fun to explain in the docs, either, but I don't see any way around it. This is the behavior that users are going to expect to happen without doing loads of research on the specific quirks involved here.

The text was updated successfully, but these errors were encountered:

MMOSimca · 2022-01-14T00:43:07Z

I updated my example a little. Bit of a CDK noob so the certificate and VPC info need to be imported for the example to run, but otherwise it should replicate the situation.

After further thought, I think there's multiple bugs at play here, both in CDK and in AWS itself.

Possible AWS bug: The traditional approach for NLB -> ALB is that the transfer protocol is TCP and the health check protocol is TCP (aimed at the IPs of the ALB). However, for the official NLB -> ALB approach directly through AWS Console, the only health check options are HTTP and HTTPS. TCP isn't even a protocol option for the health check for this target type (possible bug?).

Definite AWS bug: You can't even change the protocol type associated with a health check on an already existing TCP protocol target group because it assumes the health check protocol is also TCP and could never be anything else. In reality, the health check protocol for an ALB target can never be TCP currently, so this is an AWS bug. You receive this error in CFN (and the AWS Console itself, so it's not a CFN exclusive issue) when attempting to change a resource that already exists:

You cannot change the health check protocol for a target group with the TCP protocol (Service: AmazonElasticLoadBalancing; Status Code: 400; Error Code: InvalidConfigurationRequest; Request ID: 984f84f1-0558-4abd-b693-26c23727589a; Proxy: null)

CDK Docs issue: In my case, CDK apparently created me one with HTTP (the 'real' default in spite of the docs, I guess). This fails because my ALB does not respond to HTTP (only to HTTPS). What it needs it is a TCP health check, but that's not an option in the base AWS experience itself. However, the CDK docs for the health check protocol don't mention this. In fact, the CDK docs don't have a clean working example of NLB -> ALB at all, which would certainly be a good addition.

This issue's goal - CDK defaults wrong: CDK's behavior leaves something to be desired in this instance. If the port is 80 on an NLB listener's target group with an ALB target, it should assume an HTTP health check. If it's 443, it should assume an HTTPS health check. If it is none of those, it should probably explicitly require a healthCheck object to be provided in order to clarify things.

Figuring out the default health check was unhealthy weeks later because HTTP was the default health check protocol for a transfer protocol of TCP with a port of 443 is very unexpected.

MMOSimca · 2022-01-14T12:27:49Z

Updated both of my previous posts extensively after additional testing and research, including suggesting fallback logic in the 'Other information' section of the OP that I think will resolve the issue as users expect this to function.

MMOSimca · 2022-01-18T19:31:58Z

It's not just health check's protocol that needs this logic applied, technically, by the way. Health check's port does as well. Same idea, though.

njlynch · 2022-02-02T14:48:31Z

Thanks for the detailed report and thoughts.

The CloudFormation documentation states that all of the health-check related parameters are optional, but does not state what the default is. It does appear that in lieu of an explicit setting, an HTTP health check on / with the default target port is configured.

As with so many things, simply changing the defaults here leaves a possibility of breaking existing customers' setups. We would need to be careful to be only altering the default for this specific use case.

Of course, as a workaround, simply explicitly setting the health check works.

MMOSimca · 2022-02-02T18:12:10Z

That's a good point. Hmm. It's probably very unlikely that a user currently has their target port at 443 and has made their ALB use an HTTP listener for that port, but it is possible. Additionally, due to the AWS bug mentioned above, if we changed the default protocol in some situations, it would break deploys for everyone affected because when it tries to update an existing target group, they will receive that CloudFormation error.

I suppose the ideal case then for now is to issue a warning when building. Something like: if using the ALB target type for a target group and the health check object is not explicitly provided, put a warning line that says that using implicit health check options for target groups with an ALB target type can lead to unexpected results so we recommend they explicitly define at least the protocol and port.

The underlying problem I want to address with this issue is most users will assume the default port is TCP, like it is for every other type of NLB target group, so we should at least give them some kind of heads up that no TCP option exists and that the default is actually not what they're going to expect.

MMOSimca · 2022-02-04T17:31:48Z

Actually, it doesn't appear to correctly use the default target port in the NLB -> ALB scenario either. At least, in the above example, it was sending HTTP checks on port 80 - not HTTP on port 443. It does appear to use the target port in other target group target types. Possibly another oversight on the general AWS case for this mechanism.

ssplatt · 2022-03-29T14:50:13Z

I'm running into an issue similar to this. This doc (https://docs.aws.amazon.com/elasticloadbalancing/latest/network/application-load-balancer-target.html) says that when an Application Load Balancer is set to be the target for a Network Load Balancer, then the health check can be set to HTTPS with port 443 and the Listener is TCP port 443. I was able to manually configure this in the Console but CDK (Cloudformation, really, as the error occurs only at deploy, which is cloudformation apply) is telling me I cannot set the health checks to HTTPS when the listener is TCP.

aws-cdk ts/js just updated to 2.18.0 this morning. https://www.npmjs.com/package/aws-cdk

code example:

// ... snip ...
  const nlbTarget = new elbv2targets.AlbTarget(lb, 443)
    const nlbTargetGroup = new elbv2.NetworkTargetGroup(this,'nlbTargetGroup', {
      vpc,
      targetType: elbv2.TargetType.ALB,
      port: 443,
      healthCheck: {
        enabled: true,
        port: '443',
        path: '/health',
        protocol: elbv2.Protocol.HTTPS
      }
    })
    nlbTargetGroup.addTarget(nlbTarget)
    nlb.addListener('nlbListenerHttps', {
      port: 443,
      defaultTargetGroups:[nlbTargetGroup]
    })
// ...snip...

error message:

10:40:55 AM | UPDATE_FAILED        | AWS::ElasticLoadBalancingV2::TargetGroup    | nlbTargetGroupRandomChars
You cannot change the health check protocol for a target group with the TCP protocol (Service: AmazonElasticLoadBalancing; Status Code: 400; Error Code: InvalidConfigurationRequest; Request ID: random-random-randomg-random-random; Proxy: null)

MMOSimca · 2022-03-29T20:00:28Z

Yeah, that's the AWS bug I mentioned above. It's not only a bug that this error even happens, but the text is bugged too. It omits the most important part.

It should read like this: "You cannot change the health check protocol for AN EXISTING target group with the TCP protocol."

I submitted a bug report to AWS as well using their feedback form (not sure how else to do that) for that particular error.

As a workaround, you need to destroy the resource and then recreate it with the target group being HTTPS from the start. You can't change it.

ssplatt · 2022-03-29T20:33:58Z

I'll go back and try again but I'm pretty sure my case was not editing an existing one but was creating it brand new.

ssplatt · 2022-03-30T18:28:51Z

You're right. I was able to attach and ALB to an NLB using port 443 and an HTTPS health check when the TG was brand new.

github-actions · 2023-03-30T20:06:45Z

This issue has not received any attention in 1 year. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

rgoltz · 2023-03-31T04:25:21Z

It seems that we still waiting for CFN-fix here in order to move forward? Maybe the label should be added here.

MMOSimca added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jan 14, 2022

github-actions bot added the @aws-cdk/aws-elasticloadbalancingv2 Related to Amazon Elastic Load Balancing V2 label Jan 14, 2022

github-actions bot assigned njlynch Jan 14, 2022

ryparker added p2 and removed needs-triage This issue or PR still needs to be triaged. labels Jan 14, 2022

MMOSimca changed the title ~~(aws-elasticloadbalancingv2): ALB Target Group for NLB Incorrectly Assumes HTTP Protocol When No healthCheck Object Provided~~ (aws-elasticloadbalancingv2): Default HealthCheck Protocol Incorrect for ALB Target Group of NLB Listener Jan 14, 2022

MMOSimca changed the title ~~(aws-elasticloadbalancingv2): Default HealthCheck Protocol Incorrect for ALB Target Group of NLB Listener~~ (aws-elasticloadbalancingv2): Default HealthCheck Protocol Potentially Incorrect for ALB Target Group of NLB Listener Jan 14, 2022

MMOSimca changed the title ~~(aws-elasticloadbalancingv2): Default HealthCheck Protocol Potentially Incorrect for ALB Target Group of NLB Listener~~ (aws-elasticloadbalancingv2): Default HealthCheck Protocol Logic Flawed for NLB Target Groups with ALB Targets Jan 14, 2022

njlynch added the effort/small Small work item – less than a day of effort label Feb 2, 2022

njlynch removed their assignment Feb 2, 2022

ssplatt mentioned this issue Mar 29, 2022

Support HTTP/HTTPS health check protocols for Application Load Balancer Target Type on Network Load Balancers aws-cloudformation/cloudformation-coverage-roadmap#1125

Open

github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Mar 30, 2023

github-actions bot removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Mar 31, 2023

pahud added p3 and removed p2 labels Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(aws-elasticloadbalancingv2): Default HealthCheck Protocol Logic Flawed for NLB Target Groups with ALB Targets #18422

(aws-elasticloadbalancingv2): Default HealthCheck Protocol Logic Flawed for NLB Target Groups with ALB Targets #18422

MMOSimca commented Jan 14, 2022 •

edited

Loading

MMOSimca commented Jan 14, 2022 •

edited

Loading

MMOSimca commented Jan 14, 2022 •

edited

Loading

MMOSimca commented Jan 18, 2022

njlynch commented Feb 2, 2022

MMOSimca commented Feb 2, 2022 •

edited

Loading

MMOSimca commented Feb 4, 2022

ssplatt commented Mar 29, 2022 •

edited

Loading

MMOSimca commented Mar 29, 2022

ssplatt commented Mar 29, 2022

ssplatt commented Mar 30, 2022

github-actions bot commented Mar 30, 2023

rgoltz commented Mar 31, 2023

(aws-elasticloadbalancingv2): Default HealthCheck Protocol Logic Flawed for NLB Target Groups with ALB Targets #18422

(aws-elasticloadbalancingv2): Default HealthCheck Protocol Logic Flawed for NLB Target Groups with ALB Targets #18422

Comments

MMOSimca commented Jan 14, 2022 • edited Loading

What is the problem?

Reproduction Steps

What did you expect to happen?

What actually happened?

CDK CLI Version

Framework Version

Node.js Version

OS

Language

Language Version

Other information

MMOSimca commented Jan 14, 2022 • edited Loading

MMOSimca commented Jan 14, 2022 • edited Loading

MMOSimca commented Jan 18, 2022

njlynch commented Feb 2, 2022

MMOSimca commented Feb 2, 2022 • edited Loading

MMOSimca commented Feb 4, 2022

ssplatt commented Mar 29, 2022 • edited Loading

MMOSimca commented Mar 29, 2022

ssplatt commented Mar 29, 2022

ssplatt commented Mar 30, 2022

github-actions bot commented Mar 30, 2023

rgoltz commented Mar 31, 2023

MMOSimca commented Jan 14, 2022 •

edited

Loading

MMOSimca commented Jan 14, 2022 •

edited

Loading

MMOSimca commented Jan 14, 2022 •

edited

Loading

MMOSimca commented Feb 2, 2022 •

edited

Loading

ssplatt commented Mar 29, 2022 •

edited

Loading