Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(aws-elasticloadbalancingv2): Default HealthCheck Protocol Logic Flawed for NLB Target Groups with ALB Targets #18422

Open
MMOSimca opened this issue Jan 14, 2022 · 12 comments
Labels
@aws-cdk/aws-elasticloadbalancingv2 Related to Amazon Elastic Load Balancing V2 bug This issue is a bug. effort/small Small work item – less than a day of effort p3

Comments

@MMOSimca
Copy link

MMOSimca commented Jan 14, 2022

What is the problem?

This is a bug present in the new NLB -> ALB functionality added to AWS and CloudFormation a few months ago and added to CDK shortly afterwards.

CDK incorrectly assumes the protocol for the healthCheck is HTTP even if the target group is using TCP on port 443, which puts the target group into an unhealthy state.

Reproduction Steps

import * as cdk from '@aws-cdk/core';
import * as ec2 from '@aws-cdk/aws-ec2';
import * as iam from "@aws-cdk/aws-iam";
import * as elbv2 from '@aws-cdk/aws-elasticloadbalancingv2';
import * as targets from '@aws-cdk/aws-elasticloadbalancingv2-targets';
import * as ecs from '@aws-cdk/aws-ecs';
import * as patterns from '@aws-cdk/aws-ecs-patterns';
import * as cm from '@aws-cdk/aws-certificatemanager';

export class NblToAlbStack extends cdk.Stack {
    constructor(scope: cdk.App, id: string, props?: cdk.StackProps) {
        super(scope, id, props);

        const coreVpc = ec2.Vpc.fromLookup(this, 'CoreVPC', {isDefault: false, vpcName: 'core-vpc' });

        const task = new ecs.FargateTaskDefinition(this, 'Task', { cpu: 256, memoryLimitMiB: 512 });
        task.addContainer('nginx', {
            image: ecs.ContainerImage.fromRegistry('public.ecr.aws/nginx/nginx:latest'),
            portMappings: [{ containerPort: 80 }],
        });

        const certificate = cm.Certificate.fromCertificateArn(this, 'Certificate', 'MY-CERT-ARN');
        const svc = new patterns.ApplicationLoadBalancedFargateService(this, 'Service', {
            vpc: coreVpc,
            taskDefinition: task,
            publicLoadBalancer: false,
            certificate,
            protocol: elb.Protocol.HTTPS,
            targetProtocol: elb.Protocol.HTTP,
        });

        const nlb = new elbv2.NetworkLoadBalancer(this, 'Nlb', {
            vpc:coreVpc,
            crossZoneEnabled: true,
            internetFacing: true,
        });

        const listener = nlb.addListener('listener', { port: 443 });

        listener.addTargets('Targets', {
            targets: [new targets.AlbTarget(svc.loadBalancer, 443)],
            port: 443,
            protocol: elb.Protocol.TCP,
            // Note the absence of the 'healthCheck' object here
        });

        new cdk.CfnOutput(this, 'NlbEndpoint', { value: `http://${nlb.loadBalancerDnsName}`})
    }
}

What did you expect to happen?

Creation of an NLB with a listener and a target group with an ALB target that correctly functions in all aspects, including not failing its health check.

What actually happened?

In the AWS Console, you can see that the target group reports an unhealthy status and that the protocol is HTTP.

CDK CLI Version

1.139.0

Framework Version

No response

Node.js Version

v14.17.3

OS

Mac OS X 11.6.2

Language

Typescript

Language Version

No response

Other information

Here's what the docs currently say should happen (but this is impossible for this target type, so the docs should be amended to mention the ALB target type scenario):

/**
* (experimental) The protocol the load balancer uses when performing health checks on targets.
*
* The TCP protocol is supported for health checks only if the protocol of the target group is TCP, TLS, UDP, or TCP_UDP.
* The TLS, UDP, and TCP_UDP protocols are not supported for health checks.
*
* @default HTTP for ALBs, TCP for NLBs
* @experimental
*/
readonly protocol?: Protocol;

However, this doesn't make sense here. Apparently, AWS did not implement a TCP health check protocol option for target groups with ALB targets, which would be the obvious solution and what the CDK docs suggest should be happening. Instead, CDK needs to go down a list of logic rules to determine the default protocol it should be attempting for the health check. Here's the logic that I think should be used to determine health check protocol:

  1. If an NLB target group's healthCheck object has been provided with a protocol when the NLB target group has an ALB target, just use that protocol.
  2. If an NLB target group's healthCheck object has been provided without a protocol when the NLB target group has an ALB target and is using port 80, assume the health check's protocol is going to be HTTP.
  3. If an NLB target group's healthCheck object has been provided without a protocol when the NLB target group has an ALB target and is using port 443, assume the health check's protocol is going to be HTTPS.
  4. If an NLB target group's healthCheck object has been provided without a protocol when the NLB target group has an ALB target and is using neither port 80 nor 443, throw an error like "When using a non-standard port with an NLB target group that has an ALB target, you must explicitly declare a protocol of HTTP or HTTPS in your healthCheck object."
  5. If an NLB target group's healthCheck object has not been provided when the NLB target group has an ALB target and is using port 80, assume the health check's protocol is going to be HTTP.
  6. If an NLB target group's healthCheck object has not been provided when the NLB target group has an ALB target and is using port 443, assume the health check's protocol is going to be HTTPS.
  7. If an NLB target group's healthCheck object has been provided without a protocol when the NLB target group has an ALB target and is using neither port 80 nor 443, throw an error like "When using a non-standard port with an NLB target group that has an ALB target, you must explicitly define a healthCheck object with a protocol of HTTP or HTTPS."

Complicated? Kinda, yeah. Won't be fun to explain in the docs, either, but I don't see any way around it. This is the behavior that users are going to expect to happen without doing loads of research on the specific quirks involved here.

@MMOSimca MMOSimca added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jan 14, 2022
@github-actions github-actions bot added the @aws-cdk/aws-elasticloadbalancingv2 Related to Amazon Elastic Load Balancing V2 label Jan 14, 2022
@ryparker ryparker added p2 and removed needs-triage This issue or PR still needs to be triaged. labels Jan 14, 2022
@MMOSimca
Copy link
Author

MMOSimca commented Jan 14, 2022

I updated my example a little. Bit of a CDK noob so the certificate and VPC info need to be imported for the example to run, but otherwise it should replicate the situation.

After further thought, I think there's multiple bugs at play here, both in CDK and in AWS itself.

Possible AWS bug: The traditional approach for NLB -> ALB is that the transfer protocol is TCP and the health check protocol is TCP (aimed at the IPs of the ALB). However, for the official NLB -> ALB approach directly through AWS Console, the only health check options are HTTP and HTTPS. TCP isn't even a protocol option for the health check for this target type (possible bug?).

Definite AWS bug: You can't even change the protocol type associated with a health check on an already existing TCP protocol target group because it assumes the health check protocol is also TCP and could never be anything else. In reality, the health check protocol for an ALB target can never be TCP currently, so this is an AWS bug. You receive this error in CFN (and the AWS Console itself, so it's not a CFN exclusive issue) when attempting to change a resource that already exists:

You cannot change the health check protocol for a target group with the TCP protocol (Service: AmazonElasticLoadBalancing; Status Code: 400; Error Code: InvalidConfigurationRequest; Request ID: 984f84f1-0558-4abd-b693-26c23727589a; Proxy: null)

CDK Docs issue: In my case, CDK apparently created me one with HTTP (the 'real' default in spite of the docs, I guess). This fails because my ALB does not respond to HTTP (only to HTTPS). What it needs it is a TCP health check, but that's not an option in the base AWS experience itself. However, the CDK docs for the health check protocol don't mention this. In fact, the CDK docs don't have a clean working example of NLB -> ALB at all, which would certainly be a good addition.

This issue's goal - CDK defaults wrong: CDK's behavior leaves something to be desired in this instance. If the port is 80 on an NLB listener's target group with an ALB target, it should assume an HTTP health check. If it's 443, it should assume an HTTPS health check. If it is none of those, it should probably explicitly require a healthCheck object to be provided in order to clarify things.

Figuring out the default health check was unhealthy weeks later because HTTP was the default health check protocol for a transfer protocol of TCP with a port of 443 is very unexpected.

@MMOSimca MMOSimca changed the title (aws-elasticloadbalancingv2): ALB Target Group for NLB Incorrectly Assumes HTTP Protocol When No healthCheck Object Provided (aws-elasticloadbalancingv2): Default HealthCheck Protocol Incorrect for ALB Target Group of NLB Listener Jan 14, 2022
@MMOSimca MMOSimca changed the title (aws-elasticloadbalancingv2): Default HealthCheck Protocol Incorrect for ALB Target Group of NLB Listener (aws-elasticloadbalancingv2): Default HealthCheck Protocol Potentially Incorrect for ALB Target Group of NLB Listener Jan 14, 2022
@MMOSimca
Copy link
Author

MMOSimca commented Jan 14, 2022

Updated both of my previous posts extensively after additional testing and research, including suggesting fallback logic in the 'Other information' section of the OP that I think will resolve the issue as users expect this to function.

@MMOSimca MMOSimca changed the title (aws-elasticloadbalancingv2): Default HealthCheck Protocol Potentially Incorrect for ALB Target Group of NLB Listener (aws-elasticloadbalancingv2): Default HealthCheck Protocol Logic Flawed for NLB Target Groups with ALB Targets Jan 14, 2022
@MMOSimca
Copy link
Author

It's not just health check's protocol that needs this logic applied, technically, by the way. Health check's port does as well. Same idea, though.

@njlynch
Copy link
Contributor

njlynch commented Feb 2, 2022

Thanks for the detailed report and thoughts.

The CloudFormation documentation states that all of the health-check related parameters are optional, but does not state what the default is. It does appear that in lieu of an explicit setting, an HTTP health check on / with the default target port is configured.

As with so many things, simply changing the defaults here leaves a possibility of breaking existing customers' setups. We would need to be careful to be only altering the default for this specific use case.

Of course, as a workaround, simply explicitly setting the health check works.

@njlynch njlynch added the effort/small Small work item – less than a day of effort label Feb 2, 2022
@njlynch njlynch removed their assignment Feb 2, 2022
@MMOSimca
Copy link
Author

MMOSimca commented Feb 2, 2022

That's a good point. Hmm. It's probably very unlikely that a user currently has their target port at 443 and has made their ALB use an HTTP listener for that port, but it is possible. Additionally, due to the AWS bug mentioned above, if we changed the default protocol in some situations, it would break deploys for everyone affected because when it tries to update an existing target group, they will receive that CloudFormation error.

I suppose the ideal case then for now is to issue a warning when building. Something like: if using the ALB target type for a target group and the health check object is not explicitly provided, put a warning line that says that using implicit health check options for target groups with an ALB target type can lead to unexpected results so we recommend they explicitly define at least the protocol and port.

The underlying problem I want to address with this issue is most users will assume the default port is TCP, like it is for every other type of NLB target group, so we should at least give them some kind of heads up that no TCP option exists and that the default is actually not what they're going to expect.

@MMOSimca
Copy link
Author

MMOSimca commented Feb 4, 2022

Actually, it doesn't appear to correctly use the default target port in the NLB -> ALB scenario either. At least, in the above example, it was sending HTTP checks on port 80 - not HTTP on port 443. It does appear to use the target port in other target group target types. Possibly another oversight on the general AWS case for this mechanism.

@ssplatt
Copy link
Contributor

ssplatt commented Mar 29, 2022

I'm running into an issue similar to this. This doc (https://docs.aws.amazon.com/elasticloadbalancing/latest/network/application-load-balancer-target.html) says that when an Application Load Balancer is set to be the target for a Network Load Balancer, then the health check can be set to HTTPS with port 443 and the Listener is TCP port 443. I was able to manually configure this in the Console but CDK (Cloudformation, really, as the error occurs only at deploy, which is cloudformation apply) is telling me I cannot set the health checks to HTTPS when the listener is TCP.

aws-cdk ts/js just updated to 2.18.0 this morning. https://www.npmjs.com/package/aws-cdk

code example:

// ... snip ...
  const nlbTarget = new elbv2targets.AlbTarget(lb, 443)
    const nlbTargetGroup = new elbv2.NetworkTargetGroup(this,'nlbTargetGroup', {
      vpc,
      targetType: elbv2.TargetType.ALB,
      port: 443,
      healthCheck: {
        enabled: true,
        port: '443',
        path: '/health',
        protocol: elbv2.Protocol.HTTPS
      }
    })
    nlbTargetGroup.addTarget(nlbTarget)
    nlb.addListener('nlbListenerHttps', {
      port: 443,
      defaultTargetGroups:[nlbTargetGroup]
    })
// ...snip...

error message:

10:40:55 AM | UPDATE_FAILED        | AWS::ElasticLoadBalancingV2::TargetGroup    | nlbTargetGroupRandomChars
You cannot change the health check protocol for a target group with the TCP protocol (Service: AmazonElasticLoadBalancing; Status Code: 400; Error Code: InvalidConfigurationRequest; Request ID: random-random-randomg-random-random; Proxy: null)

@MMOSimca
Copy link
Author

Yeah, that's the AWS bug I mentioned above. It's not only a bug that this error even happens, but the text is bugged too. It omits the most important part.

It should read like this: "You cannot change the health check protocol for AN EXISTING target group with the TCP protocol."

I submitted a bug report to AWS as well using their feedback form (not sure how else to do that) for that particular error.

As a workaround, you need to destroy the resource and then recreate it with the target group being HTTPS from the start. You can't change it.

@ssplatt
Copy link
Contributor

ssplatt commented Mar 29, 2022

I'll go back and try again but I'm pretty sure my case was not editing an existing one but was creating it brand new.

@ssplatt
Copy link
Contributor

ssplatt commented Mar 30, 2022

You're right. I was able to attach and ALB to an NLB using port 443 and an HTTPS health check when the TG was brand new.

@github-actions
Copy link

This issue has not received any attention in 1 year. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

@github-actions github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Mar 30, 2023
@rgoltz
Copy link

rgoltz commented Mar 31, 2023

It seems that we still waiting for CFN-fix here in order to move forward? Maybe the label should be added here.

@github-actions github-actions bot removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Mar 31, 2023
@pahud pahud added p3 and removed p2 labels Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-elasticloadbalancingv2 Related to Amazon Elastic Load Balancing V2 bug This issue is a bug. effort/small Small work item – less than a day of effort p3
Projects
None yet
Development

No branches or pull requests

6 participants