aws_applicationautoscaling: Error: Only direct metrics are supported for Target Tracking. Use Step Scaling or supply a Metric object. #20659

mostafafarzaneh · 2022-06-08T06:58:04Z

Describe the bug

I would like to use a MathExpression for custom metric in TargetTrackingScalingPolicy, but I got this error:

Only direct metrics are supported for Target Tracking. Use Step Scaling or supply a Metric object.

checking the code here, it only checks for metricStat not mathExpression.

Expected Behavior

Should allow to define math expression for Target Tracking.

Current Behavior

Only direct metrics are allowed

Reproduction Steps

Create Target Tracking using math expression

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.27.0

Framework Version

No response

Node.js Version

16.15.0

OS

Debian 10

Language

Python

Language Version

No response

Other information

No response

The text was updated successfully, but these errors were encountered:

mostafafarzaneh · 2022-06-09T08:34:40Z

I also tried to create a metric this way:

   custom_metric = cloudwatch.MathExpression(
      expression='SELECT AVG(ActiveConnections) FROM "myMetrics/custom"',
      period=Duration.minutes(1),
   )

and use it in StepScalingPolicy. CDK complainse:

Alarm contains invalid expressions. (Service: AmazonCloudWatch; Status Code: 400; Error Code: ValidationError; Request ID: 3c245f6f-9d5e-492e-b2e1-e0fa83422594; Proxy: null)

peterwoodworth · 2022-06-09T20:56:34Z

These properties are directly passed to the ScalingPolicy CloudFormation resource in this property.

Our Metric class supports these properties, while our MathExpression class doesn't. I think we would need additional functionality from cloudformation for this to be implemented

shw1n · 2023-02-03T21:38:18Z

Also encountering this issue

matthias-pichler-warrify · 2023-03-13T10:29:14Z

Our Metric class supports these properties, while our MathExpression class doesn't. I think we would need additional functionality from cloudformation for this to be implemented

It seems like CloudFormation's AWS::AutoScaling::ScalingPolicy is indeed lacking some configuration parameters. The CustomizedMetricSpecification type from the AutoScaling API has a member Metrics where expressions can be specified like seen in the docs. On the other hand AWS::AutoScaling::ScalingPolicy CustomizedMetricSpecification does NOT have the Metrics property.

hectorsouthern · 2023-03-21T14:41:17Z

It looks like AWS announced support for this recently: https://www.amazonaws.cn/en/new/2023/application-auto-scaling-supports-metric-math-for-target-tracking-policies/

and the documentation is now available: https://docs.aws.amazon.com/autoscaling/application/userguide/application-auto-scaling-target-tracking-metric-math.html

It would be great to see this feature in CDK too

SamStephens · 2023-07-04T00:53:19Z

It looks like AWS announced support for this recently: https://www.amazonaws.cn/en/new/2023/application-auto-scaling-supports-metric-math-for-target-tracking-policies/

and the documentation is now available: https://docs.aws.amazon.com/autoscaling/application/userguide/application-auto-scaling-target-tracking-metric-math.html

It would be great to see this feature in CDK too

As per the second page you link to, "This feature is not yet available in AWS CloudFormation.". So CDK either has to wait for Cloudformation support, or provide this via a custom resource.

zubairzahoor · 2023-07-27T11:32:21Z

Any updates on this? Or any workarounds via CDK as of now?

alexbaileyuk · 2023-08-02T22:14:21Z

@zubairzahoor

I came across this today whilst folloing the and lost a few hours on it. As a temporary workaround, I've added a custom resource using AwsCustomResource. Not ideal but could be an option for you in the mean time.

import { AwsCustomResource, AwsCustomResourcePolicy, PhysicalResourceId } from 'aws-cdk-lib/custom-resources';
import { Construct } from 'constructs';
import { Metric } from 'aws-cdk-lib/aws-cloudwatch';
import { IQueue } from 'aws-cdk-lib/aws-sqs';
import { Effect, PolicyStatement } from 'aws-cdk-lib/aws-iam';
import { buildVersion } from '../../../utils/build-version';

interface EcsSqsMathExpressionAutoScalingPolicyProps {
  targetValue: number;
  resourceId: string; // format service/{cluster_name}/{service_name}
  queue: IQueue;
  taskCountMetric: Metric;
}

export class EcsSqsMathExpressionAutoScalingPolicy extends Construct {
  constructor(scope: Construct, id: string, props: EcsSqsMathExpressionAutoScalingPolicyProps) {
    super(scope, id);

    new AwsCustomResource(this, 'scaling-put-autoscaling-policy', {
      onUpdate: {
        physicalResourceId: PhysicalResourceId.of(`sqs-backlog-scaling-policy/${props.resourceId}`),
        service: 'ApplicationAutoScaling',
        action: 'putScalingPolicy',
        parameters: {
          PolicyName: `sqs-backlog-scaling-policy-${props.resourceId}-${buildVersion}`,
          PolicyType: 'TargetTrackingScaling',
          ResourceId: props.resourceId,
          ScalableDimension: 'ecs:service:DesiredCount',
          ServiceNamespace: 'ecs',
          TargetTrackingScalingPolicyConfiguration: {
            TargetValue: props.targetValue,
            CustomizedMetricSpecification: {
              Metrics: [
                {
                  Id: 'm1',
                  Label: 'Appox. # of Messages Visible',
                  ReturnData: false,
                  MetricStat: {
                    Stat: 'Sum',
                    Metric: {
                      MetricName: props.queue.metricApproximateNumberOfMessagesVisible().metricName,
                      Namespace: props.queue.metricApproximateNumberOfMessagesVisible().namespace,
                      Dimensions: [
                        {
                          Name: 'QueueName',
                          Value: props.queue.queueName
                        }
                      ]
                    }
                  }
                },
                {
                  Id: 'm2',
                  Label: 'Running Instances Count',
                  ReturnData: false,
                  MetricStat: {
                    Stat: 'Average',
                    Metric: {
                      MetricName: props.taskCountMetric.metricName,
                      Namespace: props.taskCountMetric.namespace,
                      Dimensions: Object.entries(props.taskCountMetric.dimensions || {}).map(([key, value]) => ({
                        Name: key,
                        Value: value
                      }))
                    }
                  }
                },
                {
                  Label: 'Backlog per Instance',
                  Id: 'e1',
                  Expression: 'm1 / m2',
                  ReturnData: true
                }
              ]
            }
          }
        }
      },
      policy: AwsCustomResourcePolicy.fromStatements([
        new PolicyStatement({
          effect: Effect.ALLOW,
          actions: ['application-autoscaling:*', 'ecs:DescribeServices', 'ecs:UpdateService'],
          resources: ['*']
        })
      ])
    });
  }
}

You can use it like this:

    this.scaling = this.fargateService.autoScaleTaskCount({
      minCapacity: 0,
      maxCapacity: 100
    });

    const customScalingPolicy = new EcsSqsMathExpressionAutoScalingPolicy(this, 'scaling-policy', {
      targetValue: props.acceptableLatency.toSeconds() / props.averageMessageProcessingTime.toSeconds(),
      resourceId: `service/${props.cluster.clusterName}/${this.fargateService.serviceName}`,
      queue: queue,
      taskCountMetric: desiredCountMetric
    });

    customScalingPolicy.node.addDependency(this.scaling);

It may need some adaptations to meet your needs but it should give you a good starting point.

I should mention I've not fully tested this yet so if you notice anything weird then please share :)

zubairzahoor · 2023-08-07T12:57:28Z

@alexbaileyuk Thank you!
Tried this for my use-case (with AmazonMq/ECS) and seems to work. What are the minimum permissions needed for execution role of the lambda here?

alexbaileyuk · 2023-08-07T13:16:34Z

@zubairzahoor due to difficulties with this method I ended up writing a totally different function which pre-calculates backlog / instance by pulling and calculating. Something like this:

import { DescribeServicesCommand, ECSClient, paginateListServices } from '@aws-sdk/client-ecs';
import { CloudWatchClient, MetricDatum, PutMetricDataCommand } from '@aws-sdk/client-cloudwatch';
import { SQSClient, GetQueueAttributesCommand } from '@aws-sdk/client-sqs';

const ecsClient = new ECSClient({
  region: 'eu-west-1'
});

const cloudwatchClient = new CloudWatchClient({
  region: 'eu-west-1'
});

const sqsClient = new SQSClient({
  region: 'eu-west-1'
});

export const putInstanceBacklogMetrics = async (clusterName: string) => {
  const consumerServices = await listServices(clusterName);

  const backlogMetrics = await Promise.all(consumerServices.map((serviceArn) => calculateBacklogForConsumerService(clusterName, serviceArn)));

  const metrics = backlogMetrics.map((backlog) => {
    console.log(`Service ${backlog.serviceName} has desired count ${backlog.desiredCount} and queue length ${backlog.queueLength}`);

    let instanceBacklog = null;

    if (backlog.desiredCount === 0 && backlog.queueLength > 0) {
      // If there are no instances running we have to pretend the backlog is the acceptable backlog per instance + 1
      // so that we scale up to one instance. This allows us to scale down to zero instances when there is no backlog.
      // This will cause some jitter in the instance backlog metric, but it allows us to scale to zero. In test environments
      // it'll be fine, in production we'll have enough traffic that the jitter will be negligible and instances will usually be
      // scaled up to at least one.
      instanceBacklog = backlog.queueLength > backlog.acceptableBacklogPerInstance ? backlog.queueLength : backlog.acceptableBacklogPerInstance + 1;
    } else if (backlog.queueLength === 0) {
      instanceBacklog = 0;
    } else if (backlog.desiredCount > 0) {
      instanceBacklog = backlog.queueLength / backlog.desiredCount;
    } else {
      instanceBacklog = 0;
    }

    return {
      MetricName: 'ConsumerInstanceBacklog',
      Dimensions: [
        {
          Name: 'ClusterName',
          Value: clusterName
        },
        {
          Name: 'ServiceName',
          Value: backlog.serviceName
        }
      ],
      Value: instanceBacklog
    };
  });

  if (metrics.length === 0) {
    console.log('No consumer services found');
    return;
  }

  await putConsumerInstanceBacklogMetric(metrics);
};

const listServices = async (clusterName: string) => {
  const paginator = paginateListServices({ client: ecsClient }, { cluster: clusterName });

  const serviceArns: string[] = [];

  for await (const page of paginator) {
    for (const serviceArn of page.serviceArns ?? []) {
      if (await isConsumerService(clusterName, serviceArn)) {
        serviceArns.push(serviceArn);
      }
    }
  }

  console.log(`Found ${serviceArns.length} consumer services in cluster ${clusterName}`);

  return serviceArns;
};

const isConsumerService = async (clusterName: string, serviceArn: string) => {
  const serviceDetails = await ecsClient.send(
    new DescribeServicesCommand({
      cluster: clusterName,
      services: [serviceArn],
      include: ['TAGS']
    })
  );

  return (
    serviceDetails.services?.[0].tags?.find((tag) => tag.key === 'QueueUrl') !== undefined &&
    serviceDetails.services?.[0].tags?.find((tag) => tag.key === 'AcceptableBacklogPerInstance') !== undefined
  );
};

const calculateBacklogForConsumerService = async (clusterName: string, serviceArn: string) => {
  const serviceDetails = await ecsClient.send(
    new DescribeServicesCommand({
      cluster: clusterName,
      services: [serviceArn],
      include: ['TAGS']
    })
  );

  const desiredCount = serviceDetails.services?.[0].desiredCount || 0;
  const queueName = serviceDetails.services?.[0].tags?.find((tag) => tag.key === 'QueueUrl')?.value || '';

  const queueLength = await getQueueLength(queueName);

  const acceptableBacklogPerInstance = parseInt(
    serviceDetails.services?.[0].tags?.find((tag) => tag.key === 'AcceptableBacklogPerInstance')?.value || '0'
  );

  return {
    serviceName: serviceDetails.services?.[0].serviceName || 'UNKNOWN',
    desiredCount: desiredCount,
    queueLength: queueLength,
    acceptableBacklogPerInstance: acceptableBacklogPerInstance
  };
};

const getQueueLength = async (queueUrl: string) => {
  const queueDetails = await sqsClient.send(
    new GetQueueAttributesCommand({
      QueueUrl: queueUrl,
      AttributeNames: ['ApproximateNumberOfMessages']
    })
  );

  return parseInt(queueDetails.Attributes?.ApproximateNumberOfMessages || '0');
};

const putConsumerInstanceBacklogMetric = async (metrics: MetricDatum[]) => {
  await cloudwatchClient.send(
    new PutMetricDataCommand({
      Namespace: 'ECS/CustomServiceMetrics',
      MetricData: metrics
    })
  );
};

It also relies on some tags on the ECS services. It's a bit messy and not well refined at the moment since I'm still testing and working on edge cases like the scale to zero ones. It loops through all services in a cluster and based on their tags and metrics, defines a new metric called ConsumerInstanceBacklog to do target tracking against.

I'd advise doing something similar. The main issues came on stack updates. You can't create the scaling policy without defining a name and when defining a name I ended up with tons of issues trying to update/replace/rollback etc. I'd recommend not using the above method for those reasons.

zubairzahoor · 2023-08-21T11:29:38Z

@alexbaileyuk I am more comfortable using the above, works well for me. Were there any issues you encounted with scaling-in using the custom resource?

alexbaileyuk · 2023-08-21T11:35:23Z

@zubairzahoor we're going to production later in the week with a more refined version of the code. We've not found any major issues so far.

bmeudre · 2024-01-22T13:28:57Z

Very interesting discussion. I managed to fix it with CDK-only syntax. I hope it helps 😉

const resourceId = `endpoint/${this.endpointName}/variant/${variant.name}`;

// To define min/max values
const target = new ScalableTarget(this, 'ScalableTarget', {
  serviceNamespace: ServiceNamespace.SAGEMAKER,
  minCapacity: variant.autoScale.minCapacity,
  maxCapacity: variant.autoScale.maxCapacity,
  scalableDimension: 'sagemaker:variant:DesiredInstanceCount',
  resourceId,
});

// We need the endpoint before creating the autoscaling policy
target.node.addDependency(endpoint);

const scalingPolicy = new CfnScalingPolicy(this, 'ScalingPolicy', {
  policyName: resourceId,
  scalingTargetId: target.scalableTargetId,
  policyType: 'TargetTrackingScaling',
  targetTrackingScalingPolicyConfiguration: {
    targetValue: variant.autoScale.targetProcessingTime,
  },
});

// CDK doesn't support math expression in target tracking, adding it in cloudformation manually
scalingPolicy.addPropertyOverride(
  'TargetTrackingScalingPolicyConfiguration.CustomizedMetricSpecification',
  {
    Metrics: [
      {
        Id: 'm1',
        ReturnData: false,
        MetricStat: {
          Stat: 'Average',
          Metric: {
            MetricName: 'TotalProcessingTime',
            Namespace: 'AWS/SageMaker',
            Dimensions: [
              {
                Name: 'EndpointName',
                Value: this.endpointName,
              },
              {
                Name: 'VariantName',
                Value: variant.name,
              },
            ],
          },
        },
      },
      {
        Id: 'm2',
        ReturnData: true,
        Expression: 'FILL(m1, 0)',
      },
    ],
  }
);

mostafafarzaneh added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jun 8, 2022

github-actions bot added the @aws-cdk/aws-applicationautoscaling Related to AWS Application Auto Scaling label Jun 8, 2022

github-actions bot assigned comcalvi Jun 8, 2022

peterwoodworth added p2 needs-cfn This issue is waiting on changes to CloudFormation before it can be addressed. and removed needs-triage This issue or PR still needs to be triaged. labels Jun 9, 2022

comcalvi removed their assignment Jun 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aws_applicationautoscaling: Error: Only direct metrics are supported for Target Tracking. Use Step Scaling or supply a Metric object. #20659

aws_applicationautoscaling: Error: Only direct metrics are supported for Target Tracking. Use Step Scaling or supply a Metric object. #20659

mostafafarzaneh commented Jun 8, 2022 •

edited

mostafafarzaneh commented Jun 9, 2022

peterwoodworth commented Jun 9, 2022

shw1n commented Feb 3, 2023

matthias-pichler-warrify commented Mar 13, 2023

hectorsouthern commented Mar 21, 2023

SamStephens commented Jul 4, 2023

zubairzahoor commented Jul 27, 2023

alexbaileyuk commented Aug 2, 2023 •

edited

zubairzahoor commented Aug 7, 2023

alexbaileyuk commented Aug 7, 2023

zubairzahoor commented Aug 21, 2023

alexbaileyuk commented Aug 21, 2023

bmeudre commented Jan 22, 2024

aws_applicationautoscaling: Error: Only direct metrics are supported for Target Tracking. Use Step Scaling or supply a Metric object. #20659

aws_applicationautoscaling: Error: Only direct metrics are supported for Target Tracking. Use Step Scaling or supply a Metric object. #20659

Comments

mostafafarzaneh commented Jun 8, 2022 • edited

Describe the bug

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

CDK CLI Version

Framework Version

Node.js Version

OS

Language

Language Version

Other information

mostafafarzaneh commented Jun 9, 2022

peterwoodworth commented Jun 9, 2022

shw1n commented Feb 3, 2023

matthias-pichler-warrify commented Mar 13, 2023

hectorsouthern commented Mar 21, 2023

SamStephens commented Jul 4, 2023

zubairzahoor commented Jul 27, 2023

alexbaileyuk commented Aug 2, 2023 • edited

zubairzahoor commented Aug 7, 2023

alexbaileyuk commented Aug 7, 2023

zubairzahoor commented Aug 21, 2023

alexbaileyuk commented Aug 21, 2023

bmeudre commented Jan 22, 2024

mostafafarzaneh commented Jun 8, 2022 •

edited

alexbaileyuk commented Aug 2, 2023 •

edited