(aws-elasticache): unable to make simple changes to redis cache without encountering failover error #13389

nicksbrandon · 2021-03-04T10:01:29Z

We are unable to make simple changes, for example editing billing tags on Redis cache nodes via CDK, when the same change is possible via the AWS Console and CLI. To allow for easy horizontal scaling of our Elasticache Redis clusters, i.e. by adding more shards, we have cluster mode enabled. As we tolerate cold caches and performance of a single node per shard is sufficient we do not deploy read replicas. When we attempt to change the billing tags via CDK we receive the error “Replication group must have at least one read replica to enable autofailover”.

Reproduction Steps

Here is the test code (I have obscured subnets etc. in the sample)

import { Stack, StackProps, Construct, Tags } from "@aws-cdk/core";
import { CfnReplicationGroup } from "@aws-cdk/aws-elasticache";
 
export class RedisTestStack extends Stack {
  constructor(scope: Construct, id: string, props?: StackProps) {
    super(scope, id, props);
 
    // Create Redis cache cluster
    var redisCluster = new CfnReplicationGroup(this, "TestCluster", {
      automaticFailoverEnabled: true,
      cacheNodeType: "cache.t2.micro",
      cacheParameterGroupName: "default.redis5.0.cluster.on",
      cacheSubnetGroupName: "xxxxxxxxx",
      engine: "redis",
      engineVersion: "5.0.6",
      numNodeGroups: 1,
      replicationGroupDescription: "Test Cluster",
      replicationGroupId: "TestCluster",
      replicasPerNodeGroup: 0,
      securityGroupIds: ["xxxxxxxx", "xxxxxxx"]
    });
 
    // Now add a tag
    Tags.of(redisCluster).add('Tag1', 'Value1')
  }
}

I can deploy the first part without issue. However when I add the Tagging code ..

    // Now add a tag
    Tags.of(redisCluster).add('Tag1', 'Value1')

.. and attempt to redeploy I get an error.

I can tag the redis cluster using the aws cli without any issue:

aws elasticache add-tags-to-resource --resource-name arn:aws:elasticache:eu-west-1:999999999999:cluster:testcluster-0001-001 --tags Key=tag2,Value=value2

I can also manually tag in the AWS console.

What did you expect to happen?

I expected the cache to be tagged without service interruption similar to when using the CLI

What actually happened?

I got the following error when I added the Tagging code and attempted to redeploy.

TestCluster Replication group must have at least one read replica to enable autofailover. (Service: AmazonElastiCache; Status Code: 400; Error Code: InvalidReplicationGroupState; 
Request ID: d74aa3d1-4ef5-472b-8eca-e5fdb508a7d4; Proxy: null)

This issue is not identified with synth or in diff.

Environment

I have tried with CDK 1.73.0 and 1.90.1 - Same result.

Here is the package.json file from the test

{
  "name": "redis_test",
  "version": "0.1.0",
  "bin": {
    "redis_test": "bin/redis_test.js"
  },
  "scripts": {
    "build": "tsc",
    "watch": "tsc -w",
    "test": "jest",
    "cdk": "cdk"
  },
  "devDependencies": {
    "@aws-cdk/assert": "1.73.0",
    "@types/jest": "^26.0.10",
    "@types/node": "10.17.27",
    "aws-cdk": "1.73.0",
    "jest": "^26.4.2",
    "ts-jest": "^26.2.0",
    "ts-node": "^8.1.0",
    "typescript": "~3.9.7"
  },
  "dependencies": {
    "@aws-cdk/aws-elasticache": "1.90.1",
    "@aws-cdk/core": "1.90.1",
    "source-map-support": "^0.5.16"
  }
}

Other

I would like to provision a Redis cache with cluster mode enabled so we can modify the number of shards if later required. That was straightforward to deploy initially as a single shard. However I then modified the CDK to tag the Redis cache with tags. This change failed to apply, complaining that it was unable to failover (only one node in the replication group). I could resolve this issue by adding read replicas to each shard but the function of this cache does not require a failover node and the additional cost is then undesirable. Tagging the node similarly outside of CDK does not require read replicas or for the node to be taken out of service.

I experimented with a Redis cache (Cluster mode disabled). I can tag the cache with a subsequent modification to the CDK project. However I cannot then add shards and, in the event I needed to scale, I would need to destroy the cache and create as cluster mode enabled. This is again not ideal.

I understand that a single shard without a read replica cannot support failover but it is not clear why CDK code demands failover for tagging the cache. Within the AWS console I can easily tag the nodes without any interruption to service.

If you could please advise what CDK configuration I am missing to make this possible that would be appreciated.

This is 🐛 Bug Report

The text was updated successfully, but these errors were encountered:

iliapolo · 2021-03-11T13:14:54Z

@nicksbrandon Thanks for reporting this. I can confirm the behavior you describe here.

I'm investigating this.

iliapolo · 2021-03-25T15:26:10Z

@nicksbrandon It seems your use-case surfaced a missing validation on the creation of replication group that has cluster mode enabled. The missing validation incorrectly allows the creation of a cluster without replicas but with auto-failover enabled. The service team is aware of the problem and already looking to fix the issue. While that's not resolved, please make sure to add replicas when creating an auto-failover enabled cluster.

I understand your intention was not to use replicas, but when auto-failover is enabled, this is actually a requirement.

iliapolo · 2021-03-25T15:29:16Z

The use case described here actually represents an illegal state that should have never been deployed. Deployment succeeded because of a missing validation on the Elasticache service API. Keeping this issue here so we can update and resolve when a fix is available.

nicksbrandon · 2021-03-26T09:25:03Z

@iliapolo,

Many thanks for your response. It's great to hear this has uncovered a tangential issue in Elasticache.

To be clear on a point: We were unable to set automaticFailoverEnabled: false if redis is cluster mode is enabled, and it's important we use redis in cluster mode so that we can easily scale horizontally. This is a use case that is fully supported by Elasticache so we were expecting the same from CDK.
For reference this issue (using automaticFailoverEnabled: false) is only reported at cdk deploy stage.

The message is

[redis name] Redis with cluster mode enabled cannot be created with auto failover turned off

iliapolo · 2021-03-30T09:51:55Z

@nicksbrandon Thats right, when cluster mode is enabled you must set automaticFailoverEnabled to true, which in turn means you must enable replicas as well.

I'm not really sure what you mean by:

This is a use case that is fully supported by Elasticache so we were expecting the same from CDK

It seems that your desired configuration is basically not supported by Elasticache, evidenced by the acknowledgment of the missing validation. I think CDK (and CloudFormation) surface this problem more frequently because of its mode of operation, invoking a full update request on every resource change. When using direct CLI invocation the issue may not happen, depending on exactly which parameters you pass.

But again this is all rooted in a faulty configuration that shouldn't have been allowed to be created in the first place.

nicksbrandon · 2021-03-30T13:20:36Z

@iliapolo

Thanks for your response.

To clarify: where carrying out the same action manually (via the console) and via CDK leads to differing results on the same Redis cache:

Scenario

I want to add a tag to a Redis cache with a single node and no failover (automaticFailoverEnabled: false) ..
2 Options:

Manually - The node is tagged - no interruption to service. Success.
Via CDK - It appears to attempt to take the node offline in order to carry out this action. Given there is no failover then this action fails.

I hope it is clear how these two approaches, to carry out the same action, yield different results. Many thanks.

MarcFletcher · 2021-03-31T10:08:56Z

@iliapolo

What we would like to do is deploy a Redis cluster (i.e. cluster mode enabled = true) with N nodes, disable failover (automaticFailoverEnabled: false) all via CDK. We are able to do this via the AWS console and can subsequently make any number of changes to such a cluster (e.g. editing billing) tags without taking the cluster offline, i.e. the automatic failover is not required, replicas are not required.

It sounds like you might be saying that even this manual workflow should not be possible in the AWS console? If you could confirm/elaborate that would be appreciated.

Regards,
Marc

iliapolo · 2021-04-09T00:04:56Z

@MarcFletcher Yes it does sound like the console should not have allowed this configuration as well. I'll let @NGL321 Follow up.

github-actions · 2022-09-20T08:08:51Z

This issue has not received any attention in 1 year. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

MarcFletcher · 2022-09-20T08:24:10Z

Yea, this is still a known issue for us.

nicksbrandon added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Mar 4, 2021

peterwoodworth changed the title ~~(aws-elasicache): unable to make simple changes to redis cache without encountering failover error~~ (aws-elasticache): unable to make simple changes to redis cache without encountering failover error Mar 4, 2021

github-actions bot added the @aws-cdk/aws-elasticache Related to Amazon ElastiCache label Mar 4, 2021

github-actions bot assigned iliapolo Mar 4, 2021

iliapolo added the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Mar 11, 2021

iliapolo assigned NGL321 and unassigned iliapolo Apr 9, 2021

NGL321 removed their assignment Sep 20, 2021

github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Sep 20, 2022

github-actions bot removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Sep 20, 2022

peterwoodworth added service-api This issue is due to a problem in a service API and removed blocked Work is blocked on this issue for this codebase. Other labels or comments may indicate why. labels May 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(aws-elasticache): unable to make simple changes to redis cache without encountering failover error #13389

(aws-elasticache): unable to make simple changes to redis cache without encountering failover error #13389

nicksbrandon commented Mar 4, 2021

iliapolo commented Mar 11, 2021

iliapolo commented Mar 25, 2021

iliapolo commented Mar 25, 2021

nicksbrandon commented Mar 26, 2021 •

edited

Loading

iliapolo commented Mar 30, 2021

nicksbrandon commented Mar 30, 2021

MarcFletcher commented Mar 31, 2021

iliapolo commented Apr 9, 2021

github-actions bot commented Sep 20, 2022

MarcFletcher commented Sep 20, 2022

(aws-elasticache): unable to make simple changes to redis cache without encountering failover error #13389

(aws-elasticache): unable to make simple changes to redis cache without encountering failover error #13389

Comments

nicksbrandon commented Mar 4, 2021

Reproduction Steps

What did you expect to happen?

What actually happened?

Environment

Other

iliapolo commented Mar 11, 2021

iliapolo commented Mar 25, 2021

iliapolo commented Mar 25, 2021

nicksbrandon commented Mar 26, 2021 • edited Loading

iliapolo commented Mar 30, 2021

nicksbrandon commented Mar 30, 2021

MarcFletcher commented Mar 31, 2021

iliapolo commented Apr 9, 2021

github-actions bot commented Sep 20, 2022

MarcFletcher commented Sep 20, 2022

nicksbrandon commented Mar 26, 2021 •

edited

Loading