Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(aws-elasticache): unable to make simple changes to redis cache without encountering failover error #13389

Open
nicksbrandon opened this issue Mar 4, 2021 · 10 comments
Labels
@aws-cdk/aws-elasticache Related to Amazon ElastiCache bug This issue is a bug. effort/small Small work item – less than a day of effort p2 service-api This issue is due to a problem in a service API

Comments

@nicksbrandon
Copy link

We are unable to make simple changes, for example editing billing tags on Redis cache nodes via CDK, when the same change is possible via the AWS Console and CLI. To allow for easy horizontal scaling of our Elasticache Redis clusters, i.e. by adding more shards, we have cluster mode enabled. As we tolerate cold caches and performance of a single node per shard is sufficient we do not deploy read replicas. When we attempt to change the billing tags via CDK we receive the error “Replication group must have at least one read replica to enable autofailover”.

Reproduction Steps

Here is the test code (I have obscured subnets etc. in the sample)

import { Stack, StackProps, Construct, Tags } from "@aws-cdk/core";
import { CfnReplicationGroup } from "@aws-cdk/aws-elasticache";
 
export class RedisTestStack extends Stack {
  constructor(scope: Construct, id: string, props?: StackProps) {
    super(scope, id, props);
 
    // Create Redis cache cluster
    var redisCluster = new CfnReplicationGroup(this, "TestCluster", {
      automaticFailoverEnabled: true,
      cacheNodeType: "cache.t2.micro",
      cacheParameterGroupName: "default.redis5.0.cluster.on",
      cacheSubnetGroupName: "xxxxxxxxx",
      engine: "redis",
      engineVersion: "5.0.6",
      numNodeGroups: 1,
      replicationGroupDescription: "Test Cluster",
      replicationGroupId: "TestCluster",
      replicasPerNodeGroup: 0,
      securityGroupIds: ["xxxxxxxx", "xxxxxxx"]
    });
 
    // Now add a tag
    Tags.of(redisCluster).add('Tag1', 'Value1')
  }
}

I can deploy the first part without issue. However when I add the Tagging code ..

    // Now add a tag
    Tags.of(redisCluster).add('Tag1', 'Value1')

.. and attempt to redeploy I get an error.

I can tag the redis cluster using the aws cli without any issue:

aws elasticache add-tags-to-resource --resource-name arn:aws:elasticache:eu-west-1:999999999999:cluster:testcluster-0001-001 --tags Key=tag2,Value=value2

I can also manually tag in the AWS console.

What did you expect to happen?

I expected the cache to be tagged without service interruption similar to when using the CLI

What actually happened?

I got the following error when I added the Tagging code and attempted to redeploy.

TestCluster Replication group must have at least one read replica to enable autofailover. (Service: AmazonElastiCache; Status Code: 400; Error Code: InvalidReplicationGroupState; 
Request ID: d74aa3d1-4ef5-472b-8eca-e5fdb508a7d4; Proxy: null)

This issue is not identified with synth or in diff.

Environment

I have tried with CDK 1.73.0 and 1.90.1 - Same result.

Here is the package.json file from the test

{
  "name": "redis_test",
  "version": "0.1.0",
  "bin": {
    "redis_test": "bin/redis_test.js"
  },
  "scripts": {
    "build": "tsc",
    "watch": "tsc -w",
    "test": "jest",
    "cdk": "cdk"
  },
  "devDependencies": {
    "@aws-cdk/assert": "1.73.0",
    "@types/jest": "^26.0.10",
    "@types/node": "10.17.27",
    "aws-cdk": "1.73.0",
    "jest": "^26.4.2",
    "ts-jest": "^26.2.0",
    "ts-node": "^8.1.0",
    "typescript": "~3.9.7"
  },
  "dependencies": {
    "@aws-cdk/aws-elasticache": "1.90.1",
    "@aws-cdk/core": "1.90.1",
    "source-map-support": "^0.5.16"
  }
}

Other

I would like to provision a Redis cache with cluster mode enabled so we can modify the number of shards if later required. That was straightforward to deploy initially as a single shard. However I then modified the CDK to tag the Redis cache with tags. This change failed to apply, complaining that it was unable to failover (only one node in the replication group). I could resolve this issue by adding read replicas to each shard but the function of this cache does not require a failover node and the additional cost is then undesirable. Tagging the node similarly outside of CDK does not require read replicas or for the node to be taken out of service.

I experimented with a Redis cache (Cluster mode disabled). I can tag the cache with a subsequent modification to the CDK project. However I cannot then add shards and, in the event I needed to scale, I would need to destroy the cache and create as cluster mode enabled. This is again not ideal.

I understand that a single shard without a read replica cannot support failover but it is not clear why CDK code demands failover for tagging the cache. Within the AWS console I can easily tag the nodes without any interruption to service.

If you could please advise what CDK configuration I am missing to make this possible that would be appreciated.


This is 🐛 Bug Report

@nicksbrandon nicksbrandon added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Mar 4, 2021
@peterwoodworth peterwoodworth changed the title (aws-elasicache): unable to make simple changes to redis cache without encountering failover error (aws-elasticache): unable to make simple changes to redis cache without encountering failover error Mar 4, 2021
@github-actions github-actions bot added the @aws-cdk/aws-elasticache Related to Amazon ElastiCache label Mar 4, 2021
@iliapolo
Copy link
Contributor

@nicksbrandon Thanks for reporting this. I can confirm the behavior you describe here.

I'm investigating this.

@iliapolo iliapolo added the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Mar 11, 2021
@iliapolo
Copy link
Contributor

@nicksbrandon It seems your use-case surfaced a missing validation on the creation of replication group that has cluster mode enabled. The missing validation incorrectly allows the creation of a cluster without replicas but with auto-failover enabled. The service team is aware of the problem and already looking to fix the issue. While that's not resolved, please make sure to add replicas when creating an auto-failover enabled cluster.

I understand your intention was not to use replicas, but when auto-failover is enabled, this is actually a requirement.

@iliapolo iliapolo added effort/small Small work item – less than a day of effort p2 blocked Work is blocked on this issue for this codebase. Other labels or comments may indicate why. and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. needs-triage This issue or PR still needs to be triaged. labels Mar 25, 2021
@iliapolo
Copy link
Contributor

The use case described here actually represents an illegal state that should have never been deployed. Deployment succeeded because of a missing validation on the Elasticache service API. Keeping this issue here so we can update and resolve when a fix is available.

@nicksbrandon
Copy link
Author

nicksbrandon commented Mar 26, 2021

@iliapolo,

Many thanks for your response. It's great to hear this has uncovered a tangential issue in Elasticache.

To be clear on a point: We were unable to set automaticFailoverEnabled: false if redis is cluster mode is enabled, and it's important we use redis in cluster mode so that we can easily scale horizontally. This is a use case that is fully supported by Elasticache so we were expecting the same from CDK.
For reference this issue (using automaticFailoverEnabled: false) is only reported at cdk deploy stage.

The message is

[redis name] Redis with cluster mode enabled cannot be created with auto failover turned off

@iliapolo
Copy link
Contributor

@nicksbrandon Thats right, when cluster mode is enabled you must set automaticFailoverEnabled to true, which in turn means you must enable replicas as well.

I'm not really sure what you mean by:

This is a use case that is fully supported by Elasticache so we were expecting the same from CDK

It seems that your desired configuration is basically not supported by Elasticache, evidenced by the acknowledgment of the missing validation. I think CDK (and CloudFormation) surface this problem more frequently because of its mode of operation, invoking a full update request on every resource change. When using direct CLI invocation the issue may not happen, depending on exactly which parameters you pass.

But again this is all rooted in a faulty configuration that shouldn't have been allowed to be created in the first place.

@nicksbrandon
Copy link
Author

@iliapolo

Thanks for your response.

To clarify: where carrying out the same action manually (via the console) and via CDK leads to differing results on the same Redis cache:

Scenario

I want to add a tag to a Redis cache with a single node and no failover (automaticFailoverEnabled: false) ..
2 Options:

  1. Manually - The node is tagged - no interruption to service. Success.
  2. Via CDK - It appears to attempt to take the node offline in order to carry out this action. Given there is no failover then this action fails.

I hope it is clear how these two approaches, to carry out the same action, yield different results. Many thanks.

@MarcFletcher
Copy link

@iliapolo

What we would like to do is deploy a Redis cluster (i.e. cluster mode enabled = true) with N nodes, disable failover (automaticFailoverEnabled: false) all via CDK. We are able to do this via the AWS console and can subsequently make any number of changes to such a cluster (e.g. editing billing) tags without taking the cluster offline, i.e. the automatic failover is not required, replicas are not required.

It sounds like you might be saying that even this manual workflow should not be possible in the AWS console? If you could confirm/elaborate that would be appreciated.

Regards,
Marc

@iliapolo
Copy link
Contributor

iliapolo commented Apr 9, 2021

@MarcFletcher Yes it does sound like the console should not have allowed this configuration as well. I'll let @NGL321 Follow up.

@iliapolo iliapolo assigned NGL321 and unassigned iliapolo Apr 9, 2021
@NGL321 NGL321 removed their assignment Sep 20, 2021
@github-actions
Copy link

This issue has not received any attention in 1 year. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

@github-actions github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Sep 20, 2022
@MarcFletcher
Copy link

Yea, this is still a known issue for us.

@github-actions github-actions bot removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Sep 20, 2022
@peterwoodworth peterwoodworth added service-api This issue is due to a problem in a service API and removed blocked Work is blocked on this issue for this codebase. Other labels or comments may indicate why. labels May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-elasticache Related to Amazon ElastiCache bug This issue is a bug. effort/small Small work item – less than a day of effort p2 service-api This issue is due to a problem in a service API
Projects
None yet
Development

No branches or pull requests

5 participants