Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new example: Redis Cluster Stateful MIG #1282

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

shunjikawabata
Copy link
Contributor

This example is to deploy high availability Redis clusters on Compute engine. It assumes that you're familiar with the reference architecture in High availability stateful clusters on Compute Engine.

@shunjikawabata
Copy link
Contributor Author

/gcbrun

1 similar comment
@yashwantmahawar
Copy link

/gcbrun

4. Clone the Git repository, which contains the scripts and the manifest files to deploy and configure the example workload:

```
git clone https://github.com/GoogleCloudPlatform/XXX
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps include the name of this github repository and not xxx

2. Enable APIs required for Packer build

```
gcloud services enable sourcerepo.googleapis.com compute.googleapis.com servicemanagement.googleapis.com storage-api.googleapis.com cloudbuild.googleapis.com artifactregistry.googleapis.com
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we really need sourcerepo?

PROJECT="your-project-id"
REGION="us-central1"

git clone https://github.com/GoogleCloudPlatform/cloud-builders-community.git
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you seem to have committed the whole cloud-builders-community in this PR, you should remove it.

5. Create a Consul image using Packer and deploy the cluster

```
PROJECT="your-project-id"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to redefine these environment variables again. right? thanks

```

6. Deploy Consul cluster
Update `consul-cluster/terraform.tfvars` with your project ID, region, and consul source image you've created above. To check the image name, run:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its really helpful if you could include some commands that do this,

users often just run commands and don't read the description.

@@ -0,0 +1,20 @@
<!--
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this whole directory shouldn't be committed.


# update the one ip from internal subnet range.
static_ip_address = var.consul_static_ip_address
static_ip_name = var.consul_static_ip_name
Copy link
Contributor

@bmenasha bmenasha May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static_ip_name doesn't seem to be used

instance_group_id = var.consul_instance_group_id
network = var.consul_network
subnet = var.consul_subnetwork
project_id = var.project_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to supply the region with value var.region

apt-get install consul -y
cat > /etc/consul.d/redis-master-role.sh <<EOF
#!/bin/bash
if [[ master == \$(redis-cli -h $my_ip info replication | tr -d '\r' | grep role: | cut -d : -f2 ) ]]; then echo 'matched'; else slave; fi
Copy link
Contributor

@bmenasha bmenasha May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be "else exit -1" , otherwise it exists with "command slave not found" which is odd. still a non zero exit status, but not as clean.
thanks

}

# Create the Instance Template that will be used to populate the Managed Instance Group.
resource "google_compute_instance_template" "consul_server" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the consul cluster doesn't work for me because dnsmasq is binding to localhost and not to 0.0.0.0/all interfaces. i had to edit /etc/dnsmasq.d/10-consul and delete the line

listen-address=127.0.0.1

and that wasn't the only problem. i'm actually not sure why consul isn't responding to DNS requests sent to the forwarding rule ip, but it isn't.

this is the last review comment i'm making at the moment. but hopefully you can figure out the issue ?

thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dnsmasq seems to respond only when listen-address contains the forwarding rule ip?

listen-address=10.128.0.32

i couldn't get it to work otherwise.


![Redis Architecture](./assets/redis_architecture.png "Redis Architecture")

- At minimum, 1 set of 3 nodes consisting of a master node and its replicas is deployed to ensure high availability. If multiple sets representing shards are deployed, each master in sets should reside in distinct zones. This provides resilience against single VM failures and zonal failures.
Copy link
Contributor

@bmenasha bmenasha May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of having one DNS name for all primary and another for all replica nodes?

We are starting 6 nodes. And we configure the cluster with 1 replica for each primary. This means there are 3 shards each containing a replica and primary.

Yet we only have two DNS names. How does this work?

Consul isn't shard aware, and even if it was, how would the client know which shard a key is in?

To use Redis, I'll want to get or set a key "X", what server should i talk to has the shard that contains this key?
Consul will return 3 hosts as the primary, but only one of those nodes contains shard for key X, how do I know which one to use? Consul isn't telling me because it doesn't know.

I expect the only way to properly communicate to Redis Cluster is to use the Redis Cluster protocol. And if we are doing that, what's the point of having different DNS names for the primary and replica nodes as all service discovery is done via the cluster protocol. Then it doesn't matter if we initially connect to a replica or master or what shard we connect to.

Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, relying on the Redis Cluster protocol or a smart client/proxy is usually the most effective way to handle service discovery and shard routing in a Redis Cluster environment. The primary reason to having separate DNS names for primaries and replicas in this example is to showcase how the reference architecture in High availability stateful clusters on Compute Engine works with real stateful applications like a Redis Cluster.
Although there are tradeoffs, this architecture would be applicable to a massive scale Redis Cluster which requires manual sharding, meaning having multiple sets of Redis Clusters with a primary and replicas per a subset of entire key ranges for better performance. While less common, this approach can also offer some basic load balancing for read operations, especially in scenarios where you might want to direct some read traffic to replicas to offload primaries.
Probably these things should be clarified in the README. I'll update it.

Copy link
Contributor

@bmenasha bmenasha May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relying on the Redis Cluster protocol or a smart client/proxy is usually the most effective way to handle service discovery and shard routing in a Redis Cluster environment.

It's not a matter of efficiency, i just don't see how it works at all.

this architecture would be applicable to a massive scale Redis Cluster which requires manual sharding, meaning having multiple sets of Redis Clusters with a primary and replicas per a subset of entire key ranges for better performance.

Once you determine cluster to talk to, how do you choose the node within the cluster using this example? I can ONLY understand how this Consul/DNS based routing between primary and replica might work when the cluster has a single shard. If so, that is a big limitation, and something we should document and explain and then of course change the example to only provision a single shard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will rely on Redis Cluster protocol once the client resolved the IP through DNS/Consul, and further communication will happen between the client and the node directly. Probably adding an example client with Redis configurations would help the readers to understand how it works?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. i do think it will be helpful to show how clients connect. maybe just using redis-cli.
if Redis Cluster protocol is used, what value is there in having two DNS names?
thanks

@agold-rh
Copy link
Contributor

/gcbrun

@agold-rh
Copy link
Contributor

/gcbrun

@agold-rh
Copy link
Contributor

@shunjikawabata If you can address the comments by @bmenasha , I can get this merged.

@shunjikawabata
Copy link
Contributor Author

Thanks @agold-rh and @bmenasha for the comments/suggestions. We'll take a look and try fixing them.
cc: @yashwantmahawar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants