Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8s multiregion eks #7782

Merged
merged 4 commits into from
Aug 27, 2020
Merged

K8s multiregion eks #7782

merged 4 commits into from
Aug 27, 2020

Conversation

taroface
Copy link
Contributor

Fixes #4314.
Relates to cockroachdb/cockroach#51775, which adds cockroachdb-statefulset-secure-eks.yaml, dnb-lb-eks.yaml, and configmap.yaml to the multiregion repo.
Also addresses cockroachdb/cockroach#51471.

  • Draft EKS multi-region deployment (manual steps; no script)
  • Under Create inbound rules, I'm not sure if the "Inter-node and load balancer-node communication" still fits for this rule on multi-region. It seems the load balancer points at CoreDNS on port 53, and CoreDNS provides the load balancer-node communication. Is this right?
  • Since add guidance for CPU requirements on single-cluster k8s deployment #5922 appears to still be unresolved, there is no guidance here on setting a CPU request and/or limit.
  • Bob's PR implied (to me) that --cache and --max-sql-memory are now dynamically set in the cockroach start command, so I removed this from the multi- and single-region guides.
    • Also, I am not sure if the line "To avoid running out of memory when CockroachDB is not the only pod on a Kubernetes node, you must set memory limits explicitly. This is because CockroachDB does not detect the amount of memory allocated to its pod when run in Kubernetes." is still accurate after that PR. I have an instruction here to add a memory request & limit, but am not clear on whether there is still a risk of OOM, after Bob's changes, if you don't set limits.

@taroface taroface requested a review from DuskEagle July 23, 2020 23:22
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@taroface
Copy link
Contributor Author

@DuskEagle friendly ping on review!

Copy link
Member

@DuskEagle DuskEagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the load balancer points at CoreDNS on port 53, and CoreDNS provides the load balancer-node communication
CoreDNS is only providing hostname->IP address translation in the multiregion setup. All actual application traffic is still flowing directly from CRDB node A to CRDB node B.

Bob's PR implied (to me) that --cache and --max-sql-memory are now dynamically set in the cockroach start command, so I removed this from the multi- and single-region guides.
Yep, Bob's PR updated our example K8s manifests to set max-sql-memory and cache automatically.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DuskEagle and @taroface)


_includes/v20.1/prod-deployment/aws-inbound-rules.md, line 24 at r1 (raw file):

You can set your network IP by selecting "My IP" in the Source field.

#### Load balancer-health check communication

Whether or not this is needed depends on how you're deploying CRDB on AWS. If you're running the binary directly on the VMs, or running it in a container which is using host networking, then you will need this. If instead you're running on Kubernetes, and you configure a K8s service of type LoadBalancer, then you don't need this rule, as the AWS load balancer will not be directly sending health checks to the AdminUI.


v20.1/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 227 at r1 (raw file):

    ~~~

    Each command creates three EKS instances in a region, one for each CockroachDB node you will deploy. Note that the Kubernetes scheduler automatically assigns each instance to a different availability zone in the region.

This isn't quite accurate. It's not the Kubernetes scheduler that creates the instances in 3 different availability zones. The eksctl script creates an autoscaling group which spans the different availability zones. The autoscaling group provisions EC2 instances across its AZs. So in a three-node cluster where each VM is in a separate AZ already, the K8s scheduler is not involved in assigning different AZs to each node.


v20.1/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 379 at r1 (raw file):

           cache 10
           forward . ip1 ip2 ip3 {      # <---- Modify
                force_tcp            # <---- Modify

A user following these instructions will not need to modify the force_tcp line. They'll only need to modify this line if they configured their Network Load Balancer to accept UDP traffic (which isn't covered in this doc, and probably doesn't have to be).

Copy link
Contributor Author

@taroface taroface left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DuskEagle)


_includes/v20.1/prod-deployment/aws-inbound-rules.md, line 24 at r1 (raw file):

Previously, DuskEagle (Joel Kenny) wrote…

Whether or not this is needed depends on how you're deploying CRDB on AWS. If you're running the binary directly on the VMs, or running it in a container which is using host networking, then you will need this. If instead you're running on Kubernetes, and you configure a K8s service of type LoadBalancer, then you don't need this rule, as the AWS load balancer will not be directly sending health checks to the AdminUI.

Done. Thank you for this context.


v20.1/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 227 at r1 (raw file):

Previously, DuskEagle (Joel Kenny) wrote…

This isn't quite accurate. It's not the Kubernetes scheduler that creates the instances in 3 different availability zones. The eksctl script creates an autoscaling group which spans the different availability zones. The autoscaling group provisions EC2 instances across its AZs. So in a three-node cluster where each VM is in a separate AZ already, the K8s scheduler is not involved in assigning different AZs to each node.

Done.


v20.1/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 379 at r1 (raw file):

Previously, DuskEagle (Joel Kenny) wrote…

A user following these instructions will not need to modify the force_tcp line. They'll only need to modify this line if they configured their Network Load Balancer to accept UDP traffic (which isn't covered in this doc, and probably doesn't have to be).

Done.

@taroface
Copy link
Contributor Author

TFTR @DuskEagle !

@@ -1,8 +1,12 @@
CockroachDB offers a pre-built `workload` binary for Linux that includes several load generators for simulating client traffic against your cluster. This step features CockroachDB's version of the [TPC-C](http://www.tpc.org/tpcc/) workload.

{{site.data.alerts.callout_info}}
Be sure that you have [set up an inbound rule](deploy-cockroachdb-on-aws-insecure.html#step-2-configure-your-network) that allows traffic from the application to the load balancer. In this case, you will run the sample workload on one of your instances. The traffic source for your inbound rule should therefore be the **internal (private)** IP address of that instance. To find this, open the Instances section of the Amazon EC2 console and click on the instance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should "Instances" be "Instances" since we are referring to a UI element? If yes, need to update other places where it appears.

Copy link
Contributor

@jseldess jseldess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to take me a while to get through this, so I'm submitting things as I find them.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DuskEagle, @jseldess, and @taroface)


v20.1/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 199 at r3 (raw file):

    --nodes 3 \
    --node-ami auto \
    --region <aws-region-1>

Need to add \ to the end of the line; otherwise, pasting into the terminal doesn't work as expected.


v20.1/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 211 at r3 (raw file):

    --nodes 3 \
    --node-ami auto \
    --region <aws-region-2>

Need to add \ to the end of the line; otherwise, pasting into the terminal doesn't work as expected.


v20.1/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 223 at r3 (raw file):

    --nodes 3 \
    --node-ami auto \
    --region <aws-region-3>

Need to add \ to the end of the line; otherwise, pasting into the terminal doesn't work as expected.


v20.1/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 955 at r3 (raw file):

<section class="filter-content" markdown="1" data-scope="eks">
## Step 6. Access the Admin UI

This is the wrong language for this step.


v20.2/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 199 at r3 (raw file):

    --nodes 3 \
    --node-ami auto \
    --region <aws-region-1>

Need to add \ to the end of the line; otherwise, pasting into the terminal doesn't work as expected.


v20.2/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 211 at r3 (raw file):

    --nodes 3 \
    --node-ami auto \
    --region <aws-region-2>

Need to add \ to the end of the line; otherwise, pasting into the terminal doesn't work as expected.


v20.2/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 223 at r3 (raw file):

    --nodes 3 \
    --node-ami auto \
    --region <aws-region-3>

Need to add \ to the end of the line; otherwise, pasting into the terminal doesn't work as expected.


v20.2/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 955 at r3 (raw file):

<section class="filter-content" markdown="1" data-scope="eks">
## Step 6. Access the Admin UI

This is the wrong language for this step.

Copy link
Contributor

@jseldess jseldess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DuskEagle, @jseldess, and @taroface)


v20.1/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 200 at r3 (raw file):

    --node-ami auto \
    --region <aws-region-1>
    --vpc-cidr=<ip-range-1>

We don't use = for the other flags, so we should remove it here, too.


v20.1/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 212 at r3 (raw file):

    --node-ami auto \
    --region <aws-region-2>
    --vpc-cidr=<ip-range-2>

We don't use = for the other flags, so we should remove it here, too.


v20.1/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 224 at r3 (raw file):

    --node-ami auto \
    --region <aws-region-3>
    --vpc-cidr=<ip-range-3>

We don't use = for the other flags, so we should remove it here, too.


v20.2/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 200 at r3 (raw file):

    --node-ami auto \
    --region <aws-region-1>
    --vpc-cidr=<ip-range-1>

We don't use = for the other flags, so we should remove it here, too.


v20.2/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 212 at r3 (raw file):

    --node-ami auto \
    --region <aws-region-2>
    --vpc-cidr=<ip-range-2>

We don't use = for the other flags, so we should remove it here, too.


v20.2/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 224 at r3 (raw file):

    --node-ami auto \
    --region <aws-region-3>
    --vpc-cidr=<ip-range-3>

We don't use = for the other flags, so we should remove it here, too.

@jseldess
Copy link
Contributor


v20.2/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 261 at r3 (raw file):

    {% include copy-clipboard.html %}
    ~~~ shell
    kubectl create namespace <cluster-namespace> --context=<cluster-context>

We use = for this flag but not for namespace. This is a nit, but I'd prefer we be consistent about this throughout.

@jseldess
Copy link
Contributor


v20.1/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 296 at r3 (raw file):

### Create inbound rules

For each region, navigate to the Security Groups section of the [Amazon EC2 console](https://console.aws.amazon.com/ec2/) and locate the security group that enables communication between nodes int he cluster. It should have a name like `ClusterSharedNodeSecurityGroup`. [Add Custom TCP inbound rules](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html#adding-security-group-rule) to this security group to allow TCP communication on two ports:

int he > in the

@jseldess
Copy link
Contributor


v20.2/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 296 at r3 (raw file):

### Create inbound rules

For each region, navigate to the Security Groups section of the [Amazon EC2 console](https://console.aws.amazon.com/ec2/) and locate the security group that enables communication between nodes int he cluster. It should have a name like `ClusterSharedNodeSecurityGroup`. [Add Custom TCP inbound rules](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html#adding-security-group-rule) to this security group to allow TCP communication on two ports:

int he > in the

Copy link
Contributor

@jseldess jseldess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I don't have the time to run through the entire tutorial. I made it up to configuring coredns. It's an incredibly complex process, and the documentation is stellar. I'm going to say

:lgtm_strong:

despite not testing the entirely. Ryan has run through this many times, and I'd rather get this live and in front of users. When we get feedback, we'll address it.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DuskEagle, @jseldess, and @taroface)


v20.2/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 241 at r3 (raw file):

    {% include copy-clipboard.html %}
    ~~~ shell
    $ kubectl config get-contexts

This is a gke command.

Copy link
Contributor Author

@taroface taroface left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DuskEagle and @jseldess)


_includes/v20.1/prod-deployment/insecure-test-load-balancing.md, line 4 at r3 (raw file):

Previously, Amruta-Ranade (Amruta Ranade) wrote…

nit: Should "Instances" be "Instances" since we are referring to a UI element? If yes, need to update other places where it appears.

Instances is more a general term, not specific to UI, so this should be safe to leave without formatting.


v20.1/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 199 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

Need to add \ to the end of the line; otherwise, pasting into the terminal doesn't work as expected.

Done.


v20.1/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 200 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

We don't use = for the other flags, so we should remove it here, too.

Done.


v20.1/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 211 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

Need to add \ to the end of the line; otherwise, pasting into the terminal doesn't work as expected.

Done.


v20.1/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 212 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

We don't use = for the other flags, so we should remove it here, too.

Done.


v20.1/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 223 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

Need to add \ to the end of the line; otherwise, pasting into the terminal doesn't work as expected.

Done.


v20.1/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 224 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

We don't use = for the other flags, so we should remove it here, too.

Done.


v20.1/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 296 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

int he > in the

Done.


v20.1/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 955 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

This is the wrong language for this step.

Done.


v20.2/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 199 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

Need to add \ to the end of the line; otherwise, pasting into the terminal doesn't work as expected.

Done.


v20.2/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 200 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

We don't use = for the other flags, so we should remove it here, too.

Done.


v20.2/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 211 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

Need to add \ to the end of the line; otherwise, pasting into the terminal doesn't work as expected.

Done.


v20.2/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 212 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

We don't use = for the other flags, so we should remove it here, too.

Done.


v20.2/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 223 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

Need to add \ to the end of the line; otherwise, pasting into the terminal doesn't work as expected.

Done.


v20.2/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 224 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

We don't use = for the other flags, so we should remove it here, too.

Done.


v20.2/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 241 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

This is a gke command.

kubectl is general to Kubernetes, so is also used with EKS.


v20.2/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 261 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

We use = for this flag but not for namespace. This is a nit, but I'd prefer we be consistent about this throughout.

Done.


v20.2/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 296 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

int he > in the

Done.


v20.2/orchestrate-cockroachdb-with-kubernetes-multi-cluster.md, line 955 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

This is the wrong language for this step.

Done.

@taroface taroface merged commit b3998b7 into master Aug 27, 2020
@taroface taroface deleted the k8s-multiregion-eks branch August 27, 2020 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Write multi-region EKS (Amazon) Kubernetes tutorial
5 participants