From 2d964f6c054b4a9eb8ca175f44b3b8e75c01708e Mon Sep 17 00:00:00 2001 From: Dale McDiarmid Date: Thu, 30 Jan 2025 20:47:13 +0000 Subject: [PATCH 1/2] Update byoc.md --- docs/en/cloud/reference/byoc.md | 122 ++++++++++++++++---------------- 1 file changed, 61 insertions(+), 61 deletions(-) diff --git a/docs/en/cloud/reference/byoc.md b/docs/en/cloud/reference/byoc.md index 5bbabe96919..d75a5e7bf24 100644 --- a/docs/en/cloud/reference/byoc.md +++ b/docs/en/cloud/reference/byoc.md @@ -5,9 +5,12 @@ sidebar_label: BYOC (Bring Your Own Cloud) keywords: [byoc, cloud, bring your own cloud] description: Deploy ClickHouse on your own cloud infrastructure --- +import BetaBadge from '@theme/badges/BetaBadge'; ## Overview + + BYOC (Bring Your Own Cloud) allows you to deploy ClickHouse Cloud on your own cloud infrastructure. This is useful if you have specific requirements or constraints that prevent you from using the ClickHouse Cloud managed service. **BYOC is currently in Beta. If you would like access, please contact [support](https://clickhouse.com/support/program).** Refer to our [Terms of Service](https://clickhouse.com/legal/agreements/terms-of-service) for additional information. @@ -21,7 +24,7 @@ BYOC is designed specifically for large-scale deployments. ## Glossary - **ClickHouse VPC:** The VPC owned by ClickHouse Cloud. -- **Customer BYOC VPC:** The VPC owned by the customer cloud account, provisioned and managed by ClickHouse Cloud and is dedicated for a ClickHouse Cloud BYOC deployment. +- **Customer BYOC VPC:** The VPC, owned by the customer’s cloud account, is provisioned and managed by ClickHouse Cloud and dedicated to a ClickHouse Cloud BYOC deployment. - **Customer VPC** Other VPCs owned by the customer cloud account used for applications that need to connect to the Customer BYOC VPC. ## Architecture @@ -44,51 +47,42 @@ During the Beta, initiate the onboarding process by reaching out to ClickHouse [ ### Prepare a Dedicated AWS Account -Customers need to prepare a dedicated AWS account to host ClickHouse BYOC deployment. This is for better isolation purpose. -With that and the email of the initial organization admin user, you can reach out to ClickHouse support. +Customers must prepare a dedicated AWS account for hosting the ClickHouse BYOC deployment to ensure better isolation. With this and the initial organization admin’s email, you can contact ClickHouse support. ### Apply CloudFormation Template -BYOC setup is initialized through a [CloudFormation stack](https://s3.us-east-2.amazonaws.com/clickhouse-public-resources.clickhouse.cloud/cf-templates/byoc.yaml). This CloudFormation stack only creates a role -to allow BYOC controllers from ClickHouse Cloud to set up and manage infrastructure. -The S3, VPC, and compute resources used to run ClickHouse are not part of the CloudFormation stack. +BYOC setup is initialized via a [CloudFormation stack](https://s3.us-east-2.amazonaws.com/clickhouse-public-resources.clickhouse.cloud/cf-templates/byoc.yaml), which creates only a role allowing BYOC controllers from ClickHouse Cloud to manage infrastructure. The S3, VPC, and compute resources for running ClickHouse are not included in this stack. ### Setup BYOC Infrastructure -After the CloudFormation stack is created, you will be prompted to create the infrastructure, including S3, VPC and EKS cluster -from the cloud console. A few things you need to determine, as they cannot be changed after setup. +After creating the CloudFormation stack, you will be prompted to set up the infrastructure, including S3, VPC, and the EKS cluster, from the cloud console. Certain configurations must be determined at this stage, as they cannot be changed later. Specifically: - **The region you want to use**, you can choose one of any [public regions](clickhouse.com/docs/en/cloud/reference/supported-regions) we have for ClickHouse Cloud. -- **The VPC CIDR range for BYOC**, by default we use `10.0.0.0/16` for BYOC VPC CIDR range. You might want to use -VPC peering with your own VPC in another account, peering VPCs' CIDR range cannot overlap. Therefore you need -to allocate a proper VPC CIDR range for BYOC. We require a CIDR range with at least `/22` size to host necessary -workloads. -- **Availability Zones for BYOC VPC**, if you plan to use VPC peering later, align the same availability zones -between source account and BYOC account can help reduce cross-az traffic cost. Please note that, in AWS, -availability zone suffix (`a, b, c`) could represent different underlying physical zone id in different -account. See [AWS guide](https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/use-consistent-availability-zones-in-vpcs-across-different-aws-accounts.html) for more information. - - -### Optional, Setup VPC Peering +- **The VPC CIDR range for BYOC**: By default, we use `10.0.0.0/16` for the BYOC VPC CIDR range. If you plan to use VPC peering with another account, ensure the CIDR ranges do not overlap. Allocate a proper CIDR range for BYOC, with a minimum size of `/22` to accommodate necessary workloads. +- **Availability Zones for BYOC VPC**: If you plan to use VPC peering, aligning availability zones between the source and BYOC accounts can help reduce cross-AZ traffic costs. In AWS, availability zone suffixes (`a, b, c`) may represent different physical zone IDs across accounts. See the [AWS guide](https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/use-consistent-availability-zones-in-vpcs-across-different-aws-accounts.html) for details. -To create and delete VPC peering against ClickHouse BYOC, please submit ticket with the following information +### Optional: Setup VPC Peering -- ClickHouse BYOC name you want create VPC peering for. -- The VPC id (`vpc-xxxxxx`) to peer with BYOC VPC. -- CIDR range of this VPC. -- AWS account that owns this peering VPC. -- AWS region that this VPC belogns to. +To create or delete VPC peering for ClickHouse BYOC, submit a ticket with the following details: -Once support ticket is received and being processed, you need to do a few things in your AWS account to finish the peering setup: +- ClickHouse BYOC name for the VPC peering request. +- VPC ID (`vpc-xxxxxx`) to peer with the BYOC VPC. +- CIDR range of the VPC. +- AWS account owning the peering VPC. +- AWS region of the VPC. -1. You will receive a VPC peering request in the AWS account of the peered VPC and it needs to be accepted. Please navigate to **VPC -> Peering connections -> Actions -> Accept request**. +Once the support ticket is received and processed, you will need to complete a few steps in your AWS account to finalize the peering setup: +1. Accept the VPC peering request in the AWS account of the peered VPC. + - Navigate to **VPC -> Peering connections -> Actions -> Accept request**. -2. Adjust the route table for the peered VPCs. Find the subnet in the peered VPC that needs to connect to ClickHouse instance. Edit the route table of the subnet, add one route with the following configuration: -- Destination: ClickHouse BYOC VPC CIDR (e.g. 10.0.0.0/16) -- Target: Peering Connection, pcx-12345678 (The actual ID will pop up in the dropdown list) +2. Adjust the route table for the peered VPC: + - Locate the subnet in the peered VPC that needs to connect to the ClickHouse instance. + - Edit the subnet's route table and add a route with the following configuration: + - **Destination**: ClickHouse BYOC VPC CIDR (e.g., `10.0.0.0/16`) + - **Target**: Peering Connection (`pcx-12345678`, the actual ID will appear in the dropdown list)
@@ -100,64 +94,70 @@ Once support ticket is received and being processed, you need to do a few things
-3. Check existing security groups and make sure there is no rule blocking the access of the BYOC VPC. +3. Check existing security groups and ensure no rules block access to the BYOC VPC. The ClickHouse service should now be accessible from the peered VPC. -To access the ClickHouse service privately, a private load balancer and endpoint is provisioned for the user to connect privately from the user's peer VPC. The endpoint is similar to the public endpoint with a `-private` suffix. For example, -if the public endpoint is `h5ju65kv87.mhp0y4dmph.us-west-2.aws.byoc.clickhouse.cloud`, then the private endpoint will be `h5ju65kv87-private.mhp0y4dmph.us-west-2.aws.byoc.clickhouse.cloud`. +To access ClickHouse privately, a private load balancer and endpoint are provisioned for secure connectivity from the user's peered VPC. The private endpoint follows the public endpoint format with a `-private` suffix. For example: +- **Public endpoint**: `h5ju65kv87.mhp0y4dmph.us-west-2.aws.byoc.clickhouse.cloud` +- **Private endpoint**: `h5ju65kv87-private.mhp0y4dmph.us-west-2.aws.byoc.clickhouse.cloud` -4. Optional, after verifying peering is working, you can request to remove the public load balancer for ClickHouse BYOC. +4. (Optional) After verifying that peering is working, you can request the removal of the public load balancer for ClickHouse BYOC. ## Upgrade Process We regularly upgrade the software, including ClickHouse database version upgrades, ClickHouse Operator, EKS, and other components. -While we try to make upgrades as seamless as possible (e.g., rolling upgrades and restarts), certain upgrades, such as ClickHouse version changes and EKS node upgrades, might still impact service. In such cases, customers can specify a maintenance window (e.g., every Tuesday at 1:00 a.m. PDT). We ensure that such upgrades are only performed during the scheduled maintenance window. +While we aim for seamless upgrades (e.g., rolling upgrades and restarts), some, such as ClickHouse version changes and EKS node upgrades, may impact service. Customers can specify a maintenance window (e.g., every Tuesday at 1:00 a.m. PDT), ensuring such upgrades occur only during the scheduled time. -Note that the maintenance windows do not apply for security and vulnerability fixes. These will be handled as off-cycle upgrades, and we will communicate with customers promptly to take necessary actions and coordinate a suitable time for the upgrade to minimize the impact on operations. +:::note +Maintenance windows do not apply to security and vulnerability fixes. These are handled as off-cycle upgrades, with timely communication to coordinate a suitable time and minimize operational impact. +::: ## CloudFormation IAM Roles ### Bootstrap IAM role -The bootstrap IAM role has these permissions: +The bootstrap IAM role has the following permissions: -- EC2 and VPC operations are needed for setting up VPC and EKS clusters. -- S3 operations such as `s3:CreateBucket` are needed for setting up buckets for ClickHouse BYOC storage. -- `route53:*` is needed for external DNS to set up the records in route53. -- IAM related operations such as `iam:CreatePolicy` are needed for controllers to create additional roles. See the next section for details. -- eks:xx operation limited to resources that start with the `clickhouse-cloud` prefix. +- **EC2 and VPC operations**: Required for setting up VPC and EKS clusters. +- **S3 operations (e.g., `s3:CreateBucket`)**: Needed to create buckets for ClickHouse BYOC storage. +- **`route53:*` permissions**: Required for external DNS to configure records in Route 53. +- **IAM operations (e.g., `iam:CreatePolicy`)**: Needed for controllers to create additional roles (see the next section for details). +- **EKS operations**: Limited to resources with names starting with the `clickhouse-cloud` prefix. ### Additional IAM roles created by the controller -Besides the `ClickHouseManagementRole` created through CloudFormation, the controller will also create a few roles. - -These roles are meant to be assumed by applications running within the customer EKS cluster. -- **State exporter role** - - ClickHouse component to report service health information back to ClickHouse Cloud. - - Requires permission to write to SQS owned by ClickHouse Cloud -- **Load-balancer-controller** - - Standard AWS load balancer controller - - EBS CSI Controller, to manage volumes needed by ClickHouse services -- **External-dns**, to propagate the DNS config to route53 -- **Cert-manager** to provision TLS cert for BYOC services domains -- **Cluster autoscaler**, to scale the node group accordingly +In addition to the `ClickHouseManagementRole` created via CloudFormation, the controller will create several additional roles. + +These roles are assumed by applications running within the customer's EKS cluster: +- **State Exporter Role** + - ClickHouse component that reports service health information to ClickHouse Cloud. + - Requires permission to write to an SQS queue owned by ClickHouse Cloud. +- **Load-Balancer Controller** + - Standard AWS load balancer controller. + - EBS CSI Controller to manage volumes for ClickHouse services. +- **External-DNS** + - Propagates DNS configurations to Route 53. +- **Cert-Manager** + - Provisions TLS certificates for BYOC service domains. +- **Cluster Autoscaler** + - Adjusts the node group size as needed. **K8s-control-plane** and **k8s-worker** roles are meant to be assumed by AWS EKS services. -Lastly, **data-plane-mgmt** is to allow a ClickHouse Cloud Control Plane component to reconcile necessary custom resources such as `ClickHouseCluster` and the Istio Virtual Service/Gateway. +Lastly, **`data-plane-mgmt`** allows a ClickHouse Cloud Control Plane component to reconcile necessary custom resources, such as `ClickHouseCluster` and the Istio Virtual Service/Gateway. ## Network Boundaries -This section is focused on different network traffic to and from the customer BYOC VPC. +This section covers different network traffic to and from the customer BYOC VPC: -- **Inbound**: Traffic coming to the customer BYOC VPC. -- **Outbound**: Traffic originating from the customer BYOC VPC being sent to a destination outside that VPC -- **Public**: A network endpoint address available to the public internet -- **Private**: A network endpoint address only accessible privately, such as through VPC peering, VPC Private Link, and Tailscale +- **Inbound**: Traffic entering the customer BYOC VPC. +- **Outbound**: Traffic originating from the customer BYOC VPC and sent to an external destination. +- **Public**: A network endpoint accessible from the public internet. +- **Private**: A network endpoint accessible only through private connections, such as VPC peering, VPC Private Link, or Tailscale. -**Istio ingress is deployed behind an AWS NLB to accept ClickHouse client traffic** +**Istio ingress is deployed behind an AWS NLB to accept ClickHouse client traffic.** *Inbound, Public (can be Private)* From 17815eecb61b9f8e264340dc336adb5a769c65fd Mon Sep 17 00:00:00 2001 From: Dale McDiarmid Date: Thu, 30 Jan 2025 20:56:11 +0000 Subject: [PATCH 2/2] minor changes --- docs/en/cloud/reference/byoc.md | 71 ++++++++++++++++++--------------- 1 file changed, 38 insertions(+), 33 deletions(-) diff --git a/docs/en/cloud/reference/byoc.md b/docs/en/cloud/reference/byoc.md index d75a5e7bf24..aa9cafc26f7 100644 --- a/docs/en/cloud/reference/byoc.md +++ b/docs/en/cloud/reference/byoc.md @@ -161,29 +161,33 @@ This section covers different network traffic to and from the customer BYOC VPC: *Inbound, Public (can be Private)* -The Istio ingress gateway terminates TLS. The certificate is provisioned by CertManager with Let's Encrypt and is stored as a secret within the EKS cluster. Traffic between Istio and ClickHouse is [encrypted by AWS](https://docs.aws.amazon.com/whitepapers/latest/logical-separation/encrypting-data-at-rest-and--in-transit.html#:~:text=All%20network%20traffic%20between%20AWS,supported%20Amazon%20EC2%20instance%20types) as they are in the same VPC. -By default, ingress is available to the public internet with IP allow list filtering. The customer has the option to set up VPC peering to make it private and disable public connections. We highly recommend you configure an [IP filter](/en/cloud/security/setting-ip-filters) to restrict access. +The Istio ingress gateway terminates TLS. The certificate, provisioned by CertManager with Let's Encrypt, is stored as a secret within the EKS cluster. Traffic between Istio and ClickHouse is [encrypted by AWS](https://docs.aws.amazon.com/whitepapers/latest/logical-separation/encrypting-data-at-rest-and--in-transit.html#:~:text=All%20network%20traffic%20between%20AWS,supported%20Amazon%20EC2%20instance%20types) since they reside in the same VPC. -**Troubleshooting access** +By default, ingress is publicly accessible with IP allow list filtering. Customers can configure VPC peering to make it private and disable public connections. We highly recommend setting up an [IP filter](/en/cloud/security/setting-ip-filters) to restrict access. -ClickHouse Cloud engineers require troubleshooting access via Tailscale. They will be provisioned with just-in-time certificate-based authentication to BYOC deployments. +### Troubleshooting access *Inbound, Public (can be Private)* -**Billing scraper** +ClickHouse Cloud engineers require troubleshooting access via Tailscale. They are provisioned with just-in-time certificate-based authentication for BYOC deployments. + +### Billing scraper *Outbound, Private* -The Billing scraper gathers billing data from ClickHouse and sends it to an S3 bucket owned by ClickHouse Cloud. -The scraper is a component that acts as a sidecar next to the ClickHouse server container. It periodically scrapes CPU and memory metrics from ClickHouse. Requests to the same region will be done via VPC gateway service endpoints. +The Billing scraper collects billing data from ClickHouse and sends it to an S3 bucket owned by ClickHouse Cloud. + +It runs as a sidecar alongside the ClickHouse server container, periodically scraping CPU and memory metrics. Requests within the same region are routed through VPC gateway service endpoints. -**Alerts** +### Alerts *Outbound, Public* -AlertManager is configured to fire alerts to ClickHouse Cloud when the customer ClickHouse cluster is not healthy. Metrics and logs are stored within the customer's BYOC VPC. Logs are currently stored in locally in EBS. In a future update, logs will be stored in LogHouse, which is a ClickHouse service in the customer's BYOC VPC. Metrics are implemented via a Prometheus and Thanos stack stored locally in the customer's BYOC VPC. +AlertManager is configured to send alerts to ClickHouse Cloud when the customer's ClickHouse cluster is unhealthy. + +Metrics and logs are stored within the customer's BYOC VPC. Logs are currently stored locally in EBS. In a future update, they will be stored in LogHouse, a ClickHouse service within the BYOC VPC. Metrics use a Prometheus and Thanos stack, stored locally in the BYOC VPC. -**Service state** +### Service state *Outbound* @@ -193,22 +197,24 @@ State Exporter sends ClickHouse service state information to an SQS owned by Cli ### Supported features -- SharedMergeTree: ClickHouse Cloud and BYOC use the same binary and configuration -- Console access for managing service state - - Operations supported include start, stop and terminate - - View services and status -- Backup and restore -- Manual vertical and horizontal scaling -- Runtime security monitoring and alerting via Falco (falco-metrics) -- Zero Trust Network via Tailscale -- Monitoring: The Cloud console comes with built-in health dashboards to allow users to monitor service health -- Prometheus scraping for users choosing to monitor using a centralized dashboard. We support Prometheus, Grafana and Datadog today. Refer to the [Prometheus documentation](/en/integrations/prometheus) for detailed instructions on setup -- VPC Peering -- Integrations listed on [this page](/en/integrations) -- Secure S3 -- [AWS PrivateLink](https://aws.amazon.com/privatelink/) +- **SharedMergeTree**: ClickHouse Cloud and BYOC use the same binary and configuration. +- **Console access for managing service state**: + - Supports operations such as start, stop, and terminate. + - View services and status. +- **Backup and restore.** +- **Manual vertical and horizontal scaling.** +- **Runtime security monitoring and alerting via Falco (`falco-metrics`).** +- **Zero Trust Network via Tailscale.** +- **Monitoring**: + - The Cloud console includes built-in health dashboards for monitoring service health. + - Prometheus scraping for centralized monitoring with Prometheus, Grafana, and Datadog. See the [Prometheus documentation](/en/integrations/prometheus) for setup instructions. +- **VPC Peering.** +- **Integrations**: See the full list on [this page](/en/integrations). +- **Secure S3.** +- **[AWS PrivateLink](https://aws.amazon.com/privatelink/).** ### Planned features (currently unsupported) + - [AWS KMS](https://aws.amazon.com/kms/) aka CMEK (customer-managed encryption keys) - ClickPipes for ingest - Autoscaling @@ -219,15 +225,15 @@ State Exporter sends ClickHouse service state information to an SQS owned by Cli ### Compute -**Can I create multiple services in this single EKS cluster? ** +#### Can I create multiple services in this single EKS cluster? Yes. The infrastructure only needs to be provisioned once for every AWS account and region combination. -**Which regions do you support for BYOC?** +### Which regions do you support for BYOC? BYOC supports the same set of [regions](/en/cloud/reference/supported-regions#aws-regions ) as ClickHouse Cloud. -**Will there be some resource overhead? What are the resources needed to run services other than ClickHouse instances?** +#### Will there be some resource overhead? What are the resources needed to run services other than ClickHouse instances? Besides Clickhouse instances (ClickHouse servers and ClickHouse Keeper), we run services such as clickhouse-operator, aws-cluster-autoscaler, Istio etc. and our monitoring stack. @@ -235,27 +241,26 @@ Currently we have 3 m5.xlarge nodes (one for each AZ) in a dedicated node group ### Network and Security -**Can we revoke permissions set up during installation after setup is complete?** +#### Can we revoke permissions set up during installation after setup is complete? This is currently not possible. -**Have you considered some future security controls for ClickHouse engineers to access customer infra for troubleshooting?** +#### Have you considered some future security controls for ClickHouse engineers to access customer infra for troubleshooting? Yes. Implementing a customer controlled mechanism where customers can approve engineers' access to the cluster is on our roadmap. At the moment, engineers must go through our internal escalation process to gain just-in-time access to the cluster. This is logged and audited by our security team. -**What is the size of the VPC IP range created?** +#### What is the size of the VPC IP range created? By default we use `10.0.0.0/16` for BYOC VPC. We recommend reserving at least /22 for potential future scaling, but if you prefer to limit the size, it is possible to use /23 if it is likely that you will be limited to 30 server pods. -**Can I decide maintenance frequency?** +#### Can I decide maintenance frequency Contact support to schedule maintenance windows. Please expect a minimum of a weekly update schedule. ## Observability - ### Built-in Monitoring Tools #### Observability Dashboard @@ -324,7 +329,7 @@ ClickHouse_CustomMetric_TotalNumberOfErrors{hostname="c-jet-ax-16-server-43d5baj **Authentication** -A ClickHouse username and password pair can be used for authentication. We recommend creating a dedicated user with minimal permissions for scraping metrics. A minimum, a `READ` permission is required on the `system.custom_metrics` table across replicas. For example: +A ClickHouse username and password pair can be used for authentication. We recommend creating a dedicated user with minimal permissions for scraping metrics. At minimum, a `READ` permission is required on the `system.custom_metrics` table across replicas. For example: ```sql GRANT REMOTE ON *.* TO scraping_user