From f5dd948ab4550c232c76eed1bf94fb4d2d6337c3 Mon Sep 17 00:00:00 2001 From: Adrien Guillo Date: Thu, 6 Nov 2025 16:24:15 -0500 Subject: [PATCH 1/3] CLOUDPREM: Update sizing recommendations --- .../en/cloudprem/configure/cluster_sizing.md | 31 +++++++++---------- 1 file changed, 14 insertions(+), 17 deletions(-) diff --git a/content/en/cloudprem/configure/cluster_sizing.md b/content/en/cloudprem/configure/cluster_sizing.md index 17d05bf4f1ed1..33d8e474f449e 100644 --- a/content/en/cloudprem/configure/cluster_sizing.md +++ b/content/en/cloudprem/configure/cluster_sizing.md @@ -26,33 +26,30 @@ These are starting recommendations. Monitor your cluster's performance and resou ## Indexers -- **Performance:** To index 5 MB/s of logs, CloudPrem needs approximately 1 vCPU and 2 GB of RAM. -- **Recommended Pod Sizes:** Datadog recommends that you deploy indexer pods with either: - - 2 vCPUs and 4 GB of RAM - - 4 vCPUs and 8 GB of RAM - - 8 vCPUs and 16 GB of RAM -- **Storage:** Indexers require persistent storage (preferably SSDs, but local HDDs or remote EBS volumes can also be used) to store temporary data while constructing the index files. - - Minimum: 100 GB per pod - - Recommendation (for pods > 4 vCPUs): 200 GB per pod -- **Example Calculation:** To index 1 TB per day (~11.6 MB/s): - - Required vCPUs: `(11.6 MB/s / 5 MB/s/vCPU) ≈ 2.3 vCPUs` - - Rounding up, you might start with one indexer pod configured with 3 vCPUs and 6 GB RAM, requiring a 100 GB EBS volume. (Adjust this configuration based on observed performance and redundancy needs.) +- **Performance:** Indexing performance depends heavily on the characteristics of the ingest logs, such as their size, number of attributes, and level of nesting. However, we recommend using a baseline indexing throughput of **5 MB/s per vCPU** to determine your initial sizing. +- **Memory:** We recommend 4 GB of RAM per vCPU. +- **Recommended Pod Sizes:** Datadog recommends deploying indexer pods with at least 2 vCPUs and 8 GB of RAM. +- **Storage:** Indexers require at least 200 GB of persistent storage (preferably local SSDs, but local HDDs or network-attached block storage volumes such as Amazon EBS, or Azure Managed Disks can also be used) to store temporary data while creating and merging index files. In addition, each indexer vCPU writes on disk at a rate of approximately 20 MB/s. For Amazon EBS volumes, this is equivalent to 320 IOPS per vCPU (assuming 64 KB IOPS). +- **Example Calculation:** To index 1 TB of logs per day (~11.6 MB/s): + - Required vCPUs: `11.6 MB/s / 5 MB/s per vCPU ≈ 2.3 vCPUs` + - Required RAM: `2.3 vCPUs × 4 GB RAM ≈ 9 GB RAM` + - Adding some headroom, you could start with one indexer pod configured with 3 vCPUs, 12 GB RAM, and a 200 GB disk. Adjust these values based on observed performance and redundancy needs. ## Searchers -- **Performance:** Search performance depends heavily on the workload (query complexity, concurrency, data scanned). +- **Performance:** Search performance depends heavily on the workload (query complexity, concurrency, amount of data scanned). For instance, term queries (`status:error AND message:exception`) are usually computationally less expensive than aggregations. - **Rule of Thumb:** A general starting point is to provision roughly double the total number of vCPUs allocated to Indexers. - **Memory:** We recommend 4 GB of RAM per searcher vCPU. Provision more RAM if you expect many concurrent aggregation requests. ## Other services -The following components are typically lightweight: +We recommend allocating the following resources for these lightweight components: -- **Control Plane:** 1 vCPU, 2 GB RAM -- **Metastore:** 1 vCPU, 2 GB RAM -- **Janitor:** 1 vCPU, 2 GB RAM +- **Control Plane:** 2 vCPUs, 4 GB RAM, 1 replica +- **Metastore:** 2 vCPUs, 4 GB RAM, 2 replicas +- **Janitor:** 2 vCPUs, 4 GB RAM, 1 replica -## Postgres Metastore backend +## PostgreSQL Database - **Instance Size:** For most use cases, a PostgreSQL instance with 1 vCPU and 4 GB of RAM is sufficient - **AWS RDS Recommendation:** If using AWS RDS, the `t4g.medium` instance type is a suitable starting point From add78c06efa2230a3c863065b46b5ed0ba355b1b Mon Sep 17 00:00:00 2001 From: Adrien Guillo Date: Fri, 7 Nov 2025 15:36:26 -0500 Subject: [PATCH 2/3] Apply suggestions from code review Co-authored-by: Esther Kim --- content/en/cloudprem/configure/cluster_sizing.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/content/en/cloudprem/configure/cluster_sizing.md b/content/en/cloudprem/configure/cluster_sizing.md index 33d8e474f449e..dbd77e89a05dd 100644 --- a/content/en/cloudprem/configure/cluster_sizing.md +++ b/content/en/cloudprem/configure/cluster_sizing.md @@ -26,9 +26,9 @@ These are starting recommendations. Monitor your cluster's performance and resou ## Indexers -- **Performance:** Indexing performance depends heavily on the characteristics of the ingest logs, such as their size, number of attributes, and level of nesting. However, we recommend using a baseline indexing throughput of **5 MB/s per vCPU** to determine your initial sizing. -- **Memory:** We recommend 4 GB of RAM per vCPU. -- **Recommended Pod Sizes:** Datadog recommends deploying indexer pods with at least 2 vCPUs and 8 GB of RAM. +- **Performance:** Indexing performance depends heavily on the characteristics of the ingest logs, such as their size, number of attributes, and level of nesting. However, Datadog recommends using a baseline indexing throughput of **5 MB/s per vCPU** to determine your initial sizing. +- **Memory:** 4 GB of RAM per vCPU. +- **Recommended Pod Sizes:** Deploy indexer pods with at least 2 vCPUs and 8 GB of RAM. - **Storage:** Indexers require at least 200 GB of persistent storage (preferably local SSDs, but local HDDs or network-attached block storage volumes such as Amazon EBS, or Azure Managed Disks can also be used) to store temporary data while creating and merging index files. In addition, each indexer vCPU writes on disk at a rate of approximately 20 MB/s. For Amazon EBS volumes, this is equivalent to 320 IOPS per vCPU (assuming 64 KB IOPS). - **Example Calculation:** To index 1 TB of logs per day (~11.6 MB/s): - Required vCPUs: `11.6 MB/s / 5 MB/s per vCPU ≈ 2.3 vCPUs` @@ -43,13 +43,13 @@ These are starting recommendations. Monitor your cluster's performance and resou ## Other services -We recommend allocating the following resources for these lightweight components: +Allocate the following resources for these lightweight components: - **Control Plane:** 2 vCPUs, 4 GB RAM, 1 replica - **Metastore:** 2 vCPUs, 4 GB RAM, 2 replicas - **Janitor:** 2 vCPUs, 4 GB RAM, 1 replica -## PostgreSQL Database +## PostgreSQL database - **Instance Size:** For most use cases, a PostgreSQL instance with 1 vCPU and 4 GB of RAM is sufficient - **AWS RDS Recommendation:** If using AWS RDS, the `t4g.medium` instance type is a suitable starting point From 730d64935a794a9904c2f5b21abb053fda8d40ea Mon Sep 17 00:00:00 2001 From: Esther Kim Date: Mon, 10 Nov 2025 12:30:09 -0500 Subject: [PATCH 3/3] Reformat cluster sizing recommendations (#32703) * Reformat recommendations into tables * Apply suggestions from code review Co-authored-by: Adrien Guillo --------- Co-authored-by: Adrien Guillo --- .../en/cloudprem/configure/cluster_sizing.md | 50 +++++++++++++------ 1 file changed, 34 insertions(+), 16 deletions(-) diff --git a/content/en/cloudprem/configure/cluster_sizing.md b/content/en/cloudprem/configure/cluster_sizing.md index dbd77e89a05dd..7590da03c6c3b 100644 --- a/content/en/cloudprem/configure/cluster_sizing.md +++ b/content/en/cloudprem/configure/cluster_sizing.md @@ -18,36 +18,54 @@ further_reading: ## Overview -This document gives recommendations on dimensioning your CloudPrem cluster components, particularly indexers and searchers. +Proper cluster sizing ensures optimal performance, cost efficiency, and reliability for your CloudPrem deployment. Your sizing requirements depend on several factors including log ingestion volume, query patterns, and the complexity of your log data. -
-These are starting recommendations. Monitor your cluster's performance and resource utilization closely and adjust sizing as needed. +This guide provides baseline recommendations for dimensioning your CloudPrem cluster components—indexers, searchers, supporting services, and the PostgreSQL database. + +
+Use your expected daily log volume and peak ingestion rates as starting points, then monitor your cluster's performance and adjust sizing as needed.
## Indexers -- **Performance:** Indexing performance depends heavily on the characteristics of the ingest logs, such as their size, number of attributes, and level of nesting. However, Datadog recommends using a baseline indexing throughput of **5 MB/s per vCPU** to determine your initial sizing. -- **Memory:** 4 GB of RAM per vCPU. -- **Recommended Pod Sizes:** Deploy indexer pods with at least 2 vCPUs and 8 GB of RAM. -- **Storage:** Indexers require at least 200 GB of persistent storage (preferably local SSDs, but local HDDs or network-attached block storage volumes such as Amazon EBS, or Azure Managed Disks can also be used) to store temporary data while creating and merging index files. In addition, each indexer vCPU writes on disk at a rate of approximately 20 MB/s. For Amazon EBS volumes, this is equivalent to 320 IOPS per vCPU (assuming 64 KB IOPS). -- **Example Calculation:** To index 1 TB of logs per day (~11.6 MB/s): - - Required vCPUs: `11.6 MB/s / 5 MB/s per vCPU ≈ 2.3 vCPUs` - - Required RAM: `2.3 vCPUs × 4 GB RAM ≈ 9 GB RAM` - - Adding some headroom, you could start with one indexer pod configured with 3 vCPUs, 12 GB RAM, and a 200 GB disk. Adjust these values based on observed performance and redundancy needs. +Indexers receive logs from Datadog Agents, then process, index, and store them as index files (called _splits_) in object storage. Proper sizing is critical for maintaining ingestion throughput and ensuring your cluster can handle your log volume. + +| Specification | Recommendation | Notes | +|---------------|----------------|-------| +| **Performance** | 5 MB/s per vCPU | Baseline throughput to determine initial sizing. Actual performance depends on log characteristics (size, number of attributes, nesting level) | +| **Memory** | 4 GB RAM per vCPU | | +| **Minimum Pod Size** | 2 vCPUs, 8 GB RAM | Recommended minimum for indexer pods | +| **Storage Capacity** | At least 200 GB | Required for temporary data while creating and merging index files | +| **Storage Type** | Local SSDs (preferred) | Local HDDs or network-attached block storage (Amazon EBS, Azure Managed Disks) can also be used | +| **Disk I/O** | ~20 MB/s per vCPU | Equivalent to 320 IOPS per vCPU for Amazon EBS (assuming 64 KB IOPS) | + + +{{% collapse-content title="Example: Sizing for 1 TB of logs per day" level="h4" expanded=false %}} +To index 1 TB of logs per day (~11.6 MB/s), follow these steps: + +1. **Calculate vCPUs:** `11.6 MB/s ÷ 5 MB/s per vCPU ≈ 2.3 vCPUs` +2. **Calculate RAM:** `2.3 vCPUs × 4 GB RAM ≈ 9 GB RAM` +3. **Add headroom:** Start with one indexer pod configured with **3 vCPUs, 12 GB RAM, and a 200 GB disk**. Adjust these values based on observed performance and redundancy needs. +{{% /collapse-content %}} ## Searchers +Searchers handle search queries from the Datadog UI, reading metadata from the Metastore and fetching data from object storage. + +A general starting point is to provision roughly double the total number of vCPUs allocated to Indexers. + - **Performance:** Search performance depends heavily on the workload (query complexity, concurrency, amount of data scanned). For instance, term queries (`status:error AND message:exception`) are usually computationally less expensive than aggregations. -- **Rule of Thumb:** A general starting point is to provision roughly double the total number of vCPUs allocated to Indexers. -- **Memory:** We recommend 4 GB of RAM per searcher vCPU. Provision more RAM if you expect many concurrent aggregation requests. +- **Memory:** 4 GB of RAM per searcher vCPU. Provision more RAM if you expect many concurrent aggregation requests. ## Other services Allocate the following resources for these lightweight components: -- **Control Plane:** 2 vCPUs, 4 GB RAM, 1 replica -- **Metastore:** 2 vCPUs, 4 GB RAM, 2 replicas -- **Janitor:** 2 vCPUs, 4 GB RAM, 1 replica +| Service | vCPUs | RAM | Replicas | +|---------|-------|-----|----------| +| **Control Plane** | 2 | 4 GB | 1 | +| **Metastore** | 2 | 4 GB | 2 | +| **Janitor** | 2 | 4 GB | 1 | ## PostgreSQL database