Skip to content

Commit

Permalink
Deploy to GitHub pages
Browse files Browse the repository at this point in the history
  • Loading branch information
ci committed Sep 17, 2021
1 parent 8c7c944 commit 8d8c49a
Show file tree
Hide file tree
Showing 6 changed files with 11 additions and 9 deletions.
5 changes: 3 additions & 2 deletions docs/index.xml
Expand Up @@ -5987,7 +5987,8 @@ As new blocks with old data appear on the store as a result of conversion, they
</ul>
<hr>
<h2 id="introduction">Introduction</h2>
<p>As a part of pushing Cortex’s scaling capability at AWS, we have done performance testing with Cortex and found the compactor to be one of the main limiting factors for higher active timeseries limit per tenant. The documentation <a href="https://cortexmetrics.io/docs/blocks-storage/compactor/#how-compaction-works">Compactor</a> describes the responsibilities of a compactor, and this proposal focuses on the limitations of the current compactor architecture. In the current architecture, compactor has simple sharding, meaning that a single tenant is sharded to a single compactor. In addition, a compactor handles compaction groups of a single tenant iteratively, meaning that blocks belonging non-overlapping times are not compacted in parallel.</p>
<p>As a part of pushing Cortex’s scaling capability at AWS, we have done performance testing with Cortex and found the compactor to be one of the main limiting factors for higher active timeseries limit per tenant. The documentation <a href="https://cortexmetrics.io/docs/blocks-storage/compactor/#how-compaction-works">Compactor</a> describes the responsibilities of a compactor, and this proposal focuses on the limitations of the current compactor architecture. In the current architecture, compactor has simple sharding, meaning that a single tenant is sharded to a single compactor. The compactor generates compaction groups, which are groups of Prometheus TSDB blocks that can be compacted together, independently of another group. However, a compactor currnetly handles compaction groups of a single tenant iteratively, meaning that blocks belonging non-overlapping times are not compacted in parallel.</p>
<p>Cortex ingesters are responsible for uploading TSDB blocks with data emitted by a tenant. These blocks are considered as level-1 blocks, as they contain duplicate timeseries for the same time interval, depending on the replication factor. <a href="https://cortexmetrics.io/docs/blocks-storage/compactor/#how-compaction-works">Vertical compaction</a> is done to merge all the blocks with the same time interval and deduplicate the samples. These merged blocks are level-2 blocks. Subsequent compactions such as horizontal compaction can happen, further increasing the compaction level of the blocks.</p>
<h3 id="problem-and-requirements">Problem and Requirements</h3>
<p>Currently, a compactor is able to compact up to 20M timeseries within 2 hours for a level-2 compaction, including the time to download blocks, compact, and upload the newly compacted block. We would like to increase the timeseries limit per tenant, and compaction is one of the limiting factors. In addition, we would like to achieve the following:</p>
<ul>
Expand All @@ -6006,7 +6007,7 @@ As new blocks with old data appear on the store as a result of conversion, they
<p><img src="/images/proposals/parallel-compaction-without-scheduler.png" alt="Parallel Compaction Without Scheduler"></p>
<h2 id="scenarios">Scenarios</h2>
<h3 id="bad-block-resulting-in-non-ideal-compaction-groups">Bad block resulting in non-ideal compaction groups</h3>
<p>A Cortex operator configures the compaction block range. Using 2h and 6h as example, [2h-1] [2h-2] [2h-3] [2h-4] [2h-5] [2h-6]. If the [2h-1] block is corrupted, we may compact the subsequent [2h-2] [2h-3] [2h-4] [2h-5] [2h-6] blocks. To compact into a 6 hour group, the ideal compaction is [2h-1] [2h-2] [2h-3] and [2h-4] [2h-5] [2h-6]. The cortex planner needs to know the ideal compaction interval, and prevent compaction of [2h-2] [2h-3] [2h-4] from happening, which will result in [2h-1] not able to be compacted into longer time interval blocks. Cortex has full information regarding all the available blocks, so we should utilize this information to achieve the best compaction interval.</p>
<p>A Cortex operator configures the compaction block range as 2h and 6h. If a full 6-hour block cannot be compacted due to compaction failures, the compactor should not split up the group into subgroups, as this may cause suboptimal grouping of block. Cortex has full information regarding all the available blocks, so we should utilize this information to achieve the best compaction group possible.</p>
<h2 id="alternatives">Alternatives</h2>
<h3 id="shard-compaction-jobs-amongst-compactors-with-a-scheduler">Shard compaction jobs amongst compactors with a scheduler</h3>
<p><img src="/images/proposals/parallel-compaction-design.png" alt="Parallel Compaction Architecture"></p>
Expand Down
5 changes: 3 additions & 2 deletions docs/proposals/index.xml
Expand Up @@ -632,7 +632,8 @@ As new blocks with old data appear on the store as a result of conversion, they
</ul>
<hr>
<h2 id="introduction">Introduction</h2>
<p>As a part of pushing Cortex’s scaling capability at AWS, we have done performance testing with Cortex and found the compactor to be one of the main limiting factors for higher active timeseries limit per tenant. The documentation <a href="https://cortexmetrics.io/docs/blocks-storage/compactor/#how-compaction-works">Compactor</a> describes the responsibilities of a compactor, and this proposal focuses on the limitations of the current compactor architecture. In the current architecture, compactor has simple sharding, meaning that a single tenant is sharded to a single compactor. In addition, a compactor handles compaction groups of a single tenant iteratively, meaning that blocks belonging non-overlapping times are not compacted in parallel.</p>
<p>As a part of pushing Cortex’s scaling capability at AWS, we have done performance testing with Cortex and found the compactor to be one of the main limiting factors for higher active timeseries limit per tenant. The documentation <a href="https://cortexmetrics.io/docs/blocks-storage/compactor/#how-compaction-works">Compactor</a> describes the responsibilities of a compactor, and this proposal focuses on the limitations of the current compactor architecture. In the current architecture, compactor has simple sharding, meaning that a single tenant is sharded to a single compactor. The compactor generates compaction groups, which are groups of Prometheus TSDB blocks that can be compacted together, independently of another group. However, a compactor currnetly handles compaction groups of a single tenant iteratively, meaning that blocks belonging non-overlapping times are not compacted in parallel.</p>
<p>Cortex ingesters are responsible for uploading TSDB blocks with data emitted by a tenant. These blocks are considered as level-1 blocks, as they contain duplicate timeseries for the same time interval, depending on the replication factor. <a href="https://cortexmetrics.io/docs/blocks-storage/compactor/#how-compaction-works">Vertical compaction</a> is done to merge all the blocks with the same time interval and deduplicate the samples. These merged blocks are level-2 blocks. Subsequent compactions such as horizontal compaction can happen, further increasing the compaction level of the blocks.</p>
<h3 id="problem-and-requirements">Problem and Requirements</h3>
<p>Currently, a compactor is able to compact up to 20M timeseries within 2 hours for a level-2 compaction, including the time to download blocks, compact, and upload the newly compacted block. We would like to increase the timeseries limit per tenant, and compaction is one of the limiting factors. In addition, we would like to achieve the following:</p>
<ul>
Expand All @@ -651,7 +652,7 @@ As new blocks with old data appear on the store as a result of conversion, they
<p><img src="/images/proposals/parallel-compaction-without-scheduler.png" alt="Parallel Compaction Without Scheduler"></p>
<h2 id="scenarios">Scenarios</h2>
<h3 id="bad-block-resulting-in-non-ideal-compaction-groups">Bad block resulting in non-ideal compaction groups</h3>
<p>A Cortex operator configures the compaction block range. Using 2h and 6h as example, [2h-1] [2h-2] [2h-3] [2h-4] [2h-5] [2h-6]. If the [2h-1] block is corrupted, we may compact the subsequent [2h-2] [2h-3] [2h-4] [2h-5] [2h-6] blocks. To compact into a 6 hour group, the ideal compaction is [2h-1] [2h-2] [2h-3] and [2h-4] [2h-5] [2h-6]. The cortex planner needs to know the ideal compaction interval, and prevent compaction of [2h-2] [2h-3] [2h-4] from happening, which will result in [2h-1] not able to be compacted into longer time interval blocks. Cortex has full information regarding all the available blocks, so we should utilize this information to achieve the best compaction interval.</p>
<p>A Cortex operator configures the compaction block range as 2h and 6h. If a full 6-hour block cannot be compacted due to compaction failures, the compactor should not split up the group into subgroups, as this may cause suboptimal grouping of block. Cortex has full information regarding all the available blocks, so we should utilize this information to achieve the best compaction group possible.</p>
<h2 id="alternatives">Alternatives</h2>
<h3 id="shard-compaction-jobs-amongst-compactors-with-a-scheduler">Shard compaction jobs amongst compactors with a scheduler</h3>
<p><img src="/images/proposals/parallel-compaction-design.png" alt="Parallel Compaction Architecture"></p>
Expand Down

0 comments on commit 8d8c49a

Please sign in to comment.