feature/add resources to vmcluster #556

klinch0 · 2025-01-02T12:30:35Z

Summary by CodeRabbit

Release Notes

Version Updates
- Tenant application version bumped from 1.6.5 to 1.6.6
- Monitoring application version updated from 1.5.3 to 1.5.4
Monitoring Configuration
- Adjusted metrics storage deduplication interval: shortterm from 5 minutes to 15 seconds, longterm from 15 seconds to 5 minutes
- Updated resource configurations for VM components, including new resource specifications for vminsert, vmselect, and vmstorage
- Increased memory limits and requests for VMAgent from 500Mi to 1024Mi and from 200Mi to 768Mi, respectively
Performance Improvements
- Enhanced resource allocation for monitoring services
- More flexible configuration options for metrics storage

coderabbitai · 2025-01-02T12:30:43Z

Walkthrough

This pull request involves version updates and configuration modifications across multiple monitoring-related files. The changes primarily focus on updating version numbers for the tenant and monitoring applications, adjusting resource configurations for various monitoring components, and fine-tuning metrics storage settings. The modifications span several packages, including tenant, monitoring, and system monitoring agents, with updates to Chart.yaml files, template configurations, and version mappings.

Changes

File	Change Summary
`packages/apps/tenant/Chart.yaml`	Version bumped from `1.6.5` to `1.6.6`
`packages/apps/tenant/templates/monitoring.yaml`	Updated `deduplicationInterval` to `"15s"` for `shortterm`, to `"5m"` for `longterm`, and added empty `resources` for `vminsert`, `vmselect`, and `vmstorage` in both metrics storages
`packages/apps/versions_map`	Updated `tenant` version from `1.6.5 HEAD` to `1.6.5 f1e11451`, added new `1.6.6 HEAD` version
`packages/extra/monitoring/Chart.yaml`	Version bumped from `1.5.3` to `1.5.4`
`packages/extra/monitoring/templates/vm/vmcluster.yaml`	Dynamic references added for `retentionPeriod` and `dedup.minScrapeInterval`, increased `vmstorage` memory limit to `2048Mi`
`packages/extra/monitoring/values.yaml`	Updated `deduplicationInterval` to `"15s"` for `shortterm`, to `"5m"` for `longterm`, and added resource configurations for metrics storage components
`packages/extra/versions_map`	Updated `monitoring` version from `1.5.3 HEAD` to `1.5.3 c1ca19dc`, added new `1.5.4 HEAD` version
`packages/system/monitoring-agents/templates/vmagent.yaml`	Increased memory limit to `1024Mi` and memory request to `768Mi`

Suggested Labels

size:XS, lgtm

Suggested Reviewers

kvaps

Poem

🐰 Metrics dance, versions leap,
Configurations rise from their sleep,
Monitoring agents stretch and grow,
With resources set just so,
A rabbit's tune of code's delight! 🚀

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f4b3d7e and cf53c68.

📒 Files selected for processing (3)

packages/apps/tenant/templates/monitoring.yaml (1 hunks)
packages/extra/monitoring/templates/vm/vmcluster.yaml (2 hunks)
packages/extra/monitoring/values.yaml (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

packages/extra/monitoring/templates/vm/vmcluster.yaml
packages/apps/tenant/templates/monitoring.yaml

🔇 Additional comments (4)

packages/extra/monitoring/values.yaml (4)

8-36: LGTM! Well-documented example configuration.

The example configuration provides clear guidance with appropriate resource specifications and retention settings.

43-48: Resource specifications should be implemented as suggested in the example.

Empty resource specifications (resources: {}) could lead to resource contention and performance issues. Consider implementing the example values which are well-balanced for monitoring components:

vminsert: 512Mi-1024Mi memory, 200m CPU

vmselect: 1Gi-2Gi memory, 300m CPU

vmstorage: 2Gi-4Gi memory, 500m CPU

54-59: Resource specifications should be implemented as suggested in the example.

Empty resource specifications (resources: {}) could lead to resource contention and performance issues. Consider implementing the example values which are well-balanced for monitoring components:

vminsert: 512Mi-1024Mi memory, 200m CPU

vmselect: 1Gi-2Gi memory, 300m CPU

vmstorage: 2Gi-4Gi memory, 500m CPU

40-40: Verify the impact of different deduplication intervals.

The deduplication intervals are significantly different between shortterm (15s) and longterm (5m) storages. Let's verify if this configuration aligns with the expected use cases and won't cause any data inconsistencies.

Also applies to: 51-51

✅ Verification successful

Based on the search results, I can provide a conclusive response about the deduplication intervals configuration:

The different deduplication intervals are intentional and align with the storage retention periods

The configuration shows a well-designed pattern where:

Shortterm storage (3d retention):

Uses 15s deduplication interval

Optimized for recent, high-precision data collection

Longterm storage (14d retention):

Uses 5m deduplication interval

Appropriate for historical data where lower granularity is acceptable

This setup follows a common time-series data management practice where:

Recent data is kept at high resolution for detailed analysis

Older data is stored at a lower resolution to optimize storage while maintaining useful historical trends

The configuration is consistent across both the direct values file and tenant templates, indicating this is a deliberate architectural decision rather than a misconfiguration.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for any documentation or related configurations about deduplication intervals
rg -A 5 'deduplicationInterval|retention' --type yaml

# Search for any existing metrics configurations that might be affected
rg -A 5 'metrics.*storage' --type yaml
Length of output: 41857

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

packages/system/monitoring-agents/templates/vmagent.yaml (1)
21-24: Memory allocation looks good, but consider adding CPU limits.

The memory configuration follows good practices with:

Reasonable request (768Mi) to limit (1024Mi) ratio

Sufficient headroom for spikes

However, consider adding CPU limits to prevent potential resource contention:
  resources:
    limits:
      memory: 1024Mi
+     cpu: 1000m
    requests:
      cpu: 50m
      memory: 768Mi
packages/apps/tenant/templates/monitoring.yaml (1)

31-36: Consider defining explicit resource limits.

Empty resource blocks (resources: {}) for vminsert, vmselect, and vmstorage could lead to unbounded resource usage. While default values are provided in vmcluster.yaml, consider defining explicit limits here for better resource control and documentation.

Also applies to: 41-46

packages/extra/monitoring/templates/vm/vmcluster.yaml (1)

51-59: Increased vmstorage memory limit is appropriate.

The memory limit increase to 2048Mi for vmstorage is justified given:

Reduced deduplication interval (more frequent writes)

Need to handle data replication (replicationFactor: 2)

Storage requirements for retention period

Consider documenting these resource requirements in the README.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 26388c7 and 71a9544.

📒 Files selected for processing (8)

packages/apps/tenant/Chart.yaml (1 hunks)
packages/apps/tenant/templates/monitoring.yaml (1 hunks)
packages/apps/versions_map (1 hunks)
packages/extra/monitoring/Chart.yaml (1 hunks)
packages/extra/monitoring/templates/vm/vmcluster.yaml (2 hunks)
packages/extra/monitoring/values.yaml (1 hunks)
packages/extra/versions_map (1 hunks)
packages/system/monitoring-agents/templates/vmagent.yaml (1 hunks)

✅ Files skipped from review due to trivial changes (2)

packages/apps/tenant/Chart.yaml
packages/extra/monitoring/Chart.yaml

🔇 Additional comments (6)

packages/extra/versions_map (1)

20-21: Consider pinning all versions to specific commit references.

Notably, you pinned version 1.5.3 but left 1.5.4 at HEAD. Having a definitive commit reference can improve build reproducibility and avoid potential issues should HEAD shift or introduce unexpected changes.

packages/apps/versions_map (1)

98-99: Ensure consistency with version references.

Line 98 updates tenant 1.6.5 to a specific commit, while line 99 introduces a HEAD reference for tenant 1.6.6. If 1.6.6 is intended to be released soon, consider pinning against a specific commit to avoid disruptions in downstream builds and to maintain traceability.

packages/apps/tenant/templates/monitoring.yaml (1)

29-29: Verify impact of reduced deduplication interval.

The deduplication interval has been significantly reduced from 5m to 15s. While this provides more granular data, it could impact storage and processing requirements.

Run the following to check historical metrics and estimate the impact:

packages/extra/monitoring/templates/vm/vmcluster.yaml (2)

12-21: Resource configuration for vminsert looks good.

The default resource limits and conditional logic are well-structured:

Reasonable memory limits and requests

Good error handling for undefined resources

Proper template indentation

9-9: Dynamic configuration looks good.

The templating of retentionPeriod and dedup.minScrapeInterval provides good flexibility while maintaining consistency with the tenant configuration.

Also applies to: 37-37
packages/extra/monitoring/values.yaml (1)

40-40: Verify impact of reduced deduplication interval.

The deduplication interval has been significantly reduced from "5m" to "15s". While this provides more granular data and better real-time visibility, it could lead to:

Increased storage usage

Higher processing overhead

Potential impact on system performance

Let's check for any related configurations or discussions:
✅ Verification successful

Let me analyze the search results to verify the impact of the deduplication interval change.

The search results show that the deduplication interval of "15s" is consistently set across multiple files:

packages/extra/monitoring/values.yaml

packages/apps/tenant/templates/monitoring.yaml

Additionally, there's a reference in vmcluster.yaml that uses this value:
packages/extra/monitoring/templates/vm/vmcluster.yaml:      dedup.minScrapeInterval: {{ .deduplicationInterval }}
The storage-related search didn't reveal any direct connection between the deduplication interval and storage configurations in the monitoring components. Let me search for more context about the deduplication configuration.

Based on the search results, I can now provide a comprehensive assessment of the deduplication interval change.

Deduplication interval change is aligned with retention periods

The search results reveal important context about the deduplication interval setting:

The 15s deduplication interval is consistently applied across both short-term (3d) and long-term (14d) metrics storage:
- name: shortterm
  retentionPeriod: "3d"
  deduplicationInterval: "15s"
- name: longterm
  retentionPeriod: "14d"
  deduplicationInterval: "15s"
The setting is used in VictoriaMetrics cluster configuration as dedup.minScrapeInterval

The configuration is properly synchronized across monitoring configurations in both the extra and tenant packages.

While the change from 5m to 15s does increase granularity, the impact is mitigated by:

The relatively short retention periods (3d and 14d)

The consistent application across storage tiers

VictoriaMetrics' efficient deduplication mechanism
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for related deduplication configurations
rg "deduplicationInterval" --type yaml

# Search for storage-related configurations
rg "storage:" --type yaml -A 3
Length of output: 70230

Script:
#!/bin/bash
# Search for deduplication-related configuration and documentation
rg -A 3 "deduplication" --type yaml

# Search for any comments or documentation about deduplication settings
rg -B 3 "deduplicationInterval" --type yaml
Length of output: 6991

coderabbitai · 2025-01-02T12:34:02Z

packages/extra/monitoring/values.yaml

+## Example:
+## metricsStorages:
+## - name: shortterm
+##   retentionPeriod: "3d"
+##   deduplicationInterval: "15s"
+##   storage: 10Gi
+##   storageClassName: ""
+##   vminsert:
+##     resources:
+##       limits:
+##         memory: 1024Mi
+##       requests:
+##         cpu: 200m
+##         memory: 512Mi
+##   vmselect:
+##     resources:
+##       limits:
+##         memory: 2048Mi
+##       requests:
+##         cpu: 300m
+##         memory: 1Gi
+##   vmstorage:
+##     resources:
+##       limits:
+##         memory: 4096Mi
+##       requests:
+##         cpu: 500m
+##         memory: 2Gi
+##


💡 Codebase verification

🛠️ Refactor suggestion

Let me analyze the search results. The output shows resource specifications across various components in the repository, with particular focus on the Victoria Metrics operator and related components. Looking at the actual configuration in the review comment, I can see that the example values are well-balanced and align with typical resource requirements for monitoring components.

Resource specifications should be implemented as suggested in the example

The example configuration provides appropriate resource limits and requests that are well-balanced for a monitoring stack:

vminsert: 512Mi-1024Mi memory, 200m CPU

vmselect: 1Gi-2Gi memory, 300m CPU

vmstorage: 2Gi-4Gi memory, 500m CPU

These values are reasonable because:

They follow a graduated increase in resources based on component responsibilities

They align with the resource patterns seen in the victoria-metrics-operator deployment

Empty resource specifications (resources: {}) could lead to resource contention and performance issues

🔗 Analysis chain

Consider implementing the example resource specifications.

The example configuration provides well-balanced resource limits and requests for each component. However, the actual configuration has empty resource specifications (resources: {}). This could lead to uncontrolled resource usage and potential performance issues.

Consider implementing resource specifications similar to the example:

vminsert: 512Mi-1024Mi memory, 200m CPU

vmselect: 1Gi-2Gi memory, 300m CPU

vmstorage: 2Gi-4Gi memory, 500m CPU

Let's verify the current resource usage to help determine appropriate values:

Also applies to: 43-48, 54-59

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Search for any existing resource specifications or related configurations rg -A 5 'resources:' --type yaml

Length of output: 84789

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

packages/extra/monitoring/templates/vm/vmcluster.yaml (4)
12-21: LGTM! Consider documenting resource requirements.

Good implementation of default resource limits with the ability to override. The defaults look reasonable for VMInsert.

Consider adding a comment explaining the minimum resource requirements and factors affecting resource consumption (e.g., number of active time series, ingestion rate).

25-33: Consider adjusting VMSelect resource defaults.

While the configuration structure is good, VMSelect might benefit from different resource defaults than VMInsert due to its query-focused workload.

Consider adjusting the defaults based on your query patterns:
       {{- if empty .vmselect.resources }}
       limits:
-        memory: 1000Mi
+        memory: 1500Mi
       requests:
         cpu: 100m
-        memory: 500Mi
+        memory: 750Mi
51-59: Verify VMStorage memory requirements.

The increased memory limit (2048Mi) for VMStorage is good, but ensure it's sufficient for your data volume and retention period.

Consider the following factors for VMStorage sizing:

Data ingestion rate

Retention period (now configurable)

Number of active time series

Query patterns

You might need to adjust these limits based on monitoring metrics from production usage.

Line range hint 1-70: Consider adding monitoring guidelines.

The configuration structure is well-designed with good defaults and override capabilities. To help users properly configure the cluster:

Consider adding documentation that covers:

Recommended monitoring metrics for each component

Scaling guidelines based on metrics

Resource calculation formulas based on:

Number of active time series

Data retention period

Query patterns

High availability requirements

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 71a9544 and f4b3d7e.

📒 Files selected for processing (1)

packages/extra/monitoring/templates/vm/vmcluster.yaml (2 hunks)

🔇 Additional comments (2)

packages/extra/monitoring/templates/vm/vmcluster.yaml (2)

9-9: LGTM! Verify retention period configuration.

Good change to make retention period configurable via template variable.

Let's verify the retention period configuration:

✅ Verification successful

Retention period configuration is properly defined in values files

The retention period variable is properly defined in both values.yaml and monitoring.yaml files with appropriate values:

packages/extra/monitoring/values.yaml: Defined with values "3d", "14d", and "1"

packages/apps/tenant/templates/monitoring.yaml: Defined with values "3d" and "14d"
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Check if retention period is defined in values files
rg -A 5 "retentionPeriod:" packages/extra/monitoring/values.yaml packages/apps/tenant/templates/monitoring.yaml
Length of output: 2432
37-37: Verify deduplication interval impact.

Good change to make deduplication interval configurable, but we should verify its implications.

Let's check the configured values and usage:

✅ Verification successful

Deduplication interval configuration is properly defined and consistent

The verification shows that:

The deduplication interval is consistently set to "15s" across both values files

The variable .deduplicationInterval is properly defined in the values.yaml

It's used alongside other VM performance settings like search.maxUniqueTimeseries

The change to make it configurable through template variable maintains the same default value
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Check deduplication interval configuration
rg -A 5 "deduplicationInterval:" packages/extra/monitoring/values.yaml packages/apps/tenant/templates/monitoring.yaml

# Look for any related performance settings
rg "search.maxUniqueTimeseries|maxSamplesPerSeries" packages/extra/monitoring/
Length of output: 2170

xy2 · 2025-01-02T13:31:24Z

packages/extra/monitoring/templates/vm/vmcluster.yaml

 spec:
  replicationFactor: 2
-  retentionPeriod: "3"
+  retentionPeriod: {{ .retentionPeriod }}


| quote is mandatory here because otherwise someday we will encounter an issue with scientific notation

kvaps

LGTM

klinch0 added 2 commits January 2, 2025 13:16

up limit

5e9e919

add resources to vmcluster

71a9544

klinch0 requested a review from kvaps as a code owner January 2, 2025 12:30

dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jan 2, 2025

dosubot bot added the enhancement New feature or request label Jan 2, 2025

coderabbitai bot reviewed Jan 2, 2025

View reviewed changes

fix

f4b3d7e

coderabbitai bot reviewed Jan 2, 2025

View reviewed changes

xy2 suggested changes Jan 2, 2025

View reviewed changes

add quote

ba61c90

klinch0 force-pushed the feature/add-resources-to-vmcluster branch from ecd60c8 to ba61c90 Compare January 2, 2025 13:51

change deduplicationInterval

cf53c68

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Jan 3, 2025

kvaps approved these changes Jan 9, 2025

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Jan 9, 2025

kvaps merged commit d463479 into cozystack:main Jan 9, 2025
1 check passed

This was referenced Jan 17, 2025

Add hooks to update instanceType, instanceProfile, and storage #590

Merged

feature/fix-k8s-config-with-OIDC #594

Merged

coderabbitai bot mentioned this pull request Feb 13, 2025

feature/add-workloadmonitors-roles #626

Merged

coderabbitai bot mentioned this pull request Feb 24, 2025

feature/add-quota #644

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feature/add resources to vmcluster #556

feature/add resources to vmcluster #556

Uh oh!

klinch0 commented Jan 2, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 2, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 2, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

xy2 Jan 2, 2025

Uh oh!

kvaps left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

feature/add resources to vmcluster #556

feature/add resources to vmcluster #556

Uh oh!

Conversation

klinch0 commented Jan 2, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Suggested Labels

Suggested Reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 2, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

xy2 Jan 2, 2025

Choose a reason for hiding this comment

Uh oh!

kvaps left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

klinch0 commented Jan 2, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 2, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)