Skip to content

feat: add MALLOC_ARENA_MAX to limit the dcgm mem#182

Merged
ambermingxin merged 2 commits into
mainfrom
feat/add_mem_env_var
Apr 28, 2026
Merged

feat: add MALLOC_ARENA_MAX to limit the dcgm mem#182
ambermingxin merged 2 commits into
mainfrom
feat/add_mem_env_var

Conversation

@ambermingxin
Copy link
Copy Markdown
Collaborator

@ambermingxin ambermingxin commented Apr 27, 2026

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Summary by CodeRabbit

  • New Features

    • New environment variable MALLOC_ARENA_MAX (default: 4) is available for runtime configuration.
  • Documentation

    • Configuration docs updated to include MALLOC_ARENA_MAX and instructions for setting it on bare-metal and Kubernetes deployments.

Signed-off-by: Amber Xue <ambermingxin@nvidia.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0bf2b1c3-34fc-4872-bb60-6d4deb1fd2fe

📥 Commits

Reviewing files that changed from the base of the PR and between 92f17da and 10ba003.

📒 Files selected for processing (2)
  • deployments/helm/fleet-intelligence-agent/README.md
  • deployments/helm/fleet-intelligence-agent/values.yaml
✅ Files skipped from review due to trivial changes (2)
  • deployments/helm/fleet-intelligence-agent/README.md
  • deployments/helm/fleet-intelligence-agent/values.yaml

📝 Walkthrough

Walkthrough

A new MALLOC_ARENA_MAX=4 environment variable is added to runtime configuration and documented across systemd, Helm chart values, and docs; documentation shows purpose, default, and where to set it for bare-metal and Kubernetes deployments.

Changes

Cohort / File(s) Summary
Systemd env
deployments/packages/systemd/fleetint.env
Added MALLOC_ARENA_MAX=4 to the systemd environment file.
Helm chart
deployments/helm/fleet-intelligence-agent/values.yaml, deployments/helm/fleet-intelligence-agent/README.md
Added env.MALLOC_ARENA_MAX: 4 to Helm values and documented the variable and default.
Documentation
docs/configuration.md
Documented MALLOC_ARENA_MAX purpose, default (4), and how to set it for bare-metal (/etc/default/fleetint) and Kubernetes (env.MALLOC_ARENA_MAX).

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

🐰 I nudge the config, soft and spry,
Four little arenas under sky,
Memory tamed with gentle art,
A rabbit's patch, a tiny part. ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding MALLOC_ARENA_MAX environment variable to limit DCGM memory usage, which aligns with all modified files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/add_mem_env_var

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/configuration.md (1)

38-38: ⚠️ Potential issue | 🟡 Minor

Documentation statement is misleading for MALLOC_ARENA_MAX.

The section header states "These environment variables are read by fleetint run at startup," but MALLOC_ARENA_MAX is actually read by the glibc memory allocator when the process starts, not by the fleetint application code itself. The relevant code snippet confirms that all other FLEETINT_* and DCGM_* variables are explicitly parsed in cmd/fleetint/run.go, but MALLOC_ARENA_MAX is not.

Consider adding a note distinguishing between application-level variables (read by fleetint) and system-level variables (consumed by the runtime/allocator).

📋 Suggested clarification
 ## Configurable Environment Variables
 
-These environment variables are read by `fleetint run` at startup.
+These environment variables configure `fleetint run` behavior at startup. Most are read by the application; system-level variables like `MALLOC_ARENA_MAX` are consumed by the runtime allocator.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/configuration.md` at line 38, Update the documentation to clarify that
MALLOC_ARENA_MAX is a system/runtime environment variable consumed by the glibc
allocator at process start (not read by fleetint application code); add a short
note under the section that differentiates application-level variables (those
parsed by cmd/fleetint/run.go such as FLEETINT_* and DCGM_*) from
system-level/runtime variables like MALLOC_ARENA_MAX, and optionally link or
mention that MALLOC_ARENA_MAX is honored by the libc allocator rather than by
any function in fleetint.
🧹 Nitpick comments (2)
docs/configuration.md (1)

44-44: Consider noting that MALLOC_ARENA_MAX is glibc-specific.

The description correctly identifies this as a glibc parameter, but it may be helpful to explicitly note that this setting only applies to glibc-based systems and will have no effect on musl-based distributions (e.g., Alpine Linux).

📝 Suggested enhancement
-| `MALLOC_ARENA_MAX` | glibc arena cap to constrain RSS growth for DCGM/cgo-heavy workloads. | `4` | `/etc/default/fleetint` | `env.MALLOC_ARENA_MAX` |
+| `MALLOC_ARENA_MAX` | glibc arena cap to constrain RSS growth for DCGM/cgo-heavy workloads. Only effective on glibc-based systems. | `4` | `/etc/default/fleetint` | `env.MALLOC_ARENA_MAX` |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/configuration.md` at line 44, Update the table row for the
`MALLOC_ARENA_MAX` configuration entry to explicitly state that this is a
glibc-specific setting and will have no effect on musl-based distributions (for
example, Alpine Linux); edit the Description column text for the
`MALLOC_ARENA_MAX` row (the table entry containing `env.MALLOC_ARENA_MAX`) to
add a short parenthetical note such as "glibc-specific — has no effect on
musl-based distributions (e.g., Alpine Linux)" so readers know the platform
limitation.
deployments/packages/systemd/fleetint.env (1)

5-5: Consider adding an inline comment explaining MALLOC_ARENA_MAX.

Unlike other variables in this file, MALLOC_ARENA_MAX is a system-level glibc tuning parameter rather than an application configuration. Adding a brief inline comment would help operators understand its purpose and that it's specifically for controlling memory allocator behavior in cgo-heavy workloads.

📝 Suggested documentation addition
 DCGM_URL_IS_UNIX_SOCKET="false"
+# glibc arena limit to reduce RSS growth in DCGM/cgo workloads
 MALLOC_ARENA_MAX=4
 FLEETINT_COLLECT_INTERVAL="1m"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/packages/systemd/fleetint.env` at line 5, Add a short inline
comment next to the MALLOC_ARENA_MAX environment variable explaining that this
is a glibc memory allocator tuning parameter (not an application config), why it
is set (to limit per-thread malloc arenas and reduce memory bloat in cgo-heavy
workloads), and any operational guidance (e.g., safe default value and when to
adjust). Place the comment on the same line or the line above the
MALLOC_ARENA_MAX entry so operators scanning the file immediately see the
rationale for this system-level setting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deployments/packages/systemd/fleetint.env`:
- Line 5: Replace the environment variable setting MALLOC_ARENA_MAX=4 with
MALLOC_ARENA_MAX=2 in the deployment environment file; locate the line defining
MALLOC_ARENA_MAX and change its value from 4 to 2 so the service uses two malloc
arenas instead of four.

---

Outside diff comments:
In `@docs/configuration.md`:
- Line 38: Update the documentation to clarify that MALLOC_ARENA_MAX is a
system/runtime environment variable consumed by the glibc allocator at process
start (not read by fleetint application code); add a short note under the
section that differentiates application-level variables (those parsed by
cmd/fleetint/run.go such as FLEETINT_* and DCGM_*) from system-level/runtime
variables like MALLOC_ARENA_MAX, and optionally link or mention that
MALLOC_ARENA_MAX is honored by the libc allocator rather than by any function in
fleetint.

---

Nitpick comments:
In `@deployments/packages/systemd/fleetint.env`:
- Line 5: Add a short inline comment next to the MALLOC_ARENA_MAX environment
variable explaining that this is a glibc memory allocator tuning parameter (not
an application config), why it is set (to limit per-thread malloc arenas and
reduce memory bloat in cgo-heavy workloads), and any operational guidance (e.g.,
safe default value and when to adjust). Place the comment on the same line or
the line above the MALLOC_ARENA_MAX entry so operators scanning the file
immediately see the rationale for this system-level setting.

In `@docs/configuration.md`:
- Line 44: Update the table row for the `MALLOC_ARENA_MAX` configuration entry
to explicitly state that this is a glibc-specific setting and will have no
effect on musl-based distributions (for example, Alpine Linux); edit the
Description column text for the `MALLOC_ARENA_MAX` row (the table entry
containing `env.MALLOC_ARENA_MAX`) to add a short parenthetical note such as
"glibc-specific — has no effect on musl-based distributions (e.g., Alpine
Linux)" so readers know the platform limitation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ceb77ff3-5ea7-4ff5-b268-df9d95c89488

📥 Commits

Reviewing files that changed from the base of the PR and between 24e2b5c and 92f17da.

📒 Files selected for processing (2)
  • deployments/packages/systemd/fleetint.env
  • docs/configuration.md

Comment thread deployments/packages/systemd/fleetint.env
@jingxiang-z
Copy link
Copy Markdown
Collaborator

Please add the env var to the helm chart as well.

Signed-off-by: Amber Xue <ambermingxin@nvidia.com>
@ambermingxin ambermingxin merged commit 37f954e into main Apr 28, 2026
9 checks passed
@ambermingxin ambermingxin deleted the feat/add_mem_env_var branch April 28, 2026 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants