feat: add CPU limits per agent via cgroups v2#289
Conversation
Implement per-agent CPU isolation using Linux cgroups v2 to prevent one
agent (e.g., running npm install or build) from starving other agents
of CPU resources.
Features:
- CgroupManager utility in @agent-relay/resiliency package
- Auto-detects cgroups v2 availability and gracefully degrades
- Per-agent CPU limits with configurable percentage
- Automatic cleanup on agent exit
Configuration:
- cpuLimitPercent option in RelayPtyOrchestratorConfig
- AGENT_CPU_LIMIT env var to set default in cloud (default: 100%)
- Only enabled when WORKSPACE_ID is set (cloud environment)
Files:
- packages/resiliency/src/cgroup-manager.ts: New cgroup manager
- packages/resiliency/src/index.ts: Export cgroup manager
- packages/wrapper/src/relay-pty-orchestrator.ts: Integrate cgroups
- packages/bridge/src/spawner.ts: Pass cpuLimitPercent config
Example cgroup structure:
/sys/fs/cgroup/agent-relay/{agentName}/
cpu.max: "100000 100000" (100% of one core)
cgroup.procs: {pid}
🤖 My Senior Dev — Analysis Complete👤 For @khaliqgant📁 Expert in View your contributor analytics → 📊 4 files reviewed • 5 need attention
📚 Relevant Past Discussions:
🚀 Open Interactive Review →The full interface unlocks features not available in GitHub:
💬 Chat here: 📖 View all 12 personas & slash commandsYou can interact with me by mentioning In PR comments or on any line of code:
Slash commands:
AI Personas (mention to get their perspective):
For the best experience, view this PR on myseniordev.com — includes AI chat, file annotations, and interactive reviews. |
| const subtreeControlPath = join(this.cgroupBase, 'cgroup.subtree_control'); | ||
| if (existsSync(subtreeControlPath)) { | ||
| try { | ||
| writeFileSync(subtreeControlPath, '+cpu'); |
Check failure
Code scanning / CodeQL
Potential file system race condition High
| this.agentCgroups.set(agentName, info); | ||
|
|
||
| this.emit('cgroup-created', { agentName, path: cgroupPath, cpuPercent }); | ||
| console.info(`[cgroup-manager] Created cgroup for ${agentName} with ${cpuPercent}% CPU limit`); |
Check warning
Code scanning / CodeQL
Log injection Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
General approach: sanitize user-controlled values before logging them. For plain-text logs, especially where log entries are line oriented, we should remove newline characters (\n, \r) and optionally other control characters from the agent name before interpolating it into log messages. We can also ensure that error messages constructed from user input do not contain newlines.
Best concrete fix here: in CgroupManager.createAgentCgroup (packages/resiliency/src/cgroup-manager.ts), introduce a small sanitization step for agentName that strips newline and carriage-return characters for logging purposes, and then use this sanitized value in the console.info success log and the console.warn error log. We should not change the actual agentName used for filesystem paths or map keys, to avoid altering behavior; instead, only the logged representation is sanitized. A simple helper variable safeAgentName defined inside the function is enough and does not require new imports.
More specifically:
- In
createAgentCgroup, after validatingagentNameand before logging, defineconst safeAgentName = agentName.replace(/[\r\n]/g, '');. - Use
safeAgentNameinstead ofagentNamein:- the success log:
console.info(`[cgroup-manager] Created cgroup for ${safeAgentName} with ${cpuPercent}% CPU limit`); - the error construction/logging:
new Error(\Failed to create cgroup for ${safeAgentName}: ${err.message}`);`.
This keeps all functional behavior identical (cgroup names, map keys, events) but prevents an attacker from injecting line breaks into the log output via the agent name.
- the success log:
| @@ -203,6 +203,9 @@ | ||
|
|
||
| const cgroupPath = join(this.cgroupBase, agentName); | ||
|
|
||
| // Sanitize agent name for logging to prevent log injection via newlines | ||
| const safeAgentName = agentName.replace(/[\r\n]/g, ''); | ||
|
|
||
| try { | ||
| // Create cgroup directory | ||
| if (!existsSync(cgroupPath)) { | ||
| @@ -231,11 +234,11 @@ | ||
| this.agentCgroups.set(agentName, info); | ||
|
|
||
| this.emit('cgroup-created', { agentName, path: cgroupPath, cpuPercent }); | ||
| console.info(`[cgroup-manager] Created cgroup for ${agentName} with ${cpuPercent}% CPU limit`); | ||
| console.info(`[cgroup-manager] Created cgroup for ${safeAgentName} with ${cpuPercent}% CPU limit`); | ||
|
|
||
| return true; | ||
| } catch (err: any) { | ||
| const error = new Error(`Failed to create cgroup for ${agentName}: ${err.message}`); | ||
| const error = new Error(`Failed to create cgroup for ${safeAgentName}: ${err.message}`); | ||
| this.emit('error', error); | ||
| console.warn(`[cgroup-manager] ${error.message}`); | ||
| return false; |
| } catch (err: any) { | ||
| const error = new Error(`Failed to create cgroup for ${agentName}: ${err.message}`); | ||
| this.emit('error', error); | ||
| console.warn(`[cgroup-manager] ${error.message}`); |
Check warning
Code scanning / CodeQL
Log injection Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
General approach: ensure that any user-controlled data that ends up in log output is sanitized so it cannot inject newlines or control characters into the logs. For this specific path, that means ensuring that agentName (derived from HTTP name) is normalized to a safe form early, and/or sanitizing it at the point where it is included in log messages. The simplest robust fix is to define a small helper that strips \r, \n, and other control characters from strings before logging them, and apply that to agentName and any derived error messages used with console.*.
Best fix with minimal behavior change:
- In
packages/resiliency/src/cgroup-manager.ts, introduce a small internal helper function, e.g.sanitizeForLog, which:- Accepts a
string. - Replaces all
\rand\nwith a safe character (or removes them), and optionally strips other control characters.
- Accepts a
- Use this helper where user-controlled content is logged:
- In the success log in
createAgentCgroup, sanitizeagentNamewhen embedding it into the log message. - In the error path, instead of logging
error.messagedirectly (which contains the unsanitizedagentName), build a log message that uses a sanitizedagentNameand a sanitized version oferr.message. Keep theErrorobject as-is for programmatic handling viathis.emit('error', error)so external behavior is unchanged except for the console text.
- In the success log in
- Keep all existing logic and return values intact; only adjust what is passed to
console.warn/console.info.
No other files need modification: by sanitizing in cgroup-manager.ts at the sink, we cover all alert variants tied to this flow, regardless of how agentName is constructed upstream.
| @@ -29,6 +29,12 @@ | ||
| import { join } from 'node:path'; | ||
| import { EventEmitter } from 'node:events'; | ||
|
|
||
| // Sanitize potentially user-controlled values before logging to avoid log injection | ||
| function sanitizeForLog(value: string): string { | ||
| // Remove CR/LF and other ASCII control characters | ||
| return value.replace(/[\r\n\x00-\x1F\x7F]/g, ''); | ||
| } | ||
|
|
||
| /** | ||
| * CPU limit configuration for an agent | ||
| */ | ||
| @@ -231,13 +237,18 @@ | ||
| this.agentCgroups.set(agentName, info); | ||
|
|
||
| this.emit('cgroup-created', { agentName, path: cgroupPath, cpuPercent }); | ||
| console.info(`[cgroup-manager] Created cgroup for ${agentName} with ${cpuPercent}% CPU limit`); | ||
| const safeAgentName = sanitizeForLog(agentName); | ||
| console.info(`[cgroup-manager] Created cgroup for ${safeAgentName} with ${cpuPercent}% CPU limit`); | ||
|
|
||
| return true; | ||
| } catch (err: any) { | ||
| const safeAgentName = sanitizeForLog(agentName); | ||
| const safeErrMessage = err && typeof err.message === 'string' ? sanitizeForLog(err.message) : ''; | ||
| const error = new Error(`Failed to create cgroup for ${agentName}: ${err.message}`); | ||
| this.emit('error', error); | ||
| console.warn(`[cgroup-manager] ${error.message}`); | ||
| console.warn( | ||
| `[cgroup-manager] Failed to create cgroup for ${safeAgentName}: ${safeErrMessage}`, | ||
| ); | ||
| return false; | ||
| } | ||
| } |
|
|
||
| const info = this.agentCgroups.get(agentName); | ||
| if (!info) { | ||
| console.warn(`[cgroup-manager] No cgroup found for agent ${agentName}`); |
Check warning
Code scanning / CodeQL
Log injection Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
General approach: sanitize user-controlled values before including them in log messages. For plain-text logs, the main concern is stripping newline (\n) and carriage return (\r) characters (and optionally other control characters) from attacker-controlled strings. To avoid widespread code churn and keep behavior unchanged, we can add a small helper function in cgroup-manager.ts that produces a safe, log-friendly version of the agent name, and then use it consistently in all console.* and emitted event payloads that contain agentName. This will address the current alert and any similar variants arising from logging agentName from tainted sources.
Best concrete fix in this file:
-
Define a
sanitizeLogValuehelper near the top ofpackages/resiliency/src/cgroup-manager.ts. This function will:- Convert the input to a string (defensive).
- Remove
\rand\ncharacters usingString.prototype.replacewith a regex like/[\r\n]+/g. - Optionally trim the result (not required, but harmless).
This matches the pattern recommended in the background description.
-
Use this helper for all log statements and error messages that interpolate
agentName(or values derived solely from it) insideCgroupManager:- In
createAgentCgroup:- For the validation error message
Invalid agent name: ${agentName}. - For the success log:
Created cgroup for ${agentName}.... - For the error construction and warning:
Failed to create cgroup for ${agentName}: ....
- For the validation error message
- In
addProcess:- For the “No cgroup found for agent …” warning (the one flagged by CodeQL).
- For the “Added process … to cgroup …” info log.
- For the “Failed to add process … to cgroup …” warning.
- For internal
emitevents we can leave the rawagentName(since they are in-process objects, not logs). However, if you want to be extra defensive, you can emit both raw and sanitized names; for minimal behavior changes, I will leave the event payloads unchanged and only sanitize the string interpolation used for logging.
- In
-
Keep functionality otherwise identical: the cgroup directories, map keys, and file paths will still use the original (already path-validated)
agentName. Only the text that is written to logs will be sanitized. This ensures no change to runtime behavior except for removing control characters from logged agent names.
No additional imports are needed; we only add a local function.
| info.pids.push(pid); | ||
|
|
||
| this.emit('process-added', { agentName, pid }); | ||
| console.info(`[cgroup-manager] Added process ${pid} to cgroup ${agentName}`); |
Check warning
Code scanning / CodeQL
Log injection Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
In general, to prevent log injection when logging user-controlled strings, sanitize them before interpolation: for plain-text logs, at least strip or replace newline (\n) and carriage-return (\r) characters, and optionally other non-printable characters. This ensures a user cannot forge additional log lines or alter log structure. It’s also good practice to mark/log user input with quotes or other delimiters, which this code already partially does in some messages.
For this specific case, the single best fix is to sanitize agentName right before it is used in log messages inside CgroupManager. Because agentName might be used elsewhere (e.g., filesystem paths) where newlines would already be invalid or handled by the OS, we should not change the actual value used for cgroup management; instead we should derive a sanitized version solely for logging. We can implement a small helper function within cgroup-manager.ts to normalize user-controlled strings for logs by removing \r and \n, then use that helper where agentName is logged in createAgentCgroup and addProcess (and other relevant log lines referencing agentName in this file). This approach avoids changing behavior of the cgroup logic and addresses all current and future log-injection issues involving agentName in this module, without requiring new dependencies or changes to other files.
Concretely:
- Add a simple
sanitizeForLogfunction topackages/resiliency/src/cgroup-manager.tsthat takesstring | numberand returns a string with\rand\nremoved (and safely handles non-string inputs). - In
createAgentCgroup, change theconsole.infoand error construction to usesanitizeForLog(agentName)instead ofagentNamedirectly. - In
addProcess, change theconsole.warnandconsole.infolines that referenceagentName(and optionallypid, thoughpidis numeric and not user-controlled) to use the sanitized value foragentName. - Keep all other logic, including event emission payloads and cgroup path handling, unchanged so that functionality remains the same.
|
|
||
| return true; | ||
| } catch (err: any) { | ||
| console.warn(`[cgroup-manager] Failed to add process ${pid} to cgroup ${agentName}: ${err.message}`); |
Check warning
Code scanning / CodeQL
Log injection Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
In general, to fix log injection where user-controlled strings are interpolated into logs, you should sanitize or encode those strings before logging them. For plain-text logs, the key step is to strip or normalize newline and carriage return characters (and optionally other control characters), and ensure logged user input is clearly delimited.
For this codebase, the single best minimal fix is to sanitize agentName right at the point where it is logged inside CgroupManager. That ensures that, regardless of where agentName originally came from (dashboard HTTP request, daemon, or other sources), any log messages emitted by CgroupManager cannot contain hostile newlines or control characters. We can accomplish this by introducing a small helper inside cgroup-manager.ts to clean log values (e.g., remove \r and \n, and optionally replace other control characters with safe placeholders), and using it when interpolating agentName into log strings.
Concretely:
- Define a private helper function in
cgroup-manager.ts, e.g.sanitizeLogValue(value: string): string, that:- Converts the input to string.
- Removes
\rand\n(and optionally other control characters like\tor anything in\x00-\x1Fand\x7F) usingString.prototype.replaceand a regular expression.
- Use this helper when logging
agentNamein:- the
catchblock ofcreateAgentCgroup(line 238–241 region), - the “no cgroup found” warning in
addProcess, - the success and failure logs in
addProcess, - the success and failure logs in
removeAgentCgroup.
- the
- Keep behavior of the rest of the system unchanged: we only modify log text, not identifiers used for cgroup paths or keys in maps.
No imports are needed: sanitization can be implemented with plain TypeScript/JavaScript.
Implement per-agent CPU isolation using Linux cgroups v2 to prevent one
agent (e.g., running npm install or build) from starving other agents
of CPU resources.
Features:
Configuration:
Files:
Example cgroup structure:
/sys/fs/cgroup/agent-relay/{agentName}/
cpu.max: "100000 100000" (100% of one core)
cgroup.procs: {pid}