Conversation
📝 WalkthroughWalkthroughAdds a new "systems" documentation page and TOC entry; updates Sphinx autodoc skip logic to skip non-attribute members for pydantic models when rendering the systems page; shortens RunAISystem docstring; updates LSFSystem docstring and adds a description to its Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
Greptile SummaryThis PR adds a new Key changes:
Confidence Score: 4/5
Important Files Changed
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@doc/systems.rst`:
- Around line 43-44: The RunAISystem schema currently exposes a nodes:
List[RunAINode] but the docs only render RunAISystem; either document the nested
RunAINode model by adding an autopydantic_model entry for
cloudai.systems.runai.runai_node.RunAINode (or the correct import path for
RunAINode) below the RunAISystem entry so the TOML shape of each node is shown,
or hide the nodes field from the parent model by changing the RunAISystem
autopydantic_model invocation to exclude the nodes field (e.g., add the
appropriate :exclude: nodes option or equivalent in the autopydantic directive)
so that nodes is not rendered if it is not user-settable.
- Around line 50-56: The docs page is TOML-only and must not expose
runtime-derived types LSFQueue and LSFGroup; remove the two autopydantic_model
directives for LSFQueue and LSFGroup from doc/systems.rst (leave only the
LSFSystem entry), since in src/cloudai/systems/lsf/lsf_system.py the queues are
rebuilt from live bhosts output and should not be presented as user-configurable
types.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 03b5c9b4-5bc9-48e8-bd00-08c2f730ae90
📒 Files selected for processing (5)
doc/conf.pydoc/index.rstdoc/systems.rstsrc/cloudai/systems/lsf/lsf_system.pysrc/cloudai/systems/runai/runai_system.py
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/cloudai/systems/lsf/lsf_system.py (3)
69-79:⚠️ Potential issue | 🟠 MajorDo not rebuild
self.queuesafter a failed shell command.Line 127 only logs command errors, but
update()still parses the outputs and Line 161 clearsself.queuesbefore repopulating it. A transientbhosts/bjobsfailure can therefore replace the last good topology with an empty or partial snapshot. Fail the refresh before mutating state once command execution reports an error.Suggested direction
def update(self) -> None: - bhosts_output, _ = self.fetch_command_output("bhosts") - bjobs_output, _ = self.fetch_command_output("bjobs -u all") + bhosts_output, bhosts_error = self.fetch_command_output("bhosts") + bjobs_output, bjobs_error = self.fetch_command_output("bjobs -u all") + if bhosts_error or bjobs_error: + raise RuntimeError("Failed to refresh LSF system state") + node_user_map = self.parse_bjobs_output(bjobs_output) self.parse_bhosts_output(bhosts_output, node_user_map)Also applies to: 117-131, 161-178
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cloudai/systems/lsf/lsf_system.py` around lines 69 - 79, The update() method currently ignores command failures from fetch_command_output and proceeds to parse and then clear/rebuild self.queues; change it to check the second return value (error/exit status) from fetch_command_output for both "bhosts" and "bjobs -u all", and if either reports an error, log and return early without calling parse_bjobs_output/parse_bhosts_output or mutating self.queues; also move the clearing/repopulation of self.queues (the code that empties and repopulates queues) so it only runs after successful command execution and parsing (i.e., after parse_bhosts_output returns successfully).
75-78:⚠️ Potential issue | 🔴 CriticalFix
parse_bhosts_output(): the sixth column inbhostsoutput isRUN(job slots), not a queue name.The plain
bhostscommand reports columnsHOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV. Line 168 incorrectly treats the sixth token (RUN, a numeric count of running job slots) as a queue name, causing hosts to be bucketed under values like0,1, or2instead of actual queue names. LSF queues are a separate configuration surface—queue-to-host mappings are not available from the plainbhostscommand. A different source (such aslshostswith-qflag, queue configuration files, or LSF API) must be used to correctly associate hosts with queues before constructingLSFQueueobjects.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cloudai/systems/lsf/lsf_system.py` around lines 75 - 78, parse_bhosts_output currently treats the 6th token of bhosts lines as a queue name (causing hosts to be grouped under numeric RUN values); update parse_bhosts_output to treat the 6th column as RUN (a numeric count) and not a queue identifier, stop using that token to construct LSFQueue objects, and instead either (a) leave queue association empty/None or (b) accept an explicit queue-to-host mapping injected from a proper source (e.g., lshosts -q or LSF API) before creating LSFQueue instances; adjust any logic that uses parse_bhosts_output (including calls in the same flow after parse_bjobs_output and references to LSFQueue) to handle hosts without queue assignments or to use the injected mapping.
75-77:⚠️ Potential issue | 🟠 MajorUse explicit output format for
bjobsinstead of relying on fixed column positions.The
parse_bjobs_output()method assumes token 6 is alwaysEXEC_HOST, but the defaultbjobsoutput leavesEXEC_HOSTempty for pending jobs. This causes unstable field alignment when parsing—non-running jobs have fewer columns or different column order, causing the method to either skip them or misalign field assignments, resulting in an incomplete or inaccuratenode_user_map.Request only the fields you need via
-oand filter by job status before populatingnode_user_map. This is consistent with howis_job_running()andis_job_completed()already use-noheader -o statin the same file.Suggested direction
- bjobs_output, _ = self.fetch_command_output("bjobs -u all") + bjobs_output, _ = self.fetch_command_output('bjobs -noheader -o "user stat exec_host" -u all') ... - for line in bjobs_output.splitlines(): - parts = line.split() - if len(parts) < 6: + for line in bjobs_output.splitlines(): + parts = line.split() + if len(parts) < 3: continue - _, user, _, _, _, exec_host = parts[:6] - if exec_host not in node_user_map: + user, stat, exec_host = parts[:3] + if stat != "RUN" or not exec_host or exec_host == "-": + continue + if exec_host not in node_user_map: node_user_map[exec_host] = user🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cloudai/systems/lsf/lsf_system.py` around lines 75 - 77, parse_bjobs_output is relying on fixed column positions from a plain "bjobs" call which breaks when EXEC_HOST is empty; instead change the call in the code that invokes fetch_command_output to request explicit fields and filter statuses (e.g. use "bjobs -noheader -o <fields> -u all" where <fields> includes JOBID, USER, STAT, EXEC_HOST) and then update parse_bjobs_output to parse those tab/space-separated ordered fields (or split by the chosen delimiter) and only populate node_user_map for jobs with running STAT values; reference fetch_command_output, parse_bjobs_output, and the places where is_job_running()/is_job_completed() already use "-noheader -o stat" to mirror that approach.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cloudai/systems/lsf/lsf_system.py`:
- Around line 52-53: Update the LSFSystem class docstring to clearly describe
the configuration surface and behavior: state whether the class represents TOML
configuration, runtime state, or both; list the expected config keys/sections or
their purpose (e.g., host, port, credentials, scheduling options) and the format
(TOML) if applicable; summarize runtime behavior and side effects (e.g.,
establishes connections, starts jobs, mutates system state) and any validation
performed by LSFSystem methods; place this expanded one-paragraph docstring
directly on the LSFSystem class to replace the current generic line.
---
Outside diff comments:
In `@src/cloudai/systems/lsf/lsf_system.py`:
- Around line 69-79: The update() method currently ignores command failures from
fetch_command_output and proceeds to parse and then clear/rebuild self.queues;
change it to check the second return value (error/exit status) from
fetch_command_output for both "bhosts" and "bjobs -u all", and if either reports
an error, log and return early without calling
parse_bjobs_output/parse_bhosts_output or mutating self.queues; also move the
clearing/repopulation of self.queues (the code that empties and repopulates
queues) so it only runs after successful command execution and parsing (i.e.,
after parse_bhosts_output returns successfully).
- Around line 75-78: parse_bhosts_output currently treats the 6th token of
bhosts lines as a queue name (causing hosts to be grouped under numeric RUN
values); update parse_bhosts_output to treat the 6th column as RUN (a numeric
count) and not a queue identifier, stop using that token to construct LSFQueue
objects, and instead either (a) leave queue association empty/None or (b) accept
an explicit queue-to-host mapping injected from a proper source (e.g., lshosts
-q or LSF API) before creating LSFQueue instances; adjust any logic that uses
parse_bhosts_output (including calls in the same flow after parse_bjobs_output
and references to LSFQueue) to handle hosts without queue assignments or to use
the injected mapping.
- Around line 75-77: parse_bjobs_output is relying on fixed column positions
from a plain "bjobs" call which breaks when EXEC_HOST is empty; instead change
the call in the code that invokes fetch_command_output to request explicit
fields and filter statuses (e.g. use "bjobs -noheader -o <fields> -u all" where
<fields> includes JOBID, USER, STAT, EXEC_HOST) and then update
parse_bjobs_output to parse those tab/space-separated ordered fields (or split
by the chosen delimiter) and only populate node_user_map for jobs with running
STAT values; reference fetch_command_output, parse_bjobs_output, and the places
where is_job_running()/is_job_completed() already use "-noheader -o stat" to
mirror that approach.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 160c1219-cbcd-40cd-802e-22a0ffd747c1
📒 Files selected for processing (2)
doc/conf.pysrc/cloudai/systems/lsf/lsf_system.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
doc/systems.rst (1)
51-57:⚠️ Potential issue | 🟡 Minor
LSFQueueandLSFGroupshould not be documented on this TOML-only page.These models represent runtime-derived state (queues are rebuilt from live
bhostsoutput) and should not be presented as user-configurable TOML attributes. This contradicts the page introduction stating that the documented attributes "can be set in TOML configuration files."📚 Suggested fix
.. autopydantic_model:: cloudai.systems.lsf.lsf_system.LSFSystem - :exclude-members: cmd_shell + :exclude-members: cmd_shell, queues -.. autopydantic_model:: cloudai.systems.lsf.lsf_system.LSFQueue - :exclude-members: lsf_nodes - -.. autopydantic_model:: cloudai.systems.lsf.lsf_system.LSFGroup🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@doc/systems.rst` around lines 51 - 57, Remove the runtime-only models LSFQueue and LSFGroup from this TOML-focused documentation page: delete the autopydantic_model entries for cloudai.systems.lsf.lsf_system.LSFQueue and cloudai.systems.lsf.lsf_system.LSFGroup so only LSFSystem remains documented, and add a brief one-line note (if desired) stating that queues and groups are runtime-derived (from bhosts) and are not configurable via TOML to avoid user confusion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@doc/systems.rst`:
- Around line 51-57: Remove the runtime-only models LSFQueue and LSFGroup from
this TOML-focused documentation page: delete the autopydantic_model entries for
cloudai.systems.lsf.lsf_system.LSFQueue and
cloudai.systems.lsf.lsf_system.LSFGroup so only LSFSystem remains documented,
and add a brief one-line note (if desired) stating that queues and groups are
runtime-derived (from bhosts) and are not configurable via TOML to avoid user
confusion.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 44600d20-9b69-403b-956f-6409127e14b2
📒 Files selected for processing (1)
doc/systems.rst
There was a problem hiding this comment.
♻️ Duplicate comments (1)
doc/systems.rst (1)
53-59:⚠️ Potential issue | 🟡 MinorHide runtime-derived LSF queue models from this TOML-only page.
Line 4 says this page lists attributes that can be set in TOML, but
LSFSystem.queuesis rebuilt from livebhostsoutput inparse_bhosts_output(), and the field itself is described as auto-filled. RenderingLSFQueueandLSFGrouphere makes runtime state look user-configurable. ExcludequeuesfromLSFSystemand drop the nested LSF model directives from this page.Suggested doc change
.. autopydantic_model:: cloudai.systems.lsf.lsf_system.LSFSystem - :exclude-members: cmd_shell + :exclude-members: cmd_shell, queues -.. autopydantic_model:: cloudai.systems.lsf.lsf_system.LSFQueue - :exclude-members: lsf_nodes - -.. autopydantic_model:: cloudai.systems.lsf.lsf_system.LSFGroup🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@doc/systems.rst` around lines 53 - 59, Update the doc to stop exposing runtime-derived LSF models: in the autopydantic_model directive for LSFSystem remove or add :exclude-members: queues (so the auto-generated docs do not show the runtime-filled queues field) and delete the separate autopydantic_model blocks for LSFQueue and LSFGroup from this TOML-only page; this keeps parse_bhosts_output()/LSFSystem.queues (the auto-filled runtime data) out of the configuration documentation and leaves only TOML-settable attributes documented.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@doc/systems.rst`:
- Around line 53-59: Update the doc to stop exposing runtime-derived LSF models:
in the autopydantic_model directive for LSFSystem remove or add
:exclude-members: queues (so the auto-generated docs do not show the
runtime-filled queues field) and delete the separate autopydantic_model blocks
for LSFQueue and LSFGroup from this TOML-only page; this keeps
parse_bhosts_output()/LSFSystem.queues (the auto-filled runtime data) out of the
configuration documentation and leaves only TOML-settable attributes documented.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: c981c38a-028e-418e-a37b-6d93d9c251b3
📒 Files selected for processing (2)
doc/systems.rstsrc/cloudai/systems/lsf/lsf_system.py
Summary
Add documentation for Systems: list only properties relevant for TOML configs.
Test Plan
Additional Notes
–