Skip to content

CVS docs onboarding#3

Merged
mattwill-amd merged 87 commits into
mainfrom
docs
Nov 13, 2025
Merged

CVS docs onboarding#3
mattwill-amd merged 87 commits into
mainfrom
docs

Conversation

@mattwill-amd
Copy link
Copy Markdown
Contributor

Onboarding the CVS documentation to the repository.

Copy link
Copy Markdown
Contributor

@urtiwari urtiwari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes looks good to me.

Copy link
Copy Markdown

@yugang-amd yugang-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The raw RST looks good. I’ve just added a few minor suggestions.

Comment thread docs/how-to/run-cvs-tests.rst Outdated
Comment thread docs/how-to/run-cvs-tests.rst Outdated
Comment thread docs/how-to/run-cvs-tests.rst Outdated
Comment thread docs/how-to/run-cvs-tests.rst Outdated
Comment thread docs/how-to/run-cvs-tests.rst Outdated
Comment thread docs/install/cvs-install.rst Outdated
Comment thread docs/install/cvs-install.rst Outdated
Comment thread docs/reference/configuration-files/configure-config.rst Outdated
Comment thread docs/what-is-cvs.rst Outdated
Comment thread docs/what-is-cvs.rst Outdated
Comment thread docs/reference/configuration-files/rccl.rst Outdated
Comment thread docs/install/cvs-install.rst Outdated
Comment thread docs/install/cvs-install.rst Outdated
Comment thread docs/reference/configuration-files/ib.rst Outdated
Comment thread docs/reference/configuration-files/ib.rst Outdated
Comment thread docs/reference/configuration-files/platform.rst Outdated
Comment thread docs/reference/configuration-files/platform.rst Outdated
Comment thread docs/reference/configuration-files/rccl.rst Outdated
Comment thread docs/what-is-cvs.rst Outdated
@mattwill-amd
Copy link
Copy Markdown
Contributor Author

Added comments from Jeff peer review

Copy link
Copy Markdown

@amd-jnovotny amd-jnovotny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good overall. I have a small number of (mainly style guide related) comments for you to consider.

Comment thread docs/index.rst Outdated
Comment thread docs/what-is-cvs.rst Outdated
Comment thread docs/what-is-cvs.rst Outdated
Comment thread docs/install/cvs-install.rst Outdated
Comment thread docs/install/cvs-install.rst Outdated
Comment thread docs/reference/configuration-files/health.rst Outdated
Comment thread docs/reference/configuration-files/rccl.rst Outdated
Comment thread docs/reference/configuration-files/rccl.rst Outdated
Comment thread docs/reference/configuration-files/rccl.rst Outdated
Comment thread docs/reference/configuration-files/rccl.rst Outdated
Copy link
Copy Markdown
Contributor

@urtiwari urtiwari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine

@mattwill-amd mattwill-amd merged commit 7486ea8 into main Nov 13, 2025
1 check passed
mattwill-amd added a commit that referenced this pull request Nov 13, 2025
CVS docs onboarding
mattwill-amd added a commit that referenced this pull request Nov 13, 2025
Merge pull request #3 from ROCm/docs
@cijohnson cijohnson deleted the docs branch January 30, 2026 08:53
atnair-amd added a commit that referenced this pull request Apr 17, 2026
get_linux_perf_tuning_dict builds a local out_dict populated with BIOS
version / NUMA balancing / NMI watchdog / THP / cpupower info but ended
without a return statement, so every caller received None. The
docstring itself flagged this.

Add the missing return and update the docstring to document the real
return shape. Add a regression test that asserts the function returns
a populated dict with the five expected keys.

Live observation on banff-cyxtera-s70-2 prior to fix (via
cvs.lib.parallel_ssh_lib.Pssh):

  ### #3 get_linux_perf_tuning_dict ###
  ...
  TYPE: NoneType
  VALUE: None
cijohnson added a commit that referenced this pull request May 12, 2026
This commit addresses all PR review comments by completely refactoring
the multiprocess SSH architecture to be cleaner, more maintainable, and
more robust.

Key Architectural Changes:
- Replace inheritance with ABC + composition pattern
- Eliminate registry + if/elif tree with dynamic method dispatch
- Create single source of truth via ShardableSshInterface ABC
- Move ABC to separate interfaces.py to avoid circular imports

Review Comments Addressed:
- #1-#2: Fixed operation name mismatches (upload/download)
- #3: Updated test expectations for new calling convention
- #4: Proper error handling with early ABC validation
- #5: Maintained custom download_file result merging
- #6: Implemented exact dynamic approach suggested by reviewer

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Ignatious Johnson <ichristo@amd.com>
cijohnson added a commit that referenced this pull request May 12, 2026
This commit addresses all PR review comments by completely refactoring
the multiprocess SSH architecture to be cleaner, more maintainable, and
more robust.

Key Architectural Changes:
- Replace inheritance with ABC + composition pattern
- Eliminate registry + if/elif tree with dynamic method dispatch
- Create single source of truth via ShardableSshInterface ABC
- Move ABC to separate interfaces.py to avoid circular imports

Review Comments Addressed:
- #1-#2: Fixed operation name mismatches (upload/download)
- #3: Updated test expectations for new calling convention
- #4: Proper error handling with early ABC validation
- #5: Maintained custom download_file result merging
- #6: Implemented exact dynamic approach suggested by reviewer

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Ignatious Johnson <ichristo@amd.com>
cijohnson added a commit that referenced this pull request May 12, 2026
This commit addresses all PR review comments by completely refactoring
the multiprocess SSH architecture to be cleaner, more maintainable, and
more robust.

Key Architectural Changes:
- Replace inheritance with ABC + composition pattern
- Eliminate registry + if/elif tree with dynamic method dispatch
- Create single source of truth via ShardableSshInterface ABC
- Move ABC to separate interfaces.py to avoid circular imports

Review Comments Addressed:
- #1-#2: Fixed operation name mismatches (upload/download)
- #3: Updated test expectations for new calling convention
- #4: Proper error handling with early ABC validation
- #5: Maintained custom download_file result merging
- #6: Implemented exact dynamic approach suggested by reviewer

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Ignatious Johnson <ichristo@amd.com>
cijohnson added a commit that referenced this pull request May 12, 2026
This commit addresses all PR review comments by completely refactoring
the multiprocess SSH architecture to be cleaner, more maintainable, and
more robust.

Key Architectural Changes:
- Replace inheritance with ABC + composition pattern
- Eliminate registry + if/elif tree with dynamic method dispatch
- Create single source of truth via ShardableSshInterface ABC
- Move ABC to separate interfaces.py to avoid circular imports

Review Comments Addressed:
- #1-#2: Fixed operation name mismatches (upload/download)
- #3: Updated test expectations for new calling convention
- #4: Proper error handling with early ABC validation
- #5: Maintained custom download_file result merging
- #6: Implemented exact dynamic approach suggested by reviewer

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Ignatious Johnson <ichristo@amd.com>
cijohnson added a commit that referenced this pull request May 14, 2026
This commit addresses all PR review comments by completely refactoring
the multiprocess SSH architecture to be cleaner, more maintainable, and
more robust.

Key Architectural Changes:
- Replace inheritance with ABC + composition pattern
- Eliminate registry + if/elif tree with dynamic method dispatch
- Create single source of truth via ShardableSshInterface ABC
- Move ABC to separate interfaces.py to avoid circular imports

Review Comments Addressed:
- #1-#2: Fixed operation name mismatches (upload/download)
- #3: Updated test expectations for new calling convention
- #4: Proper error handling with early ABC validation
- #5: Maintained custom download_file result merging
- #6: Implemented exact dynamic approach suggested by reviewer

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Ignatious Johnson <ichristo@amd.com>
cijohnson added a commit that referenced this pull request May 14, 2026
* Add upload_file and download_file support

to MultiProcessPssh with operation registry

This commit addresses API incompatibility issues between Pssh and MultiProcessPssh
classes and implements a robust operation registry system to prevent future drift.

Key Changes:
- Add missing upload_file and download_file methods to MultiProcessPssh class
- Implement operation registry (SUPPORTED_OPERATIONS) as single source of truth
- Use method names directly as operation names to eliminate mapping errors
- Add missing detailed parameter support to MultiProcessPssh.exec()

API Parity Testing:
- Add comprehensive test suite to catch missing methods automatically
- Validate method signatures match between Pssh and MultiProcessPssh
- Verify operation registry completeness and consistency
- Test runtime execution of all sharder operations with proper parameters

Operations Updated:
- exec -> exec (added detailed parameter support)
- upload_file -> upload_file (new operation, matches method name)
- download_file -> download_file (new operation, matches method name)

This ensures MultiProcessPssh maintains full API compatibility with Pssh
and prevents the class of issues that led to missing upload_file/download_file methods.

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Ignatious Johnson <ichristo@amd.com>

* Refactor to ABC + composition + dynamic sharder architecture

This commit addresses all PR review comments by completely refactoring
the multiprocess SSH architecture to be cleaner, more maintainable, and
more robust.

Key Architectural Changes:
- Replace inheritance with ABC + composition pattern
- Eliminate registry + if/elif tree with dynamic method dispatch
- Create single source of truth via ShardableSshInterface ABC
- Move ABC to separate interfaces.py to avoid circular imports

Review Comments Addressed:
- #1-#2: Fixed operation name mismatches (upload/download)
- #3: Updated test expectations for new calling convention
- #4: Proper error handling with early ABC validation
- #5: Maintained custom download_file result merging
- #6: Implemented exact dynamic approach suggested by reviewer

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Ignatious Johnson <ichristo@amd.com>

---------

Signed-off-by: Ignatious Johnson <ichristo@amd.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants