Add ExtendedTestBase shared base class for benchmark and functional tests#3689
Add ExtendedTestBase shared base class for benchmark and functional tests#3689
Conversation
…ests Signed-off-by: Lenine Ajagappane <Lenine.Ajagappane@amd.com>
Signed-off-by: Lenine Ajagappane <Lenine.Ajagappane@amd.com>
Signed-off-by: Lenine Ajagappane <Lenine.Ajagappane@amd.com>
geomin12
left a comment
There was a problem hiding this comment.
letting other reviewers review
geomin12
left a comment
There was a problem hiding this comment.
letting other folks review - if this isn't reviewed till fri, i will take a look
geomin12
left a comment
There was a problem hiding this comment.
lgtm, but can we post some logs making sure this works?
Triggerd nighty-ci benchmark runs with this change - https://github.com/ROCm/TheRock/actions/runs/22701813100/job/65820378187 |
geomin12
left a comment
There was a problem hiding this comment.
letting other folks review - if this isn't reviewed till fri, i will take a look
| """Base class providing shared infrastructure for benchmark and functional tests.""" | ||
|
|
||
| def __init__(self, test_name: str, display_name: str = None): | ||
| """Initialize common test infrastructure. |
There was a problem hiding this comment.
| """Initialize common test infrastructure. | |
| """Initialize common extended test infrastructure. |
There was a problem hiding this comment.
Updated docstring as suggested.
|
|
||
|
|
||
| class ExtendedTestBase: | ||
| """Base class providing shared infrastructure for benchmark and functional tests.""" |
There was a problem hiding this comment.
| """Base class providing shared infrastructure for benchmark and functional tests.""" | |
| """Base class providing shared infrastructure for extended tests.""" |
| process.wait() | ||
| return process.returncode | ||
|
|
||
| def get_gpu_architecture(self) -> str: |
There was a problem hiding this comment.
is this needed, seems identical to get_first_gpu_architecture
There was a problem hiding this comment.
No, I just added get_gpu_architecture as a wrapper on top of get_first_gpu_architecture which is not required. So, removed it now.
| "Ensure ROCm drivers are installed and GPU is accessible." | ||
| ) from e | ||
|
|
||
| def detect_gpu_count(self) -> int: |
There was a problem hiding this comment.
identical to get_visible_gpu_count, can be removed as it is part of the shared utils lib
There was a problem hiding this comment.
No, I just added detect_gpu_count as a wrapper on top of get_visible_gpu_count with logging which is not required. So, removed it and updated the benchmark script to directly call get_visible_gpu_count.
| ) -> Dict[str, Any]: | ||
| """Create a standardized test result dictionary. | ||
|
|
||
| Builds the base result structure used by both benchmark and functional tests. |
There was a problem hiding this comment.
| Builds the base result structure used by both benchmark and functional tests. | |
| Builds the base result structure used by extended tests. |
There was a problem hiding this comment.
Updated docstring as suggested.
| if key == "pass": | ||
| key = "passed" | ||
| elif key == "fail": | ||
| key = "failed" |
There was a problem hiding this comment.
can we just update the keys to be passed or failed? not sure why we do this conversion
There was a problem hiding this comment.
Yes removed now, this conversion logic not really needed here.
| metadata.update(extra_metadata) | ||
|
|
||
| log.info(f"Uploading {test_type.title()} Test Results to API") | ||
| success = self.client.upload_results( |
There was a problem hiding this comment.
i believe the benchmark tests are passing but we still get the 403 error. can we remove this or comment this out until it is fixed?
open a GH issue with error logs and comment this out with a TODO?
There was a problem hiding this comment.
Yes, this needs changes in LKG comparison and result handling. Can I create separate PR for this, so that we can get this PR reviewed and merged quickly which blocking other reviews.
I've created github-issue for #3850. Will create new PR and add for review.
There was a problem hiding this comment.
Created new PR to temporarily disable the API upload - #3853.
| if success: | ||
| log.info("Results uploaded successfully") | ||
| else: | ||
| log.info("Results saved locally only (API upload disabled or failed)") |
There was a problem hiding this comment.
can we add error log? this is a pretty vague message and if i got an error, i would want to see more than this
There was a problem hiding this comment.
Just updated to print the generic error message now. Will take care of detailed errors as part of new PR which disables API upload.
| ## Usage | ||
|
|
||
| ### From Benchmark Scripts | ||
| ### From Test Base Classes |
There was a problem hiding this comment.
| ### From Test Base Classes | |
| ### From Extended Test Base Classes |
- Remove get_gpu_architecture & detect_gpu_count and use shared utils lib - Update docstrings Signed-off-by: Lenine Ajagappane <Lenine.Ajagappane@amd.com>
…OCm/TheRock into users/lajagapp/add-test-helper
Signed-off-by: Lenine Ajagappane <Lenine.Ajagappane@amd.com>
geomin12
left a comment
There was a problem hiding this comment.
lot cleaner, thanks for updates
Summary
Extracts shared test infrastructure from
BenchmarkBaseinto a newExtendedTestBasebase class, reducing code duplication between benchmark and functional tests.This addresses the reviewer feedback on #3648 about sharing common logic between
benchmark_base.pyandfunctional_base.py.Changes
New:
utils/extended_test_base.pyexecute_command(), get_gpu_architecture(), detect_gpu_count(), create_test_result(), calculate_statistics(), upload_results()
Updated:
benchmark/scripts/benchmark_base.pyUpdated:
benchmark/scripts/test_rccl_benchmark.pyUpdated:
utils/__init__.py,README.md,utils/README.mdInheritance Hierarchy
Follow-up