-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speedup Cuda regtests via new do_regtest.py script. #1548
Conversation
tools/regtesting/do_regtest.py
Outdated
print(f"<<< {batch_result.batch.workdir} ({num_done + 1}", end="") | ||
print(f" of {len(tasks)}) done in {batch_result.duration:.2f} sec") | ||
sys.stdout.flush() | ||
err_fh.write("\n".join([r.error for r in batch_result.results if r.error])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should be no need to for the explicit list comprehension here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Thanks!
tools/regtesting/do_regtest.py
Outdated
print("\n-------------------------------- Summary -------------------------------") | ||
total_duration = time.perf_counter() - start_time | ||
num_tests = len(all_results) | ||
num_failed = sum([r.status in ("TIMEOUT", "RUNTIME FAIL") for r in all_results]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dito
print(f"Number of CORRECT tests {num_ok}") | ||
print(f"Total number of tests {num_tests}") | ||
summary = f"\nSummary: correct: {num_ok} / {num_tests}" | ||
summary += f"; wrong: {num_wrong}" if num_wrong > 0 else "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
summary += f"; wrong: {num_wrong}" if num_wrong > 0 else "" | |
summary += f"; wrong: {num_wrong if num_wrong > 0 else ''}" |
... and then thoe whole thing can be one string:
summary = (
f"\nSummary: correct: {num_ok} / {num_tests}"
f"; wrong: {num_wrong if num_wrong > 0 else ''}"
f"; failed: {num_failed if num_failed > 0 else ''}"
f"; {total_duration/60.0:.0f}min"
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will work because I also want "; failed: "
to disappear when there are no failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I see, but then I'd probably use a prefixed if
for clearness rather than appending an empty string:
if num_failed:
summary += f"..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this were some serious business logic then I would agree. But just for assembling this string I'd rather not spend the extra lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it makes code more readable
|
||
|
||
# ====================================================================================== | ||
class Config: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a dataclass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, because there is quite a lot going in the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataclass
doesn't mean you can't have a custom constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, but with a custom constructor one loses most of the advantages of a data class. It would certainly require more lines of code than the current solution.
tools/regtesting/do_regtest.py
Outdated
if self.mpiranks == 1: # Speedup non-mpi test by using all available GPUs. | ||
self.gpu_round_robin = (self.gpu_round_robin + 1) % self.num_gpu_devices | ||
os.environ["CUDA_VISIBLE_DEVICES"] = str(self.gpu_round_robin) | ||
os.environ["OMP_NUM_THREADS"] = str(self.ompthreads) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd advise against manipulating the current environ (especially when you're delaying the actual process making use of that environ) and rather pass an explicit dict with the env
keyword to wherever you're spawning the process where the dict passing is a merge of os.environ
and what cp2k_command
should return. So the signature would then rather be:
def cp2k_command(self) -> Tuple(List[str], Dict[str,str]):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I refactor the launching of binaries.
Please also keep https://www.cp2k.org/dev:regtesting in mind. |
@dev-zero, thanks a lot for the review!
I intent to roll out the new Of course, feel free to take it for a spin on Daint. I'd be very curious. The new script is essentially an improved version of the farming mode. |
No description provided.