Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup Cuda regtests via new do_regtest.py script. #1548

Merged
merged 5 commits into from
Jun 4, 2021

Conversation

oschuett
Copy link
Member

No description provided.

print(f"<<< {batch_result.batch.workdir} ({num_done + 1}", end="")
print(f" of {len(tasks)}) done in {batch_result.duration:.2f} sec")
sys.stdout.flush()
err_fh.write("\n".join([r.error for r in batch_result.results if r.error]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be no need to for the explicit list comprehension here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Thanks!

print("\n-------------------------------- Summary -------------------------------")
total_duration = time.perf_counter() - start_time
num_tests = len(all_results)
num_failed = sum([r.status in ("TIMEOUT", "RUNTIME FAIL") for r in all_results])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dito

print(f"Number of CORRECT tests {num_ok}")
print(f"Total number of tests {num_tests}")
summary = f"\nSummary: correct: {num_ok} / {num_tests}"
summary += f"; wrong: {num_wrong}" if num_wrong > 0 else ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
summary += f"; wrong: {num_wrong}" if num_wrong > 0 else ""
summary += f"; wrong: {num_wrong if num_wrong > 0 else ''}"

... and then thoe whole thing can be one string:

summary = (
    f"\nSummary: correct: {num_ok} / {num_tests}"
    f"; wrong: {num_wrong if num_wrong > 0 else ''}"
    f"; failed: {num_failed  if num_failed > 0 else ''}"
    f"; {total_duration/60.0:.0f}min"
    )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will work because I also want "; failed: " to disappear when there are no failures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I see, but then I'd probably use a prefixed if for clearness rather than appending an empty string:

if num_failed:
    summary += f"..."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this were some serious business logic then I would agree. But just for assembling this string I'd rather not spend the extra lines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes code more readable



# ======================================================================================
class Config:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a dataclass

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, because there is quite a lot going in the constructor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataclass doesn't mean you can't have a custom constructor

Copy link
Member Author

@oschuett oschuett Jun 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, but with a custom constructor one loses most of the advantages of a data class. It would certainly require more lines of code than the current solution.

if self.mpiranks == 1: # Speedup non-mpi test by using all available GPUs.
self.gpu_round_robin = (self.gpu_round_robin + 1) % self.num_gpu_devices
os.environ["CUDA_VISIBLE_DEVICES"] = str(self.gpu_round_robin)
os.environ["OMP_NUM_THREADS"] = str(self.ompthreads)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd advise against manipulating the current environ (especially when you're delaying the actual process making use of that environ) and rather pass an explicit dict with the env keyword to wherever you're spawning the process where the dict passing is a merge of os.environ and what cp2k_command should return. So the signature would then rather be:

     def cp2k_command(self) -> Tuple(List[str], Dict[str,str]):

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I refactor the launching of binaries.

@dev-zero
Copy link
Contributor

dev-zero commented Jun 1, 2021

Please also keep https://www.cp2k.org/dev:regtesting in mind.

@oschuett
Copy link
Member Author

oschuett commented Jun 3, 2021

@dev-zero, thanks a lot for the review!

Please also keep https://www.cp2k.org/dev:regtesting in mind.

I intent to roll out the new do_regtest.py script rather slowly. My main focus for now is to speedup the Cuda run so that we can afford a second GPU run for HIP.

Of course, feel free to take it for a spin on Daint. I'd be very curious. The new script is essentially an improved version of the farming mode.

@oschuett oschuett marked this pull request as ready for review June 3, 2021 21:59
@oschuett oschuett merged commit d4a8230 into cp2k:master Jun 4, 2021
@boegel boegel mentioned this pull request Aug 14, 2023
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants