Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced computational resource widget with resource setup #566

Merged
merged 28 commits into from
Apr 24, 2024

Conversation

superstar54
Copy link
Member

@superstar54 superstar54 commented Nov 21, 2023

QeApp allows a plugin to add a new computational code in the submission step. As the number of codes grows, the old Resouces and Parallelization sections are not proper anymore because the users may want to set different resources (e.g., nodes, cpus) for different codes.
There are also such discussions before, e.g., see this comment

This PR creates a new QEappComputationalResourcesWidget from one of the aiidalab-widigets-base. The new widget supports setting the resource for the selected code.

Besides, some settings are special to a code, for example, the parallelization levels (pools, images, nk, etc.) in the pw.x code. Thus, this PR creates a new PWscfWidget for the pw.x, which is inherited from the QEappComputationalResourcesWidget.

Here is the screenshot:
Screenshot from 2023-11-21 16-55-12

Copy link

codecov bot commented Nov 21, 2023

Codecov Report

Attention: Patch coverage is 86.61972% with 19 lines in your changes are missing coverage. Please review.

Project coverage is 80.82%. Comparing base (f63da7d) to head (80727e5).
Report is 59 commits behind head on main.

❗ Current head 80727e5 differs from pull request most recent head e81ece7. Consider uploading reports for the commit e81ece7 to get more accurate results

Files Patch % Lines
src/aiidalab_qe/common/widgets.py 86.41% 11 Missing ⚠️
src/aiidalab_qe/app/submission/__init__.py 72.41% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #566      +/-   ##
==========================================
+ Coverage   80.73%   80.82%   +0.09%     
==========================================
  Files          49       48       -1     
  Lines        3415     3468      +53     
==========================================
+ Hits         2757     2803      +46     
- Misses        658      665       +7     
Flag Coverage Δ
python-3.10 80.82% <86.61%> (+0.09%) ⬆️
python-3.8 80.87% <86.61%> (+0.09%) ⬆️
python-3.9 80.87% <86.61%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@superstar54
Copy link
Member Author

@AndresOrtegaGuerrero @mikibonacci , It would be great if you could test this PR in your plugins.

@mbercx , do you think we still need to add the number of k-pools here? Does the new QE version support setting the parallelization automatically?

@@ -31,12 +32,35 @@ def check_codes(pw_code, dos_code, projwfc_code):
)


def update_resources(builder, codes):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this method, now should be included in the external plugins for the get_builder?

@AndresOrtegaGuerrero
Copy link
Member

@AndresOrtegaGuerrero @mikibonacci , It would be great if you could test this PR in your plugins.

@mbercx , do you think we still need to add the number of k-pools here? Does the new QE version support setting the parallelization automatically?

I think the k-pools should be somehow controlled, for example if you want to run the Berry phase code in pw.x you cant use pools. What it could be nice is to have a dummy run. , or a method that given the structure and the input parameters, it can tell you how many k-point you will have, like that we can provide the user a suggestion for the pools to use

else:
self._update_builder(v, max_mpi_per_pool)
def _update_builder(self, builder, codes):
"""Update the resources and parallelization of the ``relax`` builder."""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This update should also affect the other plugins right ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If other plugins use the same pw code as the relax workchain, yes.

self.npool.value = 1


class PWscfWidget(ComputationalResourcesWidget):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add docstring in this one

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. I added the docstring.

pw_code = codes.get("pw")
dos_code = codes.get("dos")
projwfc_code = codes.get("projwfc")
pw_code = codes.get("pw")["code"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this affect how you call the codes in the workchain of external plugin?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have updated the doc.

@mbercx
Copy link
Member

mbercx commented Nov 23, 2023

do you think we still need to add the number of k-pools here? Does the new QE version support setting the parallelization automatically?

New QE versions (I forgot from which one, but def v7.2) indeed support setting k-pools automatically.

I think the k-pools should be somehow controlled, for example if you want to run the Berry phase code in pw.x you cant use pools.

Hmm, I'm not familiar with this use case, but give running on of these calculations a try without specifying k-points and see if the header mentions k-point parallelization. I hope the automated k-pool configuration won't be used in this case then. ^^

What it could be nice is to have a dummy run. , or a method that given the structure and the input parameters, it can tell you how many k-point you will have, like that we can provide the user a suggestion for the pools to use

I haven't tried this, but figuring out the number of k-points can be a bit tricky since you need to determine the symmetries exactly like Quantum ESPRESSO does. A dummy run is useful in some cases (e.g. we do this to figure out the number of q-points in the PhParallelizeQpointsWorkChain), but I'm not sure it's worth it for pw.x k-points because typically you have so many it doesn't matter that much.

"""Widget for the selection of compute resources.
max_num_nodes: maximum number of nodes allowed.
"""
self.num_nodes = ipw.BoundedIntText(
Copy link
Member

@mikibonacci mikibonacci Nov 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@superstar54 @AndresOrtegaGuerrero
In principle, same code for different step may require different resources.
In my opinion this can be achieved by defining multiple QEAppComputationalResourcesWidget, e.g. one for relax, one for bands (where we may want to add the -nband...).

So we may have PwRelaxWidget, PwBandsWidget... do you think it is the right way to do it? In this way multiple codes which share the same executable (same aiida node) may be used differently by different plugins. I am thinking about for example a case in which we start from a primitive cell, then we compute something for a big supercell. Using the same resources for the simulations would be not efficient

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@superstar54 @mikibonacci , there could be many examples , like in the vibroscopy plugin, the PhononWorkChain can use npools, but the Dielectric cannot , or for example in the case of the one i am developing for spin orbit coupling, i might need more resources than the ones use for PwRelax (with out SOC) , though it might create a lot of instances in the step 3

@unkcpz unkcpz changed the title Feature/new computational resource widget Enhanced computational resource widget with resource setup Nov 29, 2023
Copy link
Member

@unkcpz unkcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't override the methods of a class to create a new class, but simply extend it with more methods, please don't use class inherit.
If I understand correctly, there is no modification of ComputationalResourceWidget, but add two more widgets on top of it. So why not using composite widget design? Am I miss something?

@@ -17,7 +17,8 @@
import traitlets
from aiida.orm import CalcJobNode
from aiida.orm import Data as orm_Data
from aiida.orm import load_node
from aiida.orm import load_code, load_node
from aiidalab_widgets_base import ComputationalResourcesWidget as AiiDACodeWidget
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name AiiDACodeWidget easily get confused with AiiDACodeSetup in AWB: https://github.com/aiidalab/aiidalab-widgets-base/blob/aba0bdb3dc7a7e1af268145a0fd643194e681523/aiidalab_widgets_base/computational_resources.py#L1086 which is for detailed code setup specificly.

I didn't see the point of renaming the class.

self.num_cpus,
)

@traitlets.observe("value")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this value comes from ComputationalResourceWidget but the interface is not clearly exposed.
Instead of inherit from ComputationalResourceWidget, this case fit the composite object quite well. Please consider using ComputationalResourceWidget as sub-widget in this widget you defined.

@AndresOrtegaGuerrero
Copy link
Member

do you think we still need to add the number of k-pools here? Does the new QE version support setting the parallelization automatically?

New QE versions (I forgot from which one, but def v7.2) indeed support setting k-pools automatically.

I think the k-pools should be somehow controlled, for example if you want to run the Berry phase code in pw.x you cant use pools.

Hmm, I'm not familiar with this use case, but give running on of these calculations a try without specifying k-points and see if the header mentions k-point parallelization. I hope the automated k-pool configuration won't be used in this case then. ^^

What it could be nice is to have a dummy run. , or a method that given the structure and the input parameters, it can tell you how many k-point you will have, like that we can provide the user a suggestion for the pools to use

I haven't tried this, but figuring out the number of k-points can be a bit tricky since you need to determine the symmetries exactly like Quantum ESPRESSO does. A dummy run is useful in some cases (e.g. we do this to figure out the number of q-points in the PhParallelizeQpointsWorkChain), but I'm not sure it's worth it for pw.x k-points because typically you have so many it doesn't matter that much.

I think the number of k-points ("having so many") can be debatable when using a system with a big unit cell

@mbercx
Copy link
Member

mbercx commented Dec 13, 2023

I think the number of k-points ("having so many") can be debatable when using a system with a big unit cell

Sure, that's true. But anyways QE should now be able to figure out how to parallellize over the k-points itself, so adding a dummy run to any of the aiida-quantumespresso work chains is currently not on the agenda.

@AndresOrtegaGuerrero
Copy link
Member

I think the number of k-points ("having so many") can be debatable when using a system with a big unit cell

Sure, that's true. But anyways QE should now be able to figure out how to parallellize over the k-points itself, so adding a dummy run to any of the aiida-quantumespresso work chains is currently not on the agenda.

Even for the pools? QE can now do this?

@mbercx
Copy link
Member

mbercx commented Dec 13, 2023

Even for the pools? QE can now do this?

Yup ^^. Give it a try with QE v7.2

@superstar54
Copy link
Member Author

Hi @AndresOrtegaGuerrero , I added the support for --ntasks-per-node and --cpus-per-task, please try and review the PR. Here is a demo:

qeapp-computational-resource.mp4

@superstar54 superstar54 force-pushed the feature/new_computational_resource_widget branch from 4904813 to a387712 Compare April 23, 2024 09:12
@superstar54 superstar54 force-pushed the feature/new_computational_resource_widget branch from a387712 to 4a0cb83 Compare April 23, 2024 09:42
self.npool.value = 1


class PWscfWidget(QEAppComputationalResourcesWidget):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question Xing, if i have an external plugin, can i set an independent pw.x for it ? or the pw.x is for all the plugins?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one pw.x for all the plugins. We can add plugin code setting panel in the future.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But lets assume the external plugin has,
my_pw_code =QEAppComputationalResourcesWidget(
description="pw.x new",
default_calc_job_plugin="quantumespresso.pw",
)
Will this give an independent pw.x code for the use of that external plugin ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes!

@superstar54
Copy link
Member Author

I only have a small request on the naming convention and probably also add a small test for the widget that when the checkbox ticked you test the k-point override is shown.

Hi @unkcpz , thanks for the review. I changed the name as you suggested and added a test for the new widget.

@superstar54 superstar54 requested a review from unkcpz April 23, 2024 13:20
@superstar54 superstar54 added this to the v2024.4.0 milestone Apr 24, 2024
Copy link
Member

@AndresOrtegaGuerrero AndresOrtegaGuerrero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! , just be sure to communicate to the external plugins, so we make the change for the release . Great work !

@superstar54 superstar54 dismissed unkcpz’s stale review April 24, 2024 18:32

updated as suggested by the reviewer. And the PR is approved by other.

@superstar54 superstar54 merged commit 99c059d into main Apr 24, 2024
18 checks passed
@superstar54 superstar54 deleted the feature/new_computational_resource_widget branch April 24, 2024 18:36
Copy link
Member

@unkcpz unkcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @superstar54. I think I am late for the party.

Just add two small comments.

Comment on lines +687 to +715
@property
def parameters(self):
return self.get_parameters()

def get_parameters(self):
"""Return the parameters."""
parameters = {
"code": self.code_selection.value,
"nodes": self.num_nodes.value,
"cpus": self.num_cpus.value,
}
parameters.update(self.resource_detail.parameters)
return parameters

@parameters.setter
def parameters(self, parameters):
self.set_parameters(parameters)

def set_parameters(self, parameters):
"""Set the parameters."""
self.code_selection.value = parameters["code"]
if "nodes" in parameters:
self.num_nodes.value = parameters["nodes"]
if "cpus" in parameters:
self.num_cpus.value = parameters["cpus"]
if "ntasks_per_node" in parameters:
self.resource_detail.ntasks_per_node.value = parameters["ntasks_per_node"]
if "cpus_per_task" in parameters:
self.resource_detail.cpus_per_task.value = parameters["cpus_per_task"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You either use property getter and setter or use get_ and set_, no?
You can check the list of https://realpython.com/python-getter-setter/#deciding-whether-to-use-getters-and-setters-or-properties-in-python and see which one you gonna to use.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comment. It's a common way to use the parameters.setter interface, and then in another function set_parameters implement the real thing. In this pattern, the derived classes can easily modify or extend the setting logic by overriding a dedicated method, without needing to alter the property's interface. As you can see in the PwCodeResourceSetupWidget class.

Comment on lines +36 to +56
builder.scf.pw.metadata.options.resources = {
"num_machines": codes.get("pw")["nodes"],
"num_mpiprocs_per_machine": codes.get("pw")["ntasks_per_node"],
"num_cores_per_mpiproc": codes.get("pw")["cpus_per_task"],
}
builder.scf.pw.parallelization = orm.Dict(dict=codes["pw"]["parallelization"])
builder.nscf.pw.metadata.options.resources = {
"num_machines": codes.get("pw")["nodes"],
"num_mpiprocs_per_machine": codes.get("pw")["ntasks_per_node"],
"num_cores_per_mpiproc": codes.get("pw")["cpus_per_task"],
}
builder.nscf.pw.parallelization = orm.Dict(dict=codes["pw"]["parallelization"])
builder.dos.metadata.options.resources = {
"num_machines": codes.get("dos")["nodes"],
"num_mpiprocs_per_machine": codes.get("dos")["ntasks_per_node"],
"num_cores_per_mpiproc": codes.get("dos")["cpus_per_task"],
}
builder.projwfc.metadata.options.resources = {
"num_machines": codes.get("projwfc")["nodes"],
"num_mpiprocs_per_machine": codes.get("projwfc")["ntasks_per_node"],
"num_cores_per_mpiproc": codes.get("projwfc")["cpus_per_task"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part seems a bit verbose and codes are repeating, consider to simplify it a bit?

  • builder.scf.pw.metadata.options.resources five dots attributes call looks frightening and I think it makes plugin developer easy to make mistakes.
  • For override the resources, maybe there already functions to do this in aiida-quantumespresso? @mbercx Maybe you know it?

Somehow my previous comment on this was eaten by GitHub. I do it again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Indeed, one can use the options to set resources in the get_builder_from_protocol. and I agree that this should be the recommended way. I opened an issue on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants