Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception not raise properly when number of codes>2 in calc_info #4990

Merged
merged 2 commits into from Aug 10, 2021

Conversation

unkcpz
Copy link
Member

@unkcpz unkcpz commented Jun 18, 2021

If calc_info.codes_run_mode is not set when using more than one code in a calc_info, the PluginInternalError is not properly raised. calc_info.codes_run_mode will be a None, and the exception never raised here.

@codecov
Copy link

codecov bot commented Jun 18, 2021

Codecov Report

Merging #4990 (247d312) into develop (9285245) will increase coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #4990      +/-   ##
===========================================
+ Coverage    80.22%   80.24%   +0.03%     
===========================================
  Files          515      515              
  Lines        36757    36753       -4     
===========================================
+ Hits         29484    29490       +6     
+ Misses        7273     7263      -10     
Flag Coverage Δ
django 74.73% <100.00%> (+0.03%) ⬆️
sqlalchemy 73.64% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
aiida/common/datastructures.py 96.88% <ø> (ø)
aiida/engine/processes/calcjobs/calcjob.py 88.68% <100.00%> (+1.48%) ⬆️
aiida/transports/plugins/local.py 81.66% <0.00%> (+0.26%) ⬆️
aiida/schedulers/scheduler.py 84.14% <0.00%> (+1.38%) ⬆️
aiida/transports/util.py 65.63% <0.00%> (+3.13%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9285245...247d312. Read the comment docs.

Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @unkcpz . Shouldn't we just define a default maybe instead of raising? Maybe setting SERIAL as a default makes sense and people can switch to PARALLEL if they need. Also we really should add a test for this. Should be straightforward enough, just mock the process and call presubmit

@unkcpz unkcpz force-pushed the code-run-mode branch 6 times, most recently from abcc02b to 8ec3846 Compare June 25, 2021 00:45
@unkcpz
Copy link
Member Author

unkcpz commented Jun 25, 2021

Hi @sphuber , I have added a test and also set the default for codeinfo.withmpi. Instead of directly mocking a process, I create and run a MultiCodesCalcJob, and then check the content of job script generated.

Copy link
Member

@ramirezfranciscof ramirezfranciscof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @unkcpz thanks for the contribution! I have some minor comments on it, could you take a look? Let me know if there is anything that is not clear or you don't agree with.

docs/source/howto/plugin_codes.rst Outdated Show resolved Hide resolved
.gitignore Outdated
@@ -17,6 +17,7 @@
.eggs
.vscode
.tox
submit_test/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm this seems to be a specific folder you created in your repo but I don't think it makes sense to add this as a generic thing to the gitignore, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but no. Since this folder will be generated in current directory when you run :

pytest tests/engine/test_calc_job.py

The folder was create when I run test locally. It's not a big deal but just tedious to delete this folder when making commits.
I believe the better way would be to delete the folder using a fixture. What do you think? I am not sure I can quickly figure out how to do that. So I will remove this line from .gitignore at the moment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I see your point. Yes, I think your solution sounds reasonable. I'm surprised that this hasn't come up before; @sphuber is this the only test which is executing a dry_run and comparing the resulting script?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other tests that run a dry_run, and so create this folder, but they don't compare the input script.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see, but the question is what do you do with all these newly created test folder/files so that they are not shown on git? I actually don't remember these appearing for me after running tests, are they deleted automatically somehow?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @sphuber , sorry for the insistence, I just want to make sure there is no already standardized way in which we are dealing with this issue before I propose my own.

To recap the problem: the test needs to run a dry_run; this generates a folder with files; this files are left polluting the environment (in this case, the modification aims at removing it from files detected by git). What do other tests that use dry_run do (I haven't noticed any submit_test folders appearing after running the testsuite)? They delete the files manually? Is there a fixture that takes care of deleting this folder at the end of the tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two test files that run CalcJob dry runs: tests/engine/test_launch.py and tests/engine/test_calc_job.py. The former cleans up after itself, which is why you never saw the directories before. The TestLaunchersDryRun class deletes the submit_test folder in its tearDownClass method:

def tearDown(self):
import os
import shutil
from aiida.common.folders import CALC_JOB_DRY_RUN_BASE_PATH
super().tearDown()
self.assertIsNone(Process.current())
# Make sure to clean the test directory that will be generated by the dry-run
filepath = os.path.join(os.getcwd(), CALC_JOB_DRY_RUN_BASE_PATH)
try:
shutil.rmtree(filepath)
except Exception: # pylint: disable=broad-except
pass

For now, @unkcpz can maybe use this in the test_calc_job.py file as well. At some point we can find a cleaner fixture to do this that makes it easy to reuse, but since these test files are still using old-style, maybe we leave that for another time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sphuber !

Is it ok then to use some of these at templates to add a cleanup part to your test and remove this from .gitignore @unkcpz ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! I see your point! I can do add the fixture here and remove the line from .gitignore.

.pre-commit-config.yaml Outdated Show resolved Hide resolved
aiida/common/datastructures.py Outdated Show resolved Hide resolved
aiida/engine/processes/calcjobs/calcjob.py Outdated Show resolved Hide resolved
tests/engine/test_calc_job.py Show resolved Hide resolved
Copy link
Member Author

@unkcpz unkcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ramirezfranciscof, thanks for the review. I accept most of your advice and leave some notes for your comment. Please have another look at your convenience. Cheers!

.gitignore Outdated
@@ -17,6 +17,7 @@
.eggs
.vscode
.tox
submit_test/
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but no. Since this folder will be generated in current directory when you run :

pytest tests/engine/test_calc_job.py

The folder was create when I run test locally. It's not a big deal but just tedious to delete this folder when making commits.
I believe the better way would be to delete the folder using a fixture. What do you think? I am not sure I can quickly figure out how to do that. So I will remove this line from .gitignore at the moment.

docs/source/howto/plugin_codes.rst Outdated Show resolved Hide resolved
.pre-commit-config.yaml Outdated Show resolved Hide resolved
aiida/common/datastructures.py Outdated Show resolved Hide resolved
aiida/engine/processes/calcjobs/calcjob.py Outdated Show resolved Hide resolved
aiida/engine/processes/calcjobs/calcjob.py Outdated Show resolved Hide resolved
tests/engine/test_calc_job.py Show resolved Hide resolved
@unkcpz
Copy link
Member Author

unkcpz commented Aug 2, 2021

Thanks @ramirezfranciscof ! I open the issue #5034 for documentation of the withmpi of codeinfo. All other advice is accepted. I'll rebase and make a clear commit once it is approved.

Copy link
Member

@ramirezfranciscof ramirezfranciscof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @unkcpz its looking great! Thanks for the work, there are just two final details. One is the .gitignore modification I'm still trying to figure out and the other is a modif on aiida/schedulers/scheduler.py that I don't quite understand (maybe it was done for testing and left unintentionally?). After this is dealt with I would consider this ready to merge.

Comment on lines 1 to 8
#!/bin/bash
exec > _scheduler-stdout.txt
exec 2> _scheduler-stderr.txt


'/bin/bash'

'mpirun' '-np' '1' '/bin/bash'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm this is strange no? I expected content = content.rstrip() + '\n' to add a final newline in the file but apparently it didn't, it just ended the current line. More surprisingly so: the linter didn't complain that there was no newline end?

I mean, if it is working we can leave it like this, it is just...unexpected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is expected though. The rstrip strips all whitespace, so also the newline character, which you then re-append. However, having a terminating newline character, does not show up visually as an empty line. It just means the line is terminated and the carriage return goes to the start of the next line. If you want to see a visual empty line, you would need another new line character. Note that Github visualizes this difference (in diffs where this changes) with a symbol at the end of the file, saying "no newline character" or something to that effect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, for the rstrip() I did some tests in the shell and printing so I didn't know exactly how it would behave with write, there my "I expected..." meant "I intended that...". The unexpected part is more about the linter not complaining.

@@ -216,7 +216,7 @@ def _get_run_line(self, codes_info, codes_run_mode):

output_string = f'{command_to_exec} {stdin_str} {stdout_str} {stderr_str}'

list_of_runlines.append(output_string)
list_of_runlines.append(output_string.rstrip())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @unkcpz why did you change this here? The rstrip() on the test when you are doing the file_regression should suffice for the test to work, no? I wouldn't change this in the actual code just to cater for a test or comply with some linter criteria in a script that is produced during usage and not part of the codebase. I mean, this is changing some intermediate file produce by the code that could be checked or parsed by some users, and its modification may create some problems (however unlikely it might be in this case).

I mean, unless you have a strong reason to propose we change to having our scripts be trimmed at the end, I would say to revert this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended to pass the linter. As you can see from the above line, that the output_string will end will spaces if stdin_str, stdout_str or stderr_str is not set. In that case, the command line of submit script will end with spaces wiped out by the linter and fails the pre-commit.

I mean, this is changing some intermediate file produce by the code that could be checked or parsed by some users, and its modification may create some problems (however unlikely it might be in this case

I agree, but for this part of the code, I see no downside to strip the line. In other words, I would much more like the bash script ironed without extra spaces in each line ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended to pass the linter. As you can see from the above line, that the output_string will end will spaces if stdin_str, stdout_str or stderr_str is not set. In that case, the command line of submit script will end with spaces wiped out by the linter and fails the pre-commit.

Right, but I mean, you are not committing any script that is being generated by this, right? The reference file is being created by file_regression.check(content, extension='.sh') and you are already using rstrip to clean up content before putting it there. That you originally take content from the printed script does not matter if you clean it between reading it and writing to reference with the file_regression. Am I missing something here? Which script directly generated by this is being rejected by the linter?

I agree, but for this part of the code, I see no downside to strip the line. In other words, I would much more like the bash script ironed without extra spaces in each line ;-)

I also agree that it is very unlikely that this will break someone's code, but it is still possible. If we have had a team discussion on how to do a cleaner script and agreed in some guidelines that we were going to respect going forward, I would totally bite the bullet on the chance of causing some friction. But we are not even sure that we won't have to change this further (maybe to prevent contiguous empty lines due to non existing append/prepend texts) or even go back on it (maybe it turns out there are good reasons to have one final empty line at the end of the file). So yeah, the risk may be minimal but on the other hand there is no objective gain from this change (other than you and me liking it more aesthetically, but that is not very objective =P).

Copy link
Member Author

@unkcpz unkcpz Aug 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not explain it more clear.

I have to add this to pass the linter. Because the content.rstrip() in test function, only removes the final lines of the submit script content. output_string is the executable command line not necessary the final line. For clarity, please see the following content generated by in test and after linter pre-commit clean up (I use '␣' to represent space character):

The script generated:

#!/bin/bash
exec > _scheduler-stdout.txt
exec 2> _scheduler-stderr.txt


'/bin/bash'␣␣␣

'mpirun' '-np' '1' '/bin/bash'

After pre-commit:

#!/bin/bash
exec > _scheduler-stdout.txt
exec 2> _scheduler-stderr.txt


'/bin/bash'

'mpirun' '-np' '1' '/bin/bash'

If we agree not to add exclude files in the linter configuration, I can not figure out a simple way to get rid of this and pass the pre-commit check (Can split() the content string with rstrip() of every line, though it seems complex for this case. But I guess that is the middle point where we can compromise).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely right @unkcpz the formatter strips all whitespace, also terminating spaces in lines in the middle of the file. Apologies for not having thought of this, I was focusing on the terminating new line characters.

OK, so here is my thought process. We have added the whitespace trimming pre-commit action such that trailing whitespace is automatically cleaned up in source code, as it can make things look messy and more difficult to read than necessary. When I added it, I did not specify what files it should run for, and so let it run for everything, even comparison files that are used in the unit tests, even though those are not the primary target. We don't need to read those often, so trailing newspace is not a real problem. When you consider the motivation for the whitespace trimming, it seems clear that the correct solution for this problem is to exclude the test comparison file from the pre-commit. So I now agree that @unkcpz original solution of updating the exclude rules in the .pre-commit-config.yaml file is actually the correct one.

I propose therefore that we revert the changes we requested (really sorry for the back and forth @unkcpz ), removing the manual explicit whitespace stripping in the code and the test and simply update the include rules in the config. Note that instead of excluding explicitly this file, maybe it would be better to simply include all .py files, because that is in the end what we are really interested in. We want to clean the source code files, not any other arbitrary text files.

What we could do is use the following regex I think

        exclude: >- >
            (?x)^(
                tests/.*(?<!\.py)$
            )$

i.e. exclude all files in the tests folder that do not finish with the .py extension. We still want to clean the test source files of course. Would you agree with this suggestion @ramirezfranciscof ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@unkcpz Ahhhh I see now! Thanks for the explanation, haha, indeed I can see the excess with the newlines but it is more difficult to notice the trailing spaces 😅. Nice idea using the strange underscore symbol to illustrate it.

@sphuber mmm I'm trying to think if this could backfire. I mean, the test section does have other kinds of file (the test/static folders have some .cif, .UPF, .aiida exports, .json) but perhaps we don't want to be lintering those anyways? I also had the idea that there were some notebooks being used to do some tests, but I couldn't find anything there now. I'm wondering if at some point it won't be convenient to use some bash scripts as aux to some tests and we may want to linter those too. It is also worth mentioning that the .sh extension was set explicitly by file_regression.check(content, extension='.sh'), so we could also decide on a specific extension for reference files and exclude those (.pyreg? .pytest_regression_uniquely_long_aiida_extension?). I mean, I agree it would be nice to set some configuration to deal now and for all with all pytest regression files, I'm just not sure just applying the linter to .py is the clear way to do this.

There is also one last possible chance that we might be able to deal with this in a simple way inside the test (which is, actually, more self-documenting that the initial one I proposed...):

    content = '# pylint: skip-file\n' + content

In any case, as the proposed alternative would be a modification internal to the tests so we are more free to set it back if we change our minds, I'm ok with you going the way you prefer. It still pains me to have to add the exception to the .pre-commit-config.yaml, but since you already tried a suggestion I made that underestimated the problem I won't insist on this if you think the pylint config is the best way to go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also one last possible chance that we might be able to deal with this in a simple way inside the test (which is, actually, more self-documenting that the initial one I proposed...):

    content = '# pylint: skip-file\n' + content

It is not pylint that does the trimming though, so that won't work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm I see. I can't find anywhere how to signal inside the file that no hooks should be applied to it, so perhaps this is not possible in a general way. Weird. Anyways, after two unsuccessful suggestions I concede, feel free to go ahead with @sphuber suggested solution then @unkcpz =)

@unkcpz
Copy link
Member Author

unkcpz commented Aug 4, 2021

Hi @ramirezfranciscof , rather than delete the folder, I add the following dry_run_in_tmp fixture to change the working dir to /tmp and running the test their.

@pytest.fixture(scope='function')
def dry_run_in_tmp(request):
    """change to the tmp directory to do the dry run, then change back to the calling directory to avoid side-effects"""
    os.chdir('/tmp')
    yield
    os.chdir(request.config.invocation_dir)

@sphuber
Copy link
Contributor

sphuber commented Aug 4, 2021

Hi @ramirezfranciscof , rather than delete the folder, I add the following dry_run_in_tmp fixture to change the working dir to /tmp and running the test their.

@pytest.fixture(scope='function')
def dry_run_in_tmp(request):
    """change to the tmp directory to do the dry run, then change back to the calling directory to avoid side-effects"""
    os.chdir('/tmp')
    yield
    os.chdir(request.config.invocation_dir)

This looks fine, but I wouldn't hardcode the '/tmp' directory. You can use the tmpdir fixture to get a temporary directory that is cross-platform compatible and will automatically be cleaned up by pytest.

@unkcpz
Copy link
Member Author

unkcpz commented Aug 5, 2021

You can use the tmpdir fixture to get a temporary directory that is cross-platform compatible and will automatically be cleaned up by pytest.

@sphuber thanks! Changed.

Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @unkcpz and especially for your patience. I realize we have been going back and forth quite a bit on various things. I have now fully reviewed the code (which I hadn't done yet, just replied to question and comments) and I have two comments remaining. Once those are addressed this is good to be merged for me.

Comment on lines 596 to 597
this_withmpi = self.node.get_option('withmpi') # set code mpi, default set to as calcjob one
if code_info.withmpi:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, there are two ways to set withmpi: on the process itself through the metadata.options.withmpi input and through the CodeInfo that is created by the plugin itself, where the latter should be leading if both are defined.

If that analysis is correct, than I think this code is not correct. By checking just if code_info.withmpi the actual value will not be overridden even if it is set to False. However, this seems like a good use-case from a plugin perspective. If the plugin needs to run two codes, for which one always needs to be run without mpi, i.e. code_info.withmpi == False, then that should override the value taken from self.node.get_option('withmpi'). That is to say, False is also an actual value that should not be ignored. So should the conditional not be if code_info.withmpi is not None?

To make this code a lot clearer, I would propose writing it as follows, with comment:

Suggested change
this_withmpi = self.node.get_option('withmpi') # set code mpi, default set to as calcjob one
if code_info.withmpi:
# To determine whether this code should be run with MPI enabled, we get the value that was set in the inputs
# of the entire process, which can then be overwritten by the value from the `CodeInfo`. This allows plugins
# to force certain codes to run without MPI, even if the user wants to run all codes with MPI whenever
# possible. This use case is typically useful for `CalcJob`s that consist of multiple codes where one or
# multiple codes always have to be executed without MPI.
withmpi = self.node.get_option('withmpi')
# Override the value of `withmpi` with that of the `CodeInfo` if and only if it is set
if code_info.withmpi is not None:
withmpi = code_info.withmpi

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm, I totally missed this on my pass, my bad 😕.

Follow up question: we want code_info.withmpi to always override the calcjob setting? Or if the user has explicitly set withmpi to be False then all codes executed should actually respect that? (in which case setting code_info.withmpi = True or leaving code_info.withmpi = None would have the same effect) Or the idea is that if a plugin sets code_info.withmpi = True it means that the code can only be ran with mpi?

Maybe we can make a table of expected results and intent and check these in a test?

[Calcjob] True [Calcjob] False
[Code] True (the plugin/code can only be run with mpi?) Run with MPI Incompatible? Raise?
[Code] False (this code does not use parallelism at all) Run without MPI Run without MPI
[Code] None (this code can run with or without MPI) Run with MPI Run without MPI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or if the user has explicitly set withmpi to be False then all codes executed should actually respect that?

In the example that I sketched, this would be problematic. Imagine a CalcJob that executes two codes: a small pre-processing script that has to be executed serially or it will crash, and then a normal executable (like pw.x) that can be executed serially or with MPI depending on how it is compiled. In this case, the metadata.options.withmpi is the tool for the user to toggle the behavior for the latter, but it should not override the setting for the pre-processing script because it will fail.

I am not sure if this solution covers all use cases or whether there are exceptions where the user definitely needs to be able to override. However, if the plugin explicitly sets CodeInfo.withmpi than there must be a good reason. If it can be run with either setting, maybe the plugin should not set one value either way. Or we should change metadata.options.withmpi to be able to take a list, but that may become even more problematic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ramirezfranciscof @sphuber thanks for making this more clear. This is an important catch! It is the right code that I intended to implement.

Only if plugin developer sets CodeInfo.withmpi explicitly the metadata.option.withmpi will not override it through running CalcJob. The ultimate behaviour will be the case shown in the table above. Indeed that does not include the scenario that the plugin user what to override what was set in CodeInfo. I have no strong opinion on whether we should take this into account.
But I want to mention that maybe we can consider this problem together with setting default calcjob when configuring code #4991. Plugin user can only set one default CalcJob plugin for the code and this is the code that is implicitly called with self.inputs.code in CalcJob. I would assume the plugin user only what to control the MPI behaviour of this particular code. And the other codes need to be explicitly passed as an InputPort in defining the CalcJob, which give plugin user an option to use the correct compiled code(MPI one or serial one) if the withmpi of this codeinfo is fixed by the plugin developer.
Therefore, I prefer to have a detailed comment as shown by @sphuber and add information in documentation to warn the plugin developer to carefully set this codeinfo.withmpi in their plugin when there is more than one code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the example that I sketched, this would be problematic. Imagine a CalcJob that executes two codes: a small pre-processing script that has to be executed serially or it will crash, and then a normal executable (like pw.x) that can be executed serially or with MPI depending on how it is compiled. In this case, the metadata.options.withmpi is the tool for the user to toggle the behavior for the latter, but it should not override the setting for the pre-processing script because it will fail.

Yes, so I think you are describing what should happen in the table I included when [Calcjob] True and [Code] False. Here it makes sense that the CodeInfo.withmpi overrides the metadata.options.withmpi because of the reasons you already explained: even if the user is indicating it has MPI available for the main code called by the calculation, other auxiliary calls may not benefit from or not even be able to run in parallel.

My question was more what should happen in the reciprocal case, when [Calcjob] False and [Code] True. Here the user is specifically saying to NOT use MPI at all, is it still justified to override this and anyways use MPI on the CodeInfo.withmpi = True code? Or does this suggest a more fundamental incompatibility and thus should raise?

(@unkcpz you mention you agreed with the behavior in the table, does this mean you agree with raising in this second case?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies @ramirezfranciscof, I gave it a glimpse and take for granted with my own mind.
I reconsider your advice to raise here and start to like it. Maybe it can be a raising warning rather than an exception? But I still assume the developer of the plugin should be careful and know what they do when setting codeinfo.withmpi.
Do you have a strong opinion on raising an exception or maybe a warning?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is typically on the versatility on the end user and clarity when we are overriding his intentions. So here I would say:

  • I would issue a warning if then we let metadata.option.withmpi override codeinfo.withmpi (we let the user know the plugin designer set this to only be run with MPI for some reason but we allow them to proceed as they see fit)
  • In the other case, i.e. if after the warning codeinfo.withmpi would override metadata.option.withmpi, then I would say it is preferable to raise. This setting is not being used for anything else as far as I know, if the user is setting it, it is specifically for this and we are overriding it. They wouldn't be able to do what they want anyways, so just raise saying "you are not allowed to set this".

Now as to which of those two I prefer, I'm not sure. I'm still a bit puzzled as to in what situation the plugin designer would set something to ONLY be run with MPI, it is a bit bizarre to me. I'll let you decide what is best for this (or @sphuber if he has another opinion).


NOTE: I am now re-reading the previous logic to check all the cases of the table. The previous behavior was that it would first check the individual codeinfo.withmpi and only if it was None and if it was a single code, then it would check metadata.option.withmpi. So if there were multiple codes being called, then the plugin designer had to set whether they were run with MPI or not, and the user had no say in this. So before there was actually no way for the plugin designer to leave how to run some of the codes to "the criteria" of the user.

This gets more bizarre the more we look into it...🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the bizarre things come from there is only one binary flag that can be set from metadata.option.withmpi by plugin user when actually a list of flags need for multiple codes case, which is hard to settle at this PR.

I would like to suggest that we stop here and leave all complicity to the documentation. For plugins that involve more than one code, if the user wants to get full control in MPI mode of how to run codes, they are suggested to split the calcjob into parts with each one running with one code only.

@@ -216,7 +216,7 @@ def _get_run_line(self, codes_info, codes_run_mode):

output_string = f'{command_to_exec} {stdin_str} {stdout_str} {stderr_str}'

list_of_runlines.append(output_string)
list_of_runlines.append(output_string.rstrip())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely right @unkcpz the formatter strips all whitespace, also terminating spaces in lines in the middle of the file. Apologies for not having thought of this, I was focusing on the terminating new line characters.

OK, so here is my thought process. We have added the whitespace trimming pre-commit action such that trailing whitespace is automatically cleaned up in source code, as it can make things look messy and more difficult to read than necessary. When I added it, I did not specify what files it should run for, and so let it run for everything, even comparison files that are used in the unit tests, even though those are not the primary target. We don't need to read those often, so trailing newspace is not a real problem. When you consider the motivation for the whitespace trimming, it seems clear that the correct solution for this problem is to exclude the test comparison file from the pre-commit. So I now agree that @unkcpz original solution of updating the exclude rules in the .pre-commit-config.yaml file is actually the correct one.

I propose therefore that we revert the changes we requested (really sorry for the back and forth @unkcpz ), removing the manual explicit whitespace stripping in the code and the test and simply update the include rules in the config. Note that instead of excluding explicitly this file, maybe it would be better to simply include all .py files, because that is in the end what we are really interested in. We want to clean the source code files, not any other arbitrary text files.

What we could do is use the following regex I think

        exclude: >- >
            (?x)^(
                tests/.*(?<!\.py)$
            )$

i.e. exclude all files in the tests folder that do not finish with the .py extension. We still want to clean the test source files of course. Would you agree with this suggestion @ramirezfranciscof ?

@unkcpz
Copy link
Member Author

unkcpz commented Aug 6, 2021

@sphuber no worries about back and forth, I was only trying to get rid of the pre-commit error in the previous commit. Now we know what causes the pre-commit failure.
There is an issue here, I can either exclude files in a specific hook or in top-level for all repos. There is no option to exclude files 'repo' range https://pre-commit.com/#pre-commit-configyaml---repos .

I think we need to add it in top-level, correct? Otherwise exclude files in two hooks seems nasty.

exclude: >-
    (?x)^(
        tests/.*(?<!\.py)$
    )$
repos:
-   repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v2.5.0
    hooks:
    -   id: double-quote-string-fixer
    -   id: end-of-file-fixer
    -   id: fix-encoding-pragma
    -   id: mixed-line-ending
    -   id: trailing-whitespace

@sphuber
Copy link
Contributor

sphuber commented Aug 6, 2021

@sphuber no worries about back and forth, I was only trying to get rid of the pre-commit error in the previous commit. Now we know what causes the pre-commit failure.
There is an issue here, I can either exclude files in a specific hook or in top-level for all repos. There is no option to exclude files 'repo' range https://pre-commit.com/#pre-commit-configyaml---repos .

I think we need to add it in top-level, correct? Otherwise exclude files in two hooks seems nasty.
There is an issue here, I can either exclude files in a specific hook or in top-level for all repos.

I am pretty sure you can apply the exclude for all pre-commit hooks and not apply them to all repos, like so:

repos:
-   repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v2.5.0
    hooks:
    -   id: double-quote-string-fixer
    -   id: end-of-file-fixer
    -   id: fix-encoding-pragma
    -   id: mixed-line-ending
    -   id: trailing-whitespace
        exclude: >-
            (?x)^(
                tests/.*(?<!\.py)$
            )$

@unkcpz
Copy link
Member Author

unkcpz commented Aug 6, 2021

Hmmm.... no, it is not true. The exclude key only take effect for the hook. With your configuration above, I got the following files modification:

(aiida-core-dev) ➜  aiida_core git:(code-run-mode) ✗ pre-commit run --files tests/engine/test_calc_job/test_multi_codes_run_parallel_True_.sh
Fix double quoted strings............................(no files to check)Skipped
Fix End of Files.........................................................Failed
- hook id: end-of-file-fixer
- exit code: 1
- files were modified by this hook

Fixing tests/engine/test_calc_job/test_multi_codes_run_parallel_True_.sh
...

Have to configure pre-commit like:

repos:
-   repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v2.5.0
    hooks:
    -   id: double-quote-string-fixer
    -   id: end-of-file-fixer
        exclude: >-
            (?x)^(
                tests/.*(?<!\.py)$
            )$
    -   id: fix-encoding-pragma
    -   id: mixed-line-ending
    -   id: trailing-whitespace
        exclude: >-
            (?x)^(
                tests/.*(?<!\.py)$
            )$

@sphuber
Copy link
Contributor

sphuber commented Aug 6, 2021

Have to configure pre-commit like:

Ah sure, we need it to apply to multiple hooks, but you can define the list once and use a reference like so

repos:
-   repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v2.5.0
    hooks:
    -   id: double-quote-string-fixer
    -   id: end-of-file-fixer
        exclude: &exclude_files >
            (?x)^(
                tests/.*(?<!\.py)$
            )$
    -   id: fix-encoding-pragma
    -   id: mixed-line-ending
    -   id: trailing-whitespace
        exclude: *exclude_files

@unkcpz
Copy link
Member Author

unkcpz commented Aug 6, 2021

thanks @sphuber ! That makes sense. However, the exclude_files is already be used in the following. What do you think I name it exclude_nonpy_tests_files for the case here?

@sphuber
Copy link
Contributor

sphuber commented Aug 6, 2021

thanks @sphuber ! That makes sense. However, the exclude_files is already be used in the following. What do you think I name it exclude_nonpy_tests_files for the case here?

Maybe just name it after the pre-commit step, i.e., exclude_pre_commit_hooks

Implementing `CalcJob` plugin with more than one code got incorrect
`codes_run_mode` option check and raise. And the `metadata.option.withmpi` was
not able to be read and set for the codes when more than one code
involved in the `CalcJob` plugin.

In this commit, the `codes_run_mode` has the default value set to `SERIAL` mode.
As for `withmpi` option, we get the value that was set in the inputs of the entire
process (by `metadata.option.withmpi`), which can then be overwritten by
the value from the `CodeInfo`. This allows plugins to force certain codes
to run without MPI, even if the user wants to run all codes with MPI whenever
possible.
@unkcpz
Copy link
Member Author

unkcpz commented Aug 10, 2021

Hi @ramirezfranciscof, I add a test to check the withmpi behaviours and rebase the commits. Maybe we can get it merged at it is now, and leave the discussion when the practical issue happened and report?

Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @unkcpz . I am approving this to be merged. I agree that any further complications are indeed due to mismatch in the process inputs and the CodeInfo definition. I think we should simply open an issue that aims to improve the documentation on CodeInfo which includes a discussion on the withmpi option.

@ramirezfranciscof
Copy link
Member

ramirezfranciscof commented Aug 10, 2021

Yes, I'm ok with merging, we have had substantial discussion as it is and I don't want to drag the PR much longer (I hope it wasn't too tedious for you =P).

However I do want to make a final comment that I'm not so sure this is just a matter of documentation: I feel like the current behavior may be limiting in the sense that it may be putting control on how to execute code on the wrong actor (i.e. if MPI is an option with which codes can or not be compiled, it is the user who knows how the code is compiled and I'm not sure the plugin designer can make the decision to force that at code-design time).

@sphuber sphuber merged commit af8a930 into aiidateam:develop Aug 10, 2021
@unkcpz unkcpz deleted the code-run-mode branch August 11, 2021 00:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants