Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new grompp/mdrun behaviour conflicts with .cpt file? #3

Closed
pgbarletta opened this issue Feb 2, 2023 · 15 comments
Closed

new grompp/mdrun behaviour conflicts with .cpt file? #3

pgbarletta opened this issue Feb 2, 2023 · 15 comments
Assignees
Labels

Comments

@pgbarletta
Copy link

Hi,
I've recently updated to 3.9 and it seems that now everything is run inside a temporary sandbox dir while previously only the .zip topology file was decompressed inside a temporary folder.

So now, if I want to restart a run from a checkpoint file I get this error:

Program:     gmx mdrun, version 2022
Source file: src/gromacs/mdrunutility/handlerestart.cpp (line 192)
Function:    void gmx::{anonymous}::throwBecauseOfMissingOutputFiles(const char*, gmx::ArrayRef<const gmx_file_position_t>, int, const t_filenm*, size_t)

Inconsistency in user input:
Some output files listed in the checkpoint file
/home/pbarletta/labo/22/locuaz/e458d803-2586-4812-adfa-810796b54b4b/npt_nb.cpt
are not present or not named as the output files by the current program:)
Expected output files that are present:

Expected output files that are not present or named differently:

/home/pbarletta/labo/22/locuaz/faf79796-f8fe-4f8b-8647-2bd6f6d90226/npt_nb.log

/home/pbarletta/labo/22/locuaz/faf79796-f8fe-4f8b-8647-2bd6f6d90226/npt_nb.xtc

/home/pbarletta/labo/22/locuaz/faf79796-f8fe-4f8b-8647-2bd6f6d90226/npt_nb.edr

That is, since the temporary dir (faf79796-f8fe-4f8b-8647-2bd6f6d90226) always changes, I can't properly restart an MD.

Am I getting this right or is there something else going on? Because this would be quite an issue for most workflows, I assume.

@PauAndrio
Copy link
Contributor

Hi, @pgbarletta,

My apologies for the late reply, yesterday we meet with GROMACS developers and confirmed that at the moment there is not a way to modify CPT files.

Our solution is to chdir to the sandbox and use just file names and not absolute or relative paths.
I will implement this as a common feature/property in the biobb_common in the next week.

Regards,
Pau Andrio

@pgbarletta
Copy link
Author

Hi, thanks for the heads up.

So, just to be clear, say the scheduler kills the process, the MD is abruptly interrupted and the files are left in this sandbox dir, how would I go about restarting it?

Another issue I have with this sandbox dir is that the user may check how the MD is going but is not going to find the log file or the current trajectory file where it expects it to be, but on a cryptic directory, which ends up --if I'm correct--, on the directory from which my protocol was called. This means the library will create dirs and files where it chooses and I won't have any control over it.

I'm sure you have good reasons for this new behaviour, but I'm not sure these have been properly weighted against the cons. Is the decision to change over to this behaviour final?

@PauAndrio
Copy link
Contributor

Hi, @pgbarletta,

If you check the last two commits to biobb_common you'll find:

  • 6721265: Feature: chdir to sandbox property and use single file names instead of paths in cmd.
  • a59d4b1: Feature: disable_sandbox for local executions.

I hope the second one "disabling the sandbox" would fulfill your needs. Of course, at the moment these new features are not fully tested.
If you decide to use the development version of biobb_common and find any bug/issue please let us know.

Regards,
Pau Andrio

@pgbarletta
Copy link
Author

Thanks! I'll try it out as soon as I can.

@pgbarletta
Copy link
Author

I think this variable should be named chdir_sandbox.

Also, Path.samefile() assumes the input Path object exists, so this line errors out always.

I rewrote copy_to_host() to keep this from happening:

    def copy_to_host(self):
        for file_ref, file_path in self.stage_io_dict["out"].items():
            if file_path:
                sandbox_file_path = str(Path(self.stage_io_dict["unique_dir"]).joinpath(Path(file_path).name))
                if Path(sandbox_file_path).exists():
                    if Path(self.io_dict["out"][file_ref]).exists():
                        if not Path(sandbox_file_path).samefile(Path(self.io_dict["out"][file_ref])):
                            shutil.copy2(sandbox_file_path, self.io_dict["out"][file_ref])
                    else:
                        shutil.copy2(sandbox_file_path, self.io_dict["out"][file_ref])

but I didn't make a PR since it's a bit scrappy.
For now, it seems to work.

@pgbarletta
Copy link
Author

Sorry, I don't get it, it's still creating a sandbox dir and running the MD in there instead of running everything in the folder I point to. So now, I don't get an error from the cpi, but a new temporary folder each time I restart my protocol and a new MD is started instead of restarting from a previous step.

@PauAndrio
Copy link
Contributor

I think this variable should be named chdir_sandbox.

Also, Path.samefile() assumes the input Path object exists, so this line errors out always.

I rewrote copy_to_host() to keep this from happening:

    def copy_to_host(self):
        for file_ref, file_path in self.stage_io_dict["out"].items():
            if file_path:
                sandbox_file_path = str(Path(self.stage_io_dict["unique_dir"]).joinpath(Path(file_path).name))
                if Path(sandbox_file_path).exists():
                    if Path(self.io_dict["out"][file_ref]).exists():
                        if not Path(sandbox_file_path).samefile(Path(self.io_dict["out"][file_ref])):
                            shutil.copy2(sandbox_file_path, self.io_dict["out"][file_ref])
                    else:
                        shutil.copy2(sandbox_file_path, self.io_dict["out"][file_ref])

but I didn't make a PR since it's a bit scrappy. For now, it seems to work.

Thank you @pgbarletta all your suggestions are correct and already committed. From my point of view, your rewriting of the copy_to_host function is as good as it can be, just added a few comments to increase code readability.

Pau

@PauAndrio PauAndrio reopened this Feb 21, 2023
@PauAndrio
Copy link
Contributor

Sorry, I don't get it, it's still creating a sandbox dir and running the MD in there instead of running everything in the folder I point to. So now, I don't get an error from the cpi, but a new temporary folder each time I restart my protocol and a new MD is started instead of restarting from a previous step.

It shouln't create the sandbox folder if the disable_sandbox property is set to True. Please let me check, I'm writing a bunch of new tests.

Pau

@pgbarletta
Copy link
Author

oh, I didn't set that option. I'll try again as soon as I can.

@pgbarletta
Copy link
Author

So, disable_sandbox currently ends up deleting everything in the local folder.

Over here it gets the current working dir (by the way, this is the dir from where the script is launched, and not the directory where the input files are located, which was probably what you intended).
The problem is that the behaviour of deleting the temporary files it's still present here so self.stage_io_dict["unique_dir"] gets deleted.

As a result, the current version of mdrun wipes away the whole dir from where it was launched. Not the best user experience.

@github-actions
Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity. If there is no more activity in the following 14 days it will be automatically closed.

@github-actions
Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity. If there is no more activity in the following 14 days it will be automatically closed.

@github-actions github-actions bot added the stale label Jun 17, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jul 1, 2023

This issue was closed because it has been inactive for 14 days since being marked as stale.

@PauAndrio
Copy link
Contributor

This issue will need major changes in biobb_common opening an issue there pointing here.
biobb_common issue 14

@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2023

This issue is stale because it has been open for 30 days with no activity. If there is no more activity in the following 14 days it will be automatically closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants