new grompp/mdrun behaviour conflicts with .cpt file? #3

pgbarletta · 2023-02-02T16:45:04Z

Hi,
I've recently updated to 3.9 and it seems that now everything is run inside a temporary sandbox dir while previously only the .zip topology file was decompressed inside a temporary folder.

So now, if I want to restart a run from a checkpoint file I get this error:

Program:     gmx mdrun, version 2022
Source file: src/gromacs/mdrunutility/handlerestart.cpp (line 192)
Function:    void gmx::{anonymous}::throwBecauseOfMissingOutputFiles(const char*, gmx::ArrayRef<const gmx_file_position_t>, int, const t_filenm*, size_t)

Inconsistency in user input:
Some output files listed in the checkpoint file
/home/pbarletta/labo/22/locuaz/e458d803-2586-4812-adfa-810796b54b4b/npt_nb.cpt
are not present or not named as the output files by the current program:)
Expected output files that are present:

Expected output files that are not present or named differently:

/home/pbarletta/labo/22/locuaz/faf79796-f8fe-4f8b-8647-2bd6f6d90226/npt_nb.log

/home/pbarletta/labo/22/locuaz/faf79796-f8fe-4f8b-8647-2bd6f6d90226/npt_nb.xtc

/home/pbarletta/labo/22/locuaz/faf79796-f8fe-4f8b-8647-2bd6f6d90226/npt_nb.edr

That is, since the temporary dir (faf79796-f8fe-4f8b-8647-2bd6f6d90226) always changes, I can't properly restart an MD.

Am I getting this right or is there something else going on? Because this would be quite an issue for most workflows, I assume.

The text was updated successfully, but these errors were encountered:

PauAndrio · 2023-02-16T08:49:52Z

Hi, @pgbarletta,

My apologies for the late reply, yesterday we meet with GROMACS developers and confirmed that at the moment there is not a way to modify CPT files.

Our solution is to chdir to the sandbox and use just file names and not absolute or relative paths.
I will implement this as a common feature/property in the biobb_common in the next week.

Regards,
Pau Andrio

pgbarletta · 2023-02-16T09:28:53Z

Hi, thanks for the heads up.

So, just to be clear, say the scheduler kills the process, the MD is abruptly interrupted and the files are left in this sandbox dir, how would I go about restarting it?

Another issue I have with this sandbox dir is that the user may check how the MD is going but is not going to find the log file or the current trajectory file where it expects it to be, but on a cryptic directory, which ends up --if I'm correct--, on the directory from which my protocol was called. This means the library will create dirs and files where it chooses and I won't have any control over it.

I'm sure you have good reasons for this new behaviour, but I'm not sure these have been properly weighted against the cons. Is the decision to change over to this behaviour final?

PauAndrio · 2023-02-16T11:03:46Z

Hi, @pgbarletta,

If you check the last two commits to biobb_common you'll find:

6721265: Feature: chdir to sandbox property and use single file names instead of paths in cmd.
a59d4b1: Feature: disable_sandbox for local executions.

I hope the second one "disabling the sandbox" would fulfill your needs. Of course, at the moment these new features are not fully tested.
If you decide to use the development version of biobb_common and find any bug/issue please let us know.

Regards,
Pau Andrio

pgbarletta · 2023-02-16T11:50:55Z

Thanks! I'll try it out as soon as I can.

pgbarletta · 2023-02-18T11:41:47Z

I think this variable should be named chdir_sandbox.

Also, Path.samefile() assumes the input Path object exists, so this line errors out always.

I rewrote copy_to_host() to keep this from happening:

    def copy_to_host(self):
        for file_ref, file_path in self.stage_io_dict["out"].items():
            if file_path:
                sandbox_file_path = str(Path(self.stage_io_dict["unique_dir"]).joinpath(Path(file_path).name))
                if Path(sandbox_file_path).exists():
                    if Path(self.io_dict["out"][file_ref]).exists():
                        if not Path(sandbox_file_path).samefile(Path(self.io_dict["out"][file_ref])):
                            shutil.copy2(sandbox_file_path, self.io_dict["out"][file_ref])
                    else:
                        shutil.copy2(sandbox_file_path, self.io_dict["out"][file_ref])

but I didn't make a PR since it's a bit scrappy.
For now, it seems to work.

pgbarletta · 2023-02-18T12:02:24Z

Sorry, I don't get it, it's still creating a sandbox dir and running the MD in there instead of running everything in the folder I point to. So now, I don't get an error from the cpi, but a new temporary folder each time I restart my protocol and a new MD is started instead of restarting from a previous step.

PauAndrio · 2023-02-21T10:44:09Z

I think this variable should be named chdir_sandbox.

Also, Path.samefile() assumes the input Path object exists, so this line errors out always.

I rewrote copy_to_host() to keep this from happening:

    def copy_to_host(self):
        for file_ref, file_path in self.stage_io_dict["out"].items():
            if file_path:
                sandbox_file_path = str(Path(self.stage_io_dict["unique_dir"]).joinpath(Path(file_path).name))
                if Path(sandbox_file_path).exists():
                    if Path(self.io_dict["out"][file_ref]).exists():
                        if not Path(sandbox_file_path).samefile(Path(self.io_dict["out"][file_ref])):
                            shutil.copy2(sandbox_file_path, self.io_dict["out"][file_ref])
                    else:
                        shutil.copy2(sandbox_file_path, self.io_dict["out"][file_ref])

but I didn't make a PR since it's a bit scrappy. For now, it seems to work.

Thank you @pgbarletta all your suggestions are correct and already committed. From my point of view, your rewriting of the copy_to_host function is as good as it can be, just added a few comments to increase code readability.

Pau

PauAndrio · 2023-02-21T10:47:38Z

Sorry, I don't get it, it's still creating a sandbox dir and running the MD in there instead of running everything in the folder I point to. So now, I don't get an error from the cpi, but a new temporary folder each time I restart my protocol and a new MD is started instead of restarting from a previous step.

It shouln't create the sandbox folder if the disable_sandbox property is set to True. Please let me check, I'm writing a bunch of new tests.

Pau

pgbarletta · 2023-02-21T10:50:05Z

oh, I didn't set that option. I'll try again as soon as I can.

pgbarletta · 2023-02-21T14:03:43Z

So, disable_sandbox currently ends up deleting everything in the local folder.

Over here it gets the current working dir (by the way, this is the dir from where the script is launched, and not the directory where the input files are located, which was probably what you intended).
The problem is that the behaviour of deleting the temporary files it's still present here so self.stage_io_dict["unique_dir"] gets deleted.

As a result, the current version of mdrun wipes away the whole dir from where it was launched. Not the best user experience.

github-actions · 2023-05-16T14:52:51Z

This issue is stale because it has been open for 30 days with no activity. If there is no more activity in the following 14 days it will be automatically closed.

github-actions · 2023-06-17T02:27:41Z

This issue is stale because it has been open for 30 days with no activity. If there is no more activity in the following 14 days it will be automatically closed.

github-actions · 2023-07-01T02:56:47Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

PauAndrio · 2023-09-08T09:14:31Z

This issue will need major changes in biobb_common opening an issue there pointing here.
biobb_common issue 14

github-actions · 2023-10-09T02:05:52Z

This issue is stale because it has been open for 30 days with no activity. If there is no more activity in the following 14 days it will be automatically closed.

pgbarletta closed this as completed Feb 16, 2023

PauAndrio reopened this Feb 21, 2023

github-actions bot added the stale label May 16, 2023

adamhospital assigned PauAndrio May 17, 2023

adamhospital removed the stale label May 17, 2023

github-actions bot added the stale label Jun 17, 2023

github-actions bot closed this as completed Jul 1, 2023

PauAndrio reopened this Sep 8, 2023

PauAndrio removed the stale label Sep 8, 2023

PauAndrio mentioned this issue Sep 8, 2023

Disable sandbox feature bioexcel/biobb_common#14

Open

github-actions bot added the stale label Oct 9, 2023

github-actions bot closed this as completed Oct 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new grompp/mdrun behaviour conflicts with .cpt file? #3

new grompp/mdrun behaviour conflicts with .cpt file? #3

pgbarletta commented Feb 2, 2023

PauAndrio commented Feb 16, 2023

pgbarletta commented Feb 16, 2023

PauAndrio commented Feb 16, 2023

pgbarletta commented Feb 16, 2023

pgbarletta commented Feb 18, 2023

pgbarletta commented Feb 18, 2023

PauAndrio commented Feb 21, 2023

PauAndrio commented Feb 21, 2023

pgbarletta commented Feb 21, 2023

pgbarletta commented Feb 21, 2023

github-actions bot commented May 16, 2023

github-actions bot commented Jun 17, 2023

github-actions bot commented Jul 1, 2023

PauAndrio commented Sep 8, 2023

github-actions bot commented Oct 9, 2023

new grompp/mdrun behaviour conflicts with .cpt file? #3

new grompp/mdrun behaviour conflicts with .cpt file? #3

Comments

pgbarletta commented Feb 2, 2023

PauAndrio commented Feb 16, 2023

pgbarletta commented Feb 16, 2023

PauAndrio commented Feb 16, 2023

pgbarletta commented Feb 16, 2023

pgbarletta commented Feb 18, 2023

pgbarletta commented Feb 18, 2023

PauAndrio commented Feb 21, 2023

PauAndrio commented Feb 21, 2023

pgbarletta commented Feb 21, 2023

pgbarletta commented Feb 21, 2023

github-actions bot commented May 16, 2023

github-actions bot commented Jun 17, 2023

github-actions bot commented Jul 1, 2023

PauAndrio commented Sep 8, 2023

github-actions bot commented Oct 9, 2023