Skip to content

verdi calcjob cleanworkdir: Add option for work chain #4693

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mbercx opened this issue Jan 28, 2021 · 13 comments
Open

verdi calcjob cleanworkdir: Add option for work chain #4693

mbercx opened this issue Jan 28, 2021 · 13 comments
Labels
good first issue Issues that should be relatively easy to fix also for beginning contributors priority/nice-to-have topic/verdi type/feature request status undecided

Comments

@mbercx
Copy link
Member

mbercx commented Jan 28, 2021

Is your feature request related to a problem? Please describe

A user on the mailing list wanted to clean the remote directories of all calculation jobs executed within a work chain with verdi calcjob cleanworkdir. Currently, this is not possible and I don't think there is a very straightforward way of doing this with the Python API.

Describe the solution you'd like

One solution would be to simply add an option, e.g. --workchains / -w, where the user can provide a list of work chain pks/uuids for which they want the remote directories cleaned:

verdi calcjob cleanworkdir -w <WORKCHAIN_ID>
@mbercx mbercx added good first issue Issues that should be relatively easy to fix also for beginning contributors topic/verdi type/feature request status undecided priority/nice-to-have labels Jan 28, 2021
@chrisjsewell
Copy link
Member

chrisjsewell commented Jan 28, 2021

sounds good to me 👍 I would also go one step further and say it would be good to specify a group (and search for something like all calcjob, workchains and remote folders it contained)

@praanjalmishra
Copy link
Contributor

Hi @mbercx @chrisjsewell, I'm interested in working on this issue. Would you guys tell me the approach to the problem?

@chrisjsewell
Copy link
Member

Hey @pranjalmish1, just to note for others you have expressed interest in the Google Summer of Code (GSOC) 2021.
Firstly I would point towards the general PR guidelines I have mentioned in #4702 (reply in thread)

Specifically, you need to adapt the verdi calcjob cleanworkdir (

def calcjob_cleanworkdir(calcjobs, past_days, older_than, computers, force):
), to collect all calcjobs that a workchain calls (read for example: https://aiida.readthedocs.io/projects/aiida-core/en/latest/topics/workflows/concepts.html#id5) and add them to the list of work directories to clean

@lkbhitesh07
Copy link

Hello @chrisjsewell, As far as I understand that we have to call the calcjob_cleanworkdir function to clear all the calcjobs that are present in a workchain but I'm not sure where to look at for all calcjobs which are specific to a particular workchain. As workchain keeps the track of provenance so how are we suppose to get ids of calcjobs.
I'm bit new to this environment that's why I'm getting a bit confused
Thanks in advance

@unkcpz
Copy link
Member

unkcpz commented Nov 11, 2021

For reference, here is the code I use in my workflow to collect the calcjobs of a workchain and clean the remote folder:

if self.inputs.clean_workdir.value is False:
    self.report('remote folders will not be cleaned')
    return

cleaned_calcs = []

for called_descendant in self.node.called_descendants:
    if isinstance(called_descendant, orm.CalcJobNode):
        try:
            called_descendant.outputs.remote_folder._clean()  # pylint: disable=protected-access
            cleaned_calcs.append(called_descendant.pk)
        except (IOError, OSError, KeyError):
            pass

if cleaned_calcs:
    self.report(
        f"cleaned remote folders of calculations: {' '.join(map(str, cleaned_calcs))}"
    )

@ltalirz
Copy link
Member

ltalirz commented Dec 21, 2022

Has there been any progress on this?

I see this being a frequently requested feature

@khsrali
Copy link
Contributor

khsrali commented Feb 14, 2025

With solution #6756 for issue #6732, this is now possible.

verdi node delete <workflow_PK> --clean-workdir

This command, first cleans the directory before deleting the node itself. For this specific behavior one can pass y/n to the prompt questions.
If you don't mind deleting the node as well, just pass y/y.

@mbercx
Copy link
Member Author

mbercx commented Feb 19, 2025

Thanks @khsrali! So if I understand correctly in order to have the desired behaviour of this issue, the user would have to run

verdi node delete <workflow_PK> --clean-workdir

and then reply n to the prompt? Although this technically does resolve this issue, I'm not sure it's the most intuitive. The use case I had in mind is where the user doesn't want to delete the node, so they wouldn't go look for the command in verdi node delete.

@khsrali
Copy link
Contributor

khsrali commented Feb 19, 2025

@mbercx thanks for your feedback. I agree it's not very intuitive.
The solution was originally designed to answer #6732

We could also put the main functionality under verdi calcjob cleanworkdir

The reason that I didn't do it, is that a workchain is not a calcjob node but a generic node.
So honestly, I don't know what's the best solution in this case

update:
What about verdi node cleanworkdir that could do either workchain or calcjob?

@mbercx
Copy link
Member Author

mbercx commented Feb 19, 2025

I agree, it's a bit weird to have verdi calcjob cleanworkdir and apply it to a work chain. Why not verdi process cleanworkdir, that could work for both a CalcJob and a WorkChain?

@unkcpz
Copy link
Member

unkcpz commented Feb 19, 2025

You have me on your back @mbercx (check #6732 (comment)) :3 we move the discussion to there?

@khsrali
Copy link
Contributor

khsrali commented Feb 20, 2025

@mbercx we thought about that idea as well,
The problem is a process does not have a workdir. And even we take the meaning vaguely that bring about more issues, such as a process first has to stop or killed, etc..

While workdir are properties of nodes not processes, if anything general I'd go with verdi node cleanworkdir

@khsrali
Copy link
Contributor

khsrali commented Mar 5, 2025

Ok to conclude here, and document for anybody who reads this issue and seeks for a solution:

There are two potential ways to achieve what this issue is asking for:

  1. The recommend way is explained here, which is not developed, yet:
    CLI: verdi node list --descendant list all descendant nodes of a node, respecting traversal rules #6782

  2. The "hacky" way:
    (echo yes; echo no) | verdi node delete --clean-workdir <workflow's pk>
    Confirms the first prompt to clean work directories, and Aborts the second prompt to not delete the nodes.

Disclaimer for the potential reply of @unkcpz 😅 : verdi node delete --clean-workdir was not designed for this porpuse, but can get the job done if someone is stuck with this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Issues that should be relatively easy to fix also for beginning contributors priority/nice-to-have topic/verdi type/feature request status undecided
Projects
None yet
Development

No branches or pull requests

7 participants