Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nextflow pipeline removes leftover folders/files #23

Merged
merged 14 commits into from
Aug 9, 2021

Conversation

Cecilia-Sensalari
Copy link
Collaborator

  • If the pipeline is stopped while Ks estimate
    processes are still ongoing, there might be
    tmp folders and incomplete file leftovers.
    They must be deteleted, and now the pipeline
    does it automatically as closure step.
  • Both paralog and ortholog folders are cleaned
  • If BLAST tmp folder is found, delete also the
    associated and incomplete BLAST TSV file
  • If Ks tmp folder is found, delete it

- If the pipeline is stopped while Ks estimate
  processes are still ongoing, there might be
  tmp folders and incomplete file leftovers.
  They must be deteleted, and now the pipeline
  does it automatically as closure step.
- Both paralog and ortholog folders are cleaned
- If BLAST tmp folder is found, delete also the
  associated and incomplete BLAST TSV file
- If Ks tmp folder is found, delete it
@lohausr
Copy link
Contributor

lohausr commented Jul 15, 2021

Shall we also delete any core dumps? And delete any empty ortholog_distributions/wgd_species_species/ and paralog_distributions/wgd_species/ folders? Could there be any temporary i-adhore files/folders that need to be cleaned up?

If the pipeline stops with an error, may it help in any way to keep any of these files to figure out what went wrong? If so, maybe introduce an optional parameter that prevents any removal of temporary files.

main.nf Outdated Show resolved Hide resolved
@Cecilia-Sensalari
Copy link
Collaborator Author

> Shall we also delete any core dumps?
It would be handy. They are always generated in the launching directory and they are always called "core.xxx...", so it would be easy to localize/parse them. Except if some other file is called "core..." and we delete it too by mistake?

> And delete any empty ortholog_distributions/wgd_species_species/ and paralog_distributions/wgd_species/ folders?
It's not really necessary, but handy as well.

> Could there be any temporary i-adhore files/folders that need to be cleaned up?
Yes, there is a speces.ks_anchors_tmp! I'll include it in the code.

> If the pipeline stops with an error, may it help in any way to keep any of these files to figure out what went wrong? If so, maybe introduce an optional parameter that prevents any removal of temporary files.
It might be, we can be flexible... Shall we delete by default?

@lohausr
Copy link
Contributor

lohausr commented Jul 15, 2021

> Shall we also delete any core dumps?
It would be handy. They are always generated in the launching directory and they are always called "core.xxx...", so it would be easy to localize/parse them. Except if some other file is called "core..." and we delete it too by mistake?

If we make the pattern to detect them specific enough it's unlikely to have some other similarly-named file deleted.

> If the pipeline stops with an error, may it help in any way to keep any of these files to figure out what went wrong? If so, maybe introduce an optional parameter that prevents any removal of temporary files.
It might be, we can be flexible... Shall we delete by default?

Yes, delete by default, but keep when parameter is set. But use a Nextflow parameter for this, and not one in the ksrates config file, since this is specific to Nextflow (at least for now).

cesen added 3 commits July 15, 2021 11:28
- Nextflow parameter added to the main.nf pipeline
  that switches on/off the automatic deletion of
  wgd and i-ADHoRe leftover folders such as
  ks_tmp.
- Default true: it is automatically deleted when
  the pipeline crashes.
- Can be turned off either from the NF
  configuration file or from the command line.
- Note: configuration files must still be updated
@Cecilia-Sensalari
Copy link
Collaborator Author

UPDATE:

  • temporary anchor Ks directory is deleted, if any
  • added delete_leftover_folders parameter in Nextflow configuration file and main.nf to switch on/off the automatic deletion of leftover folders (default on)

Note: if we are going to remove the nThreadsParalogs and nThreadsOrthologs Nextflow parameters in the configuration file through one of the other PRs, then we will have to deal with minor conflicts because it involves the same lines of the new parameter:
image

@lohausr
Copy link
Contributor

lohausr commented Jul 15, 2021

  • added delete_leftover_folders parameter in Nextflow configuration file and main.nf to switch on/off the automatic deletion of leftover folders (default on)

Note: if we are going to remove the nThreadsParalogs and nThreadsOrthologs Nextflow parameters in the configuration file through one of the other PRs, then we will have to deal with minor conflicts because it involves the same lines of the new parameter:
image

I don't think we need to put this parameter in the config files, we can document it and use it in the Nextflow command line.

We should use the same notation for these parameter names, so far we have used camelCase notation (nThreadsParalogs etc.), so we shouldn't use underscores here just for this parameter but deleteLeftoverFolders instead.
However, I wouldn't actually name this parameter like that. It's a bit long, it's not only about folders, and most importantly, I'm not a fan of setting some parameter to false to turn something on. So I would name this parameter something like noCleanup or preserve instead, and by default it is set to false internally. The files and folders can then be kept by setting the parameter like:
nextflow run VIB-PSB/ksrates --noCleanup
More simple and clear.

@Cecilia-Sensalari
Copy link
Collaborator Author

Cecilia-Sensalari commented Jul 15, 2021

Thanks for pointing that out!

  • I called it "preserve" and set it false by default. By adding "--preserve" to the command line it is then set to true and the leftovers are kept.
  • Added small description on docs "configuration.rst" about new parameter preserve.
  • Removed the parameter from Nextflow configuration files

@lohausr
Copy link
Contributor

lohausr commented Jul 15, 2021

Thanks!

Don't forget to look at my comment on the code that defines species_name in the cleanup section.

@Cecilia-Sensalari
Copy link
Collaborator Author

Cecilia-Sensalari commented Jul 15, 2021

Update:

  • "core" files removed, if any (tested on our cluster)
  • onComplete parses the ksrates configuration file by specifically looking for "focal_species", instead of blindly taking the second line.

main.nf Outdated Show resolved Hide resolved
cesen added 4 commits July 15, 2021 17:11
- Remove wgd sub-directories (i.e.
  wgd_species and wgd_species_species) if
  they end up being empty after removing all
  leftover temporary directories.
@lohausr
Copy link
Contributor

lohausr commented Jul 16, 2021

Thanks, looks all good!

@Cecilia-Sensalari
Copy link
Collaborator Author

UPDATE:

  • "core." files are deleted only if their suffix starts with a digit. Note: I didn't find a way to say "a suffix consisting of one or more digits", but only "a suffix whose first character is a digit". This should be however enough :)
  • Empty wgd_species and wgd_species_species directories are deleted as well

@Cecilia-Sensalari Cecilia-Sensalari merged commit fbcc46a into master Aug 9, 2021
@Cecilia-Sensalari Cecilia-Sensalari deleted the nextflow_delete_leftovers branch August 23, 2021 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants