Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failures with integrity_coverage, possibly singularity related #102

Closed
karinlag opened this issue Jul 3, 2018 · 8 comments
Closed

failures with integrity_coverage, possibly singularity related #102

karinlag opened this issue Jul 3, 2018 · 8 comments
Assignees

Comments

@karinlag
Copy link

karinlag commented Jul 3, 2018

Hi!

First, I realize this might be a singularity issue and not a flowcraft issue as such, but if so, would be grateful for pointers on where to go figure out more.

I am trying to use flowcraft on an HPC cluster running CentOS 6.9, which uses slurm as the queueing system. They also have the module system

I have built a basic pipeline as described in the tutorial, i.e.

flowcraft build -t "trimmomatic fastqc spades" -o my_pipe.nf

That worked nicely.

I am now trying to run it, and I have tried to different ways: singularity installed via conda, and singularity loaded as a module via the module system. Both fails, and in both cases the output on the command line looks the same:

[karinlag@abel fastq_files]$ nextflow run my_pipe.nf --profile slurm
N E X T F L O W  ~  version 0.26.4
Launching `my_pipe.nf` [nauseous_jennings] - revision: 0857cd8b6b

============================================================
                F L O W C R A F T
============================================================
Built using flowcraft v1.2.0

 Input FastQ                 : 4
 Input samples               : 2
 Reports are found in        : ./reports
 Results are found in        : ./results
 Profile                     : standard

Starting pipeline at Tue Jul 03 10:53:42 CEST 2018

[warm up] executor > local
[80/d75a28] Submitted process > integrity_coverage_1_1 (A2)
[78/c1b312] Submitted process > integrity_coverage_1_1 (A1)
[80/d75a28] NOTE: Process `integrity_coverage_1_1 (A2)` terminated with an error exit status (255) -- Execution is retried (1)
[78/c1b312] NOTE: Process `integrity_coverage_1_1 (A1)` terminated with an error exit status (255) -- Execution is retried (1)
[1e/cf471a] Re-submitted process > integrity_coverage_1_1 (A2)
[37/26296e] Re-submitted process > integrity_coverage_1_1 (A1)
...etc

However, the errors I get are different. Below is the .command.err output when I run with the two different ways of loading singularity.

  1. as a slurm module:
[karinlag@abel cf471aa996a4fcef1c61188e9d1ad7]$ cat .command.err
ERROR  : Image path /work/projects/nn9305k/tmp/testdata/fastq_files/flowcraft/flowcraft_base:1.0.0-1 doesn't exist
ABORT  : Retval = 255
[karinlag@abel cf471aa996a4fcef1c61188e9d1ad7]$ 
  1. available via conda
[karinlag@abel 340be836b130153683beb7294d85d8]$ cat .command.err 
ERROR  : Failed invoking the NEWUSER namespace runtime: Invalid argument
ABORT  : Retval = 255
[karinlag@abel 340be836b130153683beb7294d85d8]$ 

Flowcraft looks really cool and useful, and I would be most grateful for any pointers you can give me on how to debug this further.

@ODiogoSilva
Copy link
Collaborator

Hi Karin,

Thank you for you feedback. We have had some issues when pulling singularity images (regardless of the pipeline used) but it seems to be an issue with nextflow/singularity versions. I see that you are using nextflow 0.26.4, which is now pretty "old" and I think it does not support the automatic conversion of the docker image paths defined in the containers to the singularity ones (I think this was introduced in version 0.27). This is also why flowcraft has the nextflow >=0.27 dependency on bioconda. So, the first thing to do would be to update nextflow to the lastest version and I also suggest removing the $HOME/.singularity/ directory before trying to pull any new images (this will not remove any stored images, just some singularity cache files).

Can you also provide me the singularity version(s) that are being used? There were some reported issues with versions pre 2.4.1: nextflow-io/nextflow#724 and nextflow-io/nextflow#498. In my case, updating to the latest 2.5.1 fixed the issues I was having.

@karinlag
Copy link
Author

karinlag commented Jul 3, 2018

Thankyou for your prompt response.

I have now updated nextflow to 0.30, and I am at least getting new error messages 🙂

I did not delete $HOME/.singularity/, because I could not find one in my home directory, nor in the directory where I'm running things.

My current results are:

  1. with conda singularity:
[karinlag@abel fastq_files]$ singularity --version
2.4.2-dist
[karinlag@abel fastq_files]$ 

With this one I succeed in downloading an image, I think:

[karinlag@abel fastq_files]$ nextflow run my_pipe.nf 
N E X T F L O W  ~  version 0.30.2
^C[karinlag@abel fastq_files]$ rm -rf work/
[karinlag@abel fastq_files]$ rm -rf local_work/
[karinlag@abel fastq_files]$ nextflow run my_pipe.nf 
N E X T F L O W  ~  version 0.30.2
Launching `my_pipe.nf` [dreamy_tesla] - revision: 0857cd8b6b

============================================================
                F L O W C R A F T
============================================================
Built using flowcraft v1.2.0

 Input FastQ                 : 4
 Input samples               : 2
 Reports are found in        : ./reports
 Results are found in        : ./results
 Profile                     : standard

Starting pipeline at Tue Jul 03 16:13:38 CEST 2018

[warm up] executor > local
Pulling Singularity image docker://flowcraft/flowcraft_base:1.0.0-1 [cache /usit/abel/u1/karinlag/.singularity_cache/flowcraft-flowcraft_base-1.0.0-1.img]
[3a/0e4e1e] Submitted process > integrity_coverage_1_1 (A2)
[5e/c40322] Submitted process > integrity_coverage_1_1 (A1)
[5e/c40322] NOTE: Process `integrity_coverage_1_1 (A1)` terminated with an error exit status (255) -- Execution is retried (1)
[4c/7e5cfb] Re-submitted process > integrity_coverage_1_1 (A1)

The error from this is the same NEWUSER that I had earlier.

  1. With singularity loaded as a module:
[karinlag@abel fastq_files]$ singularity --version
2.5.0-dist
[karinlag@abel fastq_files]$
[karinlag@abel fastq_files]$ nextflow run my_pipe.nf
N E X T F L O W  ~  version 0.30.2
Launching `my_pipe.nf` [peaceful_montalcini] - revision: 0857cd8b6b

============================================================
                F L O W C R A F T
============================================================
Built using flowcraft v1.2.0

 Input FastQ                 : 4
 Input samples               : 2
 Reports are found in        : ./reports
 Results are found in        : ./results
 Profile                     : standard

Starting pipeline at Tue Jul 03 16:16:25 CEST 2018

[warm up] executor > local
[19/90a7d2] Submitted process > integrity_coverage_1_1 (A2)
[4d/729419] Submitted process > integrity_coverage_1_1 (A1)
[19/90a7d2] NOTE: Process `integrity_coverage_1_1 (A2)` terminated with an error exit status (255) -- Execution is retried (1)
[4d/729419] NOTE: Process `integrity_coverage_1_1 (A1)` terminated with an error exit status (255) -- Execution is retried (1)

My error here now is:

[karinlag@abel work]$ cat */*/.command.err
ERROR  : No more available loop devices, try increasing 'max loop devices' in singularity.conf
ABORT  : Retval = 255

Let me know if there is information that I have not thought to include, and thankyou for your help!

@ODiogoSilva
Copy link
Collaborator

Ok, the error generator for bioconda's singularity is most likely the result of installing singularity without root permission (See singularity issue apptainer/singularity#415 (comment)). We have been getting this error in our builds and we have already updated the dev docs( but not yet in the master's README) advising to install the latest singularity version with root permissions. If the cached image was downloaded via the bioconda installation, it is best to remove it and re-do the pull/conversion process with a singularity version that has root permissions.

Concerning the 'max loop devices' error, it seems that is related to the installation and configuration of singularity on the cluster. Were you able to execute any other nextflow pipeline using singularity on that cluster? And can you enter the singularity image at all using singualrity run /usit/abel/u1/karinlag/.singularity_cache/flowcraft-flowcraft_base-1.0.0-1.img?

According to singularity's docs on the matter, the maximum number of loop devices can be increased in a cluster, but it requires sudo permissions.

@karinlag
Copy link
Author

karinlag commented Jul 4, 2018

I don't have root on the cluster, so that precludes some fixes, but thanks for mentioning them 😄

The cluster in question is offline until Friday morning, but I will test the singluarity command you suggested then. I will also get in touch with the cluster management about the max loop devices.

@ODiogoSilva
Copy link
Collaborator

Ok, let us now of the results. It would be very useful to have a troubleshooting for singularity-based piepline runs.

@karinlag
Copy link
Author

karinlag commented Jul 17, 2018

Finally had time to look at this some more.

Ok, so with they hpc installed singularity version(2.5.0-dist), which I activate with module load, I took one of the images that I managed to download on a different machine and tried to run that:

[karinlag@abel .singularity_cache]$ singularity run flowcraft-fastqc-0.11.7-1.img 
ERROR  : Base home directory does not exist within the container: /usit
ABORT  : Retval = 255
[karinlag@abel .singularity_cache]$ 

I am kind of assuming that this has something to do with the setup of singularity on the cluster, but I thought I'd double check before I try to wake up somebody there during vacation time.

@ODiogoSilva
Copy link
Collaborator

In this case, I suspect it is missing a bind mountpoint. Try adding it to mount the current directory:

singularity run -B $(pwd) flowcraft-fastqc-0.11.7-1.img

But yes, this seems to be a singularity related issue but I don't have much experience with it running as a module. In the meantime, if the maxloop devices issue was address you can also try running through nextflow again. Just make sure that the images are pulled with the module's singularity.

@ODiogoSilva
Copy link
Collaborator

Closing due to inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants