Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a peculiar bug in grch37 dbnsftp recipe #296

Closed
naumenko-sa opened this issue Apr 25, 2019 · 3 comments
Closed

a peculiar bug in grch37 dbnsftp recipe #296

naumenko-sa opened this issue Apr 25, 2019 · 3 comments

Comments

@naumenko-sa
Copy link
Collaborator

naumenko-sa commented Apr 25, 2019

Hello, cloudbiolinux community!

When installing bcbio_nextgen 1.1.5 from scratch, installing --datatarget dbnsfp failed for me.

Running GGD recipe: GRCh37 dbnsfp 3.5a
Traceback (most recent call last):
  File "/hpf/largeprojects/ccmbio/naumenko/tools/bcbio_1.1.5/anaconda/bin/bcbio_nextgen.py", line 221, in <module>
    install.upgrade_bcbio(kwargs["args"])
  File "/hpf/largeprojects/ccmbio/naumenko/tools/bcbio_1.1.5/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 106, in upgrade_bcbio
    upgrade_bcbio_data(args, REMOTES)
  File "/hpf/largeprojects/ccmbio/naumenko/tools/bcbio_1.1.5/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 348, in upgrade_bcbio_data
    args.cores, ["ggd", "s3", "raw"])
  File "/hpf/largeprojects/ccmbio/naumenko/tools/bcbio_1.1.5/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 354, in install_data_local
    _prep_genomes(env, genomes, genome_indexes, ready_approaches, data_filedir)
  File "/hpf/largeprojects/ccmbio/naumenko/tools/bcbio_1.1.5/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 480, in _prep_genomes
    retrieve_fn(env, manager, gid, idx)
  File "/hpf/largeprojects/ccmbio/naumenko/tools/bcbio_1.1.5/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 850, in _install_with_ggd
    ggd.install_recipe(os.getcwd(), env.system_install, recipe_file, gid)
  File "/hpf/largeprojects/ccmbio/naumenko/tools/bcbio_1.1.5/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 30, in install_recipe
    recipe["recipe"]["full"]["recipe_type"], system_install)
  File "/hpf/largeprojects/ccmbio/naumenko/tools/bcbio_1.1.5/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 62, in _run_recipe
    subprocess.check_output(["bash", run_file])
  File "/hpf/largeprojects/ccmbio/naumenko/tools/bcbio_1.1.5/anaconda/lib/python3.6/subprocess.py", line 336, in check_output
    **kwargs).stdout
  File "/hpf/largeprojects/ccmbio/naumenko/tools/bcbio_1.1.5/anaconda/lib/python3.6/subprocess.py", line 418, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['bash', '/hpf/largeprojects/ccmbio/naumenko/tools/bcbio_1.1.5/genomes/Hsapiens/GRCh37/txtmp/ggd-run.sh']' returned non-zero exit status 141.

I was able to track down the problem in the recipe to that line:

echo "test0"
unzip -p dbNSFPv*.zip "dbNSFP*_variant.chr1" | head -n1 > $UNPACK_DIR/header.txt
echo "test1"

header.txt is created, but then script fails without any messages, test1 is not printed.

 echo "test0"
 unzip dbNSFPv*.zip "dbNSFP*_variant.chrM"
 head -n1 dbNSFP*_variant.chrM > $UNPACK_DIR/header.txt
 echo "test1"

works well.

I've changed the recipe for grch37 and hg38 accordingly here
#295

Sergey

@chapmanb
Copy link
Owner

Sergey;
Thanks for the diagnosis and the fix. I'm confused as to why that step would fail when run in a pipe but work cleanly when done separately. Do you have an idea of what is failing on your system exactly? The only thing I'm concerned about in your change is the impact of doing a larger unzip on runtime and filesystem sizes. That particular line was trying to avoid unpacking much of the file and just grabbing a single header line that we need. If the workaround doesn't create anything large on disk and also works quickly that works great with me as well. Thanks again.

@naumenko-sa
Copy link
Collaborator Author

naumenko-sa commented Apr 25, 2019

Thanks Brad!

It is a puzzle why it is not working, my system is pretty standard. Something with the pipe buffer (which was increased in the latest bash) overflow, while head -n1 needs just a bit of information, i.e. pipe synchronization issue? The fix does not unpack the huge file - you see I've changed chr1 to chrM. It is quite small. Interestingly, the larger pipe down the script which processes all the huge files works well (but it processes every line of the files, not just head -n1).

S.

@naumenko-sa
Copy link
Collaborator Author

Found a typo which I introduced in dbnsftp recipe.
Sorry about that.
Please merge.
#297

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants