Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bioconda install of bcalm produces incorrect cDBG on linux? #37

Closed
ctb opened this issue Dec 23, 2018 · 10 comments
Closed

bioconda install of bcalm produces incorrect cDBG on linux? #37

ctb opened this issue Dec 23, 2018 · 10 comments

Comments

@ctb
Copy link
Contributor

ctb commented Dec 23, 2018

In #34, I reported that in not-very-reproducible circumstances on my HPC, when I compiled bcalm from source, it produced bad cDBG files. I ended up automating the heck out of the check and being unable to reproduce it so I gave up. But I cleverly never removed the validation code from the spacegraphcats application :)

Fast forward to today, and I found the same problem as in #34 when I installed bcalm using bioconda! Hopefully this time it will be more reproducible. I reproduce the bug on both my HPC and Amazon's Linux box, which kind of makes sense because bioconda provides a binary install so I would hope that it would fail the same way on both :).

To reproduce using spacegraphcats

this is running on Amazon Linux 2 AMI (HVM), SSD Volume Type - ami-01bbe152bf19d0289, region us-west-2. I created a t2-large with 20 GB of disk space, logged in as ec2-user, and then ran the following commands (some require answering interactive prompts, sorry):

# install C compiler
sudo yum groupinstall "Development Tools"

# install conda
curl -L -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh 
source ~/.bashrc
. /home/ec2-user/miniconda3/etc/profile.d/conda.sh

# add bioconda
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

# create a conda environment called 'sgc' with python 3.6 and baclm installed.
conda create -n sgc python==3.6 bcalm

Now, activate and try running the spacegraphcats example:

conda activate sgc

# install prereqs for spacegraphcats
pip install Cython
pip install https://github.com/dib-lab/pybbhash/archive/spacegraphcats.zip
pip install https://github.com/dib-lab/khmer/archive/master.zip
pip install git+https://github.com/dib-lab/sourmash@master#egg=sourmash

# install spacegraphcats
pip install spacegraphcats

# grab example repo
git clone https://github.com/spacegraphcats/spacegraphcats-twofoo-example.git
cd spacegraphcats-twofoo-example/

# run example workflow
snakemake

this fails for me after a few minutes of running with:

/home/ec2-user/miniconda3/envs/sgc/bin/python -m spacegraphcats.cdbg.bcalm_to_gxt  -k 31 twofoo/bcalm.twofoo.k31.unitigs.fa twofoo_k31_r1/cdbg.gxt twofoo_k31_r1/contigs.fa.gz
reading unitigs from twofoo/bcalm.twofoo.k31.unitigs.fa
...read 206855 unitigs
577 -> 181159, but not 181159 -> 577
576 -> 81582, but not 81582 -> 576
4675 -> 101520, but not 101520 -> 4675
4675 -> 104265, but not 104265 -> 4675
...
204785 -> 12508, but not 12508 -> 204785
Traceback (most recent call last):
  File "/home/ec2-user/miniconda3/envs/sgc/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/ec2-user/miniconda3/envs/sgc/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/ec2-user/miniconda3/envs/sgc/lib/python3.6/site-packages/spacegraphcats/cdbg/bcalm_to_gxt.py", line 375, in <module>
    main(sys.argv[1:])
  File "/home/ec2-user/miniconda3/envs/sgc/lib/python3.6/site-packages/spacegraphcats/cdbg/bcalm_to_gxt.py", line 257, in main
    neighbors, sequences, mean_abunds, sizes = read_bcalm(unitigs, debug, k)
  File "/home/ec2-user/miniconda3/envs/sgc/lib/python3.6/site-packages/spacegraphcats/cdbg/bcalm_to_gxt.py", line 214, in read_bcalm
    assert not fail
AssertionError
[Sun Dec 23 20:23:27 2018]
Error in rule bcalm_catlas_input:
...

Here's what I have installed for bcalm:

% conda list | grep bcalm
bcalm                     2.2.0                hd28b015_3    bioconda

cc @natir viz #32.

@ctb
Copy link
Contributor Author

ctb commented Dec 23, 2018

Oddly enough my shorter automated workflow https://github.com/ctb/2018-bcalm-debug doesn't detect any problems. Must not trigger the right problem.

Note that when I compile bcalm from scratch on my HPC the spacegraphcats-twofoo-example above runs just fine and passes the reciprocal edges test.

@rchikhi
Copy link
Member

rchikhi commented Dec 23, 2018

Thanks for the detailed report! I confirm that I can reproduce the behavior of #37 (comment) on my own Linux, when using bioconda's bcalm. The problem doesn't occur when using the current master branch of bcalm. So something must be going on with bioconda's version, I'm on it.

@ctb
Copy link
Contributor Author

ctb commented Dec 24, 2018 via email

@rchikhi
Copy link
Member

rchikhi commented Dec 24, 2018

I think I see what's going on. Two bugs: 1) I thought BCALM outputted consecutive unitig IDs, but apparently with some compilers it had some rare inversions (e.g. 9,10,12,11,13,14,..). Likely a multi-threading bug. 2) The procedure used to link unitigs silently and wrongly assumed that unitigs needed to have consecutive IDs. I've commited a fix for both bugs and will release BCALM 2.2.1. @natir could you please update Conda when you have time, or show me how?

@ctb
Copy link
Contributor Author

ctb commented Dec 24, 2018

you rock! thanks, @rchikhi!

@ctb
Copy link
Contributor Author

ctb commented Jan 29, 2019

ping! I'd suggest withdrawing bcalm from bioconda until someone has time to fix it.

@rchikhi
Copy link
Member

rchikhi commented Feb 3, 2019

pinging @natir: bcalm 2.2.1 is released and needs to be updated in bioconda please :)

@natir
Copy link

natir commented Feb 4, 2019

@epruesse update bioconda bcalm to 2.2.1 in this pull request bioconda/bioconda-recipes#12811

@rchikhi
Copy link
Member

rchikhi commented Feb 4, 2019

merci!

@ctb
Copy link
Contributor Author

ctb commented Feb 4, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants