Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PySam error with MarginAlign+Minimap2 (last realease) #46

Open
tetedange13 opened this issue Jan 30, 2019 · 2 comments
Open

PySam error with MarginAlign+Minimap2 (last realease) #46

tetedange13 opened this issue Jan 30, 2019 · 2 comments
Assignees

Comments

@tetedange13
Copy link

Hello,

I am trying to use MarginAlign with the last release of Minimap2 (2.15-r908-dirty) and I had several issues:

  1. I had this mysterious PySam error, that has not yet been mentionned in any issues (I think).
    I did the following sequence of commands to install MarginAlign and get the last release of Minimap2:
git clone https://github.com/benedictpaten/marginAlign.git
cd marginAlign
git pull
git submodule init
git submodule update
make
virtualenv --python=/usr/bin/python2 --no-site-packages --distribute env && source env/bin/activate && pip install -r requirements.txt
make test

Everything went fine, both on the make test and using a toy dataset (20 reads against a small database).
But when I used a bigger fastq file (against the same database) an error appeared, apparently coming from the getLastNonClippedPositionInRead function (utils.py file). The alignedSegment.query_alignment_end was crashing and returning a "SystemError: error return without exception set" (I checked the 2 other variables, they seemed fine).
I found a PySam issue mentionning a similar thing, but in a different context (pysam-developers/pysam#176), so I fixed this by installing pysam==0.9.0 instead of the version mentionned in the requirement.txt
I don't know, maybe I did something wrong, or maybe it is linked with the use of the last version of Minimap2 ?

  1. Moreover, trying to use the --em option, to run the EM learning followed by the realignement, I realized that MarginAlign was always loading the HMM file "last_hmm_20.txt". So I had to comment the line options.inputModel = os.path.join(pathToBaseNanoporeDir(), "src", "margin", "mappers", "last_hmm_20.txt") in the marginAlign.py file.
    I guess this is linked with the fact that LAST is set as the default aligner, but maybe there is something to fix here?

Thanks in advance for your answer and many thanks for implementing Minimap2 into MarginAlign!
Felix.

@mitenjain mitenjain self-assigned this Jan 30, 2019
@mitenjain
Copy link
Collaborator

Hi Felix,

Couple of thoughts.

  1. Was the virtualenv functional before you had to update pysam?
  2. When you say a large size file, how large are you talking about?
  3. For the hmm file, you should consider training your own model since the stock one is out of data (and needs updating with a newer one). I will try to find a newer R9.4 human trained em model to upload. The hmm file came from last as a legacy but works well with other aligners.
  4. Thank you about minimap2. That is also the least tested aligner for realignment and chaining so some of the error could be due to that. I'll look into this.

If you could share a toy dataset as an MWE for me to reproduce this here, that would be very helpful.

Thank you.
Best regards,
-Miten

@tetedange13
Copy link
Author

Hi Miten,

Thanks for your very quick answer.

  1. I only had 1 issue creating the virtualenv (with distribute and markerLib distribute wheel build failure in setuptools>=20.2 pypa/setuptools#535) but this package did not seem to be essential as:
  • The make test worked ;
  • The alignment with Minimap2 (last release) + realignement of my toy dataset against the database (see the part about MWE bellow for details), using the default last_hmm_20.txt file worked ;
  • The training of a new HMM worked until the complete 100 iterations (even if I had this issue mentionned in point 2.).
  1. Not that large actually, so that I was surprised to get that PySam error... As they are reads from experimental Nanopore 1D R9 runs, I don't think that I am allowed to share them.
    Anyway, I can tell you that the fastq file (not .gz) was about 300 MB large, containing 244 458 reads.
    You can probably find some others available Nanopore runs, much bigger than mine (you can maybe give a try with this one from this publication where file1 = 730 MB, I did not try it myself), that will give you (I hope) the same error.

  2. Yes that is precisely what I was trying to do: training a new HMM model by running MarginAlign on my own fastq file, with the --em --outputModel output.hmm option! But at the end, the program displayed something like "model used to realign sam file = last_hmm_20.txt", instead of using my generated output.hmm file.

Concerning the MWE you asked for:

  • Database (fasta) used = 8 genomes from a standard mock community (can be found here , inside the "GENOMES" folder), all fasta files concatenated into 1 single ;
  • Fastq file used = Nanopore runs that were used to test the Centrifuge tool (reference article, "fruitshake" dataset), that can be found here (I used the fastq with run accession number SRR4333334).

Thanks again.
Best regards.
Felix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants