Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to do a global alignment (nw) of really long proteins #3

Closed
valentynbez opened this issue Aug 13, 2023 · 4 comments
Closed
Labels
bug Something isn't working

Comments

@valentynbez
Copy link
Contributor

valentynbez commented Aug 13, 2023

I align some extremely long (>20,000 aa) proteins against AF2DB.
Opal fails to do that at ~30,000 aa with RuntimeError: Failed to run search Opal database (code=1) for 2 longest sequences in the dataset, both in score and full mode.

Code to reproduce:

import pyopal
from pysam.libcfaidx import FastxFile

target = [x.sequence for x in FastxFile("longest.fa")]
database = [x.sequence for x in FastxFile("af2db_hits.faa")]
database = pyopal.Database(database)

for sequence in target:
    try:
        results = database.search(sequence, mode="full", algorithm="nw")
    except RuntimeError:
        print("Failure", len(sequence))

data.zip

@valentynbez
Copy link
Contributor Author

valentynbez commented Aug 13, 2023

I tried to replicate it with randomly generated sequences, but now it always fails. Is this an alphabet issue?

import pyopal
from random import choice
alphabet = 'ACDEFGHIKLMNPQRSTVWY'
# generate proteins with length from 1k to 36k in steps of 1k
proteins = ["".join([choice(alphabet) for _ in range(i)]) for i in range(1000, 36000, 1000)]
database = pyopal.Database(proteins)

for sequence in proteins:
    try:
        results = database.search(sequence, mode="score", algorithm="nw")
    except RuntimeError:
        print("Failure", len(sequence))

@althonos
Copy link
Owner

I'll have a look when I have more time, I'm also getting failures with the random sequences.

@althonos
Copy link
Owner

Overflow detection was turned off for 32-bit integer scoring. This only affected the non-SW alignment modes, as SW uses a different function where overflow worked as expected. I updated the code in my fork of Opal.

@althonos
Copy link
Owner

Fixed in v0.4.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants