Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SeqRecord broken on Debian packages when -sql package is not installed #1471

Closed
kblin opened this issue Dec 8, 2017 · 7 comments
Closed

Comments

@kblin
Copy link
Contributor

kblin commented Dec 8, 2017

Setup

I am reporting a problem with Biopython version, Python version, and operating
system as follows:

2.7.13 (default, Jan 19 2017, 14:48:08) 
[GCC 6.3.0 20170118]
CPython
Linux-4.4.0-93-generic-x86_64-with-debian-9.1
1.68

This is a Debian 9 (stretch), with python-biopython installed and python-biopython-sql not installed.

Expected behaviour

Calling extract on a SeqFeature that's coming from a SeqRecord loaded from a GenBank file on disk works.

Actual behaviour

__getitem__ in SeqRecord imports BioSQL.BioSeq without a try/except block, causing the script to crash.

Traceback (most recent call last):
  File "/usr/local/bin/run_antismash.py", line 1210, in <module>
    main()
  File "/usr/local/bin/run_antismash.py", line 556, in main
    run_analyses(seq_record, options, plugins)
  File "/usr/local/bin/run_antismash.py", line 640, in run_analyses
    cluster_specific_analysis(plugins, seq_record, options)
  File "/usr/local/bin/run_antismash.py", line 1191, in cluster_specific_analysis
    plugin.specific_analysis(seq_record, options)
  File "/antismash-dev/antismash/specific_modules/nrpspks/specific_analysis.py", line 81, in specific_analysis
    pksnrpsvars = annotate_pksnrps(pksnrpsvars, seq_record, options)
  File "/antismash-dev/antismash/specific_modules/nrpspks/nrpspksdomains.py", line 28, in annotate_pksnrps
    name_nrpspks(seq_record, pksnrpsvars, withinclustergenes, options)
  File "/antismash-dev/antismash/specific_modules/nrpspks/nrpspksdomains.py", line 163, in name_nrpspks
    quals['translation'] = [str(motifFeature.extract(seq_record).seq.translate(table=transl_table))]
  File "/usr/lib/python2.7/dist-packages/Bio/SeqFeature.py", line 329, in extract
    return self.location.extract(parent_sequence)
  File "/usr/lib/python2.7/dist-packages/Bio/SeqFeature.py", line 936, in extract
    f_seq = parent_sequence[self.nofuzzy_start:self.nofuzzy_end]
  File "/usr/lib/python2.7/dist-packages/Bio/SeqRecord.py", line 439, in __getitem__
    from BioSQL.BioSeq import DBSeqRecord
ImportError: No module named BioSQL.BioSeq

Steps to reproduce

Set up a pristine Debian install like this:

docker run --rm -it debian:stretch
apt update && apt install -y python-biopython

Then run the following python script:

from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
seq = Seq("ATGGCAGGCATCTGCTAG")
rec = SeqRecord(seq)
rec[:5]
@kblin
Copy link
Contributor Author

kblin commented Dec 8, 2017

Just to confirm, installing python-biopython-sql in addition to python-biopython makes this problem go away, at the cost of pulling in client-side MariaDB.

@peterjc
Copy link
Member

peterjc commented Dec 8, 2017

As discussed on Twitter, due to change in Biopython 1.68 onwards 48cd6bc1 - CC @ctSkennerton

Ideally all the Bio code would work without importing from BioSQL (as per the Debian split packaging approach).

We could implement __getitem__ on the DBSeqRecord object (more code duplication for all the annotation handling), or perhaps make that isinstance check pass if the import fails (if the import fails, it can't be a DBSeqRecord object anyway)?

@peterjc
Copy link
Member

peterjc commented Dec 8, 2017

Cross reference https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=883860 (filed by @kblin)

@kblin
Copy link
Contributor Author

kblin commented Dec 8, 2017

I'm a bit stumped on how to write a meaningful test for this, otherwise I'd propose a patch that'd only do the DBSeqRecord logic if the import worked.

kblin added a commit to kblin/biopython that referenced this issue Dec 8, 2017
Currently, SeqRecord expects BioSQL to always be importable. That is not true on Debian packages,
where the -sql package is needed to have BioSQL around.

When BioSQL is not available, the instance in question should not be a DBSeqRecord, though.
That means the else branch of that check should be safe.

Signed-off-by: Kai Blin <kblin@biosustain.dtu.dk>
@kblin
Copy link
Contributor Author

kblin commented Dec 8, 2017

Not sure how to test if this works, though

@peterjc peterjc closed this as completed in cbbf861 Dec 8, 2017
@peterjc
Copy link
Member

peterjc commented Dec 8, 2017

Closed via #1472 on the Biopython side, fix will be in our next release, i.e. Biopython 1.71

I suggest for Debian they back-port Kai's patch to whichever of the Biopython 1.68, 1.69 and 1.70 packages are still in active use.

@peterjc
Copy link
Member

peterjc commented Dec 8, 2017

In terms of testing this (outside of Debian), one could hack the setup.py and/or delete the installed BioSQL/*.py files from the Python path, and then try test_SeqRecord.py or Kai's minimal test case before and after the patch.

I have not done that (its bed time here), but perhaps I should have to be 100% sure we've fixed this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants