Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SeqIO.parse side effects #4252

Open
mdehoon opened this issue Mar 3, 2023 · 5 comments
Open

SeqIO.parse side effects #4252

mdehoon opened this issue Mar 3, 2023 · 5 comments
Assignees

Comments

@mdehoon
Copy link
Contributor

mdehoon commented Mar 3, 2023

There is some inconsistency in exception handling between Python and Biopython when parsing files.

This is what happens in Python (using the binary file 310.ab1 in Tests/Abi as an example):

>>> stream = open("310.ab1")
>>> next(stream)
Traceback (most recent call last):
...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 15: invalid start byte
>>> stream.closed
False

i.e. the file stream stays open.

Biopython's SeqIO.parse, on the other hand, closes the file if an exception occurs:

>>> from Bio import SeqIO
>>> sequences = SeqIO.parse("310.ab1", "fasta")
>>> next(sequences)
Traceback (most recent call last):
...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 15: invalid start byte
>>> sequences.stream.closed
True

Python will close the file if a with-block is used:

>>> with open("310.ab1") as stream:
...     line = next(stream)
... 
Traceback (most recent call last):
...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 15: invalid start byte
>>> stream.closed
True

However, this is currently not supported by Biopython:

>>> with SeqIO.parse("310.ab1", "fasta") as sequences:
...     next(sequences)
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: __enter__

To be consistent with Python, Biopython should keep the file stream open if an exception occurs.

To ensure that file streams get closed if an exception occur, __enter__, __exit__ methods can be added to the SeqIO iterator to support with blocks.

@mdehoon mdehoon assigned mdehoon and peterjc and unassigned mdehoon Mar 3, 2023
@peterjc
Copy link
Member

peterjc commented Mar 3, 2023

Cross reference #4251

So you want to enable the SeqIO iterating parsers to be used as context managers? Perhaps adding the methods to the base class would be enough...

I'm not going to find time to work on this soon (and don't fully understand your motivating use case), but would be happy to review a PR if you wanted to do this.

@mdehoon
Copy link
Contributor Author

mdehoon commented Mar 3, 2023

So you want to enable the SeqIO iterating parsers to be used as context managers? Perhaps adding the methods to the base class would be enough...

Yes I think so.

I'm not going to find time to work on this soon (and don't fully understand your motivating use case), but would be happy to review a PR if you wanted to do this.

I don't think it's a major issue; it's just that Biopython is deviating from Python's approach. If flake or black could detect such issues, it would be flagged as an error. Let's keep this PR open until somebody find time to fix this.

@BabaYaga1221
Copy link
Contributor

Good morning @mdehoon @peterjc , Could you please provide me with additional information and the file structure related to this matter? Additionally, I am interested in working on this issue.
~Regards,
Farhan khan

@mdehoon
Copy link
Contributor Author

mdehoon commented Apr 10, 2023

@BabaYaga1221 In Bio/SeqIO/Interfaces.py, the should_close_stream stuff should be replaced by __enter__, __exit__ methods. You can have a look at the __enter__, __exit__ methods in Bio/Align/interfaces.py as a guide.

@BabaYaga1221
Copy link
Contributor

@mdehoon @peterjc I created a PR #4280. can you take a look? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants