Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MuscleCommandline example using stdin and stdout broken on Python 3 #284

Closed
brandoninvergo opened this issue Feb 10, 2014 · 6 comments
Closed

Comments

@brandoninvergo
Copy link
Contributor

Following the wiki example for using the MUSCLE commandline interface under Python 3 fails at the stage in which you write the SeqRecords to the child process's stdin. In Python 3, string data cannot be written to this buffer; only bytes can:

>>> SeqIO.write(records, child.stdin, "fasta")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.3/site-packages/Bio/SeqIO/__init__.py", line 463, in write
    count = writer_class(fp).write_file(sequences)
  File "/usr/lib/python3.3/site-packages/Bio/SeqIO/Interfaces.py", line 266, in write_file
    count = self.write_records(records)
  File "/usr/lib/python3.3/site-packages/Bio/SeqIO/Interfaces.py", line 251, in write_records
    self.write_record(record)
  File "/usr/lib/python3.3/site-packages/Bio/SeqIO/FastaIO.py", line 189, in write_record
    self.handle.write(">%s\n" % title)
TypeError: 'str' does not support the buffer interface
>>> child.stdin.write("blah")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' does not support the buffer interface
>>> child.stdin.write("blah".encode())
4

However, trying to get around this by writing encoded bytes to a Seq object (understandably) fails:

>>> SeqRecord(Seq("ATCG".encode()), id="foo", description="bar")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.3/site-packages/Bio/Seq.py", line 106, in __init__
    raise TypeError("The sequence data given to a Seq object should "
TypeError: The sequence data given to a Seq object should be a string (not another Seq object etc)

I'm not sure whether this should be fixed in Biopython or the example should be removed or updated in the Wiki (I haven't yet found a workaround, other than just skipping keeping everything in memory and using intermediate files).

@peterjc
Copy link
Member

peterjc commented Feb 10, 2014

How are you creating the child process with subprocess, and in particular did you ask for unicode rather than bytes in the stdin/stdout/stderr handles by using the (oddly named) universal_newlines=True option?

[Update: You said wiki example but pointed at the tutorial, if the universal_newlines=True trick works would you like to prepare an update to Doc/Tutorial.tex adding it?]

We use this option in Bio.Applications so that the everything is a (unicode) string rather than bytes. This seemed best for most of the command line tools we wrap (one exception is binary formats like SFF or BAM).

@brandoninvergo
Copy link
Contributor Author

The universal_newlines option did the trick. I probably would not have figured that out unless I reached the point of desperation, since the option name seems quite unrelated to the effect in this case.

Thanks.

@peterjc
Copy link
Member

peterjc commented Feb 10, 2014

I presume the thought process was that handling DOS/Windows vs Old Mac vs Unix newlines already required a handle wrapper, where file pointers via seek/tell are not transparent, and so that universal newlines code was co-opted during the bytes/unicode separation for Python 3. But is it a stupidly named argument :(

@peterjc
Copy link
Member

peterjc commented Feb 16, 2014

Tutorial example updated 522a693 - thanks for reporting this.

@peterjc peterjc closed this as completed Feb 16, 2014
@brandoninvergo
Copy link
Contributor Author

Sorry, I missed your edit about updating the tutorial. I would have been happy to update it!

@peterjc
Copy link
Member

peterjc commented Feb 19, 2014

With hindsight an extra comment would have been better than adding the note to the existing comment. Never mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants