Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check on screed's attributes #1484

Closed
ctb opened this issue Oct 6, 2016 · 16 comments
Closed

Check on screed's attributes #1484

ctb opened this issue Oct 6, 2016 · 16 comments

Comments

@ctb
Copy link
Member

ctb commented Oct 6, 2016

Does 'accuracy' still exist on FASTA records?

#769 (comment)

Update: last remaining differences between khmer Read and screed Record objects seems to be description vs annotation.

@ctb ctb added the screed label Oct 6, 2016
@ctb ctb added this to the screed 1.0 milestone Oct 6, 2016
@luizirber
Copy link
Member

No, neither does quality.

  • khmer.ReadParser records also use annotations, while screed.screedRecord uses description.
  • khmer.Read has everything from the ID line in the name field, leaves annotations empty
  • screed.screedRecord has ID line split by spaces, first into name and remainder in description

@camillescott
Copy link
Member

Note that the quality attribute always seems to exist in the ReadParser's Read extension class, so the hasattr(read, "quality") paradigm doesn't work for them. Perhaps both implementations should switch to adding an extra format attribute.

@standage
Copy link
Member

If .quality is always present but never set for Fasta records, then it should be possible to switch the check from hasattr(record, "quality") to record.quality != "", no?

@camillescott
Copy link
Member

Yup, although screed just leaves it off entirely if it doesn't exist. I'd
prefer using a @property and returning None for the null case (checking
is None has less room for edge cases than empty string).

On Thu, Nov 17, 2016 at 9:50 AM, Daniel Standage notifications@github.com
wrote:

If .quality is always present but never set for Fasta records, then it
should be possible to switch the check from hasattr(record, "quality") to record.quality
!= "", no?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1484 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ACwxrbGPMKxN0A4rJ2I68uvDS4RsSTonks5q_JPYgaJpZM4KP8NG
.

Camille Scott

Graduate Group for Computer Science
Lab for Data Intensive Biology
University of California, Davis

camille.scott.w@gmail.com

@ctb
Copy link
Member Author

ctb commented Nov 18, 2016

This was discussed in #1468 and resolved to eliminating the quality attribute on FASTQ records, which was the opinion of me and @luizirber both. Following from that, we decided 'quality' should be eliminated from ReadParser results on FASTA sequences.

At this point I'm happy to reconsider (I kinda like the 'format' notion) but I would like a compleat (if brief) proposal that satisfies all of the discussion points raised in the various discussions, i.e. would require some work by someone :).

@ctb
Copy link
Member Author

ctb commented Nov 18, 2016

One extra consideration is whether Nanopore or PacBio add sequence attributes that we want to plan ahead for. I don't have a clear idea of this.

@betatim
Copy link
Member

betatim commented Nov 18, 2016

For me read = khmer.Read("blah", "ACGT"); hasattr(read, 'quality') works as expected (returns False). (However ipython tab-completes read.q to read.quality so somewhere there is a bug)

@ctb
Copy link
Member Author

ctb commented Dec 19, 2016

I believe #1484 (comment) is incorrect - 'description' is only set if parse_description=True.

@ctb
Copy link
Member Author

ctb commented Dec 19, 2016

But it seems like the bigger annoyance is that khmer now has 'annotations' and screed has 'description' (when parse_description is set)

@ctb
Copy link
Member Author

ctb commented Dec 24, 2016

Note that we must allow 'screed.Record()', as it is used in khmer 2.0 which depends on screed > 0.9. In future we should remember to bound requirements by next major version :)

@betatim
Copy link
Member

betatim commented Jan 2, 2017

See dib-lab/screed#64 for allowing Read() to work again.

@standage
Copy link
Member

screed.Record() compatibility between khmer and screed was restored with dib-lab/screed#64. As far as bringing reads/records into alignment between khmer and screed, it looks like there are two remaining discrepancies.

@ctb
Copy link
Member Author

ctb commented Jan 23, 2017

quality does not appear to be present in khmer.Read(...) if quality=None in constructor.

See test_read_parsers.py::test_read_type_basic and test_read_parsers.py::test_read_quality_none (added in #1559).

One remaining question - is it present when we read FASTA sequences?

@betatim
Copy link
Member

betatim commented Jan 23, 2017

In [2]: rp = khmer.ReadParser('tests/test-data/random-20-a.fa')

In [3]: for read in rp:
   ...:     print(read)
   ...:     print(hasattr(read, 'quality'))
   ...:     
<khmer.Read object at 0x7f31b9e4acc0>
False

Shall we immortalise by making it a test?

@ctb
Copy link
Member Author

ctb commented Jan 23, 2017

See #1583.

The only remaining issue here seems to be 'annotations' which we should be able to change in khmer (since AFAIK nothing uses it...)

@betatim
Copy link
Member

betatim commented Jan 23, 2017

I'm renaming annotations to description.

@betatim betatim closed this as completed Jan 31, 2017
@standage standage moved this from Slated to Closed in Screed 1.0 release Feb 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Development

No branches or pull requests

5 participants