New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
screed doesn't uppercase DNA in loaded records #1434
Milestone
Comments
Hey, this breaks all of our diginorm and trim-low-abund consistency checks. Sweet :)
|
Closed
Instead of using a macro, would it be any slower to use an inline function that raises an exception on an unknown character? |
A duplicate of #370, basically, except that back then we thought we'd handled it in scripts. |
This was referenced Sep 4, 2016
betatim
added a commit
to betatim/khmer
that referenced
this issue
Sep 5, 2016
Exploratory branch for dib-lab#1434. Use a lookup table and a function instead of a macro.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In the course of mucking about with hash functions, I discovered that there are multiple places in the code where bad data is fed to our hashing code.
More specifically,
twobit_repr
etc in kmer_hash.hh) intentionally does not handle lower case DNA characters (it treats them as 'G');This is an excellent combination of optimization (avoiding too much input DNA sanitization 'cause it makes hashing slow) with poorly specified library behavior by screed.
This affects a number of tests, which is how I found it. For example, tests/test_filter_abund.py::test_filter_abund_6_trim_high_abund_Z will fail if you uppercase
record.sequence
inkhmer/thread_utils.py::verbose_loader
.There are lots of possible solutions -
KHMER_EXTRA_SANITY_CHECKS
turned on;but this kind of thing has happened enough in the past that we should probably think about how to check for it more systematically. Thoughts welcome.
The text was updated successfully, but these errors were encountered: