-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check interfaces, remove NotImplemented errors #140
Comments
This seems like a good idea, should we spitball what those interfaces actually are? |
Okay, so I decided to not make another issue. Here is my proposal for the interface for BioSequence and Alphabet. There are already inferfaces stated in the documentation, but if you actually try to implement a new sequence/alphabet type, you will find that they are insufficient (cc @benjward ) The following types are assumed to have the listed properties. You may create subtypes of that violate these properties, but then any fallback functions may be excessively slow or wrong. What is an Alphabet?
What to implement?Every subtype
If you want interoperation with existing subtypes of
Minimal exampleIn this example, the element type is struct ReducedAAAlphabet <: Alphabet end
Base.eltype(::Type{ReducedAAAlphabet}) = AminoAcid
BioSequences.BitsPerSymbol(::ReducedAAAlphabet) = BioSequences.BitsPerSymbol{4}()
function BioSequences.symbols(::ReducedAAAlphabet)
(AA_L, AA_C, AA_A, AA_G, AA_S, AA_T, AA_P, AA_F,
AA_W, AA_E, AA_D, AA_N, AA_Q, AA_K, AA_H, AA_M)
end
const (ENC_LUT, DEC_LUT) = let
enc_lut = fill(0xff, length(alphabet(AminoAcid)))
dec_lut = fill(AA_A, length(symbols(ReducedAAAlphabet())))
for (i, aa) in enumerate(symbols(ReducedAAAlphabet()))
enc_lut[reinterpret(UInt8, aa) + 0x01] = i - 1
dec_lut[i] = aa
end
(Tuple(enc_lut), Tuple(dec_lut))
end
function encode(::ReducedAAAlphabet, aa::AminoAcid)
i = reinterpret(UInt8, aa) + 0x01
(i ≥ length(ENC_LUT) || @inbounds ENC_LUT[i] === 0xff) && throw(DomainError(aa))
(@inbounds ENC_LUT[i]) % UInt
end
function decode(::ReducedAAAlphabet, x::UInt)
x ≥ length(DEC_LUT) && throw(DomainError(aa))
@inbounds DEC_LUT[x + UInt(1)]
end What is a BioSequence?
What to implement?Every subtype
Furthermore, mutable sequences should implement
For compatibility with existing Minimal examplestruct Codon <: BioSequence{RNAAlphabet{2}}
x::UInt8
end
function Codon(it)
length(it) == 3 || error("Must have length 3")
x = zero(UInt)
for (i, nt) in enumerate(it)
x |= BioSequences.encode(Alphabet(Codon), convert(RNA, nt)) << (6-2i)
end
Codon(x)
end
Base.length(::Codon) = 3
BioSequences.encoded_data_eltype(::Type{Codon}) = UInt
function BioSequences.extract_encoded_element(x::Codon, i::Int)
(x.x >>> ((i-1) & 7)) % UInt
end |
@jakobnissen So these two examples have made their way into the tests, and I can include them in the manual as example custom types if necessary. Can this close? |
Yes! Do they actually work? 😅 I kind of forgot if the custom types were
thoroughly checked. Should I have a look tomorrow, or have you already
tested the interfaces? If so, please do close this!
…On Wed, Jun 23, 2021, 20:35 Ben J. Ward ***@***.***> wrote:
@jakobnissen <https://github.com/jakobnissen> So these two examples have
made their way into the tests, and I can include them in the manual as
example custom types if necessary. Can this close?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#140 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFQ6SXW5RX4ZU7RM3OEDC7TTUISPFANCNFSM4Y76QFGQ>
.
|
The tests all pass. But we should probably, do like in the blogpost suggests about anti-patterns, where there's a function that also other users can use, that tests any type conforms to the interface. |
I basically think this is done now, as #202 is merged and I've gone through the tests. It's not true that all tests also include an operation on a custom alphabet, but code coverage of the generic code paths should be good enough. |
Errors such as e.g.
BioSequences.jl/src/biosequence/biosequence.jl
Lines 21 to 30 in 0ba3d4b
It's not more useful than a MethodError, and it hampers static analysis from e.g. JET, which will probably be a reality before the end of this year. Much better to simply remove it. (also see this blogpost)
Instead, we should go through and verify we have covered and documented interfaces for:
Also, the tests should include custom biosequence and alphabet types to make sure some basic behaviour is covered by fallbacks.
The text was updated successfully, but these errors were encountered: