Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tryparse(::BioSequence, s::AbstractString) or similar #224

Open
jakobnissen opened this issue May 5, 2022 · 3 comments
Open

Add tryparse(::BioSequence, s::AbstractString) or similar #224

jakobnissen opened this issue May 5, 2022 · 3 comments

Comments

@jakobnissen
Copy link
Member

Since we have parse, it would be nice to have tryparse. This would also enable e.g. users of FASTX to check if a record conformed to an Alphabet without throwing and catching errors.

@CiaranOMara
Copy link
Member

I don't understand the tryparse idea. It seems like an anti-pattern to me. To me, it moves a thrown error into an isnothing check, and the user still needs to handle the error just the same.

Maybe facilitating something like the following would provide a more productive upfront test?

symbols = unique(itr)

if issubset(symbols, alphabet)
    # do something
end

@jakobnissen
Copy link
Member Author

@CiaranOMara Right, so this is a complicated topic, and while I'm pretty sure I don't have any final, ideal solutions, I'm convinced that moving the thrown errors into an isnothing check is indeed preferable.

For a longer discussion on this, see JuliaLang/julia#43773 .

The core issue is that there are 2 problems with throwing errors in cases where you actually expect an error could happen:

  • It's slow. By design, Julia's try/catch is not optimised.
  • It can get very ugly very quickly, because when you catch an error, you don't know where the error come from. It could come from the function you just called, or it could be entirely different error way further down the call stack that you didn't expect, but now your code is accidentally "handling" it.

The advantage of tryparse over issubset or isalphabet that you can do the parsing and the checking in a single call (and in a single pass over the sequence).

That being said, in this particular case, maybe you're right that an option of checking the alphabet with an explicit issubset or something would maybe be nearly as good. I still think this is better:

# One alphabet
seq = tryparse(LongDNA{2}, myseq)
if seq isa LongSequence
   ...
end
# Multiple
for seqtype in [LongDNA{2}, LongAA]
    seq = tryparse(seqtype, myseq)
    if seq isa seqtype
    ....

Instead of

# Multiple
if can_parse(LongDNA{2}, myseq)
seq = parse(LongDNA{2}, myseq)
...
# Multiple
for seqtype in [LongDNA{2}, LongAA]
    if canparse(seqtype, myseq)
    seq = parse(seqtype, myseq)
    ....

One additional issue with tryparse is that the only error it returns is nothing, which says nothing about why the sequence could not parse, which in turns lead to terrible error messages for the end user: "ERROR: Sequence could not parse to LongDNA{2}". This is a problem, and I think it ought eventually be solved on the Julia level. What we can do for now is something like creating a ParseError type which contains all information relevant to the parsing error, then have tryparse return Union{LongDNA{2}, ParseError}.

@CiaranOMara
Copy link
Member

CiaranOMara commented Jun 13, 2022

To summarise, the proposed tryparse would optimistically attempt work, and if a known recoverable error occurs, it returns an error. The advantage of returning an error instead of throwing an error is that Julia's slow try/catch gets bypassed. Furthermore, the overhead of Julia's try/catch is unnecessary when handling known recoverable errors.

I prefer the idea of returning an error instead of nothing, as it is more likely that downstream code that doesn't handle a returned error will fail fast.

If I'm not mistaken, for the return Union{T, ParseError} approach, the custom error could still subtype Core.Exception? Though I'm not sure whether that subtyping would add anything.

It could come from the function you just called, or it could be entirely different error way further down the call stack that you didn't expect, but now your code is accidentally "handling" it.

You can still choose to handle your own errors and otherwise rethrow.

for seqtype in [LongDNA{2}, LongAA]
    seq = try
        parse(LongDNA{2}, myseq)
    catch e
        if e isa ParseError
           return nothing
        end
        rethrow()
    end
    if seq isa seqtype
    ...

However, I can see that the boilerplate for your suggestion is preferable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants