Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parsing APIs that indicate whether something was possible, rather than raising an error? #5704

Closed
StefanKarpinski opened this issue Feb 6, 2014 · 13 comments
Labels
needs decision A decision on this change is needed

Comments

@StefanKarpinski
Copy link
Sponsor Member

See http://stackoverflow.com/questions/21595246/try-catch-or-type-conversion-performance-in-julia. We have lots of integer, float, expression parsing code that all just raises exceptions rather than indicating whether they succeeded or not, but we discourage try/catch as control flow. Should we provide non-error-raising versions of all of these functions that people can use to test whether some string is a valid float, for example?

@JeffBezanson
Copy link
Sponsor Member

See also #3631.
It might be enough to restrict this to parsing, and not worry about conversion in general. Maybe we could return (result, isvalid) tuples. Not sure how best to separate that from the case where you want an answer-or-error though.

@StefanKarpinski
Copy link
Sponsor Member Author

Parsing is definitely a bit different since it tends to occur with external inputs where it's more normal not to know what the correct type for something is. We could also have a "data parsing" API that parses something, figuring out what the most conservative type for it is – integer, floating-point, big int, big float, string. Something like that. Probably want to throw dates in there too. I don't think it needs to be an expandable list, but it should cover the kinds of basic data that data frames might want to be able to read and automatically determine the type of. I suspect the logic for some of this already must exist in DataFrames.

@ssfrr
Copy link
Contributor

ssfrr commented Feb 6, 2014

So it seems like the idioms under consideration are:

A

if can_do_the_thing(x)
    y = do_the_thing(x)
else
    # I couldn't do the thing!
end

B

try:
    y = do_the_thing(x)
catch:
    # I couldn't do the thing!
end

C

(y, success) = do_the_thing(x)
if !success
    # I couldn't do the thing!
end

Regarding A, As @JeffBezanson mentions often there's considerable code duplication and inefficiency in checking whether you can do something before doing it. It also exposes the user of the API to race conditions if the data you're operating on is mutable (or for instance checking whether you have permission to write to a file before opening it for writing).

B doesn't have those problems, but according to the SO thread exception catching is slow. Is that a long-term thing or an artifact of some current implementation details? It also sounds like this pattern is discouraged as not idiomatic Julia.

Returning a tuple that includes a status field addresses those issues, but it feels like the exception throwing idiom is more general and cleaner to me. I'm definitely biased from python experience though.

@stevengj
Copy link
Member

stevengj commented Feb 6, 2014

There is also

y = do_the_thing(x, default)
if y == default
   # I couldn't do the thing!
end

This pattern is already used in several places in Julia (e.g. get).

@JeffBezanson
Copy link
Sponsor Member

Exceptions certainly could be faster, but you often need to parse huge numbers of text values and it can easily be a bottleneck. Exceptions might not ever be fast enough for that.

@stevengj
Copy link
Member

stevengj commented Feb 6, 2014

I did a quick benchmark, comparing the exception-based isFloat and the non-exception float64_isvalid, on an array of 10^6 strings (50% of which are valid numbers), and it doesn't seem like exceptions are the bottleneck here:

function isFloat(s)
    try
        float64(s)
        return true
    catch
        return false
    end
end
function isFloat{T<:String}(ss::Array{T})
    v = true
    for s in ss
        v = v && isFloat(s)
    end
    return v
end
function isFloat2{T<:String}(ss::Array{T})
    out = [1.0]
    v = true
    for s in ss
        v = v && float64_isvalid(s, out)
    end              
    return v
end
r = rand(10^6); rs = ASCIIString[rand(1:2) == 1 ? "wrong" : string(R) for R in r]
@time isFloat(rs)
@time isFloat2(rs)

(Note: of course, the loops could be short-circuited as soon as a false is encountered, but I intentionally did not do this, for benchmark purposes.) The results are almost identical:

elapsed time: 0.001753407 seconds (96 bytes allocated)
elapsed time: 0.001695101 seconds (112 bytes allocated)

@johnmyleswhite
Copy link
Member

FWIW, DataFrames uses the following return values:

  • parsed_value
  • was_missing
  • wasvalid

@JeffBezanson
Copy link
Sponsor Member

@stevengj you are still short-circuiting via &&.

@JeffBezanson
Copy link
Sponsor Member

I'll also add that basically all of the time spent when throwing an exception is for collecting a stack trace.

Switching && to &, I get:

elapsed time: 13.885327767 seconds (16059932 bytes allocated)
elapsed time: 0.137299815 seconds (62236 bytes allocated)

Removing backtrace collection gives:

elapsed time: 0.165405547 seconds (16113012 bytes allocated)
elapsed time: 0.125439228 seconds (154064 bytes allocated)

Painful, indeed! The best way to work around this would be to know in advance which exception handlers want backtraces, which is a bit awkward.

@simonster
Copy link
Member

If we did a better job of generating code for union types, would it be possible to return a different type if parsing fails? Then you could either check for that or have your code throw when you try to use the value.

@stevengj
Copy link
Member

stevengj commented Feb 6, 2014

@JeffBezanson, whoops, I meant to write isFloat(s) && v. When I do that, I get a 100x slowdown, similar to you:

elapsed time: 34.781770763 seconds (16000048 bytes allocated)@time isFloat2(rs)
elapsed time: 0.224690652 seconds (112 bytes allocated)

@vtjnash
Copy link
Sponsor Member

vtjnash commented Dec 21, 2014

would be closed by #9316 (comment) Nullable{Float64} maybefloat64() suggestion

tanmaykm added a commit to tanmaykm/julia that referenced this issue Dec 29, 2014
Introduces following methods that parse a string as the indicated type and return a `Nullable` with the result instead of throwing exception:
- `maybeint{T<:Integer}(::Type{T<:Integer},s::AbstractString)`
- `maybefloat32(s::AbstractString)` and `maybefloat64(s::AbstractString)`

Ref: discussions at JuliaLang#9316, JuliaLang#3631, JuliaLang#5704
tanmaykm added a commit to tanmaykm/julia that referenced this issue Dec 29, 2014
Introduces following methods that parse a string as the indicated type and return a `Nullable` with the result instead of throwing exception:
- `maybeint{T<:Integer}(::Type{T<:Integer},s::AbstractString)`
- `maybefloat32(s::AbstractString)` and `maybefloat64(s::AbstractString)`

Ref: discussions at JuliaLang#9316, JuliaLang#3631, JuliaLang#5704
tanmaykm added a commit to tanmaykm/julia that referenced this issue Dec 29, 2014
Introduces following methods that parse a string as the indicated type and return a `Nullable` with the result instead of throwing exception:
- `maybeint{T<:Integer}(::Type{T<:Integer},s::AbstractString)`
- `maybefloat32(s::AbstractString)` and `maybefloat64(s::AbstractString)`

Ref: discussions at JuliaLang#9316, JuliaLang#3631, JuliaLang#5704
tanmaykm added a commit to tanmaykm/julia that referenced this issue Mar 11, 2015
Introduces following methods that parse a string as the indicated type and return a `Nullable` with the result instead of throwing exception:
- `maybeint{T<:Integer}(::Type{T<:Integer},s::AbstractString)`
- `maybefloat32(s::AbstractString)` and `maybefloat64(s::AbstractString)`

Ref: discussions at JuliaLang#9316, JuliaLang#3631, JuliaLang#5704
tanmaykm added a commit to tanmaykm/julia that referenced this issue Mar 11, 2015
Introduces following methods that parse a string as the indicated type and return a `Nullable` with the result instead of throwing exception:
- `maybeint{T<:Integer}(::Type{T<:Integer},s::AbstractString)`
- `maybefloat32(s::AbstractString)` and `maybefloat64(s::AbstractString)`

Ref: discussions at JuliaLang#9316, JuliaLang#3631, JuliaLang#5704
tanmaykm added a commit to tanmaykm/julia that referenced this issue Mar 11, 2015
Introduces following methods that parse a string as the indicated type and return a `Nullable` with the result instead of throwing exception:
- `maybeint{T<:Integer}(::Type{T<:Integer},s::AbstractString)`
- `maybefloat32(s::AbstractString)` and `maybefloat64(s::AbstractString)`

Ref: discussions at JuliaLang#9316, JuliaLang#3631, JuliaLang#5704
tanmaykm added a commit to tanmaykm/julia that referenced this issue Mar 12, 2015
Introduces following methods that parse a string as the indicated type and return a `Nullable` with the result instead of throwing exception:
- `maybeint{T<:Integer}(::Type{T<:Integer},s::AbstractString)`
- `maybefloat32(s::AbstractString)` and `maybefloat64(s::AbstractString)`

Ref: discussions at JuliaLang#9316, JuliaLang#3631, JuliaLang#5704
tanmaykm added a commit to tanmaykm/julia that referenced this issue Mar 13, 2015
Introduces following methods that parse a string as the indicated type and return a `Nullable` with the result instead of throwing exception:
- `maybeint{T<:Integer}(::Type{T<:Integer},s::AbstractString)`
- `maybefloat32(s::AbstractString)` and `maybefloat64(s::AbstractString)`

Ref: discussions at JuliaLang#9316, JuliaLang#3631, JuliaLang#5704
tanmaykm added a commit to tanmaykm/julia that referenced this issue Mar 17, 2015
Introduces the tryparse method:
- tryparse{T<:Integer}(::Type{T<:Integer},s::AbstractString)
- tryparse(::Type{Float..},s::AbstractString)
- a few variants of the above

And:
- tryparse(Float.., ...) call the corresponding C functions jl_try_strtof, jl_try_substrtof, jl_try_strtod and jl_try_substrtod.
- The parseint, parsefloat, float64_isvalid and float32_isvalid methods wrap the corresponding tryparse methods.
- The jl_strtod, jl_strtof, ... functions are wrappers over the jl_try_str... functions.

This should fix JuliaLang#10498 as well.

Ref: discussions at JuliaLang#9316, JuliaLang#3631, JuliaLang#5704
tanmaykm added a commit to tanmaykm/julia that referenced this issue Mar 17, 2015
Introduces the tryparse method:
- tryparse{T<:Integer}(::Type{T<:Integer},s::AbstractString)
- tryparse(::Type{Float..},s::AbstractString)
- a few variants of the above

And:
- tryparse(Float.., ...) call the corresponding C functions jl_try_strtof, jl_try_substrtof, jl_try_strtod and jl_try_substrtod.
- The parseint, parsefloat, float64_isvalid and float32_isvalid methods wrap the corresponding tryparse methods.
- The jl_strtod, jl_strtof, ... functions are wrappers over the jl_try_str... functions.

This should fix JuliaLang#10498 as well.

Ref: discussions at JuliaLang#9316, JuliaLang#3631, JuliaLang#5704
tanmaykm added a commit to tanmaykm/julia that referenced this issue Mar 17, 2015
Introduces the tryparse method:
- tryparse{T<:Integer}(::Type{T<:Integer},s::AbstractString)
- tryparse(::Type{Float..},s::AbstractString)
- a few variants of the above

And:
- tryparse(Float.., ...) call the corresponding C functions jl_try_strtof, jl_try_substrtof, jl_try_strtod and jl_try_substrtod.
- The parseint, parsefloat, float64_isvalid and float32_isvalid methods wrap the corresponding tryparse methods.
- The jl_strtod, jl_strtof, ... functions are wrappers over the jl_try_str... functions.

This should fix JuliaLang#10498 as well.

Ref: discussions at JuliaLang#9316, JuliaLang#3631, JuliaLang#5704
@JeffBezanson
Copy link
Sponsor Member

tryparse added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs decision A decision on this change is needed
Projects
None yet
Development

No branches or pull requests

8 participants