Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add default parameter to numerical tryparse #37170

Closed
wants to merge 2 commits into from

Conversation

anaveragehuman
Copy link
Contributor

Should fix #37162, although I wonder if there's a cleaner way to do this.

@fredrikekre
Copy link
Member

I don't like this API. Quite a number of functions return nothing and leaves it to the caller to handle that case, for example the find*-family of functions. Should every such function support a different sentinel value?
Also, from the example in #37162: tryparse(Float64, "BBB", missing) is barely shorter than something(tryparse(Float64, "BBB"), missing). And IMO the something(...) way is more clear since tryparse(Float64, "1.0", missing) could potentially be interpreted as returning (1.0, nothing).

@KristofferC
Copy link
Sponsor Member

Agree that just using something here seems like like a perfect way of composing two things to get this exact feature for free.

@pdeffebach
Copy link
Contributor

That's fair.

I wish tryparse returned missing as the default. I understand why it doesn't, since it's essential for software-development use-cases, rather than data. But from the data aspect of things this seems like a perfect scenario for missing. We know that the value should be a Float64, but we unfortunately don't know what the value is since the input was bad.

Maybe a function can be added in Missings.jl that would make things easier for users. I don't feel great recommending people use something when they are just starting learning julia for the purposes of data cleaning.

@pdeffebach
Copy link
Contributor

Well, imagine if get(dict, :x) always returned nothing and we had users use something(get(dict, :x), default_value). That would be a pain! The logic is very similar for tryparse and the convenience of the third value being the default value exists elsewhere in Base.

@KristofferC
Copy link
Sponsor Member

We know that the value should be a Float64, but we unfortunately don't know what the value is since the input was bad.

Corrupted data vs missing data are two different things, to me at least.

@thofma
Copy link
Contributor

thofma commented Aug 24, 2020

Well, imagine if get(dict, :x) always returned nothing and we had users use something(get(dict, :x), default_value). That would be a pain! The logic is very similar for tryparse and the convenience of the third value being the default value exists elsewhere in Base.

I guess this is there to make it consistent with get!(dict, key, default_value).

@pdeffebach
Copy link
Contributor

How do you suppose the data became missing in the first place? ;)

But fair enough. @nalimilan do you have any ideas where parsing that returns missing could go? Or if you think users should learn nothing?

@nalimilan
Copy link
Member

Missings.jl would be a natural home if we wanted to add such a function. But this PR would sound like a more logical addition to me (I like the analogy with get).

@nalimilan nalimilan added the domain:missing data Base.missing and related functionality label Aug 24, 2020
base/parse.jl Outdated
# Zero base means, "figure it out"
tryparse_internal(T, s, firstindex(s), lastindex(s), base===nothing ? 0 : check_valid_base(base), false)
val = tryparse_internal(T, s, firstindex(s), lastindex(s), base===nothing ? 0 : check_valid_base(base), false)
isnothing(default) ? val : something(val, default)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT this should be either something(val, default) or default === nothing ? val : default, not both. The latter is safer if T can be <: Some. (=== nothing is better than isnothing for inference)

Have you considered the other approach, which is to pass default to tryparse_internal and have it return that instead of nothing? It could be slightly faster (not sure).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something(val, default) errors if both val and default are nothing, and default === nothing ? val : default will always return default if one is passed in.

I was hoping to avoid clutter by not propagating to tryparse_internal, but I'll run some benchmarks to see if there is a speedup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something(val, default) errors if both val and default are nothing, and default === nothing ? val : default will always return default if one is passed in.

Ah, yes, of course. Though using default === nothing instead of isnothing(default) might still be better for the compiler.

I was hoping to avoid clutter by not propagating to tryparse_internal, but I'll run some benchmarks to see if there is a speedup.

I can't be sure without trying, but I had the impression that adding the argument to tryparse could actually be cleaner, as it would just require replacing nothing with default everywhere, instead of adding special code to replace nothing after the fact. But I might be wrong.

Copy link
Contributor Author

@anaveragehuman anaveragehuman Aug 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a commit with these changes but didn't observe any significant difference once compiled into a sysimage. Also, tests are now failing as some stdlibs depend on Base.tryparse_internal, so I'm not convinced adding the argument to tryparse_internal is a good idea.

Another thing to consider is whether we should enforce default::Union{T,Missing,Nothing} or allow default::Any, in which case we would need a workaround for the Float16 implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a commit with these changes but didn't observe any significant difference once compiled into a sysimage.

Yeah, I wasn't saying it would necessarily make a difference in this case, it's just better in general.

Also, tests are now failing as some stdlibs depend on Base.tryparse_internal, so I'm not convinced adding the argument to tryparse_internal is a good idea.

Why not use default = nothing in tryparse_internal so that it keeps returning nothing if you don't provide the argument?

Another thing to consider is whether we should enforce default::Union{T,Missing,Nothing} or allow default::Any, in which case we would need a workaround for the Float16 implementation.

What's the problem with Float16?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use default = nothing in tryparse_internal so that it keeps returning nothing if you don't provide the argument?

Then you have a default value for default in both tryparse and tryparse_internal, which means you could accidentally stop passing default from tryparse to tryparse_internal in the future and not see any issues. Although I suppose that's mitigated by having a broad range of tests.

What's the problem with Float16?

tryparse(::Type{Float16}, s::AbstractString) =
    convert(Union{Float16, Nothing}, tryparse(Float32, s))

Float16 is "parsed" by parsing as a Float32 and then converting to Float16, so this would need something like val === nothing ? default : val. But from the above conversation

We know that the value should be a Float64, but we unfortunately don't know what the value is since the input was bad.

I think we might want to annotate the defaults with ::Union{T,Missing,Nothing} to be type-stable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you have a default value for default in both tryparse and tryparse_internal, which means you could accidentally stop passing default from tryparse to tryparse_internal in the future and not see any issues. Although I suppose that's mitigated by having a broad range of tests.

Right, but I don't think that's a problem. As you say, tests are there to check that it works.

Float16 is "parsed" by parsing as a Float32 and then converting to Float16, so this would need something like val === nothing ? default : val.

Yeah, but I guess it's OK as long as it's only in one method.

I think we might want to annotate the defaults with ::Union{T,Missing,Nothing} to be type-stable.

Why would this affect affect type stability? :-/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this affect affect type stability? :-/

I guess tryparse isn't a type-stable function to begin with since it could return nothing (maybe I'm misunderstanding type stability?), but if tryparse returns a value that is neither nothing or missing, shouldn't it be of type T? Or can we assume the user supplying default will handle it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tryparse(T, str, default) should be inferred by the compiler as returning Union{T, typeof(default)}, and yes, that's the user's job to handle the default he provided.

@@ -227,79 +227,81 @@ end
end

"""
tryparse(type, str; base)
tryparse(type, str, default; base)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tryparse(type, str, default; base)
tryparse(type, str, default=nothing; base)

base/parse.jl Outdated
# Zero base means, "figure it out"
tryparse_internal(T, s, firstindex(s), lastindex(s), base===nothing ? 0 : check_valid_base(base), false)
val = tryparse_internal(T, s, firstindex(s), lastindex(s), base===nothing ? 0 : check_valid_base(base), false)
isnothing(default) ? val : something(val, default)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a commit with these changes but didn't observe any significant difference once compiled into a sysimage.

Yeah, I wasn't saying it would necessarily make a difference in this case, it's just better in general.

Also, tests are now failing as some stdlibs depend on Base.tryparse_internal, so I'm not convinced adding the argument to tryparse_internal is a good idea.

Why not use default = nothing in tryparse_internal so that it keeps returning nothing if you don't provide the argument?

Another thing to consider is whether we should enforce default::Union{T,Missing,Nothing} or allow default::Any, in which case we would need a workaround for the Float16 implementation.

What's the problem with Float16?

@KristofferC
Copy link
Sponsor Member

KristofferC commented Aug 25, 2020

It's worrying to me how the data-community keeps trying to add features to Base that scales with O(n_methods) instead of trying to find solutions that compose with each other. First it was the Missing where some people have argued that a large portion of methods in Base (and packages!) should have an extra method for Missing. And now, methods that return nothing should also take a default argument. The solution is instead to have a feature that makes dealing with the return value of nothing easier but doesn't cause any extra burden to the function itself. This could be by using something or if that is not satisfactory, a @default tryparse(Int, "foo") 5 macro that expands to the nothing check or something along those lines. But not something that forces every author of a nothing-returning function to implement the default argument themselves in order to be consistent with the rest.

@nalimilan nalimilan added the status:triage This should be discussed on a triage call label Sep 1, 2020
@JeffBezanson
Copy link
Sponsor Member

I'm also against this. Returning nothing vs. non-nothing is a legitimate style of API used in many places. Callers should handle the nothing return as they see fit, instead of modifying the function to return something else.

@StefanKarpinski
Copy link
Sponsor Member

The argument that this is similar to get is not really valid: the default value for get is necessary because without the default argument get throws an error if there is no such key. The correct analogy here would have been to introduce parse(T, value, default) that returns the default if value cannot be parsed as T. Arguably, parse(T, value, nothing) would have been better than introducing tryparse(T, value).

@nalimilan
Copy link
Member

Indeed. Actually, tryparse was added by #9487 and #10543 at a time when we had Nullable{T} instead of Union{T, Nothing}. So using parse(T, value, nothing) wasn't an option at the time (I was in favor of using parse(Nullable{T}, value) but other argued that syntax looked like it should parse a value which could either be a T or a null).

Maybe worth revisiting now, in the perspective of deprecating tryparse in 2.0.

@pdeffebach
Copy link
Contributor

Bumping this, a 3 argument parse would be great. I think This discourse question is an example of when users might like it. They don't really want to know what nothing and Something are but they have string values that they know should be missing.

@oscardssmith oscardssmith removed the status:triage This should be discussed on a triage call label Nov 4, 2021
@oscardssmith
Copy link
Member

Triage says tryparse doesn't need a default since users can use something(tryparse).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:missing data Base.missing and related functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow custom return type for tryparse
9 participants