Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An isconvertible check #3631

Closed
ViralBShah opened this issue Jul 5, 2013 · 8 comments
Closed

An isconvertible check #3631

ViralBShah opened this issue Jul 5, 2013 · 8 comments

Comments

@ViralBShah
Copy link
Member

Working with real world data, it is common to encounter poorly formatted files and such. One thing I find myself frequently doing is something like this in DataFrames:

julia> int(df["Age"])
ERROR: ArgumentError("'F' is not a valid digit (in \"F\")")
 in parseint at string.jl:1209
 in int at string.jl:1242
 in map_to2 at abstractarray.jl:1450
 in map at abstractarray.jl:1459
 in int at /Users/viral/.julia/DataFrames/src/dataarray.jl:746

The Age column should been integers, but there is some bad data in the column, as a result of which readtable left the column as a string. When I try to use int, it is obvious that the data is corrupt, but I have no idea where it is corrupt.

I often find myself doing something like:

[ try int(df["Age"][i]) catch end for i in 1:nrow(df) ]

I think that an isconvertible function would be generally useful, which takes the same arguments as convert, but returns a boolean based on whether the conversion is possible or not.

@johnmyleswhite
Copy link
Member

This would be great. I feel like this even came when I last wrote about the type inference code for DataFrames IO.

Jeff's response, which was quite correct, was that it's generally easier to try the conversion and only report if it failed. In DataFrames IO, you'll see that we now do exactly that by always returning a triple: (1) converted value, (2) did conversion succeed, (3) was value missing.

@ViralBShah
Copy link
Member Author

Trying the conversion and reporting if it failed is also ok - and may end up being higher performance too. I am open to other ideas too, but I feel like this is essential to have for most of the convert functions in Base.

@johnmyleswhite
Copy link
Member

I totally agree. As I've said before, I think we need a lot more tools for type conversion in Base.

@JeffBezanson
Copy link
Member

This would be nice. It is a tricky design problem though, since having separate isconvertible and convert duplicates logic, and is potentially slow from doing almost the same work twice. convert is also extraordinarily performance sensitive.

@vtjnash
Copy link
Member

vtjnash commented Dec 21, 2014

with eventual debugging support, we should be able to read the frame info and pop the arguments out of their stack locations. although that would require being able to do that rather expensive operation in before unwinding the stack for the exception handler

perhaps this can be closed by #9316 (comment) and Nullable{Float64}, etc?

tanmaykm added a commit to tanmaykm/julia that referenced this issue Dec 29, 2014
Introduces following methods that parse a string as the indicated type and return a `Nullable` with the result instead of throwing exception:
- `maybeint{T<:Integer}(::Type{T<:Integer},s::AbstractString)`
- `maybefloat32(s::AbstractString)` and `maybefloat64(s::AbstractString)`

Ref: discussions at JuliaLang#9316, JuliaLang#3631, JuliaLang#5704
tanmaykm added a commit to tanmaykm/julia that referenced this issue Dec 29, 2014
Introduces following methods that parse a string as the indicated type and return a `Nullable` with the result instead of throwing exception:
- `maybeint{T<:Integer}(::Type{T<:Integer},s::AbstractString)`
- `maybefloat32(s::AbstractString)` and `maybefloat64(s::AbstractString)`

Ref: discussions at JuliaLang#9316, JuliaLang#3631, JuliaLang#5704
tanmaykm added a commit to tanmaykm/julia that referenced this issue Dec 29, 2014
Introduces following methods that parse a string as the indicated type and return a `Nullable` with the result instead of throwing exception:
- `maybeint{T<:Integer}(::Type{T<:Integer},s::AbstractString)`
- `maybefloat32(s::AbstractString)` and `maybefloat64(s::AbstractString)`

Ref: discussions at JuliaLang#9316, JuliaLang#3631, JuliaLang#5704
tanmaykm added a commit to tanmaykm/julia that referenced this issue Mar 11, 2015
Introduces following methods that parse a string as the indicated type and return a `Nullable` with the result instead of throwing exception:
- `maybeint{T<:Integer}(::Type{T<:Integer},s::AbstractString)`
- `maybefloat32(s::AbstractString)` and `maybefloat64(s::AbstractString)`

Ref: discussions at JuliaLang#9316, JuliaLang#3631, JuliaLang#5704
tanmaykm added a commit to tanmaykm/julia that referenced this issue Mar 11, 2015
Introduces following methods that parse a string as the indicated type and return a `Nullable` with the result instead of throwing exception:
- `maybeint{T<:Integer}(::Type{T<:Integer},s::AbstractString)`
- `maybefloat32(s::AbstractString)` and `maybefloat64(s::AbstractString)`

Ref: discussions at JuliaLang#9316, JuliaLang#3631, JuliaLang#5704
tanmaykm added a commit to tanmaykm/julia that referenced this issue Mar 11, 2015
Introduces following methods that parse a string as the indicated type and return a `Nullable` with the result instead of throwing exception:
- `maybeint{T<:Integer}(::Type{T<:Integer},s::AbstractString)`
- `maybefloat32(s::AbstractString)` and `maybefloat64(s::AbstractString)`

Ref: discussions at JuliaLang#9316, JuliaLang#3631, JuliaLang#5704
tanmaykm added a commit to tanmaykm/julia that referenced this issue Mar 12, 2015
Introduces following methods that parse a string as the indicated type and return a `Nullable` with the result instead of throwing exception:
- `maybeint{T<:Integer}(::Type{T<:Integer},s::AbstractString)`
- `maybefloat32(s::AbstractString)` and `maybefloat64(s::AbstractString)`

Ref: discussions at JuliaLang#9316, JuliaLang#3631, JuliaLang#5704
tanmaykm added a commit to tanmaykm/julia that referenced this issue Mar 13, 2015
Introduces following methods that parse a string as the indicated type and return a `Nullable` with the result instead of throwing exception:
- `maybeint{T<:Integer}(::Type{T<:Integer},s::AbstractString)`
- `maybefloat32(s::AbstractString)` and `maybefloat64(s::AbstractString)`

Ref: discussions at JuliaLang#9316, JuliaLang#3631, JuliaLang#5704
tanmaykm added a commit to tanmaykm/julia that referenced this issue Mar 17, 2015
Introduces the tryparse method:
- tryparse{T<:Integer}(::Type{T<:Integer},s::AbstractString)
- tryparse(::Type{Float..},s::AbstractString)
- a few variants of the above

And:
- tryparse(Float.., ...) call the corresponding C functions jl_try_strtof, jl_try_substrtof, jl_try_strtod and jl_try_substrtod.
- The parseint, parsefloat, float64_isvalid and float32_isvalid methods wrap the corresponding tryparse methods.
- The jl_strtod, jl_strtof, ... functions are wrappers over the jl_try_str... functions.

This should fix JuliaLang#10498 as well.

Ref: discussions at JuliaLang#9316, JuliaLang#3631, JuliaLang#5704
tanmaykm added a commit to tanmaykm/julia that referenced this issue Mar 17, 2015
Introduces the tryparse method:
- tryparse{T<:Integer}(::Type{T<:Integer},s::AbstractString)
- tryparse(::Type{Float..},s::AbstractString)
- a few variants of the above

And:
- tryparse(Float.., ...) call the corresponding C functions jl_try_strtof, jl_try_substrtof, jl_try_strtod and jl_try_substrtod.
- The parseint, parsefloat, float64_isvalid and float32_isvalid methods wrap the corresponding tryparse methods.
- The jl_strtod, jl_strtof, ... functions are wrappers over the jl_try_str... functions.

This should fix JuliaLang#10498 as well.

Ref: discussions at JuliaLang#9316, JuliaLang#3631, JuliaLang#5704
tanmaykm added a commit to tanmaykm/julia that referenced this issue Mar 17, 2015
Introduces the tryparse method:
- tryparse{T<:Integer}(::Type{T<:Integer},s::AbstractString)
- tryparse(::Type{Float..},s::AbstractString)
- a few variants of the above

And:
- tryparse(Float.., ...) call the corresponding C functions jl_try_strtof, jl_try_substrtof, jl_try_strtod and jl_try_substrtod.
- The parseint, parsefloat, float64_isvalid and float32_isvalid methods wrap the corresponding tryparse methods.
- The jl_strtod, jl_strtof, ... functions are wrappers over the jl_try_str... functions.

This should fix JuliaLang#10498 as well.

Ref: discussions at JuliaLang#9316, JuliaLang#3631, JuliaLang#5704
@ViralBShah
Copy link
Member Author

Thoughts on whether we leave this open or close it?

@StefanKarpinski
Copy link
Member

What was originally wanted seems to be addressed by tryparse, no?

@KristofferC
Copy link
Member

Was gonna comment that tryparse seems to solve this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants