Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsers.jl version 2.5 appears to break OpenML.jl #150

Closed
ablaom opened this issue Nov 14, 2022 · 4 comments · Fixed by #151
Closed

Parsers.jl version 2.5 appears to break OpenML.jl #150

ablaom opened this issue Nov 14, 2022 · 4 comments · Fixed by #151
Labels
bug Something isn't working

Comments

@ablaom
Copy link

ablaom commented Nov 14, 2022

OpenML depends on ARFFFiles.jl, which in turn depends on Parsers.jl

No error is thrown below if I pin Parsers to 2.4:

julia> OpenML.load(42638)
ERROR: Invalid nominal "\"C85\"" in column 'cabin' of row 2, expecting one of "B45", "E31", "B57 B59 B63 B66", "B36", "A21", "C78", "D34", "D19", "A9", "D15", "D56", "C103", "C123", "C31", "C23 C25 C27", "F G63", "B61", "C53", "D43", "C130", "C132", "C101", "C55 C57", "B71", "C46", "C116", "F", "A29", "G6", "C6", "C28", "C51", "E46", "C54", "C97", "D22", "B10", "F4", "E45", "E52", "D30", "B58 B60", "E34", "C62 C64", "A11", "B11", "C80", "F33", "C85", "D37", "C86", "D21", "C89", "F E46", "A34", "D", "B26", "C22 C26", "B69", "C32", "B78", "F E57", "F2", "A18", "C106", "B51 B53 B55", "D10 D12", "E60", "E50", "E39 E41", "B52 B54 B56", "C39", "B24", "D28", "B41", "C7", "D40", "D38", "C105", "A6", "D33", "B30", "C52", "B28", "C83", "F G73", "A5", "D26", "C110", "E101", "F E69", "D47", "B86", "C2", "E33", "B19", "A7", "C49", "A32", "B4", "B80", "A31", "D36", "C93", "D35", "C87", "B77", "E67", "B94", "C125", "C99", "C118", "D7", "A19", "B49", "C65", "E36", "B18", "C124", "C91", "E40", "T", "C128", "B35", "C82", "B96 B98", "E10", "E44", "C104", "C111", "C92", "E38", "E12", "E63", "A14", "B37", "C30", "D20", "B79", "E25", "D46", "B73", "C95", "B38", "B39", "B22", "C70", "A16", "C68", "A10", "E68", "A20", "D50", "D9", "A23", "B50", "A26", "D48", "E58", "C126", "D49", "B5", "B20", "E24", "C90", "C45", "E8", "B101", "D45", "E121", "D11", "E77", "F38", "B3", "D6", "B82 B84", "D17", "A36", "B102", "E49", "C47", "E17", "A24", "C50", "B42" or "C148"
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] _readcolumns_readdatum
    @ ~/.julia/packages/ARFFFiles/o3ClW/src/ARFFFiles.jl:965 [inlined]
  [3] _readcolumns_readdatum
    @ ~/.julia/packages/ARFFFiles/o3ClW/src/ARFFFiles.jl:900 [inlined]
  [4] readcolumns(r::ARFFFiles.ARFFReader{IOStream}; opts_sq::Parsers.Options, opts_dq::Parsers.Options, date_opts_sq::Vector{Parsers.Options}, date_opts_dq::Vector{Parsers.Options}, maxbytes::Nothing, chunkbytes::Int64)
    @ ARFFFiles ~/.julia/packages/ARFFFiles/o3ClW/src/ARFFFiles.jl:847
  [5] #3
    @ ~/.julia/packages/OpenML/dTbTl/src/data.jl:87 [inlined]
  [6] load(::OpenML.var"#3#5"{Nothing}, ::String; opts::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ ARFFFiles ~/.julia/packages/ARFFFiles/o3ClW/src/ARFFFiles.jl:577
  [7] load
    @ ~/.julia/packages/ARFFFiles/o3ClW/src/ARFFFiles.jl:574 [inlined]
  [8] load(id::Int64; maxbytes::Nothing)
    @ OpenML ~/.julia/packages/OpenML/dTbTl/src/data.jl:87
  [9] load(id::Int64)
    @ OpenML ~/.julia/packages/OpenML/dTbTl/src/data.jl:69
 [10] top-level scope
    @ REPL[7]:1

Here I'm running in an env that has only OpenML 0.3.0 in it.

In both my working and failing environments, the version of ARFFFiles.jl is the same, namely 1.4.1.

@nickrobinson251
Copy link
Collaborator

nickrobinson251 commented Nov 15, 2022

Thanks for the report!

I think this issue comes from ARFFFiles comparing characters to the openquote character in the Parsers.Options ((opts::Parsers.Options).oq) in order to decide how to proceed with parsing (effectively trying to detect what oq should be used for the parsing). The issue in Parserss.jl v2.5 is that opts.oq is now a Parsers.Token whereas before it was a UInt8, and so ARFFFiles end up checking c == opts.oq which in Parsers.jl v2.4 was 0x22 == 0x22 (true) but in v2.5 is 0x22 == Parsers.Token(0x22) (false), and we then end up taking a different code path in ARFFFiles and eventually hit the error you see.
https://github.com/cjdoris/ARFFFiles.jl/blob/4fa74951f8c1142440c3adfb6b2994414a8ea0d7/src/ARFFFiles.jl#L296

This seems like something we should fix in Parsers.jl. But i'm not yet sure what the best fix will be :)

@nickrobinson251
Copy link
Collaborator

Also ARFFFiles.jl does not seem to have any test, which is sad. Otherwise we could have added it to the Parsers.jl integration tests to help us make sure not to break it

@nickrobinson251
Copy link
Collaborator

nickrobinson251 commented Nov 15, 2022

Lol

julia> 0x22 == Parsers.Token(0x22)
false

julia> Parsers.Token(0x22) == 0x22
true

Okay, we literally just missed some == methods -- will put up a PR to fix!

@ablaom
Copy link
Author

ablaom commented Nov 15, 2022

@nickrobinson251 Thanks for the lightning action on this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants