Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add accepted_field_type for abstract vectors #89

Merged
merged 3 commits into from
Jun 8, 2023
Merged

Conversation

hannahilea
Copy link
Contributor

ref apache/arrow-julia#452

Add accepted_field_types as proposed by @ericphanson in above issue. No additional tests added BUT main is currently failing when run on latest (v2.6) Arrow.jl (e.g. in #88), and no longer fails with this fix.

Copy link
Member

@ericphanson ericphanson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a stop-gap I think we should do this. I think there might be might be a more general solution, and maybe we can use Tables.is_equivalent_schema as Jacob linked in the Arrow issue somehow. I also think on the left-hand sides we might be able to make this conversion more general too somehow, instead of only widening it for Vector. Would be great to have @jrevels perspective.

But I would say we should merge this now to fix the breakage and take a step towards making Legolas more flexible.

@ericphanson
Copy link
Member

Maybe we should also file an issue to add more tests for this as a followup, to cover the changed lines and ensure we don't regress

src/schemas.jl Outdated Show resolved Hide resolved
src/schemas.jl Outdated
@@ -250,6 +253,9 @@ otherwise constitutes type piracy and should be avoided.
@inline accepted_field_type(::SchemaVersion, T::Type) = T
accepted_field_type(::SchemaVersion, ::Type{UUID}) = Union{UUID,UInt128}
accepted_field_type(::SchemaVersion, ::Type{Symbol}) = Union{Symbol,String}
accepted_field_type(::SchemaVersion, ::Type{Vector{String}}) = AbstractVector{<:AbstractString}
accepted_field_type(::SchemaVersion, ::Type{Vector{T}}) where T = AbstractVector{<:T}
Copy link
Member

@jrevels jrevels Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
accepted_field_type(::SchemaVersion, ::Type{Vector{T}}) where T = AbstractVector{<:T}
accepted_field_type(sv::SchemaVersion, ::Type{<:Vector{T}}) where T = AbstractVector{<:(accepted_field_type(sv, T))}

?

i think this should also obviate the need for the other accepted_field_type definition for Vector Vector{String} here

Copy link
Contributor Author

@hannahilea hannahilea Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this should also obviate the need for the other accepted_field_type definitions for Vector here

When accepted_field_type(::SchemaVersion, ::Type{Vector}) = AbstractVector is removed, we still hit error in this test:

┌ Warning: The `Tables.Schema` of the `Arrow.Table` read via `Legolas.read(io_or_path)` does not appear to
│ comply with the `Legolas.SchemaVersion` indicated by the table's metadata (`SchemaVersion("test.parent", 1)`). Try invoking
│ `Legolas.read(io_or_path; validate=false)` to inspect the table.
└ @ Legolas ~/Code/Legolas.jl/src/tables.jl:171
miscellaneous Legolas/src/tables.jl tests: Error During Test at /Users/hrobertson/Code/Legolas.jl/test/runtests.jl:468
  Test threw exception
  Expression: t == [NamedTuple(ParentV1(r)) for r = Tables.rows(Legolas.read(path))]
  ArgumentError: Tables.Schema violates Legolas schema `test.parent@1`:
   - Incorrect type: `x` expected `<:Vector`, found `SubArray{Int64, 1, Arrow.Primitive{Int64, Vector{Int64}}, Tuple{UnitRange{Int64}}, true}`
  Provided Tables.Schema:
   :x  SubArray{Int64, 1, Arrow.Primitive{Int64, Vector{Int64}}, Tuple{UnitRange{Int64}}, true}
   :y  String
  Stacktrace:
   [1] validate(ts::Tables.Schema{(:x, :y), Tuple{SubArray{Int64, 1, Arrow.Primitive{Int64, Vector{Int64}}, Tuple{UnitRange{Int64}}, true}, String}}, sv::SchemaVersion{Symbol("test.parent"), 1})

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, have to keep that one actually, my bad

hannahilea and others added 2 commits June 6, 2023 19:53
Co-authored-by: Jarrett Revels <jarrettrevels@gmail.com>
@hannahilea
Copy link
Contributor Author

Maybe we should also file an issue...

#90

@hannahilea hannahilea requested a review from jrevels June 7, 2023 18:35
@hannahilea hannahilea merged commit 1ca78bc into main Jun 8, 2023
3 of 5 checks passed
@hannahilea hannahilea deleted the hr/abstract-vector branch June 8, 2023 17:41
ararslan added a commit that referenced this pull request Jun 13, 2023
PR #89 fixed compatibility with Arrow v2.6 except when the schema
field's declared type is e.g. `Union{Vector{T},Missing}` and what's
passed for validation is some `AbstractVector`. This adds a method to
`accepted_field_type` that recurses on the non-`Missing` type in a
`Union` with `Missing`. Note that this also necessitated adding an
explicit `accepted_field_type` method for `Any` to avoid infinite
recursion with the fallback definition, since `(Union{T,Missing} where
{T}) == Any`.
ararslan added a commit that referenced this pull request Jun 13, 2023
PR #89 fixed compatibility with Arrow v2.6 except when the schema
field's declared type is e.g. `Union{Vector{T},Missing}` and what's
passed for validation is some `AbstractVector`. This adds a method to
`accepted_field_type` that recurses on the non-`Missing` type in a
`Union` with `Missing`. Note that this also necessitated adding an
explicit `accepted_field_type` method for `Any` to avoid infinite
recursion with the fallback definition, since `(Union{T,Missing} where
{T}) == Any`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants