Skip to content

Commit

Permalink
Compatibility of schemas with nested types (#504)
Browse files Browse the repository at this point in the history
Hi,

Here is a minimal example of the issue I've encountered.

```julia
 struct A
    x::Int
end

struct B
    a::A
end

v = [B(A(i)) for i =1:3]

io = IOBuffer()
Arrow.write(io, v; file=false)
seekstart(io)
Arrow.append(io, v) # throws
```

I don't know if this is really necessary, or if I'm not using this
library properly, but this issue makes it difficult to append to arrow
files with nested types.

Since I've only added more cases where the call to `append` can succeed,
I do not think that this creates retro-compatibility issues.

Thanks for the review!

---------

Co-authored-by: Ben Baumgold <4933671+baumgold@users.noreply.github.com>
  • Loading branch information
poncito and baumgold committed May 5, 2024
1 parent ac199b0 commit 64fc730
Show file tree
Hide file tree
Showing 3 changed files with 44 additions and 3 deletions.
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
name = "Arrow"
uuid = "69666777-d1a9-59fb-9406-91d4454c9d45"
authors = ["quinnj <quinn.jacobd@gmail.com>"]
version = "2.7.1"
version = "2.7.2"

[deps]
ArrowTypes = "31f734f8-188a-4ce0-8406-c8a06bd891cd"
Expand Down
26 changes: 24 additions & 2 deletions src/append.jl
Original file line number Diff line number Diff line change
Expand Up @@ -282,9 +282,31 @@ function is_equivalent_schema(sch1::Tables.Schema, sch2::Tables.Schema)
for (t1, t2) in zip(sch1.types, sch2.types)
tt1 = Base.nonmissingtype(t1)
tt2 = Base.nonmissingtype(t2)
if t1 == t2 ||
(tt1 <: AbstractVector && tt2 <: AbstractVector && eltype(tt1) == eltype(tt2))
if t1 == t2
continue
elseif tt1 <: AbstractVector && tt2 <: AbstractVector && eltype(tt1) == eltype(tt2)
continue
elseif isstructtype(tt1) && isstructtype(tt2)
is_equivalent_type_by_field(tt1, tt2)
else
return false
end
end
true
end

function is_equivalent_type_by_field(T1, T2)
n1 = fieldcount(T1)
n2 = fieldcount(T2)
n1 != n2 && return false

for i = 1:n1
fieldname(T1, i) == fieldname(T2, i) || return false

if fieldtype(T1, i) == fieldtype(T2, i)
continue
elseif isstructtype(T1) && isstructtype(T2)
is_equivalent_type_by_field(T1, T2) || continue
else
return false
end
Expand Down
19 changes: 19 additions & 0 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -1042,5 +1042,24 @@ end
@test tbl.f[2] === Foo493(4, 5)
end
end

@testset "# 504" begin
struct Foo504
x::Int
end

struct Bar504
a::Foo504
end

v = [Bar504(Foo504(i)) for i = 1:3]
io = IOBuffer()
Arrow.write(io, v; file=false)
seekstart(io)
Arrow.append(io, v) # testing the compatility between the schema of the arrow Table, and the "schema" of v (using the fallback mechanism of Tables.jl)
seekstart(io)
t = Arrow.Table(io)
@test Arrow.Tables.rowcount(t) == 6
end
end # @testset "misc"
end

7 comments on commit 64fc730

@baumgold
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/106211

Tip: Release Notes

Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.

@JuliaRegistrator register

Release notes:

## Breaking changes

- blah

To add them here just re-invoke and the PR will be updated.

Tagging

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v2.7.2 -m "<description of version>" 64fc730f767de84835a5f1b4fc9b7831a3c2d15b
git push origin v2.7.2

@kou
Copy link
Member

@kou kou commented on 64fc730 May 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@baumgold Could you use this process https://github.com/apache/arrow-julia/blob/main/dev/release/README.md when we release a new version of this repository?

@baumgold
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kou - my mistake, apologies. It’s been too long and the process slipped my mind. I will in the future.

@kou
Copy link
Member

@kou kou commented on 64fc730 May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
We already registered this version to the General registry but I'll start a vote for this version to follow the ASF guideline.

@kou
Copy link
Member

@kou kou commented on 64fc730 May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kou
Copy link
Member

@kou kou commented on 64fc730 May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.