Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert Arrow-flavored eltypes to Julia-flavored eltypes on copy #98

Merged
merged 3 commits into from
Jan 5, 2021

Conversation

jrevels
Copy link
Contributor

@jrevels jrevels commented Jan 5, 2021

This improves consistency with the copy path for nullable columns, and is also important for avoiding unexpected situations with e.g. DataFrames operations that call copy internally.

Before:

julia> using Arrow, Dates, UUIDs, DataFrames

julia> tt = begin
           t = (a = [Nanosecond(0), Nanosecond(1)], b = [uuid4(), uuid4()], c = [missing, Nanosecond(1)])
           io = IOBuffer()
           Arrow.write(io, t)
           seekstart(io)
           Arrow.Table(io)
       end
Arrow.Table: (a = [Nanosecond(0), Nanosecond(1)], b = UUID[UUID("3258e377-fd8c-4305-9e6a-e1348c3ef362"), UUID("05473e64-0ac3-4951-a668-43d195813429")], c = Union{Missing, Nanosecond}[missing, Nanosecond(1)])

julia> copy(tt.a)
2-element Array{Arrow.Duration{Arrow.Flatbuf.TimeUnitModule.NANOSECOND},1}:
 Arrow.Duration{Arrow.Flatbuf.TimeUnitModule.NANOSECOND}(0)
 Arrow.Duration{Arrow.Flatbuf.TimeUnitModule.NANOSECOND}(1)

julia> copy(tt.b)
2-element Array{UInt128,1}:
 0x3258e377fd8c43059e6ae1348c3ef362
 0x05473e640ac34951a66843d195813429

julia> copy(tt.c)
2-element Array{Union{Missing, Nanosecond},1}:
 missing
 1 nanosecond
 
 julia> transform(identity, DataFrame(tt))
2×3 DataFrame
 Row │ a                        b                                  c
     │ Duration                UInt128                            Nanoseco?
─────┼──────────────────────────────────────────────────────────────────────────
   1Duration{NANOSECOND}(0)  94652081088113183412278002734480  missing
   2Duration{NANOSECOND}(1)  16916473316481761014725308728310  1 nanosecond

after:

julia> using Arrow, Dates, UUIDs, DataFrames

julia> tt = begin
           t = (a = [Nanosecond(0), Nanosecond(1)], b = [uuid4(), uuid4()], c = [missing, Nanosecond(1)])
           io = IOBuffer()
           Arrow.write(io, t)
           seekstart(io)
           Arrow.Table(io)
       end
Arrow.Table: (a = [Nanosecond(0), Nanosecond(1)], b = UUID[UUID("43650671-ccf7-46cb-8fa9-a0c63b7f2e7b"), UUID("828d37d7-975f-4d6e-89bb-3a2f48ed1b10")], c = Union{Missing, Nanosecond}[missing, Nanosecond(1)])

julia> copy(tt.a)
2-element Array{Nanosecond,1}:
 0 nanoseconds
 1 nanosecond

julia> copy(tt.b)
2-element Array{UUID,1}:
 UUID("43650671-ccf7-46cb-8fa9-a0c63b7f2e7b")
 UUID("828d37d7-975f-4d6e-89bb-3a2f48ed1b10")

julia> copy(tt.c)
2-element Array{Union{Missing, Nanosecond},1}:
 missing
 1 nanosecond
 
 julia> transform(identity, DataFrame(tt))
2×3 DataFrame
 Row │ a              b                                  c
     │ Nanoseco      UUID                               Nanoseco?
─────┼────────────────────────────────────────────────────────────────
   10 nanoseconds  43650671-ccf7-46cb-8fa9-a0c63b7f  missing
   21 nanosecond   828d37d7-975f-4d6e-89bb-3a2f48ed  1 nanosecond

@codecov
Copy link

codecov bot commented Jan 5, 2021

Codecov Report

Merging #98 (71ee79c) into main (f92592c) will increase coverage by 0.29%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #98      +/-   ##
==========================================
+ Coverage   83.68%   83.98%   +0.29%     
==========================================
  Files          23       23              
  Lines        2673     2673              
==========================================
+ Hits         2237     2245       +8     
+ Misses        436      428       -8     
Impacted Files Coverage Δ
src/arraytypes/primitive.jl 80.00% <100.00%> (+6.00%) ⬆️
src/arraytypes/arraytypes.jl 90.00% <0.00%> (+1.11%) ⬆️
src/eltypes.jl 85.16% <0.00%> (+1.69%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f92592c...71ee79c. Read the comment docs.

src/arraytypes/primitive.jl Outdated Show resolved Hide resolved
src/arraytypes/primitive.jl Outdated Show resolved Hide resolved
@quinnj quinnj merged commit 1a5d6e4 into apache:main Jan 5, 2021
@jrevels jrevels deleted the jr/copyconvert branch January 6, 2021 00:14
tanmaykm pushed a commit to tanmaykm/arrow-julia that referenced this pull request Apr 7, 2021
…che#98)

* convert Arrow-flavored eltypes to Julia-flavored eltypes on copy

* Update src/arraytypes/primitive.jl

Co-authored-by: Jacob Quinn <quinn.jacobd@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants