Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

innerjoin gives odd result with missing values #34

Closed
c42f opened this issue Nov 16, 2018 · 3 comments · Fixed by #37
Closed

innerjoin gives odd result with missing values #34

c42f opened this issue Nov 16, 2018 · 3 comments · Fixed by #37

Comments

@c42f
Copy link
Contributor

c42f commented Nov 16, 2018

I just started trying out the current incarnation of TypedTables for real work, I love it. The tutorial is spot on btw.

I ran into the following oddity, if missing is added to the innerjoin example:

julia> customers = Table(id = 1:3, name = ["Alice", "Bob", "Charlie"], address = ["12 Beach Street", "163 Moon Road", "6 George Street"])

julia> orders = Table(customer_id = [2, 2, 3, 3], items = ["Socks", "Tie", "Shirt", missing])

julia> innerjoin(getproperty(:id), getproperty(:customer_id), customers, orders)
4-element Array{Any,1}:
 NamedTuple{(:id, :name, :address, :customer_id, :items),Tuple{Int64,String,String,Int64,Union{Missing, String}}}((2, "Bob", "163 Moon Road", 2, "Socks"))      
 NamedTuple{(:id, :name, :address, :customer_id, :items),Tuple{Int64,String,String,Int64,Union{Missing, String}}}((2, "Bob", "163 Moon Road", 2, "Tie"))        
 NamedTuple{(:id, :name, :address, :customer_id, :items),Tuple{Int64,String,String,Int64,Union{Missing, String}}}((3, "Charlie", "6 George Street", 3, "Shirt"))
 NamedTuple{(:id, :name, :address, :customer_id, :items),Tuple{Int64,String,String,Int64,Union{Missing, String}}}((3, "Charlie", "6 George Street", 3, missing))

No longer a table! What's more, this array of Any is not directly convertible to one either. I did the following nasty trick for now.

julia> Table(collect(NamedTuple, innerjoin(getproperty(:id), getproperty(:customer_id), customers, orders)))
Table with 5 columns and 4 rows:
     id  name     address          customer_id  items
   ┌───────────────────────────────────────────────────
 12   Bob      163 Moon Road    2            Socks
 22   Bob      163 Moon Road    2            Tie
 33   Charlie  6 George Street  3            Shirt
 43   Charlie  6 George Street  3            missing
@andyferris
Copy link
Member

Thanks for exploring :)

Yeah... so the first problem would probably be somehow related to inference. The implementation get's the eltype via Core.compiler.return_type and similar can only create a Table if it knows the column names and types via the new eltype... I'm a little suprised inference widened to Any. Speculatively, I will look into how the implementation of merge(::NamedTuple, ::NamedTuple) works when some elements are Union types (from memory it is a generated function that unrwarps the fields, at which point they will lose their Union box...).

The second part should maybe be addressed in Tables.jl? @quinnj what do you suppose we should do about the case that Vector{Any} is a table? It's like istable(x) is not true or false but maybe (and answering it definitively is actually O(N)...).

@andyferris
Copy link
Member

Wow... this typed code is more messed up than I thought....

julia> nt1 = customers[1]
(id = 1, name = "Alice", address = "12 Beach Street")

julia> nt2 = orders[1]
NamedTuple{(:customer_id, :items),Tuple{Int64,Union{Missing, String}}}((2, "Socks"))

julia> merge(nt1, nt2)
NamedTuple{(:id, :name, :address, :customer_id, :items),Tuple{Int64,String,String,Int64,Union{Missing, String}}}((1, "Alice", "12 Beach Street", 2, "Socks"))

julia> @code_typed merge(nt1, nt2)
CodeInfo(
228 1%1 = (Base.getfield)(a, :id)::Int64                                                                        │╻ macro expansion
    │   %2 = (Base.getfield)(a, :name)::String                                                                     ││
    │   %3 = (Base.getfield)(a, :address)::String                                                                  ││
    │   %4 = (Base.getfield)(b, :customer_id)::Int64                                                               ││
    │   %5 = (Base.getfield)(b, :items)::Union{Missing, String}                                                    ││
    │   %6 = (Core.tuple)(%1, %2, %3, %4, %5)::Tuple{Int64,String,String,Int64,Union{Missing, String}}             ││
    │   %7 = (NamedTuple{(:id, :name, :address, :customer_id, :items),Tuple{Int64,String,String,Int64,Union{Missing, String}}})(%6)::Any
    └──      return %7                                                                                             ││
) => Any

We can see my original hypothesis was wrong, %6's inferred type is exactly what we need.

The final line is super surprising. The constructor is for a concrete type and the input %6 matches perfectly. I think we should raise an issue for julia.

@andyferris
Copy link
Member

This was raised by Keno only a week ago: JuliaLang/julia#29970

andyferris pushed a commit that referenced this issue Nov 19, 2018
Mostly fixes #34. I think I can live with waiting for a proper fix of
JuliaLang/julia#29970 for more than 32 columns.
andyferris pushed a commit that referenced this issue Nov 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants