Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add method to horizontally concatenate two (or more) tables of possibly different type #30

Open
ablaom opened this issue Mar 28, 2022 · 1 comment

Comments

@ablaom
Copy link

ablaom commented Mar 28, 2022

This has come up in ML workflows; see JuliaAI/MLJ.jl#915. Would TableOperations.jl be the appropriate place for this?

What I have in mind is a simple concatenation - not a fancy join. So, if a column name of table1 appears in table2, then the table2 column just gets added with its name modified.

The tricky part is deciding on what the return type should be. I don't have fixed ideas about this, but perhaps if the tables do have the same type, and that is a sink type, then that is also the return type.

Although it is not a part of the public API, I see that TableTransforms.jl has an implementation. (To get the final table, the type of the first table is materialized.):

julia> table1
3×2 DataFrame
 Row │ x        z        
     │ Char     Float64  
─────┼───────────────────
   1 │ 𘂯       0.673471
   2\U3f846  0.360792
   3\Ud50cb  0.68075

julia> table2
(x = [0.41754294943943493, 0.7713462387833814, 0.9189998773436003], y = ['\U84fa1', '\U5e144', '\U872a4'])

julia> TableTransforms.tablehcat([table1, table2])
3×4 DataFrame
 Row │ x        z         x_        y       
     │ Char     Float64   Float64   Char    
─────┼──────────────────────────────────────
   1 │ 𘂯       0.673471  0.417543  \U84fa1
   2\U3f846  0.360792  0.771346  \U5e144
   3\Ud50cb  0.68075   0.919     \U872a4

cc @ExpandingMan @juliohm

@juliohm
Copy link

juliohm commented Mar 28, 2022

The public API in TableTransforms.jl that calls this internal function is the "union" operation:

julia> using TableTransforms

julia> t = (a=rand(10), b=rand(10))
(a = [0.6641423369533678, 0.8825002137646382, 0.09547491191702095, 0.08300878853401705, 0.7676971362813552, 0.9581504696200156, 0.567855526779016, 0.07678981780869187, 0.5567635903628834, 0.08891189315634984], b = [0.43196071275199466, 0.06530159708719874, 0.5762803761469641, 0.06943147111497461, 0.7115173288150275, 0.5255875672459875, 0.9022021113163965, 0.4923613837755302, 0.4019291861614135, 0.4260936690192283])

julia> t |> (Select(:a,:b)  Select(:a,:b))
(a = [0.6641423369533678, 0.8825002137646382, 0.09547491191702095, 0.08300878853401705, 0.7676971362813552, 0.9581504696200156, 0.567855526779016, 0.07678981780869187, 0.5567635903628834, 0.08891189315634984], b = [0.43196071275199466, 0.06530159708719874, 0.5762803761469641, 0.06943147111497461, 0.7115173288150275, 0.5255875672459875, 0.9022021113163965, 0.4923613837755302, 0.4019291861614135, 0.4260936690192283], a_ = [0.6641423369533678, 0.8825002137646382, 0.09547491191702095, 0.08300878853401705, 0.7676971362813552, 0.9581504696200156, 0.567855526779016, 0.07678981780869187, 0.5567635903628834, 0.08891189315634984], b_ = [0.43196071275199466, 0.06530159708719874, 0.5762803761469641, 0.06943147111497461, 0.7115173288150275, 0.5255875672459875, 0.9022021113163965, 0.4923613837755302, 0.4019291861614135, 0.4260936690192283])

We used to depend on TableOperations.jl for the lazy Select, but recently we added a lazy select in TableTransforms.jl directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants