Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Tables.getcolumn by index #86

Open
sefffal opened this issue Nov 2, 2021 · 4 comments
Open

Issue with Tables.getcolumn by index #86

sefffal opened this issue Nov 2, 2021 · 4 comments

Comments

@sefffal
Copy link

sefffal commented Nov 2, 2021

Accessing columns through Tables.getcolumn(table, name::Symbol) works as expected, but using Tables.getcolumn(table, ind::Int) does not.

Setup:

using Tables, TypedTables
table = Table(a=rand(300), b=rand(300))
table_nt = (;a=rand(300), b=rand(300))

Expected behaviour:

Tables.getcolumn(table_nt, 2)
300-element Vector{Float64}:
 0.7419591651104771
 0.03643357962428917
 0.511973946658012
 0.7525280472737248
 0.5312671306022833
...

This works with simple named tuples of vectors, as well as DataFrames.

Observed behaviour:

julia> Tables.getcolumn(table, 2)
ERROR: BoundsError: attempt to access 300-element Table{NamedTuple{(:a, :b), Tuple{Float64, Float64}}, 1, NamedTuple{(:a, :b), Tuple{Vector{Float64}, Vector{Float64}}}} at index [2]
Stacktrace:
 [1] getcolumn(x::Table{NamedTuple{(:a, :b), Tuple{Float64, Float64}}, 1, NamedTuple{(:a, :b), Tuple{Vector{Float64}, Vector{Float64}}}}, i::Int64)
   @ Tables C:\Users\William\.julia\packages\Tables\i6z2B\src\Tables.jl:101
 [2] top-level scope
   @ REPL[65]:1

However, using index 1 returns all columns which is not useful:

julia> Tables.getcolumn(table, 1)
(a = [0.7736170160574704, 0.32973335588180575, 0.17889965718253964, 0.7631323090473862, 0.7800224219389631, 0.08040930668634005, 0.9557133954558753, 0.9979396219551491, 0.15894660237894975, 0.5680381167378448    0.6559116874983786, 0.7328418210533515, 0.4856581423782824, 0.33251283450523117, 0.08142486970852292, 0.2259648695642409, 0.39396960265088865, 0.7031534405558856, 0.10224220322748001, 0.14191199646807617], b = [0.017236706415861724, 0.5265418832740683, 0.4268344997706731, 0.46470458360887146, 0.8360733105726028, 0.6032125887699785, 0.9385924928402325, 0.7405311692330161, 0.4201266483743147, 0.9833490878965103    0.14241236909936195, 0.29289242214548683, 0.8408873927907317, 0.7439831490645507, 0.6205302905751314, 0.9686022965164416, 0.8139530289474524, 0.823492626767103, 0.04273546220284152, 0.44406075204392326])

Accessing by column name :a or :b works as expected.

Thanks!

@andyferris
Copy link
Member

@quinnj any advice on this one?

@quinnj
Copy link
Member

quinnj commented Nov 3, 2021

In the official "usage" of the Tables.jl interface, you're only guaranteed to be able to call Tables.getcolumn on either: 1) the object returned from Tables.columns(x), or 2) on each iterated element of the object returned by Tables.rows(x). For DataFrames.jl/NamedTuple of vectors, the objects themselves happen to get returned from Tables.columns, but in the case of Table, it's not. So if you do tbl = Tables.columns(table) first, you can get expect to call Tables.getcolumn on the result.

@andyferris
Copy link
Member

I see.

Is it good practice to extend some of these methods and opt into common behaviour? Or is it preferable to let users use the columns function?

@quinnj
Copy link
Member

quinnj commented Nov 3, 2021

All up to you; users of the Tables.jl API just need to make sure they follow the guidelines, which admittedly aren't the absolute most convenient form, but are really meant for "sink" authors in the end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants