Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add getrows #284

Merged
merged 6 commits into from
Aug 30, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,9 +197,11 @@ the table-specific use-case, knowing that it will Just Work™️.

Before moving on to _implementing_ the Tables.jl interfaces, we take a quick
break to highlight some useful utility functions provided by Tables.jl:

```@docs
Tables.Schema
Tables.schema
Tables.getrows
Tables.partitions
Tables.partitioner
Tables.rowtable
Expand Down Expand Up @@ -239,6 +241,7 @@ For a type `MyTable`, the interface to becoming a proper table is straightforwar
| **Optional methods** | | |
| `Tables.schema(x::MyTable)` | `Tables.schema(x) = nothing` | Return a [`Tables.Schema`](@ref) object from your `Tables.AbstractRow` iterator or `Tables.AbstractColumns` object; or `nothing` for unknown schema |
| `Tables.materializer(::Type{MyTable})` | `Tables.columntable` | Declare a "materializer" sink function for your table type that can construct an instance of your type from any Tables.jl input |
| `Tables.getrows(x::MyTable, inds; view)` | | Return a row or a sub-table of the original table

Based on whether your table type has defined `Tables.rows` or `Tables.columns`, you then ensure that the `Tables.AbstractRow` iterator
or `Tables.AbstractColumns` object satisfies the respective interface.
Expand Down
23 changes: 23 additions & 0 deletions src/Tables.jl
Original file line number Diff line number Diff line change
Expand Up @@ -565,6 +565,29 @@ struct Partitioner{T}
x::T
end

"""
getrows(x, inds; view=nothing)
CarloLucibello marked this conversation as resolved.
Show resolved Hide resolved

Return one or more rows from table `x` according to the position(s) specified by `inds`:

- If `inds` is a single integer return a row object.
- If `inds` is a collection of integers, return a table object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- If `inds` is a collection of integers, return a table object.
- If `inds` is a collection of integers, return an indexable object of rows

Just a bit more specific; one of the main motivations for this was to ensure users can get an in-memory/indexable collection of rows in a consistent way.

Copy link
Member

@nalimilan nalimilan Jul 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure we want to require Tables.getrows to return a indexable collection of rows? That would mean that e.g. for a NamedTuple of vectors getrows couldn't return another NamedTuple of vectors, even if that's the most efficient representation. IOW this is against the implementation that this PR adds for ColumnTable. :-)

In this case, the returned type is not necessarily the same as the original table type.

The `view` argument influences whether the returned object is a view of the original table
or an independent copy:

- If `view=nothing` (the default) then the implementation for a specific table type
is free to decide whether to return a copy or a view.
- If `view=true` then a view is returned and if `view=false` a copy is returned.
This applies both to returning a row or a table.

Any specialized implementation of `getrows` must support the `view=nothing` argument.
Support for `view=true` or `view=false` instead can be an opt-in
quinnj marked this conversation as resolved.
Show resolved Hide resolved
(i.e. implementations might error on them if they are not supported).
"""
function getrows end

"""
Tables.partitioner(f, itr)
Tables.partitioner(x)
Expand Down
16 changes: 16 additions & 0 deletions src/namedtuples.jl
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,14 @@ function rowtable(itr::T) where {T}
return collect(namedtupleiterator(eltype(r), r))
end

function getrows(x::RowTable, inds; view::Union{Bool,Nothing} = nothing)
if view === true
return view(x, inds)
CarloLucibello marked this conversation as resolved.
Show resolved Hide resolved
else
return x[inds]
end
end

# NamedTuple of arrays of matching dimensionality
const ColumnTable = NamedTuple{names, T} where {names, T <: NTuple{N, AbstractArray{S, D} where S}} where {N, D}
rowcount(c::ColumnTable) = length(c) == 0 ? 0 : length(c[1])
Expand Down Expand Up @@ -173,3 +181,11 @@ function columntable(itr::T) where {T}
return columntable(schema(cols), cols)
end
columntable(x::ColumnTable) = x

function getrows(x::ColumnTable, inds; view::Union{Bool,Nothing} = nothing)
if view === true
return map(c -> view(c, inds), x)
CarloLucibello marked this conversation as resolved.
Show resolved Hide resolved
else
return map(c -> c[inds], x)
end
end
4 changes: 4 additions & 0 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -798,4 +798,8 @@ Tables.columnnames(::WideTable2) = [Symbol("x", i) for i = 1:1000]
@test nm isa Symbol
@test col isa Vector{Float64}
end

@testset "getrows" begin
Tables.getrows isa Function
end
end