Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schema fails when selecting columns from a really wide table #20

Closed
OkonSamuel opened this issue Aug 13, 2021 · 3 comments
Closed

schema fails when selecting columns from a really wide table #20

OkonSamuel opened this issue Aug 13, 2021 · 3 comments

Comments

@OkonSamuel
Copy link

Hello
Due to a recent change in Tables.jl developers now need to also consider specializing on Tables.Schema{nothing, nothing} in addition to Tables.Schema{names, types}. see here.
Users will eventually pop into this error when selecting a few columns from a really wide table as shown below.

julia> ncols = 1000000
1000000

julia> df = DataFrame(rand(6, ncols), :auto);

julia> n = TableOperations.select(df, :x1, :x2);

julia> Tables.schema(n);
ERROR: MethodError: no method matching columntype(::Nothing, ::Nothing, ::Symbol)
@OkonSamuel
Copy link
Author

The following naive solution should do the trick. Maybe someone has a better fix?

function typesubset(sch::Tables.Schema{nothing, nothing}, nms::NTuple{N, Symbol}) where {N}
    names = sch.names
     types = sch.types
    return Tuple{Any[Tables.columntype(names, types, nm) for nm in nms]...}
end

function typesubset(sch::Tables.Schema{nothing, nothing}, inds::NTuple{N, Int}) where {N}
    types = sch.types
    return Tuple{Any[types[i] for i in inds]...}
end
typesubset(::Tables.Schema{nothing, nothing}, ::Tuple{}) = Tuple{}

namesubset(::Tables.Schema{nothing, nothing}, nms::NTuple{N, Symbol}) where {N} = nms
Base.@pure namesubset(::Tables.Schema{nothing, nothing}, inds::NTuple{N, Int}) where {N} = (names = sch.names; ntuple(i -> names[inds[i]], N))
namesubset(::Tables.Schema{nothing, nothing}, ::Tuple{}) = ()

@quinnj
Copy link
Member

quinnj commented Aug 18, 2021

Due to a recent change in Tables.jl developers now need to also consider specializing on Tables.Schema{nothing, nothing} in addition to Tables.Schema{names, types}. see here.

It should be noted that the threshold where Tables.jl will switch to this alternative Schema representation is pretty high: 67_000 columns. This was chosen specifically because many operations on tables this wide were failing anyway; there are fundamental limits in the compiler right now that mean creating tuples/namedtuples that large start breaking in weird ways.

Anyway, that aside, yes, we should fix the case here. I just wanted to clarify that this kind of operation didn't work before anyway.

quinnj added a commit that referenced this issue Aug 18, 2021
Started with code suggested by @OkonSamuel in issue #20. Made some
tweaks and added some tests and this seems to work now.
@quinnj
Copy link
Member

quinnj commented Aug 18, 2021

PR up: #22

quinnj added a commit that referenced this issue Aug 18, 2021
Started with code suggested by @OkonSamuel in issue #20. Made some
tweaks and added some tests and this seems to work now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants