Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to print indices as columns #4280

Closed
MichaelChirico opened this issue Mar 6, 2020 · 3 comments · Fixed by #6187
Closed

Option to print indices as columns #4280

MichaelChirico opened this issue Mar 6, 2020 · 3 comments · Fixed by #6187
Labels

Comments

@MichaelChirico
Copy link
Member

setindex basically creates "shadow columns" -- nrow(x)-size vectors that can be used to sort x on the fly.

It would be helpful, besides tables() and indices(), to be able to show the indices alongside the rest of the tables columns.

This is nice for auditing our tables, as well as helping to surface/visualize what indices actually are/why they're useful.

e.g.

NN = 1e6
set.seed(39489)
DT = data.table(
  grp1 = sample(1e5, NN, TRUE),
  grp2 = sample(1e4, NN, TRUE),
  grp3 = sample(1e3, NN, TRUE)
)
setkey(DT, grp1, grp2)
setindex(DT, grp1, grp3)
DT
#            grp1 grp2 grp3
#       1:      1  544   67
#       2:      1  650  150
#       3:      1 1242  230
#       4:      1 1647  915
#       5:      1 6111  119
#      ---                 
#  999996: 100000 2699  494
#  999997: 100000 4844  188
#  999998: 100000 5299  964
#  999999: 100000 7166  819
# 1000000: 100000 8590   45
print(DT, show_indices = TRUE)
## same as
DT[ , paste0('index__', idx) := attr(attr(DT, 'index'), paste0('__', idx))][]
#            grp1 grp2 grp3 index__grp1__grp3
#       1:      1  544   67                 1
#       2:      1  650  150                 5
#       3:      1 1242  230                 2
#       4:      1 1647  915                 3
#       5:      1 6111  119                 6
#      ---                                   
#  999996: 100000 2699  494            999997
#  999997: 100000 4844  188            999995
#  999998: 100000 5299  964            999996
#  999999: 100000 7166  819            999999
# 1000000: 100000 8590   45            999998

add to #1523

@jangorecki
Copy link
Member

Not sure about argument name for that, do we use underscores in this function already? Just 'indices' should be enough

@jangorecki
Copy link
Member

Or maybe instead of print argument just add function to expand indices into table columns.

@MichaelChirico
Copy link
Member Author

Maybe that too, but I think this is also nice since its effects are transient. We don't even have to assign the whole index -- just the 2*topn elements that are displayed during print, assigned to the rbind(head(x), tail(x)) object we're already making.

Name show_indices was just for example, not committed to that at all 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants