Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving the CompressedList constructors #27

Closed
LTLA opened this issue Nov 27, 2020 · 4 comments
Closed

Improving the CompressedList constructors #27

LTLA opened this issue Nov 27, 2020 · 4 comments

Comments

@LTLA
Copy link
Contributor

LTLA commented Nov 27, 2020

I frequently use CompressedLists but I can't figure out how to create them without making a list first. This is a common pattern:

stuff <- runif(100)
f <- sample(10, length(stuff), replace=TRUE)
library(IRanges)
out <- NumericList(split(stuff, f))

This is unfortunate because it spends time creating a large list with lots of little vectors before unlisting everything again to create the CompressedList. It seems like we could easily circumvent the middleman, possibly with an interface like:

CompressedList(stuff, by=f)

This would handle the reordering to create the internal IRanges and the unlistData. It would also handle the type dispatch so that I don't have to explicitly call NumericList for numeric values, etc. Finally, if by=NULL, it would do the same as as(stuff, "CompressedList"), which allows for a slightly less verbose way to convert vectors into CompressedLists:

library(IRanges)
df <- DataFrame(x=runif(100), y=runif(100))
f <- sample(letters, 100, replace=TRUE)
out <- split(df, f)
out[as(which.max(out[,'x']), "CompressedList")]
# replace with out[CompressedList(which.max(out[,'x']))]
@lawremi
Copy link
Collaborator

lawremi commented Nov 30, 2020

I think splitAsList() should get you most of the way. For the final use case, of converting a list to a CompressedList, I'd prefer the "as" notion over construction, because it's a conversion, analogous to as.list().

@hpages
Copy link
Contributor

hpages commented Nov 30, 2020

@LTLA Depending on your use case, relist() and extractList() are also efficient ways to generate CompressedList objects, in addition to splitAsList().

@lawremi Looks like which.max() is broken on CompressedNumericList objects:

x <- NumericList(a=c(0.1, 0.9), b=c(2.1, 2.9, 1))
which.max(x)
# a b 
# 1 1 

The culprit seems to be this line

PARTITIONED_AGG(int, ACCESSOR, INTSXP, INTEGER, \
where int is passed to the PARTITIONED_AGG() macro, with the consequence that values in x are extracted as integer values. Apparently it's been broken since which.min() and which.max() got optimized in 2016 (commit c320278).

@LTLA
Copy link
Contributor Author

LTLA commented Dec 1, 2020

splitAsList() is quite satisfactory.

@hpages
Copy link
Contributor

hpages commented Dec 1, 2020

Looks like we can close this. (@lawremi I moved the which.min() / which.max() story to its own issue.)

@hpages hpages closed this as completed Dec 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants