Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split/apply/combine paradigm #29

Open
mllg opened this issue Dec 12, 2013 · 6 comments
Open

split/apply/combine paradigm #29

mllg opened this issue Dec 12, 2013 · 6 comments

Comments

@mllg
Copy link
Collaborator

mllg commented Dec 12, 2013

I'd like to get started on this one and use this tracker to collect and discuss ideas.

AFAIR @lawremi suggested back in September to use split/by (split), bp*apply (apply) and stack (combine).

I'm rather unsure what functionality is needed. Usually I'm fine with split, bplapply and l*ply/Reduce.

@lawremi
Copy link

lawremi commented Dec 12, 2013

split/apply/combine is a nice mental model, but maybe it does not need
explicit representation in code. Another direction is thinking about faster
ways to iterate, i.e., can we form partitions of data more efficiently? The
data.table package has some interesting approaches.

Michael

On Thu, Dec 12, 2013 at 2:42 AM, Michel notifications@github.com wrote:

I'd like to get started on this one and use this tracker to collect and
discuss ideas.

AFAIR @lawremi https://github.com/lawremi suggested back in September
to use split/by (split), bp*apply (apply) and stack (combine).

I'm rather unsure what functionality is needed. Usually I'm fine with
split, bplapply and l*ply/Reduce.


Reply to this email directly or view it on GitHubhttps://github.com//issues/29
.

@DarwinAwardWinner
Copy link

When I need to do a split-apply-combine type of operation, I usually turn to plyr::ddply.

@lawremi
Copy link

lawremi commented Dec 14, 2013

Yes, that's a useful tool. Would be nice to have a similar API on top of
BiocParallel (and thus BatchJobs). We worked toward making aggregate()
behave that way through omission of the LHS, but I think we ended up
punting due to release deadlines. Also, we'd want it to be more generic,
with support for e.g. GRanges. I rarely use a data.frame.

On Fri, Dec 13, 2013 at 5:39 PM, Ryan Thompson notifications@github.comwrote:

When I need to do a split-apply-combine type of operation, I usually turn
to plyr::ddply.


Reply to this email directly or view it on GitHubhttps://github.com//issues/29#issuecomment-30556845
.

@vobencha
Copy link
Contributor

vobencha commented Nov 4, 2015

Any opposition to closing this issue?

@lawremi
Copy link

lawremi commented Nov 4, 2015

This issue sort of depends on having a clean API in base R for aggregation. Currently, aggregate and friends fall a bit short. Once we have that, then BiocParallel will need a corresponding frontend. Perhaps there is no need for a specific issue.

It would seem that BiocParallel needs a bp analog to every member of the apply family. In ddR, we instead define data structures that represent partitioned, distributed data that is managed by some computational engine, so we are able to use existing generics, with implicit parallelism.

@vobencha
Copy link
Contributor

vobencha commented Nov 6, 2015

OK. I've marked this as an enhancement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants