split/apply/combine paradigm #29

mllg · 2013-12-12T10:42:11Z

I'd like to get started on this one and use this tracker to collect and discuss ideas.

AFAIR @lawremi suggested back in September to use split/by (split), bp*apply (apply) and stack (combine).

I'm rather unsure what functionality is needed. Usually I'm fine with split, bplapply and l*ply/Reduce.

The text was updated successfully, but these errors were encountered:

lawremi · 2013-12-12T17:28:01Z

split/apply/combine is a nice mental model, but maybe it does not need
explicit representation in code. Another direction is thinking about faster
ways to iterate, i.e., can we form partitions of data more efficiently? The
data.table package has some interesting approaches.

Michael

On Thu, Dec 12, 2013 at 2:42 AM, Michel notifications@github.com wrote:

I'd like to get started on this one and use this tracker to collect and
discuss ideas.

AFAIR @lawremi https://github.com/lawremi suggested back in September
to use split/by (split), bp*apply (apply) and stack (combine).

I'm rather unsure what functionality is needed. Usually I'm fine with
split, bplapply and l*ply/Reduce.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/29
.

DarwinAwardWinner · 2013-12-14T01:39:39Z

When I need to do a split-apply-combine type of operation, I usually turn to plyr::ddply.

lawremi · 2013-12-14T02:32:23Z

Yes, that's a useful tool. Would be nice to have a similar API on top of
BiocParallel (and thus BatchJobs). We worked toward making aggregate()
behave that way through omission of the LHS, but I think we ended up
punting due to release deadlines. Also, we'd want it to be more generic,
with support for e.g. GRanges. I rarely use a data.frame.

On Fri, Dec 13, 2013 at 5:39 PM, Ryan Thompson notifications@github.comwrote:

When I need to do a split-apply-combine type of operation, I usually turn
to plyr::ddply.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/29#issuecomment-30556845
.

vobencha · 2015-11-04T15:27:13Z

Any opposition to closing this issue?

lawremi · 2015-11-04T17:46:21Z

This issue sort of depends on having a clean API in base R for aggregation. Currently, aggregate and friends fall a bit short. Once we have that, then BiocParallel will need a corresponding frontend. Perhaps there is no need for a specific issue.

It would seem that BiocParallel needs a bp analog to every member of the apply family. In ddR, we instead define data structures that represent partitioned, distributed data that is managed by some computational engine, so we are able to use existing generics, with implicit parallelism.

vobencha · 2015-11-06T15:09:19Z

OK. I've marked this as an enhancement.

vobencha added the enhancement label Nov 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

split/apply/combine paradigm #29

split/apply/combine paradigm #29

mllg commented Dec 12, 2013

lawremi commented Dec 12, 2013

DarwinAwardWinner commented Dec 14, 2013

lawremi commented Dec 14, 2013

vobencha commented Nov 4, 2015

lawremi commented Nov 4, 2015

vobencha commented Nov 6, 2015

split/apply/combine paradigm #29

split/apply/combine paradigm #29

Comments

mllg commented Dec 12, 2013

lawremi commented Dec 12, 2013

DarwinAwardWinner commented Dec 14, 2013

lawremi commented Dec 14, 2013

vobencha commented Nov 4, 2015

lawremi commented Nov 4, 2015

vobencha commented Nov 6, 2015