Want to join this working group? #2

HenrikBengtsson · 2024-06-07T22:28:14Z

Hi all,

let us know if you'd like to join this working group on 'Marshaling and Serialization in R'. To join, just add a comment below with a very brief introduction of yourself and your interest is in this topic.

coolbutuseless · 2024-06-10T00:57:10Z

Thanks for the invitation to join. . Happy to help in any way I can!

R's Serialisation powers some things I've written: e.g. {xxhashlite}, rlang::hash(). I'm also interested in some low level aspects of R e.g. {rbytecode}.

simonpcouch · 2024-06-10T13:43:46Z

Hey yall—more than happy to tag along. Simon from Chicago, IL, USA—I work on packages for predictive modeling at Posit.

We end up thinking about marshaling/serialization a good bit in the context of model deployment and training in parallel. Re: model deployment, we put together a package for marshaling model objects last year that standardizes interfaces to serialization methods from different modeling packages. The issue of serialization also comes up in our support for parallel processing, where various models may be fitted in several R processes but handed back to a parent process for analysis.

cc @juliasilge and @topepo. I won't be able to make it to useR! in person, but Max will be there for the in-person meeting.

ltierney · 2024-06-15T00:48:47Z

Not sure how much time I'll have to participate, but I'll try to follow what goes on and chip in from time to time. I developed the current serialization framework a number of years ago. The main goals of that redesign of what came before were to support parallel computation and separate loading of objects in a collection while maintaining identity of mutable objects, mainly environments (i.e. lazy loading).

shikokuchuo · 2024-06-15T22:52:20Z

mirai provides an implementation of what is currently possible using the serialization framework @ltierney describes above. Specifically, it interfaces at the C level with the ‘refhook’ system for reference objects, supporting their use in parallel and distributed computing.

This feature was originally motivated by parallel computations involving torch tensors, as described in https://shikokuchuo.net/mirai/articles/torch.html, and following helpful discussions with @dfalbel.

Permitted usage was subsequently broadened to a much wider class of serialization functions, as described in https://shikokuchuo.net/mirai/articles/mirai.html#serialization-arrow-polars-and-beyond, which also benefited from input by @eitsupi.

Finally, it also allows hosting of ADBC database connections in parallel processes as described in https://shikokuchuo.net/mirai/articles/databases.html, where @krlmlr was instrumental in proposing and verifying this use case.

wlandau · 2024-06-18T15:16:37Z

Thanks for the invite. I develop targets and crew, both of which rely on sending objects to concurrent R processes. targets lets you select or customize a "format", which is a storage type that covers serialization and marshaling. It works, but it is not implicit, and some users have struggled with the extra responsibility.

Jiefei-Wang · 2024-06-18T16:24:46Z

Thanks for the invite. I'm one of the developers in BiocParallel and SharedObject, which provide a parallelization framework to all Bioconductor packages. I have been thinking about serialization for a while. I think one interesting topic is how we can serialize/unserialize only once in each computer and make the object available to all workers on the same computer. The current solution is ignoring the fact that multiple workers are on the same computer and sending the object to each worker. This is clearly an unnecessary waste of the resources we have in distributed computing. I do not know what the best solution could be but I'll be happy to see any idea.

Great to meet with you @wlandau. I am using your package targets to manage my data extraction pipeline. It is incredibly helpful. Frankly speaking, I am the person who struggled with the extra responsibility you mentioned. I like it, but also hate it. I might open a thread in your repository to discuss the automation of the format selection :)

t-kalinowski · 2024-07-09T16:08:53Z

Hi all,

Tomasz here from mlverse at Posit. I'm more than happy to help out too. I am particularly interested in making sure S7, reticulate, TensorFlow/Jax/Keras, torch, and things like it (R external ptrs, potentially complex environment requirements) work well with whatever the final solution is.

HenrikBengtsson pinned this issue Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Want to join this working group? #2

Want to join this working group? #2

HenrikBengtsson commented Jun 7, 2024

coolbutuseless commented Jun 10, 2024

simonpcouch commented Jun 10, 2024

ltierney commented Jun 15, 2024

shikokuchuo commented Jun 15, 2024

wlandau commented Jun 18, 2024 •

edited

Loading

Jiefei-Wang commented Jun 18, 2024

t-kalinowski commented Jul 9, 2024 •

edited

Loading

Want to join this working group? #2

Want to join this working group? #2

Comments

HenrikBengtsson commented Jun 7, 2024

coolbutuseless commented Jun 10, 2024

simonpcouch commented Jun 10, 2024

ltierney commented Jun 15, 2024

shikokuchuo commented Jun 15, 2024

wlandau commented Jun 18, 2024 • edited Loading

Jiefei-Wang commented Jun 18, 2024

t-kalinowski commented Jul 9, 2024 • edited Loading

wlandau commented Jun 18, 2024 •

edited

Loading

t-kalinowski commented Jul 9, 2024 •

edited

Loading