Skip to content
evalparse edited this page Aug 3, 2019 · 7 revisions

Welcome to the disk.frame wiki!

How does disk.frame work?

Many of disk.frame's functions, such as map and delayed are just convenience functions to let you perform the same operation on each chunk. The convenience comes from the fact that it loads every chunk into a data.table/data.frame and does the saving to disk automatically.

Working time

I only work on Sunday morning on disk.frame to avoid this eating into my other (paid) work. So progress can be slow. If you would like to speed things up feel free to contact me for consulting services.

TODO: convenience disk.frame syntax proposal

a = libname(path1)
b = libname(path2)
a$disk.frame2 = delayed(b$disk.frame1, some_fn)

Scaling out as a cluster

future has backends for clusters. But I foresee this as alor of work to scale out to clusters. The implementation as it stands is tightly coupled to fst files and works on one folder. We also need to identify a simple way to set up these servers on AWS or on a local network. They may not be hard, but are definitely tasks I have not tackled before. So this is not going to happen for a while.