Loading: Lazy or compressed storage #201

farhi · 2020-05-07T10:48:57Z

It could be efficient to use either lazy loading (see #193), or in memory compression with a fast compressor, such as:

https://github.com/fangq/zmat

This latter works with Matlab 2017. An adaption to old MeX functions may be needed for old Matlab versions (e.g. 2010a).

A quick test:

e=eye(1000); we = whos('e');
methods = {'zlib','gzip','lzip','lzma','lz4','lz4hc'};
for m=methods;
  t0=clock;
  [ss, info]=zmat(e, 1, m{1});
  dt = etime(clock, t0);
  ws = whos('ss');
  fprintf(1, '%10s %10.3f %10.3f\n', m{1}, dt, we.bytes/ws.bytes);
end

Results are highly dependent on the initial data. Here we use a matrix with mostly zeros. Sparse storage would be a good solution as well.

method          time  comp_ratio
      zlib      0.048    838.574
      gzip      0.052    839.102
      lzip      0.274   6488.240
      lzma      0.256   6514.658
       lz4      0.001    254.818
     lz4hc      0.002    254.834

With random data, compression ration is very bad (around 1). With organised data (for instance magic), it is pretty good. In all cases, using lz4 compressor is the fastest, by far.

This could be embedded into estruct/findfield. Its cached data can be used to identify large blocks, and then compress them dynamically, as an alias, or a new compressed object, that should implement basic methods (subsref, subsasgn, ...).

The text was updated successfully, but these errors were encountered:

farhi · 2020-05-07T12:48:32Z

Combining with MappedTensor:

https://github.com/DylanMuir/MappedTensor

can be great 👍

What could be done:

enhance MappedTensor to be able to select where storage takes place. Currently tempname is used in function create_temp_file:1493. One could add an input argument option 'Dir' that could be /tmp (default), or /dev/shm to storing data in memory. This last choice is only relevant when coupled with in-memory compression option below. Working in /dev/shm without compression is just equivalent to normal Matlab memory management, in a very complex way 👎 .
enhance MappedTensor with a few more methods (e.g. display as disp now, disp would also show dir and stat), and possibly more minor improvements.
allow to store a compressed (encoded) content, just as zmat (https://github.com/fangq/zmat) does. This would allow smaller files, or, when stored in /dev/shm working with in-memory compressed data. Only LZ4 should be supported though for performance considerations.

farhi added enhancement priority labels May 7, 2020

farhi mentioned this issue May 7, 2020

Loaders: read_hdf5: use hdf5prop (HDF memmapfile) #193

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading: Lazy or compressed storage #201

Loading: Lazy or compressed storage #201

farhi commented May 7, 2020 •

edited

farhi commented May 7, 2020

Loading: Lazy or compressed storage #201

Loading: Lazy or compressed storage #201

Comments

farhi commented May 7, 2020 • edited

farhi commented May 7, 2020

farhi commented May 7, 2020 •

edited