In [1]:
import os

## Caching

Often hdf5 files can be very large and processing them can take a long time.  To speed up future access, processing results can be cached.

For this it is necessary to specify an `out_dire` when creating the instance of `H5Scan` or `H5Data`, e.g.

``` python
>>> data = H5Data(fil, out_dire='analysis')
```

If `out_dire` is declared then the log will automatically be cached as a pickle file in `./[out_dire]/cache/log.pkl`.  By default, when reloading the file the cache will be used, until the method `H5Data.update_log()` is called.

In [2]:
from e11 import H5Data
from e11.tools import ls, sub_dire

In [3]:
fil = os.path.join(os.getcwd(), 'example_data', 'array_data.h5')
data = H5Data(fil, out_dire='analysis')
data.update_log()

100%|██████████| 6/6 [00:00<00:00, 909.79it/s]


Caching can be applied to the methods: `array()`, `df()`, and `apply()`.

In [4]:
# first, load from hdf5
%time av = data.df(data.squids, 'AV_0', label=None, ignore_missing=False, cache="av_0")
av.head()

100%|██████████| 6/6 [00:00<00:00, 225.58it/s]

CPU times: user 33.3 ms, sys: 3.71 ms, total: 37 ms
Wall time: 35.4 ms





Unnamed: 0_level_0,Unnamed: 1_level_0,AB,CD,EF
squid,measurement,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,0,0.983765,0.984008,0.004351
1,1,0.984163,0.984424,0.004799
1,2,0.983991,0.984383,0.004606
1,3,0.983983,0.984222,0.004568
1,4,0.983856,0.984048,0.004349


In [5]:
# then reload from cache
%time av = data.df(data.squids, 'AV_0', label=None, ignore_missing=False, cache="av_0")
av.head()

CPU times: user 2.88 ms, sys: 0 ns, total: 2.88 ms
Wall time: 2.45 ms


Unnamed: 0_level_0,Unnamed: 1_level_0,AB,CD,EF
squid,measurement,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,0,0.983765,0.984008,0.004351
1,1,0.984163,0.984424,0.004799
1,2,0.983991,0.984383,0.004606
1,3,0.983983,0.984222,0.004568
1,4,0.983856,0.984048,0.004349


To read the contents of the cache directory,

In [6]:
ls(data.cache_dire)

['/home/adam/Git/e11_analysis/notebooks/example_data/analysis/cache/log.pkl',
 '/home/adam/Git/e11_analysis/notebooks/example_data/analysis/cache/av_0.df.pkl']

Another good use of `H5Data.out_dire` is for saving files the are related to the analysis, e.g., plots.  Build new folders using `e11_tools.sub_dire`.

In [7]:
out_fil = sub_dire(data.out_dire, 'plots', fil='signal.png')