## Converting hdf5 to lmdb

References:

* https://symas.com/lightning-memory-mapped-database/
* http://deepdish.io/2015/04/28/creating-lmdb-in-python/
* https://gist.github.com/bearpaw/3a07f0e8904ed42f376e
* http://stackoverflow.com/questions/37337523/how-do-you-load-an-lmdb-file-into-tensorflow
* http://research.beenfrog.com/code/2015/12/30/write-read-lmdb-example.html
* https://lmdb.readthedocs.io/en/release/
* http://stackoverflow.com/questions/8855574/convert-ndarray-from-float64-to-integer

Reasons:
* LMDB uses memory-mapped files, giving much better I/O performance.
* Works with large datasets. The HDF5 files are always read entirely into memory, so you can’t have any HDF5 file exceed your memory capacity.

Install: pip install lmdb


In [1]:
import lmdb
import h5py
import numpy as np
from driving_data import HandleData

In [2]:
# Load hdf5 file and get the whole training (batch=-1)
data = HandleData(path='TestData.h5',shuffle = False)
xs, ys = data.LoadTrainBatch(-1,crop_up=0)

Loading training data
Spliting training and validation
Number training images: 752
Number validation images: 188


In [3]:
# Open LMDB file
env = lmdb.open('mylmdb', map_size=1000000)

In [4]:
# Get a write lmdb transaction
with env.begin(write=True) as txn:
    # Iterate on batch
    idx = 0
    for (tup_element) in list(zip(xs, ys)):
        _,steer = tup_element        
        str_id = '{:08}'.format(idx)
        #txn.put(bytes(str_id.encode('ascii')),bytes('something'.encode('ascii')))                
        #txn.put(bytes(str_id.encode('ascii')),bytes(steer.astype(numpy.int64)))                
        txn.put(bytes(str_id.encode('ascii')),bytes(steer[0].astype(np.int64)))                
        idx += 1

### Reading from lmdb

In [6]:
env = lmdb.open('mylmdb', readonly=True)
with env.begin() as txn:
    cursor = txn.cursor()
    for key, value in cursor:
        #print(key, np.frombuffer(value, dtype=np.dtype(float)))
        print(key, value)

b'00000000' b''
b'00000001' b''
b'00000002' b''
b'00000003' b''
b'00000004' b''
b'00000005' b''
b'00000006' b''
b'00000007' b''
b'00000008' b''
b'00000009' b''
b'00000010' b''
b'00000011' b''
b'00000012' b''
b'00000013' b''
b'00000014' b''
b'00000015' b''
b'00000016' b''
b'00000017' b''
b'00000018' b''
b'00000019' b''
b'00000020' b''
b'00000021' b''
b'00000022' b''
b'00000023' b''
b'00000024' b''
b'00000025' b''
b'00000026' b''
b'00000027' b''
b'00000028' b''
b'00000029' b''
b'00000030' b''
b'00000031' b''
b'00000032' b''
b'00000033' b''
b'00000034' b''
b'00000035' b''
b'00000036' b''
b'00000037' b''
b'00000038' b''
b'00000039' b''
b'00000040' b''
b'00000041' b''
b'00000042' b''
b'00000043' b''
b'00000044' b''
b'00000045' b''
b'00000046' b''
b'00000047' b''
b'00000048' b''
b'00000049' b''
b'00000050' b''
b'00000051' b''
b'00000052' b''
b'00000053' b''
b'00000054' b''
b'00000055' b''
b'00000056' b''
b'00000057' b''
b'00000058' b''
b'00000059' b''
b'00000060' b''
b'00000061' b''
b'000000