# PyLMDB Creator
This notebook gives a brief overview over the capacities of the PyLMDB Creator. With this module you can insert any 4D Numpy ndarray of type uint8 into LMDBs which can then be used in any standard proto file. This bypasses the need to save the images to disk first and using the convert_imageset script then.

## Creating an LMDB from a list of ndarrays
The code snippets below show how create a LMDB when all ndarrays to be inserted are already present.
For the sake of example, we create a number of random ndarrays and corresponding labels which we can then insert.

In [None]:
import numpy as np
from caffe_lmdb.lmdb_creator import LMDBCreator

n_dummy_images = 10000
dummy_data = [np.random.randint(0,256, (1, 224,224), dtype=np.uint8)
              for _ in xrange(n_dummy_images)]
labels = list(xrange(n_dummy_images))

Now we define where to create the LMDB and put all images from the list into the database.

In [None]:
lmdb_path = '/tmp/batch_lmdb'
lmdb_creator = LMDBCreator()
lmdb_creator.create_single_lmdb_from_ndarray_batch(array_list=dummy_data, labels_list=labels, 
                                                   lmdb_path=lmdb_path, max_lmdb_size=1024**3)

That's it, the LMDB should now exist at the specified path. The LMDBCreator class uses the standard python logger instead of printing. If you defined your logging environment, you should see the LMDBCreator output some information. 

## Dynamically create an LMDB
Sometimes having all the data in memory before creating the LMDB is not possible. This can for example happen when dealing with large datasets like the ImageNet or when heavily augmenting smaller datasets.
The LMDBCreator class offers the possibility to create an LMDB in an online fashion.

In [None]:
import numpy as np
from caffe_lmdb.lmdb_creator import LMDBCreator

n_dummy_images = 10000
lmdb_path = '/tmp/online_single_lmdb'

# get an LMDBCreator object and prepare for online writing
lmdb_creator = LMDBCreator()
lmdb_creator.open_single_lmdb_for_write(lmdb_path=lmdb_path)

# insert the images
for label in xrange(n_dummy_images):
    # we create a dummy image here but this is where augmentation
    # or loading could happen
    dummy_mat = np.random.randint(0,256, (1, 224,224), dtype=np.uint8)
    # put the image into the LMDB
    lmdb_creator.put_single(img_mat=dummy_datum, label=label)
    
# wrap up LMDB creation
lmdb_creator.finish_creation()

Again, there should be some logging output if you defined your environment. 

## Create dual LMDB
A Caffe Datum does not provide the possibility to assign labels as vectors or matrices. However, this representation is inevitable when dealing with tasks such as multi-label classification or semantic segmentation. In Caffe one usually has to create two LMDBs and then include them in the specific proto file.
The LMDBCreator offers the possibility to wrap the creation of two LMDBs. The following snippet creates two LMDBs where each image in the first has a correpsonding label with the same key in the second.

In [None]:
import numpy as np
from caffe_lmdb.lmdb_creator import LMDBCreator

n_dummy_images = 10000
image_lmdb_path = '/tmp/image_lmdb'
label_lmdb_path = '/tmp/label_lmdb'

# get an LMDBCreator object and prepare for online writing
lmdb_creator = LMDBCreator()
lmdb_creator.open_dual_lmdb_for_write(lmdb_path=image_lmdb_path,
                                      additional_path=label_lmdb_path)

# insert the images
for label in xrange(n_dummy_images):
    # again, we could do the augmentation or loading of images here
    img_mat = np.random.randint(0,256, (1, 224,224), dtype=np.uint8)
    label_mat = np.random.randint(0,256, (1, 224,224), dtype=np.uint8)
    
    # put the arrays into the databases
    lmdb_creator.put_dual(img_mat=dummy_datum, 
                          additional_mat=label_mat,
                          label=label)
    
# wrap up LMDB creation
lmdb_creator.finish_creation()

## Create shuffled LMDBs
If you want to shuffle your data there are two ways to do so: The most obvious is to shuffle the order you insert arrays. This, however, can become cumbersome real quickly. The easiest way to shuffle the order of data in your LMDB is by supplying your own keys. The only thing you have to know before is how many Datum objects you are planning to insert.
The following snippet shows how to insert Caffe Data at random databases positions.

In [None]:
import numpy as np
from caffe_lmdb.lmdb_creator import LMDBCreator

n_dummy_images = 10000
image_lmdb_path = '/tmp/image_lmdb'
label_lmdb_path = '/tmp/label_lmdb'

# create random order to insert
rand_indices = np.arange(n_dummy_images)
np.random.shuffle(rand_indices)

# get an LMDBCreator object and prepare for online writing
lmdb_creator = LMDBCreator()
lmdb_creator.open_dual_lmdb_for_write(lmdb_path=image_lmdb_path,
                                      additional_path=label_lmdb_path)

# insert the images
# we don't actually need the enumerate here, this is just
# for clarification
for idx, label in enumerate(xrange(n_dummy_images)):
    img_mat = np.random.randint(0,256, (1, 224,224), dtype=np.uint8)
    label_mat = np.random.randint(0,256, (1, 224,224), dtype=np.uint8)
    
    # create a key based on the random index from above
    key='%s_%d' % (str(rand_indices[idx]).zfill(8), label)
    
    # put the arrays into the databases
    lmdb_creator.put_dual(img_mat=dummy_datum, 
                          additional_mat=label_mat,
                          label=label,
                          key=key)
    
# wrap up LMDB creation
lmdb_creator.finish_creation()

The Datum objects are now inserted in a random order as any net using a LMDB iterates the database in alphabetical order.