This notebook will cover a process for using utility functions from `tfmodels` to convert pairs of images and masks into a tfrecord dataset for training segmentation networks.

First, the data structure:

In [1]:
!ls ./example_data/*

./example_data/img:
test100.jpg  test102.jpg  test104.jpg  test106.jpg  test108.jpg  test10.jpg
test101.jpg  test103.jpg  test105.jpg  test107.jpg  test109.jpg

./example_data/mask:
test100.png  test102.png  test104.png  test106.png  test108.png  test10.png
test101.png  test103.png  test105.png  test107.png  test109.png


My copy of [`tfmodels`](https://github.com/BioImageInformatics/tfmodels) is in the root of this project, so one up from where we are now. Make sure you know the path to yours... something like:

In [3]:
!ls ../tfmodels

assets	experiments  ReadMe.md	tfmodels


Easy enough, we'll import that as a module. `tfmodels` relies on `tensorflow`, `numpy`, and some others so make sure all the dependencies are installed.

In [4]:
import sys
sys.path.insert(0, '../tfmodels')
import tfmodels

`tfmodels` has a utility for creating `tfrecords` out of image/mask pair examples, since it's something that we do pretty often. We just have to define some paths and a few constants related to the experiment we're going to do:

1. Write down a pattern we want to pass into `glob`, for both the images, and masks.
2. path to place the resulting `tfrecord`
3. Integeer number of classes to be expected in the dataset

Those are all the required arguments. There are more options like autmated sub-tiling of the original images, or preprocessing to apply to image, mask, or both, but we don't need them for this.

In [5]:
# Maybe needed to silence some warnings. Everything runs fine without this line:
# os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

img_patt = './example_data/img/*.jpg'
mask_patt = './example_data/mask/*.png'
record_path = './example_data/image_mask_pairs.tfrecord'

Now we're ready to all our function:

In [6]:
tfmodels.image_mask_2_tfrecord(img_patt, mask_patt, record_path)

Got 11 source images
Finished writing [./example_data/image_mask_pairs.tfrecord]


Good. Notice that the `tfrecord` is a binary file, and that it's quite a bit larger than the sources. This is because we're storing the image data and mask data as uncompressed matrices. So, if there are many examples the database will be large and it might be worth considering how to split it up into smaller pieces.

In [9]:
!du -sh ./example_data/img/
!du -sh ./example_data/mask/
!ls -lha ./example_data/image_mask_pairs.tfrecord

4.9M	./example_data/img/
260K	./example_data/mask/
-rw-rw-r-- 1 nathan nathan 61M Jul 26 11:52 ./example_data/image_mask_pairs.tfrecord
