Allow LabelMaker to return TFRecords from NPZ #150

martham93 · 2019-10-28T20:15:04Z

@wronk and I think it will be important for the ffda-poi work as well as for future label-maker users to have an optional step to convert the NPZ files generated from label-maker package.

To address this, we can open a PR that would allow for an optional step 6 in the label maker workflow, and could work as a cli command label-maker tfrecords, to write out a.tfrecords file for train, test, and if applicable, val, using numpy array representations of the images and labels from label-maker package.

Initially this PR will just address adding tfrecords for classification data to stay within the ffda-poi scope. I do see that there is already some existing work around tfrecords creation for object detection data https://github.com/developmentseed/label-maker/blob/master/examples/utils/tf_records_generation.py.

The text was updated successfully, but these errors were encountered:

martham93 · 2019-10-28T22:55:35Z

the implementation I'm currently working on does require tensorflow2, and tensorflow2 has some conflicts with label-maker's requirements.txt, would it be okay if to upgrade these to work with tf2 or do we want to avoid having tensorflow as a dependency?

Happy to approach this a different way, if people have thoughts of something else to try.

cc @Geoyi @wronk @drewbo

wronk · 2019-10-29T18:39:30Z

@martham93, thinking more about this, I'm not sure we want to add TF as a dependency. I think the options are:

Code this functionality up separately and link to a public example in label-maker's documentation. However, don't explicitly support it in this codebase.
Include the code in label-maker, make requirements updates (maybe in a separate PR), and make the TF functionality optional. We can do this by only importing TF imports within your data->TFRecord file and only include tf with the requirements-dev.txt file.

Feel free to continue coding this separately and we can drop in the functionality later depending on what we decide to do.

@drewbo, @Geoyi

martham93 · 2019-10-30T20:43:19Z

Right now code to convert data.npz is in the ffda-poi repo. I can work on making a public example (once I've made more progress on other ffda-poi work) and link to label-maker's documentation, to move forward with option 1 presented above.

A few things to note:

Converting a ~1.8GB data.npz file takes about 3.5 hours to run. The corresponding .tfrecords file will be ~1GB.
kudos to @wronk for figuring out that the image array needs to be converted to a tensor, then run through tf.image.encode_png function to reduce the size before converting to a tfrecord without this step the data size is huge!

cc @Geoyi

martham93 added the enhancement New feature or request label Oct 28, 2019

martham93 self-assigned this Oct 28, 2019

martham93 closed this as completed Nov 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow LabelMaker to return TFRecords from NPZ #150

Allow LabelMaker to return TFRecords from NPZ #150

martham93 commented Oct 28, 2019

martham93 commented Oct 28, 2019 •

edited

Loading

wronk commented Oct 29, 2019

martham93 commented Oct 30, 2019 •

edited

Loading

Allow LabelMaker to return TFRecords from NPZ #150

Allow LabelMaker to return TFRecords from NPZ #150

Comments

martham93 commented Oct 28, 2019

martham93 commented Oct 28, 2019 • edited Loading

wronk commented Oct 29, 2019

martham93 commented Oct 30, 2019 • edited Loading

martham93 commented Oct 28, 2019 •

edited

Loading

martham93 commented Oct 30, 2019 •

edited

Loading