Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow LabelMaker to return TFRecords from NPZ #150

Closed
martham93 opened this issue Oct 28, 2019 · 3 comments
Closed

Allow LabelMaker to return TFRecords from NPZ #150

martham93 opened this issue Oct 28, 2019 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@martham93
Copy link
Contributor

@wronk and I think it will be important for the ffda-poi work as well as for future label-maker users to have an optional step to convert the NPZ files generated from label-maker package.

To address this, we can open a PR that would allow for an optional step 6 in the label maker workflow, and could work as a cli command label-maker tfrecords, to write out a.tfrecords file for train, test, and if applicable, val, using numpy array representations of the images and labels from label-maker package.

Initially this PR will just address adding tfrecords for classification data to stay within the ffda-poi scope. I do see that there is already some existing work around tfrecords creation for object detection data https://github.com/developmentseed/label-maker/blob/master/examples/utils/tf_records_generation.py.

@martham93 martham93 added the enhancement New feature or request label Oct 28, 2019
@martham93 martham93 self-assigned this Oct 28, 2019
@martham93
Copy link
Contributor Author

martham93 commented Oct 28, 2019

the implementation I'm currently working on does require tensorflow2, and tensorflow2 has some conflicts with label-maker's requirements.txt, would it be okay if to upgrade these to work with tf2 or do we want to avoid having tensorflow as a dependency?

Happy to approach this a different way, if people have thoughts of something else to try.

cc @Geoyi @wronk @drewbo

@wronk
Copy link
Contributor

wronk commented Oct 29, 2019

@martham93, thinking more about this, I'm not sure we want to add TF as a dependency. I think the options are:

  1. Code this functionality up separately and link to a public example in label-maker's documentation. However, don't explicitly support it in this codebase.
  2. Include the code in label-maker, make requirements updates (maybe in a separate PR), and make the TF functionality optional. We can do this by only importing TF imports within your data->TFRecord file and only include tf with the requirements-dev.txt file.

Feel free to continue coding this separately and we can drop in the functionality later depending on what we decide to do.

@drewbo, @Geoyi

@martham93
Copy link
Contributor Author

martham93 commented Oct 30, 2019

Right now code to convert data.npz is in the ffda-poi repo. I can work on making a public example (once I've made more progress on other ffda-poi work) and link to label-maker's documentation, to move forward with option 1 presented above.

A few things to note:

  • Converting a ~1.8GB data.npz file takes about 3.5 hours to run. The corresponding .tfrecords file will be ~1GB.

  • kudos to @wronk for figuring out that the image array needs to be converted to a tensor, then run through tf.image.encode_png function to reduce the size before converting to a tfrecord without this step the data size is huge!

cc @Geoyi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants