Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TFRecords improvement #232

Merged
merged 4 commits into from
Mar 14, 2024
Merged

TFRecords improvement #232

merged 4 commits into from
Mar 14, 2024

Conversation

RukuangHuang
Copy link
Collaborator

@RukuangHuang RukuangHuang commented Mar 14, 2024

Closes #230.

Several improvements related to the use of TFRecords.

  • added overwrite argument to indicate whether to overwrite the tfrecord files when tfrecord config is not the same. This solves issue TFRecord dataset not being updated #181 .
  • added method to save tfrecord files and function to load tfrecord files as a TF Dataset object, which could be passed to model.fit or model.predict direclty. This solves issue Save/load TFRecord Datasets #230 .
  • This PR also fixes some minor bugs.

To save a TFRecord dataset:

from osl_dynamics.data import Data

data = Data(...)
data.save_tfrecord_dataset("path_to_directory")

Note if overwrite=False and a previous tfrecord_config.pkl exists, the files won't be rewritten.

To load a TFRecord dataset:

from osl_dynamics.data import load_tfrecord_dataset

dataset = load_tfrecord_dataset("path_to_directory", batch_size=batch_size)

# Then it can be passed directly to train a model
model.fit(dataset)

@RukuangHuang RukuangHuang added the enhancement New feature or request label Mar 14, 2024
This was linked to issues Mar 14, 2024
@cgohil8
Copy link
Collaborator

cgohil8 commented Mar 14, 2024

This looks great!

@cgohil8 cgohil8 merged commit d7e11ae into main Mar 14, 2024
1 check passed
@cgohil8 cgohil8 deleted the tfrecords branch March 14, 2024 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Save/load TFRecord Datasets TFRecord dataset not being updated
2 participants