Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving progress for datasets #30

Closed
shahbuland opened this issue Oct 14, 2022 · 3 comments
Closed

Saving progress for datasets #30

shahbuland opened this issue Oct 14, 2022 · 3 comments

Comments

@shahbuland
Copy link
Collaborator

shahbuland commented Oct 14, 2022

Saving progress for datasets, namely IterablePipelines, is currently a bit clunky. The output dataset is agnostic of progress/location in source. With respect to the source iterator being read from, all that is really being saved is an index in the dataset being read from. Currently naively running next on iterator to get back to whatever index was saved. Leaving a note here to revisit this later as it might have unforeseen consequences at scale.

@shahbuland
Copy link
Collaborator Author

#31 Partially addresses, needs more debugging to ensure it is consistent and fault tolerant across all pipelines

@shahbuland
Copy link
Collaborator Author

  • Need to add saving for client statistics

@shahbuland
Copy link
Collaborator Author

Solved with #37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant