-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Keep track of loaded shards #1826
Conversation
8c52dc7
to
d5e8f61
Compare
d5e8f61
to
0c5544e
Compare
@@ -706,6 +706,32 @@ def __iter__(self): | |||
return | |||
|
|||
|
|||
class Tracker(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps Tracker is a bit of a vague name? Suggestions: DataTracker of DataProgressTracker.
if tracker_queue is not None: | ||
data_tracker = tracker_queue.get() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this conflict with the initialised data_tracker above? (not sure) In which event would a data_tracker not be initialised but only available in the queue?
Even though I am not entirely sure of how this works, it seems good to me. You seem to track the latest shard that has been used during training and also save it in the checkpoints. When continuing training, the latest shard can then be used. Smart! I wrote some question inline but those are not critiques but rather questions to better understand what is going on. |
This would allow to restart from (approximately) the same point in the data when continuing some training, instead of restarting from shard 0 each time.