-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training: cache feature vectors for 2nd epoch onward #80
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The code may be ugly so far, but the results sure aren't... (remember, each subsequent epoch used to be as long as epoch 1) |
Merged
This was referenced Mar 11, 2024
…otely. OOP design can be better, but putting off a larger redesign for now.
…mp dir location, specify a temp dir location, or don't cache.
StephenChan
force-pushed
the
training-cache-features
branch
from
March 16, 2024 05:54
2f8c044
to
d007700
Compare
Also revise TrainClassifierMsg comments for the new optional arg.
Ready for review. Recent changes:
Still want to refactor adjacent classes/methods for better OOP design, but not urgent and will leave that for another time. |
StephenChan
changed the title
Training: cache feature vectors for 2nd epoch onward (very WIP)
Training: cache feature vectors for 2nd epoch onward
Mar 16, 2024
yeelauren
approved these changes
Mar 26, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Cache feature vectors in the local filesystem if they were loaded from remote storage (S3 or URL).
I think this will do what it intends to, but I couldn't figure out a way to do it without committing OOP crimes somewhere (
data_classes.ImageFeatures.load()
in this case). I think the better way to do this would involve passing aStorage
to TrainClassifierMsg instead of aDataLocation
, and putting the temporary directory attribute onto that Storage. That probably involves a much larger refactor overall where the designs ofStorage
,DataLocation
, and perhapsDataClass
are reworked. I have ideas for that but I don't think I'm up for that for the remainder of the month.Also want to make the caching optional (but on by default), so that it can be turned off in case filesystem space is a concern, as it may be for coralnet's largest sources (particularly since older sources have feature vectors about 8x in filesize).
Let me know if it'd be useful to merge this PR in the short term though.