-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix TensorFlow log syncing bug #105
Conversation
AWS master
Ignore Emacs backup files
Hi, Thanks for submitting your pull request. This is awesome. Can you please add unit tests? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good overall!
Can you add a new unit test file to https://github.com/aws/sagemaker-python-sdk/tree/master/tests/unit with tests on the _sync_directories function? Here are some cases I can think of that would be good to cover:
- to_directory does not exist
- to_directory exists, child dirs within it don't exist
- to_directory exists with the same structure and files as from_directory
And from_directory should have at least one child dir with files in it.
@staticmethod | ||
def _sync_directories(from_directory, to_directory): | ||
"""Sync to_directory with from_directory by copying each file in | ||
to_directory with new contents. Why do this? Because TensorBoard picks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a sentence to the docstring which clarifies that this function will overwrite the contents of existing files in the to_directory, but does not delete the files? This is key because it explains why we implement it in this particular way, rather than using shutil.rmtree / shutil.copytree or similar.
Codecov Report
@@ Coverage Diff @@
## master #105 +/- ##
==========================================
+ Coverage 89.72% 89.74% +0.02%
==========================================
Files 34 34
Lines 2015 2039 +24
==========================================
+ Hits 1808 1830 +22
- Misses 207 209 +2
Continue to review full report at Codecov.
|
Ok I made the docstring more explicit and added unit tests for the _sync_directories method. Sorry for all the failing builds - I'm having trouble with the tox tests on my laptop. I didn't realize the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with you adding the context manager into tensorflow/estimator.py since we are only using it there at the moment; we can consider moving it somewhere else if we ever need it elsewhere.
I'd recommend putting the temporary_directory function as a "private" staticmethod in Tensorboard, just like _sync_directories, to make it clear that it's only intended to be used there at the moment.
I also have one more question below about the filecmp usage in the tests below. These comments are rather nitpicky though; everything looks great! Thanks for adding tests!
tests/unit/test_sync_directories.py
Outdated
for fname in comp.common_files: | ||
left_file = os.path.join(a, fname) | ||
right_file = os.path.join(b, fname) | ||
if not filecmp.cmp(left_file, right_file): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't dircmp already do a shallow comparison of the files to build the common_files list? Should you be passing in shallow=False here? (or just removing this check)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
common_files
is just a list of files with the same name (whether or not they're actually the same). But it looks like we can get rid of this by checking len(comp.diff_files)
. I'll make that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for figuring this out and getting this in!
No problem - happy to help! |
Image classification notebooks
This is a solution to #26. As soon as
aws s3 sync
adds temp files to the TensorFlow logs directory, TensorBoard stops looking at the correct files for updates. This works around the issue by creating a temporary sync directory foraws s3 sync
, then synchronously copying files to the logs directory after the download finishes.