-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 FileObserver #542
S3 FileObserver #542
Conversation
I'm currently pretty puzzled at why one of the tests fails in the AppVeyor builds, despite passing in all the Linux ones. I have a Mac myself, and am not sure the best way to mock up a windows machine for debugging, since it doesn't seem like AppVeyor has a "debug build" option available. If anyone has any suggestions or best practices for how to reproduce this in a context where I can get a terminal and try to understand why it's happening, that'd be much appreciated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice and clean PR! All issues are rather minor things. Please also add a section in the docs describing the observer https://github.com/IDSIA/sacred/blob/master/docs/observers.rst
|
||
@mock_s3 | ||
def test_raises_error_on_duplicate_id_directory(observer, sample_run): | ||
observer.started_event(**sample_run) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test seems to be dependent on the order of test execution. It would be better to create a non failing run first and then try to create a second run with the same id
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by why this would depend on order of test execution; I just tested changing the test name such that it runs either first or last in alphabetical test execution order, and it passed in both cases. I believe that the "create a non failing run and then create a second with the same ID" is what the test is currently doing, so I don't understand your comment well enough to respond to it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must have been a logic error on my side. For the file storage observer, this code would depend on the started_event
being the first executed in the directory to actually get the id 1. In this case mock_s3
probably provides a fresh mocked bucket for every test, so never mind.
Please only remove the z
from the test name.
sacred/observers/s3_observer.py
Outdated
# cloudtrail-s3-bucket-naming-requirements.html | ||
if len(bucket_name) < 3 or len(bucket_name) > 63: | ||
return False | ||
if '..' in bucket_name or '.-' in bucket_name or '-.' in bucket_name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if '..' in bucket_name or '.-' in bucket_name or '-.' in bucket_name: | |
forbidden_words = {"..", ".-", "-.", "_"} | |
if any(word in bucket_name for word in forbidden_words): |
I believe the "cannot contain underscores" requirement was missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the "cannot contain underscores" requirement was handled by there being an explicit whitelist of allowed character types (lowercase characters, digits, dashes), where underscores don't fall into any of those whitelisted categories
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, I think the old code was incorrect in that it didn't have periods as one of the whitelisted characters. I've refactored this to have the notion of "labels", which are distinct alphanumeric chunks separated by periods, and I think this makes things cleaner because it matches the abstraction that AWS itself uses
sacred/observers/s3_observer.py
Outdated
return True | ||
|
||
|
||
class S3FileObserver(RunObserver): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a docstring to the class and the create
method.
Regarding the question of inheriting from the file storage observer: There seems indeed to be a fair amount of code duplication, so having a base class that abstracts over object store like things would make sense. However, this could also be done in a separate PR if you do not want to do it now. Regarding AppVeyor I am also confused. Strange things tend to happen on windows. Unfortunately I also do not have a windows machine at my disposal, but I will let you know if I find something out. |
…cket exists method
…on is not either passed in or set in config file
Had some fun with black and py35 checks fighting about trailing commas after kwargs (see this bug: psf/black#759), for a long while black was complaining about things that if I fixed would induce py35 bugs), but now have all checks passing, and re-requesting review. |
@JarnoRFB I don't seem to have the button available that would let me officially re-request a review through Github's interface |
@decodyng thanks for updating! I take a look.
Yes I also had trouble with it, but the bug should be fixed in master (see psf/black#763). And this is also what the preconfigured pre-commit hook should use. Did you install black yourself or did you only use the pre-commit hook? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, except for the few minor comments I left! Looking forward to merging this.
|
||
@mock_s3 | ||
def test_raises_error_on_duplicate_id_directory(observer, sample_run): | ||
observer.started_event(**sample_run) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must have been a logic error on my side. For the file storage observer, this code would depend on the started_event
being the first executed in the directory to actually get the id 1. In this case mock_s3
probably provides a fresh mocked bucket for every test, so never mind.
Please only remove the z
from the test name.
docs/observers.rst
Outdated
.. code-block:: python | ||
|
||
from sacred.observers import S3Observer | ||
ex.observers.append(S3Observer.create(bucket='my-awesome-bucket', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please then also update the docs to use __init__
instead of create
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A slightly tangential question: I notice as I'm editing the docs that they don't seem to presently be up to date for the existing observers (i.e. the current master docs say "Sacred ships with four observers" and then has a list that doesn't include the Slack, Telegram, or Neptune observers. I feel slightly weird about continuing the error to say "Sacred ships with five observers", but also don't feel conversant enough about how the Slack/Telegram/Neptune observers work to add the summary documentation for them, so by default I'll just up the number to five and add the S3Observer to the list, but just flagging this as something I wanted to make sure was intentional on the part of the maintainers.
Oh, I didn't run into info about installing the pre-commit hook, though possibly I just wasn't looking in the right place (possibly you're only shown contribution procedures when you first are submitting a PR, and it seems like Black was added in the last two weeks, so it wouldn't have been there when I opened the PR?) I do remember running black with py35 as a command line argument, but possibly I still did something wrong there; I'm not that familiar with black in general. ETA: Found it now; my bad for not looking in the reasonable place for it earlier! Is there a way to run the commit hook logic on code that existed before a commit? I just was able to push a commit where the current state of the code failed the black test, even though the error was not introduced by that particular commit. Also, I'm still a bit confused, because I've tried running black locally with both |
I believe we currently do not show contributing information to anybody, but it would be definitely a good idea to do so.
Yes
When you invoke black directly, you probably still use the stable version 19.3b0, which has this bug of adding commas after kwargs. This is unfortunately only fixed in the unreleased master branch of black.
Yes |
Ok looks good to me! Thanks for your first and valuable sacred contribution 🎉 and sorry for the trouble with black, we need to make the precommit procedure more explicit. |
Creates a S3FileObserver that uses boto3 to automatically sync all run file data to a S3 location. Doesn't handle any credentials or permissions itself, so requires the user to have gone through
aws configure
on command line, or to have in some other fashion generated~/.aws/config
and~/.aws/credentials
files with Access Key and Secret Key information. Implementsrun._id
iteration and safety checks on overriding run files in a fashion meant to mimic how FileStorageObserver does it. Adds a feature requested in #308 .Some Notes: