S3 FileObserver #542

decodyng · 2019-07-30T21:47:09Z

Creates a S3FileObserver that uses boto3 to automatically sync all run file data to a S3 location. Doesn't handle any credentials or permissions itself, so requires the user to have gone through aws configure on command line, or to have in some other fashion generated ~/.aws/config and ~/.aws/credentials files with Access Key and Secret Key information. Implements run._id iteration and safety checks on overriding run files in a fashion meant to mimic how FileStorageObserver does it. Adds a feature requested in #308 .

Some Notes:

Moto used to mock S3 for testing purposes, which is why it's needed as a dependency
Addition of google_compute_engine as a dependency is in response to this weird Travis/google images issue. "No module named google_compute_engine" importing python boto 2 on new Trusty images travis-ci/travis-ci#7940
I considered making S3FileObserver inherit from FileStorageObserver, since a number of methods (everything between save_cout and log_metrics) is a direct copy from FSO and didn't need to be directly modified. It seemed like it might add more complexity than it solved, but I'm definitely open to the idea if you guys prefer it
I added in the tests that made most sense to me, but am definitely open to adding more

coveralls · 2019-07-30T21:52:19Z

Coverage decreased (-0.4%) to 85.041% when pulling a8c3f59 on HumanCompatibleAI:s3_observer into 6df989e on IDSIA:master.

decodyng · 2019-07-30T23:45:45Z

I'm currently pretty puzzled at why one of the tests fails in the AppVeyor builds, despite passing in all the Linux ones. I have a Mac myself, and am not sure the best way to mock up a windows machine for debugging, since it doesn't seem like AppVeyor has a "debug build" option available. If anyone has any suggestions or best practices for how to reproduce this in a context where I can get a terminal and try to understand why it's happening, that'd be much appreciated.

JarnoRFB

Very nice and clean PR! All issues are rather minor things. Please also add a section in the docs describing the observer https://github.com/IDSIA/sacred/blob/master/docs/observers.rst

JarnoRFB · 2019-07-31T11:48:51Z

tests/test_observers/test_s3_observer.py

+
+@mock_s3
+def test_raises_error_on_duplicate_id_directory(observer, sample_run):
+    observer.started_event(**sample_run)


This test seems to be dependent on the order of test execution. It would be better to create a non failing run first and then try to create a second run with the same id.

I'm confused by why this would depend on order of test execution; I just tested changing the test name such that it runs either first or last in alphabetical test execution order, and it passed in both cases. I believe that the "create a non failing run and then create a second with the same ID" is what the test is currently doing, so I don't understand your comment well enough to respond to it

Must have been a logic error on my side. For the file storage observer, this code would depend on the started_event being the first executed in the directory to actually get the id 1. In this case mock_s3 probably provides a fresh mocked bucket for every test, so never mind.
Please only remove the z from the test name.

sacred/observers/s3_observer.py

tests/test_observers/test_s3_observer.py

JarnoRFB · 2019-07-31T12:12:04Z

sacred/observers/s3_observer.py

+    # cloudtrail-s3-bucket-naming-requirements.html
+    if len(bucket_name) < 3 or len(bucket_name) > 63:
+        return False
+    if '..' in bucket_name or '.-' in bucket_name or '-.' in bucket_name:


Suggested change

if '..' in bucket_name or '.-' in bucket_name or '-.' in bucket_name:

forbidden_words = {"..", ".-", "-.", "_"}

if any(word in bucket_name for word in forbidden_words):

I believe the "cannot contain underscores" requirement was missing.

I think that the "cannot contain underscores" requirement was handled by there being an explicit whitelist of allowed character types (lowercase characters, digits, dashes), where underscores don't fall into any of those whitelisted categories

That said, I think the old code was incorrect in that it didn't have periods as one of the whitelisted characters. I've refactored this to have the notion of "labels", which are distinct alphanumeric chunks separated by periods, and I think this makes things cleaner because it matches the abstraction that AWS itself uses

sacred/observers/s3_observer.py

JarnoRFB · 2019-07-31T12:51:11Z

sacred/observers/s3_observer.py

+        return True
+
+
+class S3FileObserver(RunObserver):


Please add a docstring to the class and the create method.

JarnoRFB · 2019-07-31T12:55:13Z

Regarding the question of inheriting from the file storage observer: There seems indeed to be a fair amount of code duplication, so having a base class that abstracts over object store like things would make sense. However, this could also be done in a separate PR if you do not want to do it now.

Regarding AppVeyor I am also confused. Strange things tend to happen on windows. Unfortunately I also do not have a windows machine at my disposal, but I will let you know if I find something out.

…cket exists method

…on is not either passed in or set in config file

…o s3_observer

decodyng · 2019-08-21T01:17:16Z

Had some fun with black and py35 checks fighting about trailing commas after kwargs (see this bug: psf/black#759), for a long while black was complaining about things that if I fixed would induce py35 bugs), but now have all checks passing, and re-requesting review.

decodyng · 2019-08-21T01:20:36Z

@JarnoRFB I don't seem to have the button available that would let me officially re-request a review through Github's interface

JarnoRFB · 2019-08-21T10:53:34Z

@decodyng thanks for updating! I take a look.

Had some fun with black and py35 checks fighting about trailing commas after kwargs (see this bug: psf/black#759), for a long while black was complaining about things that if I fixed would induce py35 bugs), but now have all checks passing, and re-requesting review.

Yes I also had trouble with it, but the bug should be fixed in master (see psf/black#763). And this is also what the preconfigured pre-commit hook should use. Did you install black yourself or did you only use the pre-commit hook?

JarnoRFB

Looks great, except for the few minor comments I left! Looking forward to merging this.

sacred/observers/s3_observer.py

JarnoRFB · 2019-08-21T11:23:04Z

tests/test_observers/test_s3_observer.py

+
+@mock_s3
+def test_raises_error_on_duplicate_id_directory(observer, sample_run):
+    observer.started_event(**sample_run)


Must have been a logic error on my side. For the file storage observer, this code would depend on the started_event being the first executed in the directory to actually get the id 1. In this case mock_s3 probably provides a fresh mocked bucket for every test, so never mind.
Please only remove the z from the test name.

JarnoRFB · 2019-08-21T11:24:21Z

docs/observers.rst

+.. code-block:: python
+
+    from sacred.observers import S3Observer
+    ex.observers.append(S3Observer.create(bucket='my-awesome-bucket',


Please then also update the docs to use __init__ instead of create.

A slightly tangential question: I notice as I'm editing the docs that they don't seem to presently be up to date for the existing observers (i.e. the current master docs say "Sacred ships with four observers" and then has a list that doesn't include the Slack, Telegram, or Neptune observers. I feel slightly weird about continuing the error to say "Sacred ships with five observers", but also don't feel conversant enough about how the Slack/Telegram/Neptune observers work to add the summary documentation for them, so by default I'll just up the number to five and add the S3Observer to the list, but just flagging this as something I wanted to make sure was intentional on the part of the maintainers.

decodyng · 2019-08-21T19:17:03Z

@decodyng thanks for updating! I take a look.

Had some fun with black and py35 checks fighting about trailing commas after kwargs (see this bug: psf/black#759), for a long while black was complaining about things that if I fixed would induce py35 bugs), but now have all checks passing, and re-requesting review.

Yes I also had trouble with it, but the bug should be fixed in master (see psf/black#763). And this is also what the preconfigured pre-commit hook should use. Did you install black yourself or did you only use the pre-commit hook?

Oh, I didn't run into info about installing the pre-commit hook, though possibly I just wasn't looking in the right place (possibly you're only shown contribution procedures when you first are submitting a PR, and it seems like Black was added in the last two weeks, so it wouldn't have been there when I opened the PR?) I do remember running black with py35 as a command line argument, but possibly I still did something wrong there; I'm not that familiar with black in general.

ETA: Found it now; my bad for not looking in the reasonable place for it earlier! Is there a way to run the commit hook logic on code that existed before a commit? I just was able to push a commit where the current state of the code failed the black test, even though the error was not introduced by that particular commit. Also, I'm still a bit confused, because I've tried running black locally with both black --target-version py35 --diff sacred/ tests/ and black --target-version py36 --diff sacred/ tests/ and it tries to add in the comma after kwargs in both of them. Is it intentional that the .pre-commit-config.yaml file specifies a black language version of python3.6, whereas the pyproject.toml specifies a target version of py35?

JarnoRFB · 2019-08-22T07:43:13Z

possibly you're only shown contribution procedures when you first are submitting a PR

I believe we currently do not show contributing information to anybody, but it would be definitely a good idea to do so.

Is there a way to run the commit hook logic on code that existed before a commit?

Yes $ pre-commit run --all-files should do the job.

Also, I'm still a bit confused, because I've tried running black locally with both black --target-version py35 --diff sacred/ tests/ and black --target-version py36 --diff sacred/ tests/ and it tries to add in the comma after kwargs in both of them.

When you invoke black directly, you probably still use the stable version 19.3b0, which has this bug of adding commas after kwargs. This is unfortunately only fixed in the unreleased master branch of black.

Is it intentional that the .pre-commit-config.yaml file specifies a black language version of python3.6, whereas the pyproject.toml specifies a target version of py35?

Yes language version of the pre-commit hook only means in what virtual environment the hook will be run. target-version are the python versions black should produce compatible code with.

JarnoRFB · 2019-08-22T07:55:41Z

Ok looks good to me! Thanks for your first and valuable sacred contribution 🎉 and sorry for the trouble with black, we need to make the precommit procedure more explicit.

decodyng added 15 commits July 26, 2019 13:50

Add implementation of directory saving

4c43dc8

basic tests for s3 observer

4d18ad3

remove unused imports

4524364

fix format string

12bb9cc

Add boto3 to requirements for testing purposes

f687d9d

fix flake8 issues and add moto for test mocking

bb3ecf5

hopefully fix issue with google compute engine

b688cb6

give up and install google-compute-engine

5de6e42

add default location

0027662

add decode utf to enforce python 3.5 compatibility

d1d7bae

Clean up and add more tests

5b993c2

fix format string

5b44c48

flake8 fixes

25d26c9

remove recursive artifacts

0c20f66

remove newline

883ced1

add more test coverage

a8c3f59

JarnoRFB requested changes Jul 31, 2019

View reviewed changes

JarnoRFB reviewed Jul 31, 2019

View reviewed changes

decodyng and others added 9 commits August 20, 2019 14:56

change observer name, refactor list_s3_subdirs to have an explicit bu…

777a7f5

…cket exists method

add ability to pass in region, and error on Observer creation if regi…

a929db8

…on is not either passed in or set in config file

fix error handling

9788e86

fix comment

f4a0299

Merge branch 'master' into s3_observer

c1f49b9

fix flake8 issues

3dd47bf

explicitly set region in tests

bc5d296

start to write s3observer docs

ab5fafb

Merge branch 's3_observer' of github.com:HumanCompatibleAI/sacred int…

c86fe0c

…o s3_observer

decodyng added 16 commits August 20, 2019 16:01

add requirement in tox file

fedf694

black reformatting

3889e33

pass in region for tests

ed58c96

remove use of os.path.join to get around issues with windows paths

d481d66

more black reformatting

3c2315c

remove comma because of python 3.5 syntax error

f418e34

fix black project toml, update s3_join method, fix tests to use s3_join

553b195

reformat for black modulo mongo comma issue

a6b3415

fix s3_join calls to not pass in list

fb07ac2

changes to mongo

df571a4

fix py35 issues

3d1c48f

sync py35 error cases with master

f5b10a4

add new lines at end of files

bcd1904

sync offending black files with master

e63720d

remove final newline

d83a290

add end quotations to pyproject.toml

558a79c

JarnoRFB reviewed Aug 21, 2019

View reviewed changes

decodyng added 6 commits August 21, 2019 12:17

remove Z from test name

6f99985

fix string line breaks

365d2d1

remove create method and update docs accordingly

548d7c1

remove create method from s3 tests

00d1f34

clean up docs

22ce54f

fix black issue in test_s3_observer, worried this will annoy flake8

3d52c2f

JarnoRFB merged commit 3be8642 into IDSIA:master Aug 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3 FileObserver #542

S3 FileObserver #542

decodyng commented Jul 30, 2019

coveralls commented Jul 30, 2019 •

edited

Loading

decodyng commented Jul 30, 2019

JarnoRFB left a comment

JarnoRFB Jul 31, 2019

decodyng Aug 20, 2019

JarnoRFB Aug 21, 2019

JarnoRFB Jul 31, 2019

decodyng Aug 20, 2019

decodyng Aug 20, 2019

JarnoRFB Jul 31, 2019

JarnoRFB commented Jul 31, 2019

decodyng commented Aug 21, 2019

decodyng commented Aug 21, 2019

JarnoRFB commented Aug 21, 2019

JarnoRFB left a comment

JarnoRFB Aug 21, 2019

JarnoRFB Aug 21, 2019

decodyng Aug 21, 2019

decodyng commented Aug 21, 2019 •

edited

Loading

JarnoRFB commented Aug 22, 2019

JarnoRFB commented Aug 22, 2019

	if '..' in bucket_name or '.-' in bucket_name or '-.' in bucket_name:
	forbidden_words = {"..", ".-", "-.", "_"}
	if any(word in bucket_name for word in forbidden_words):

S3 FileObserver #542

S3 FileObserver #542

Conversation

decodyng commented Jul 30, 2019

coveralls commented Jul 30, 2019 • edited Loading

decodyng commented Jul 30, 2019

JarnoRFB left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JarnoRFB commented Jul 31, 2019

decodyng commented Aug 21, 2019

decodyng commented Aug 21, 2019

JarnoRFB commented Aug 21, 2019

JarnoRFB left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

decodyng commented Aug 21, 2019 • edited Loading

JarnoRFB commented Aug 22, 2019

JarnoRFB commented Aug 22, 2019

coveralls commented Jul 30, 2019 •

edited

Loading

decodyng commented Aug 21, 2019 •

edited

Loading