Skip to content

implement dj.Bucket class to handle S3 external storage operations.#358

Merged
eywalker merged 9 commits intodatajoint:masterfrom
ixcat:master
Oct 20, 2017
Merged

implement dj.Bucket class to handle S3 external storage operations.#358
eywalker merged 9 commits intodatajoint:masterfrom
ixcat:master

Conversation

@ixcat
Copy link
Copy Markdown

@ixcat ixcat commented Aug 21, 2017

Per S3 external fire desired feature in #204

Currently not hooked into actual dj code; awaiting base external file logic 1st.

S3 functionality implemented via 'boto3' package; unit tests are currently
MOCKED using 'moto' S3 mock library due to difficulties w/r/t credential mgmt.

Currently not hooked into actual dj code; awaiting base external file logic 1st.

S3 functionality implemented via 'boto3' package; unit tests are currently
MOCKED using 'moto' S3 mock library due to difficulties w/r/t credential mgmt.
@ixcat
Copy link
Copy Markdown
Author

ixcat commented Aug 21, 2017

Apparently dependencies are not properly defined for the boto/moto libraries for the unit tests to work in the travis envirionment - investigating.

@ixcat
Copy link
Copy Markdown
Author

ixcat commented Aug 21, 2017

Noticed I did not incorporate API as defined in #204 here. Will fix to match after I determine the issue with travis.

Chris Turner added 2 commits August 21, 2017 11:51
Current environment mis-pulls in a /usr/share/google.* copy of a file,
which is not python 3x compatible, and so builds fail.

See also:

GoogleCloudPlatform/compute-image-packages#213
which outlines ubuntu cloud images are not updated.
@coveralls
Copy link
Copy Markdown

coveralls commented Aug 21, 2017

Coverage Status

Coverage decreased (-0.4%) to 92.175% when pulling b2cae86 on ixcat:master into fb34601 on datajoint:master.

1 similar comment
@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage decreased (-0.4%) to 92.175% when pulling b2cae86 on ixcat:master into fb34601 on datajoint:master.

@coveralls
Copy link
Copy Markdown

coveralls commented Aug 21, 2017

Coverage Status

Coverage decreased (-0.4%) to 92.175% when pulling 9acfbd6 on ixcat:master into fb34601 on datajoint:master.

@ixcat
Copy link
Copy Markdown
Author

ixcat commented Aug 21, 2017

On further thinking - not yet going to implement API as outlined in #204 until feedback; the items there might have deviated w/r/t 'base' external file implementation which is essentially the 'reference' spec.. any adjustments can be handled during actual pre-merge process.

@eywalker
Copy link
Copy Markdown
Contributor

eywalker commented Oct 2, 2017

I suggest that we close this PR for now and make a new one when it is actually ready for review and merge.

@dimitri-yatsenko dimitri-yatsenko added this to the Release 0.9 milestone Oct 2, 2017
@dimitri-yatsenko
Copy link
Copy Markdown
Member

Actually let's accept it. It does not hurt anything. Our next release 0.9 will need to handle S3 buckets.

self.connect()
r = self._s3.Object(self._bucket, rpath).delete()
try:
if r['ResponseMetadata']['HTTPStatusCode'] == 204:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return r['ResponseMetadata']['HTTPStatusCode'] == 204

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

included in new diff.

self.connect()
self._s3.Object(self._bucket, rpath).load()
except ClientError as e:
if e.response['Error']['Code'] == "404":
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the following would be better

        if e.response['Error']['Code'] != "404":
            raise DataJointError('error checking remote file')
        return False

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had some thoughts about this when implementing - to some extent it depends on

a) consistency with API in other extfile stuff
b) notion of default processing vs exceptions

this call is asking 'if a file exists' -
False indicates it does not; an exception indicates the program was not able to successfully check.

Essentially this logic flips the S3 API to more closely mimic something like stat(2) so that things like:

if not mybucket.stat('/remote/path'):
     mybucket.put('/local/path', '/remote/path')

work rather than needing to do:

try:
   mybucket.stat('/remote/path')
except aws.NoS3FileErrorExceptionGizmo:
   mybucket.put('/local/path','/remote/path')
finally:
   raise DataJointError('your s3 doesnt work')

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

misunderstanding corrected, as is the code :)

try:
self._s3.Object(self._bucket, rpath).upload_file(lpath)
except:
raise DataJointError('Error uploading file')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make error message more informative

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new diff incorporates file paths and stringified exception to all error messages.

try:
self._s3.Object(self._bucket, rpath).download_file(lpath)
except Exception as e:
raise DataJointError('file download error')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make error message more informative

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new diff incorporates file paths and stringified exception to all error messages.

@coveralls
Copy link
Copy Markdown

coveralls commented Oct 20, 2017

Coverage Status

Coverage decreased (-0.4%) to 92.039% when pulling 0772a41 on ixcat:master into 40bb5a7 on datajoint:master.

@coveralls
Copy link
Copy Markdown

coveralls commented Oct 20, 2017

Coverage Status

Coverage decreased (-0.4%) to 92.047% when pulling 80c3b26 on ixcat:master into 40bb5a7 on datajoint:master.

else:
raise DataJointError('error checking remote file')
if e.response['Error']['Code'] != "404":
raise DataJointError('Error checking remote file', str(rpath),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throughout the module, I think it's better to use formatted strings than string sums. For example, this line should be

raise DataJointError('Error checking remote file {p} ({e})'.format(p=rpath, e=e))

It's more explicit and efficient.

Copy link
Copy Markdown
Author

@ixcat ixcat Oct 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switched.. and agreed (assuming you are correct) for the general case of 'how to manage strings in the codebase'.. that said if we are worried about string processing efficiency within exception constructors we have bigger problems on our hands :D

Copy link
Copy Markdown
Author

@ixcat ixcat Oct 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disagree this is more explicit however (even if more efficient processing wise).. but in any event, no biggie and will go with the flow

@coveralls
Copy link
Copy Markdown

coveralls commented Oct 20, 2017

Coverage Status

Coverage decreased (-0.4%) to 92.047% when pulling 30ade48 on ixcat:master into 40bb5a7 on datajoint:master.

@coveralls
Copy link
Copy Markdown

coveralls commented Oct 20, 2017

Coverage Status

Coverage decreased (-0.4%) to 92.047% when pulling c8a4763 on ixcat:master into 40bb5a7 on datajoint:master.

@eywalker eywalker merged commit 54cd18f into datajoint:master Oct 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants