implement dj.Bucket class to handle S3 external storage operations.#358
implement dj.Bucket class to handle S3 external storage operations.#358eywalker merged 9 commits intodatajoint:masterfrom
Conversation
Currently not hooked into actual dj code; awaiting base external file logic 1st. S3 functionality implemented via 'boto3' package; unit tests are currently MOCKED using 'moto' S3 mock library due to difficulties w/r/t credential mgmt.
|
Apparently dependencies are not properly defined for the boto/moto libraries for the unit tests to work in the travis envirionment - investigating. |
|
Noticed I did not incorporate API as defined in #204 here. Will fix to match after I determine the issue with travis. |
Current environment mis-pulls in a /usr/share/google.* copy of a file, which is not python 3x compatible, and so builds fail. See also: GoogleCloudPlatform/compute-image-packages#213 which outlines ubuntu cloud images are not updated.
1 similar comment
This reverts commit 126d9fc.
|
On further thinking - not yet going to implement API as outlined in #204 until feedback; the items there might have deviated w/r/t 'base' external file implementation which is essentially the 'reference' spec.. any adjustments can be handled during actual pre-merge process. |
|
I suggest that we close this PR for now and make a new one when it is actually ready for review and merge. |
|
Actually let's accept it. It does not hurt anything. Our next release 0.9 will need to handle S3 buckets. |
datajoint/external.py
Outdated
| self.connect() | ||
| r = self._s3.Object(self._bucket, rpath).delete() | ||
| try: | ||
| if r['ResponseMetadata']['HTTPStatusCode'] == 204: |
There was a problem hiding this comment.
return r['ResponseMetadata']['HTTPStatusCode'] == 204
datajoint/external.py
Outdated
| self.connect() | ||
| self._s3.Object(self._bucket, rpath).load() | ||
| except ClientError as e: | ||
| if e.response['Error']['Code'] == "404": |
There was a problem hiding this comment.
I think the following would be better
if e.response['Error']['Code'] != "404":
raise DataJointError('error checking remote file')
return FalseThere was a problem hiding this comment.
Had some thoughts about this when implementing - to some extent it depends on
a) consistency with API in other extfile stuff
b) notion of default processing vs exceptions
this call is asking 'if a file exists' -
False indicates it does not; an exception indicates the program was not able to successfully check.
Essentially this logic flips the S3 API to more closely mimic something like stat(2) so that things like:
if not mybucket.stat('/remote/path'):
mybucket.put('/local/path', '/remote/path')
work rather than needing to do:
try:
mybucket.stat('/remote/path')
except aws.NoS3FileErrorExceptionGizmo:
mybucket.put('/local/path','/remote/path')
finally:
raise DataJointError('your s3 doesnt work')
There was a problem hiding this comment.
misunderstanding corrected, as is the code :)
datajoint/external.py
Outdated
| try: | ||
| self._s3.Object(self._bucket, rpath).upload_file(lpath) | ||
| except: | ||
| raise DataJointError('Error uploading file') |
There was a problem hiding this comment.
make error message more informative
There was a problem hiding this comment.
new diff incorporates file paths and stringified exception to all error messages.
datajoint/external.py
Outdated
| try: | ||
| self._s3.Object(self._bucket, rpath).download_file(lpath) | ||
| except Exception as e: | ||
| raise DataJointError('file download error') |
There was a problem hiding this comment.
make error message more informative
There was a problem hiding this comment.
new diff incorporates file paths and stringified exception to all error messages.
datajoint/external.py
Outdated
| else: | ||
| raise DataJointError('error checking remote file') | ||
| if e.response['Error']['Code'] != "404": | ||
| raise DataJointError('Error checking remote file', str(rpath), |
There was a problem hiding this comment.
Throughout the module, I think it's better to use formatted strings than string sums. For example, this line should be
raise DataJointError('Error checking remote file {p} ({e})'.format(p=rpath, e=e))It's more explicit and efficient.
There was a problem hiding this comment.
switched.. and agreed (assuming you are correct) for the general case of 'how to manage strings in the codebase'.. that said if we are worried about string processing efficiency within exception constructors we have bigger problems on our hands :D
There was a problem hiding this comment.
disagree this is more explicit however (even if more efficient processing wise).. but in any event, no biggie and will go with the flow
Per S3 external fire desired feature in #204
Currently not hooked into actual dj code; awaiting base external file logic 1st.
S3 functionality implemented via 'boto3' package; unit tests are currently
MOCKED using 'moto' S3 mock library due to difficulties w/r/t credential mgmt.