S3 bindings #68

tunnell · 2018-08-21T08:55:43Z

We can now read from S3 storage. Please be aware that you have to have credentials to do this. It uses boto3. See documentation in the PR for more.

The unique strax key (e.g. 170521_0011_raw_records_58340a130a541c95997fd1c442930427b04eac30) is the bucket name and then objects are chunk filenames. This should look very similar in layout to the files I/O routines.

Are there no tests for I/O? Was thinking if I should mock this.

tunnell · 2018-08-21T10:01:34Z

Fixed some writing issue. Now done and ready for review. @JelleAalbers

tunnell · 2018-08-21T10:01:49Z

Though I guess we wait for the automatic code review and Travis....

JelleAalbers

Looks good!

There are indeed no generic I/O tests, though there are tests for DataDirectory + local file storage. Would be useful to generalize these at some point. Maybe it's a bit tricky for S3 since you'd have to setup a mock S3 provider.

JelleAalbers · 2018-08-21T11:23:43Z

strax/storage/s3.py

+        bk = self.backend_key(key_str)
+
+        # Get list of all buckets / (run_id, lineage)
+        objects_list = self.s3client.list_buckets()


I guess this is fine for now, but is there no way to access a bucket by name? The list of all buckets can get rather large. Guess we don't need this once we have a runs db though.

You use Bucket=self.key later in the Saver's close, so apparently it is possible

As far as I know, this is the only way to determine with boto3's client API if a Bucket exists. I could go the exception route and try to access then catch exception, but I try not to use Exceptions for normal cases.

Exceptions in normal cases are pythonic (https://docs.quantifiedcode.com/python-anti-patterns/readability/asking_for_permission_instead_of_forgiveness_when_working_with_files.html). But in this case, at least according to https://stackoverflow.com/questions/26871884, it seems you get a generic ClientError which you then have to laboriously parse to ensure you're not catching other errors.

Anyway, it's fine, the run db should take care of locating data in the end.

JelleAalbers · 2018-08-21T11:28:53Z

strax/storage/s3.py

+    def _save_chunk(self, data, chunk_info):
+        filename = '%06d' % chunk_info['chunk_i']
+
+        with tempfile.SpooledTemporaryFile() as f:


This file-writing code (open SpooledTemporaryfile, write to it, seek 0, upload to S3) is repeated three times, maybe consider adding a new method?

I see what you mean, but hard to do. One is download, one is upload JSON, one is upload chunk. Therefore, the only line that is the same is the seek (which may not be needed...)

There's another upload json at the end of close. But if you think avoiding duplication increases complexity too much, just leave it for now.

Good point, within the Saver I can do that

tunnell · 2018-08-21T13:40:58Z

Once this goes into release, I'll switch processing to using it.

tunnell added 3 commits August 20, 2018 13:48

Implement S3 bindings

b780678

Merge https://github.com/AxFoundation/strax

e72d1ca

Implement S3 bindings

516fa49

tunnell requested a review from JelleAalbers August 21, 2018 08:55

tunnell and others added 4 commits August 21, 2018 11:25

Merge branch 'master' into s3

b8f2e1d

Codacy fixes

969b08c

Merge branch 's3' of github.com:AxFoundation/strax

d750a83

Fix S3 writing and raise Fuzzy NotImplementedError

e7bd81e

JelleAalbers approved these changes Aug 21, 2018

View reviewed changes

tunnell added the enhancement New feature or request label Aug 21, 2018

tunnell merged commit bbf85f1 into master Aug 21, 2018

tunnell deleted the s3 branch August 21, 2018 13:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3 bindings #68

S3 bindings #68

tunnell commented Aug 21, 2018

tunnell commented Aug 21, 2018

tunnell commented Aug 21, 2018

JelleAalbers left a comment

JelleAalbers Aug 21, 2018

JelleAalbers Aug 21, 2018

tunnell Aug 21, 2018

JelleAalbers Aug 21, 2018

JelleAalbers Aug 21, 2018

tunnell Aug 21, 2018

JelleAalbers Aug 21, 2018

tunnell Aug 21, 2018

tunnell commented Aug 21, 2018

S3 bindings #68

S3 bindings #68

Conversation

tunnell commented Aug 21, 2018

tunnell commented Aug 21, 2018

tunnell commented Aug 21, 2018

JelleAalbers left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tunnell commented Aug 21, 2018