[QUESTION] Is it possible to use metaflow.S3 outside of standard metaflow use?

First of all, my apologies if this is not the appropriate channel for questions.

After researching metaflow, I believe metaflow's implementation of parallel s3 reading is significantly faster than other alternatives (looking for a replacement of dask internal s3fs reading logic).

However, I cant seem to be able to read valid data from metaflow.S3. Here is a snippet that shows my issue:
```
S3_PATH = "s3://s3-bucket/path/"
s3 = S3(s3root=S3_PATH) 
s3 = s3.__enter__() # issue is the same with context manager
data = s3.get_all() # read all files in the root

first_file = data[0]
first_file.text   
```

The text representation of the files seems to be encoded in a way I cant figure out how to deserialize? How can we properly deserialize these string representations?

I also tried deserializing the locally downloaded data using metaflow datastore 

``` 
with gzip.GzipFile(data[0].path, mode="rb") as f: 
    r = f.read()
```

Yields `Not a gzipped file (b'15')` 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] Is it possible to use metaflow.S3 outside of standard metaflow use? #183

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[QUESTION] Is it possible to use metaflow.S3 outside of standard metaflow use? #183

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions