Use of s3() within parallel_map()

Consider this call:

with metaflow.S3() as s3i:
result = s3i.info_many(s3_path, return_missing=True)

Can this be put in a metaflow.multicore_utils.parallel_map ?

i.e. parallel_map(wrapper_for_s3_info_many, s3_paths)

When I try, I get this error:
>2024-09-30 23:08:29.391 [261693/start/3201226 (pid 1400063)] metaflow.plugins.datatools.s3.s3.MetaflowS3URLException: Specify S3(run=self) when you use S3 inside a running flow. Otherwise you have to use S3 with full s3:// urls.
2024-09-30 23:08:29.391 [261693/start/3201226 (pid 1400063)] Internal error

However, s3_paths=["s3://path/to/something.jpg","s3://path/to/something_else.jpg", ...] and I know 100% that every path in s3_paths starts with "s3://"

Putting run=self in the S3 instantiation within the wrapper yields
>2024-09-30 23:21:21.832 [261699/start/3201250 (pid 1405459)] S3 non-transient error (attempt #1): s3op failed:
2024-09-30 23:21:21.913 [261699/start/3201250 (pid 1405459)] Invalid url: /
        



    




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use of s3() within parallel_map() #2069

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use of s3() within parallel_map() #2069

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions