-
Notifications
You must be signed in to change notification settings - Fork 941
Description
Consider this call:
with metaflow.S3() as s3i:
result = s3i.info_many(s3_path, return_missing=True)
Can this be put in a metaflow.multicore_utils.parallel_map ?
i.e. parallel_map(wrapper_for_s3_info_many, s3_paths)
When I try, I get this error:
2024-09-30 23:08:29.391 [261693/start/3201226 (pid 1400063)] metaflow.plugins.datatools.s3.s3.MetaflowS3URLException: Specify S3(run=self) when you use S3 inside a running flow. Otherwise you have to use S3 with full s3:// urls.
2024-09-30 23:08:29.391 [261693/start/3201226 (pid 1400063)] Internal error
However, s3_paths=["s3://path/to/something.jpg","s3://path/to/something_else.jpg", ...] and I know 100% that every path in s3_paths starts with "s3://"
Putting run=self in the S3 instantiation within the wrapper yields
2024-09-30 23:21:21.832 [261699/start/3201250 (pid 1405459)] S3 non-transient error (attempt #1): s3op failed:
2024-09-30 23:21:21.913 [261699/start/3201250 (pid 1405459)] Invalid url: /