Skip to content

Airflow Implicit rule on removing S3 path trailing slash #15664

@Isaacwhyuenac

Description

@Isaacwhyuenac

Hi, I found the behaviour that removes the trailing slash on S3 path intriguing and unnecessary.

Under the scheme, if in s3 there are

2021-04-30 10:38:27   11.7 KiB agg/user_segmentation/tag_article_user_analysis/partition_0=2021-04-29/20210430_023708_00016_g2f4v_31a98366-e1a0-428e-a57f-3c4b0fe2bb79
2021-04-30 10:38:27   49.1 KiB agg/user_segmentation/tag_article_user_analysis/partition_0=2021-04-29/20210430_023708_00016_g2f4v_55438602-3787-4c09-9fad-562e7a6786cb
2021-04-30 10:38:31   10.6 KiB agg/user_segmentation/tag_article_user_analysis/partition_0=2021-04-29/20210430_023708_00016_g2f4v_6773215f-1697-4c99-9e94-f7961e86af62
2021-04-30 10:38:31   27.1 KiB agg/user_segmentation/tag_article_user_analysis/partition_0=2021-04-29/20210430_023708_00016_g2f4v_69c952f5-97b4-45e9-b790-fc7830fb2150
2021-04-30 10:38:31  131.2 KiB agg/user_segmentation/tag_article_user_analysis/partition_0=2021-04-29/20210430_023708_00016_g2f4v_b4b995f5-211d-4d46-bd9a-86912b29d978
2021-04-30 10:38:27  166.2 KiB agg/user_segmentation/tag_article_user_analysis/partition_0=2021-04-29/20210430_023708_00016_g2f4v_bbcebd80-c280-4e66-9431-9a626df8bc33
2021-04-30 10:38:30  171.6 KiB agg/user_segmentation/tag_article_user_analysis/partition_0=2021-04-29/20210430_023708_00016_g2f4v_f4ef423f-cf70-4f71-960e-70f1bdddaf3d

2021-04-30 10:38:27   11.7 KiB agg/user_segmentation/tag_article_user_analysis_v2/partition_0=2021-04-29/20210430_023708_00016_g2f4v_31a98366-e1a0-428e-a57f-3c4b0fe2bb79
2021-04-30 10:38:27   49.1 KiB agg/user_segmentation/tag_article_user_analysis_v2/partition_0=2021-04-29/20210430_023708_00016_g2f4v_55438602-3787-4c09-9fad-562e7a6786cb
2021-04-30 10:38:31   10.6 KiB agg/user_segmentation/tag_article_user_analysis_v2/partition_0=2021-04-29/20210430_023708_00016_g2f4v_6773215f-1697-4c99-9e94-f7961e86af62
2021-04-30 10:38:31   27.1 KiB agg/user_segmentation/tag_article_user_analysis_v2/partition_0=2021-04-29/20210430_023708_00016_g2f4v_69c952f5-97b4-45e9-b790-fc7830fb2150
2021-04-30 10:38:31  131.2 KiB agg/user_segmentation/tag_article_user_analysis_v2/partition_0=2021-04-29/20210430_023708_00016_g2f4v_b4b995f5-211d-4d46-bd9a-86912b29d978
2021-04-30 10:38:27  166.2 KiB agg/user_segmentation/tag_article_user_analysis_v2/partition_0=2021-04-29/20210430_023708_00016_g2f4v_bbcebd80-c280-4e66-9431-9a626df8bc33
2021-04-30 10:38:30  171.6 KiB agg/user_segmentation/tag_article_user_analysis_v2/partition_0=2021-04-29/20210430_023708_00016_g2f4v_f4ef423f-cf70-4f71-960e-70f1bdddaf3d

If we only want to match agg/user_segmentation/tag_article_user_analysis/, the agg/user_segmentation/tag_article_user_analysis_v2 pattern will also be removed under the current s3 path processor. Developer should have the freedom to choose what pattern they want to match instead of forcing a pattern matching for them.

Created a PR on this issue.
#15609

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions