-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingest/s3): type aware directory sorting #8089
feat(ingest/s3): type aware directory sorting #8089
Conversation
…s not a partition column
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main question is around how to best handle folders with multiple equals signs
if num_folder1 == num_folder2: | ||
return 0 | ||
else: | ||
return 1 if num_folder1 > num_folder2 else -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this can be more concisely num_folder1 - num_folder2
if "=" in folder1 and "=" in folder2: | ||
if folder1.split("=", 1)[0] == folder2.split("=", 1)[0]: | ||
folder1 = folder1.split("=", 1)[1] | ||
folder2 = folder2.split("=", 1)[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the structure of these folder names? Is this preferred over using .rsplit("=", 1)
, which seems maybe more flexible in case the folders have multiple equal signs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hive partitioning is usually like :
mydir/2022/12/11
or like this:
mydir/year=2022/month=12/day=11
Hive partitioning is usually like : I think these are the most common ones and for the others it should be fine to compare as string |
Adding comparator to dir sorting which can sort dirs as numbers if it is number
Checklist