Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support encoded Partitions letter case handling. #381

Open
daehokimm opened this issue Dec 18, 2020 · 3 comments
Open

Support encoded Partitions letter case handling. #381

daehokimm opened this issue Dec 18, 2020 · 3 comments

Comments

@daehokimm
Copy link

Problem

  • the encoded partition path can contain uppercase.
  • if the S3 path contain uppercase, the AWS Athena can not create metadata for the S3 path. ref

I think, when the encoded partition path's letter case can be handled(or forced to lowercase), s3 sink results will be more useful.

This was referenced Dec 21, 2020
@levzem
Copy link
Contributor

levzem commented Dec 21, 2020

@daehokimm

here were all of the cases that I could think of that could cause uppercase letters in the encoded partition

  1. uppercase topic - solution: use a non-uppercase topic
  2. partitioners
    • default: does not use uppercase - solution: N/A
    • timestamp: controlled by path.format config - solution: just avoid uppercases there
    • field: the field name is uppercase - solution: use an SMT to rename the field and value to a lowercase
  3. topics.dir - solution: just use a lowercase topics.dir

did I miss any? seems like there is a suitable solution to each case.

@daehokimm
Copy link
Author

Hi @levzem

Thanks to your reply.

I think most of the cases you suggested are technically considered. But there are some difficulties operationally which is not technically.
First, kafka topics can contain capital letters. Now I'm constructing a Kafka platform and providing in-house developers with various Kafka clients and connectors. So in-house developers can configure topics with capital letters at any time, and after a while, they can request me to migrate. It's a terrible thing, you know. and this situation can be happened to any who is already using kafka cluster. 😭
And in the case of FieldPartitioner, Json type support and Nested value support are being developed in kafka-connect-storage-common. I think, SMT will need to be more complex when Nested value is applied later.

Considering many other situations, the s3 connector could provide better usability if the option to force characters in the encoded partition to lowercase characters is provided.

@daehokimm
Copy link
Author

Hi @levzem

Can I know what you're thinking about my opinion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants