Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes GCS based ingestion #12445

Conversation

tejaswini-imply
Copy link
Member

@tejaswini-imply tejaswini-imply commented Apr 18, 2022

GCP allows buckets in GCS to contain underscores in their names. When this location is mapped to java.net.URI, URI#host comes out to be null as URI doesn't allow it to contain underscores which is being translated to bucket name hence mapping URI#authority to bucket name.

Description

Fixed CloudObjectLocation constructor from URI

  • Previously bucket variable of the CloudObjectLocation is being set by uri.getHost() value (but java.net.URI refuses to handle underscore in host since it's not a valid character and gcs allows underscores in bucket naming)

  • This modification allows uri.getAuthority() to be mapped to bucket name which is allowing underscores to be present (Since all 3 S3, GCS, Azure Cloud buckets uris won't contain port or user information it is assumed that authority will return nothing more than the bucket name)


Key changed/added classes in this PR
  • CloudObjectLocation

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@kfaraz
Copy link
Contributor

kfaraz commented Apr 19, 2022

Thanks for the fix, @tejaswini-imply .
Please add some unit tests to validate that this works in the following cases:

  • with and without underscore
  • GCS, S3 and other implementations

@@ -71,7 +71,7 @@ public CloudObjectLocation(@JsonProperty("bucket") String bucket, @JsonProperty(

public CloudObjectLocation(URI uri)
{
this(uri.getHost(), uri.getPath());
this(uri.getAuthority(), uri.getPath());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we fall back to getAuthority() only if getHost() returns null?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would be more comfortable with that too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work as well.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@kfaraz kfaraz merged commit 177e185 into apache:master Apr 21, 2022
@abhishekagarwal87 abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants