Allow Cloud Deep Storage configs without segment bucket or path specified by zachjsh · Pull Request #9588 · apache/druid

zachjsh · 2020-03-30T22:20:14Z

Description

This change fixes a bug that was introduced that causes ingestion
to fail if data is ingested from one of the supported cloud storages
(Azure, Google, S3), and the user is using another type of storage
for deep storage. In this case the all segment killer implementations
are instantiated. A change recently made forced a dependency between
the supported cloud storage type SegmentKiller classes and the
deep storage configuration for that storage type being set, which
forced the deep storage bucket and prefix to be non-null. This caused
a NullPointerException to be thrown when instantiating the
SegmentKiller classes during ingestion.

To fix this issue, the respective deep storage segment configs for the
cloud storage types supported in druid are now allowed to have nullable
bucket and prefix configurations

This PR has:

been self-reviewed.
- using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths.
added integration tests.
been tested in a test Druid cluster.

…or path This change fixes a bug that was introduced that causes ingestion to fail if data is ingested from one of the supported cloud storages (Azure, Google, S3), and the user is using another type of storage for deep storage. In this case the all segment killer implementations are instantiated. A change recently made forced a dependency between the supported cloud storage type SegmentKiller classes and the deep storage configuration for that storage type being set, which forced the deep storage bucket and prefix to be non-null. This caused a NullPointerException to be thrown when instantiating the SegmentKiller classes during ingestion. To fix this issue, the respective deep storage segment configs for the cloud storage types supported in druid are now allowed to have nullable bucket and prefix configurations

zachjsh · 2020-03-30T23:16:43Z

Manual Tests ran:

Start local druid cluster with S3 deep strorage, and ingest data from google storage.
Start local druid cluster with S3 deep strorage, and ingest data from azure storage.

Both tests succeeded with these changes, and failed without

jihoonson · 2020-03-30T23:22:48Z

As I commented on a similar PR, I'm wondering whether it's better to split the inputSources and the deep storage types into different extensions. I think it's better to split them because

the configurations will be still non-nulls and thus you don't have to add null checks everywhere which is less error-prone.
the error will be still thrown during the initialization instead of when a relevant method is called while the cluster is running. This could matter because it would be nice to fail fast when some configuration is accidentally wrong so that users don't have to restart the cluster whenever they find wrong configurations.
if we unify all extension inputSources as a separate extension, users can just choose the new extension to use all input source types which is better than listing all different extensions in the load list.

jihoonson · 2020-03-30T23:32:15Z

Please regard my comment as a non-blocker. The approach of this PR looks good to me as a short term solution.

clintropolis

lgtm, thanks 👍

jihoonson

LGTM

…fied (apache#9588) * Allow Cloud SegmentKillers to be instantiated without segment bucket or path This change fixes a bug that was introduced that causes ingestion to fail if data is ingested from one of the supported cloud storages (Azure, Google, S3), and the user is using another type of storage for deep storage. In this case the all segment killer implementations are instantiated. A change recently made forced a dependency between the supported cloud storage type SegmentKiller classes and the deep storage configuration for that storage type being set, which forced the deep storage bucket and prefix to be non-null. This caused a NullPointerException to be thrown when instantiating the SegmentKiller classes during ingestion. To fix this issue, the respective deep storage segment configs for the cloud storage types supported in druid are now allowed to have nullable bucket and prefix configurations * * Allow google deep storage bucket to be null

…fied (#9588) (#9601) * Allow Cloud SegmentKillers to be instantiated without segment bucket or path This change fixes a bug that was introduced that causes ingestion to fail if data is ingested from one of the supported cloud storages (Azure, Google, S3), and the user is using another type of storage for deep storage. In this case the all segment killer implementations are instantiated. A change recently made forced a dependency between the supported cloud storage type SegmentKiller classes and the deep storage configuration for that storage type being set, which forced the deep storage bucket and prefix to be non-null. This caused a NullPointerException to be thrown when instantiating the SegmentKiller classes during ingestion. To fix this issue, the respective deep storage segment configs for the cloud storage types supported in druid are now allowed to have nullable bucket and prefix configurations * * Allow google deep storage bucket to be null Co-authored-by: zachjsh <zachjsh@gmail.com>

zachjsh added 2 commits March 30, 2020 15:10

* Allow google deep storage bucket to be null

708639f

zachjsh changed the title ~~Allow Cloud SegmentKillers to be instantiated without segment bucket or path~~ Allow Cloud configs without segment bucket or path specified Mar 30, 2020

zachjsh changed the title ~~Allow Cloud configs without segment bucket or path specified~~ Allow Cloud Deep Storage configs without segment bucket or path specified Mar 30, 2020

clintropolis added Area - Deep Storage Area - Operations labels Mar 30, 2020

clintropolis approved these changes Apr 1, 2020

View reviewed changes

clintropolis added the Bug label Apr 1, 2020

jihoonson approved these changes Apr 1, 2020

View reviewed changes

clintropolis merged commit e855c7f into apache:master Apr 1, 2020

zachjsh deleted the IMPLY-2574 branch April 1, 2020 19:35

clintropolis mentioned this pull request Apr 1, 2020

[Backport] Allow Cloud Deep Storage configs without segment bucket or path specified #9601

Merged

jihoonson added this to the 0.18.0 milestone Apr 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow Cloud Deep Storage configs without segment bucket or path specified#9588

Allow Cloud Deep Storage configs without segment bucket or path specified#9588
clintropolis merged 2 commits intoapache:masterfrom
zachjsh:IMPLY-2574

zachjsh commented Mar 30, 2020

Uh oh!

zachjsh commented Mar 30, 2020

Uh oh!

jihoonson commented Mar 30, 2020

Uh oh!

jihoonson commented Mar 30, 2020 •

edited

Loading

Uh oh!

clintropolis left a comment

Uh oh!

jihoonson left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zachjsh commented Mar 30, 2020

Description

Uh oh!

zachjsh commented Mar 30, 2020

Uh oh!

jihoonson commented Mar 30, 2020

Uh oh!

jihoonson commented Mar 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clintropolis left a comment

Choose a reason for hiding this comment

Uh oh!

jihoonson left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jihoonson commented Mar 30, 2020 •

edited

Loading