Skip to content

Allow Cloud Deep Storage configs without segment bucket or path specified#9588

Merged
clintropolis merged 2 commits intoapache:masterfrom
zachjsh:IMPLY-2574
Apr 1, 2020
Merged

Allow Cloud Deep Storage configs without segment bucket or path specified#9588
clintropolis merged 2 commits intoapache:masterfrom
zachjsh:IMPLY-2574

Conversation

@zachjsh
Copy link
Contributor

@zachjsh zachjsh commented Mar 30, 2020

Description

This change fixes a bug that was introduced that causes ingestion
to fail if data is ingested from one of the supported cloud storages
(Azure, Google, S3), and the user is using another type of storage
for deep storage. In this case the all segment killer implementations
are instantiated. A change recently made forced a dependency between
the supported cloud storage type SegmentKiller classes and the
deep storage configuration for that storage type being set, which
forced the deep storage bucket and prefix to be non-null. This caused
a NullPointerException to be thrown when instantiating the
SegmentKiller classes during ingestion.

To fix this issue, the respective deep storage segment configs for the
cloud storage types supported in druid are now allowed to have nullable
bucket and prefix configurations

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths.
  • added integration tests.
  • been tested in a test Druid cluster.

zachjsh added 2 commits March 30, 2020 15:10
…or path

This change fixes a bug that was introduced that causes ingestion
to fail if data is ingested from one of the supported cloud storages
(Azure, Google, S3), and the user is using another type of storage
for deep storage. In this case the all segment killer implementations
are instantiated. A change recently made forced a dependency between
the supported cloud storage type SegmentKiller classes and the
deep storage configuration for that storage type being set, which
forced the deep storage bucket and prefix to be non-null. This caused
a NullPointerException to be thrown when instantiating the
SegmentKiller classes during ingestion.

To fix this issue, the respective deep storage segment configs for the
cloud storage types supported in druid are now allowed to have nullable
bucket and prefix configurations
@zachjsh zachjsh changed the title Allow Cloud SegmentKillers to be instantiated without segment bucket or path Allow Cloud configs without segment bucket or path specified Mar 30, 2020
@zachjsh zachjsh changed the title Allow Cloud configs without segment bucket or path specified Allow Cloud Deep Storage configs without segment bucket or path specified Mar 30, 2020
@zachjsh
Copy link
Contributor Author

zachjsh commented Mar 30, 2020

Manual Tests ran:

  1. Start local druid cluster with S3 deep strorage, and ingest data from google storage.

  2. Start local druid cluster with S3 deep strorage, and ingest data from azure storage.

Both tests succeeded with these changes, and failed without

@jihoonson
Copy link
Contributor

As I commented on a similar PR, I'm wondering whether it's better to split the inputSources and the deep storage types into different extensions. I think it's better to split them because

  1. the configurations will be still non-nulls and thus you don't have to add null checks everywhere which is less error-prone.
  2. the error will be still thrown during the initialization instead of when a relevant method is called while the cluster is running. This could matter because it would be nice to fail fast when some configuration is accidentally wrong so that users don't have to restart the cluster whenever they find wrong configurations.
  3. if we unify all extension inputSources as a separate extension, users can just choose the new extension to use all input source types which is better than listing all different extensions in the load list.

@jihoonson
Copy link
Contributor

jihoonson commented Mar 30, 2020

Please regard my comment as a non-blocker. The approach of this PR looks good to me as a short term solution.

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks 👍

Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@clintropolis clintropolis merged commit e855c7f into apache:master Apr 1, 2020
@zachjsh zachjsh deleted the IMPLY-2574 branch April 1, 2020 19:35
clintropolis pushed a commit to clintropolis/druid that referenced this pull request Apr 1, 2020
…fied (apache#9588)

* Allow Cloud SegmentKillers to be instantiated without segment bucket or path

This change fixes a bug that was introduced that causes ingestion
to fail if data is ingested from one of the supported cloud storages
(Azure, Google, S3), and the user is using another type of storage
for deep storage. In this case the all segment killer implementations
are instantiated. A change recently made forced a dependency between
the supported cloud storage type SegmentKiller classes and the
deep storage configuration for that storage type being set, which
forced the deep storage bucket and prefix to be non-null. This caused
a NullPointerException to be thrown when instantiating the
SegmentKiller classes during ingestion.

To fix this issue, the respective deep storage segment configs for the
cloud storage types supported in druid are now allowed to have nullable
bucket and prefix configurations

* * Allow google deep storage bucket to be null
clintropolis added a commit that referenced this pull request Apr 2, 2020
…fied (#9588) (#9601)

* Allow Cloud SegmentKillers to be instantiated without segment bucket or path

This change fixes a bug that was introduced that causes ingestion
to fail if data is ingested from one of the supported cloud storages
(Azure, Google, S3), and the user is using another type of storage
for deep storage. In this case the all segment killer implementations
are instantiated. A change recently made forced a dependency between
the supported cloud storage type SegmentKiller classes and the
deep storage configuration for that storage type being set, which
forced the deep storage bucket and prefix to be non-null. This caused
a NullPointerException to be thrown when instantiating the
SegmentKiller classes during ingestion.

To fix this issue, the respective deep storage segment configs for the
cloud storage types supported in druid are now allowed to have nullable
bucket and prefix configurations

* * Allow google deep storage bucket to be null

Co-authored-by: zachjsh <zachjsh@gmail.com>
@jihoonson jihoonson added this to the 0.18.0 milestone Apr 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants