-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Consolidated segment metadata management #6849
Comments
I think this is a big simplicity win. Most people are already using replicated metadata databases with PITR (or managed options like Amazon RDS/Google Cloud SQL). |
This is probably a good idea. The descriptors aren't used for anything besides the A question: how much of the |
It's quite limited. Only As I noted in the proposal, I think replicating metadata store is a better solution for disaster recovery. I would imagine that deep storage is likely broken if the metadata store is broken by some sort of disaster and they are in the same data center. |
I have seen people in past using insert-segment-to-db tool to recover metadata storage db segments. |
@nishantmonu51 thank you for the comment! It makes sense and I will update the doc. |
@jihoonson This was a cool change. I have a question regarding the existing descriptor.json files after the upgrade. Do they need to be manually removed? We use HDFS as a deep store so it would be nice to reduce the file count from Druid after upgrade validation. Thanks! |
They don't need to be manually removed, but you can if you want to. |
@capistrant as @gianm said, you don't have to. This issue exists only for HDFS deep storage. The roll back should be fine with other types of deep storage. |
Motivation
Druid currently stores segment metadata in two places, i.e., metadata store and deep storage. In metadata store, segment metadata is stored in
segments
table. In deep storage, it's stored indescriptor.json
file.Druid core retrieves segment metadata only from the metadata store, and only
insert-segment-to-db
tool usesdescriptor.json
file to find segment files in deep storage.However, storing metadata in two different places has several drawbacks.
insert-segment-to-db
tool seems to be introduced for recovery when the metadata store is broken (#1861), but another approach should be employed to handle that kind of error, e.g., replicating metadata store.Public Interfaces
DataSegmentFinder
will be removed.Proposed Changes
The segment metadata is stored only in the metadata store.
Compatibility, Deprecation, and Migration Plan
This is not a backward compatible change.
descriptor.json
file is no longer stored in deep storage.insert-segment-to-db
tool will be removed.Deep storage migration would become simpler since segment metadata needs to be updated in only metadata store.
Test Plan
N/A
Rejected Alternatives
N/A
The text was updated successfully, but these errors were encountered: