-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add feature to automatically remove datasource metadata based on retention period #11227
Conversation
) | ||
); | ||
for (String datasourceMetadataInDb : datasources) { | ||
if (!excludeDatasources.contains(datasourceMetadataInDb)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possible NPE - excludeDatasources
is marked as nullable in the function definition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
excludeDatasources should be @NotNull
. Fixed
server/src/main/java/org/apache/druid/indexing/overlord/IndexerMetadataStorageCoordinator.java
Show resolved
Hide resolved
.mapTo(String.class) | ||
.list() | ||
); | ||
return connector.getDBI().withHandle( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if an exception is thrown while trying to delete the datasources? withHandle
will throw a CallbackFailedException
- is this handled somewhere else in the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added try catch block
handle -> { | ||
final PreparedBatch batch = handle.prepareBatch( | ||
StringUtils.format( | ||
"DELETE FROM %1$s WHERE dataSource = :dataSource AND created_date < '%2$s'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you choose to build the delete statements one at a time instead of doing a batch delete?
I think we could encapsulate the excludeDatasources
logic in a where clause of this delete statement instead.
Something like DELETE FROM datasources where created_date < "date" and datasource not in ("excludeDataSources")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to prevent the IN clause being too large
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also moved the filtering of the datasources to outside the handle block.
if ((lastKillTime + period) < System.currentTimeMillis()) { | ||
lastKillTime = System.currentTimeMillis(); | ||
long timestamp = System.currentTimeMillis() - retainDuration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use consistent timestamp for all calculations
if ((lastKillTime + period) < System.currentTimeMillis()) { | |
lastKillTime = System.currentTimeMillis(); | |
long timestamp = System.currentTimeMillis() - retainDuration; | |
long currentTimeMillis = System.currentTimeMillis(); | |
if ((lastKillTime + period) < currentTimeMillis) { | |
lastKillTime = currentTimeMillis; | |
long timestamp = currentTimeMillis - retainDuration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional question - I notice this pattern in a few other co-ordinator duties.
Are there any additional safeguards we need for a very large retainDuration
? What happens if timestamp
is calculated to be negative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added safeguard so that we never get calculated timestamp
to be negative
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I intentionally didnt specific in the docs that retainDuration have to be less than current timestamp (although we do check against this condition to protect ourself from unexpected behavior), since this is a very unlikely scenario and I don't want to make the docs to be unnecessary verbose.
Set<String> allDatasourceWithActiveSupervisor = allSupervisor.values() | ||
.stream() | ||
// Terminated supervisor will have it's latest supervisorSpec as NoopSupervisorSpec | ||
// (NoopSupervisorSpec is used as a tombstone marker) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is very similar to SQLMetadataSupervisorManager#removeTerminatedSupervisorsOlderThan
Should that logic be moved out of the metadata store layer and pulled into the KillSupervisors
class instead?
Should this logic be shared so that other callers can easily find the "active" supervisors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some convenience methods in SQLMetadataSupervisorManager to getLatestTerminatedOnly and getLatestActiveOnly supervisors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after CI
[ERROR] Errors:
[ERROR] KillDatasourceMetadataTest.unnecessary Mockito stubbings » UnnecessaryStubbing
Add feature to automatically remove datasource metadata based on retention period
Description
We currently already have tasklog auto cleanup (#3677) and audit logs auto cleanup (#11084). This PR adds a similar auto cleanup based on duration (time to retained) but for the datasource metadata table to auto clean up datasource that is no longer active -- meaning that the datasource does not have active supervisor running (Note: datasource metadata only exists for datasource created from supervisor).
This is useful when Druid user has a high churn of task / datasource in a short amount of time causing the metadata store size to grow uncontrollably.
This PR has: