Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete expired known endpoints #2108

Merged
merged 1 commit into from
Aug 12, 2020
Merged

Conversation

danielmarbach
Copy link
Contributor

No description provided.


try
{
var query = new IndexQuery
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I first wanted to use something like

Operation operation = store
	.DatabaseCommands
	.DeleteByIndex(
		"Raven/DocumentsByEntityName",
		new IndexQuery
		{
			Query = "Tag:KnownEndpoints AND LastSeen:[* TO {expiryThreshold.Ticks}]"
		},
		new BulkOperationOptions
		{
			AllowStale = false
		});

but then I wasn't able to find delete by index on the document database structure and I realized in order to filter by LastSeen we require an index anyway

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere in the app, we're making the assumption that the total number of known endpoints will generally be contained to < 1024. So why are we doing the select + delete ids in batch dance when we could do a DeleteByIndex using ExpiryKnownEndpointsIndex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaning runs in a bundle, there I have only access to the Document Database type. Like written before I couldn't find a delete by index so I used the same approach as in the audit cleaner to use the batch size settings to chunk for convenience. Regarding the settings yeah we kind of hardcoded page size to be 1024 for now and cutt off the query but we also introduced paging in the other PR so I'm on the fence.

Do you know of a way of doing delete by query on the Document Database object by any chance?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I forgot this was in a bundle. 🤦

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we considered just setting the Raven Expiration Date, sliding it forward for each new entry, and relying on the built-in retention bundle to do the job?

https://ravendb.net/docs/article-page/3.5/Csharp/server/bundles/expiration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dismissed Expiration for now because as of today Expiration is a business feature. Meaning if someone changes the expiration time it will affect all running cycles after the date has been set. This is also doable with Expiration but we would have to apply a set based patch then on the endpoints. I think this is likely coming up again as a discussion point when upgrading Raven so I figured I just stick to the current pattern

Comment on lines +19 to +20
logger.Debug("Trying to find expired KnownEndpoints to delete (with threshold {0})", threshold.ToString(Default.DateTimeFormatsToWrite, CultureInfo.InvariantCulture));
KnownEndpointsCleaner.Clean(deletionBatchSize, database, threshold, token);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's wise to execute this every time, as in most systems this data (while updating the LastSeen parameter) should remain relatively static. Maybe it should have a separate schedule, or run every 10th time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw no reason to diverge from the cleaner pattern we also use in the primary instance

public static Task<TimerJobExecutionResult> RunCleanup(int deletionBatchSize, DocumentDatabase database, Settings settings, CancellationToken token)

Right now it is assumed that the expiry interval and settings apply to all cleanups. Given that it is an async process that runs on a customer defined schedule I saw no problem and it keeps the data consistent with the expiry of processed messages so essentially mimicing more or less the current behaviour.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The collection is also likely to be relatively small so the overhead of checking it will be negligible compared to the audit cleanup


try
{
var query = new IndexQuery
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere in the app, we're making the assumption that the total number of known endpoints will generally be contained to < 1024. So why are we doing the select + delete ids in batch dance when we could do a DeleteByIndex using ExpiryKnownEndpointsIndex?

@@ -16,6 +16,8 @@ public static Task<TimerJobExecutionResult> RunCleanup(int deletionBatchSize, Do

logger.Debug("Trying to find expired ProcessedMessage and SagaHistory documents to delete (with threshold {0})", threshold.ToString(Default.DateTimeFormatsToWrite, CultureInfo.InvariantCulture));
AuditMessageCleaner.Clean(deletionBatchSize, database, threshold, token);
logger.Debug("Trying to find expired KnownEndpoints to delete (with threshold {0})", threshold.ToString(Default.DateTimeFormatsToWrite, CultureInfo.InvariantCulture));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have a separate log line for this and not just include it in the one above? The threshold is the same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stuck to the pattern started in

logger.Debug("Trying to find expired FailedMessage documents to delete (with threshold {0})", threshold.ToString(Default.DateTimeFormatsToWrite, CultureInfo.InvariantCulture));

logger.Debug("Trying to find expired EventLogItem documents to delete (with threshold {0})", threshold.ToString(Default.DateTimeFormatsToWrite, CultureInfo.InvariantCulture));

logger.Debug("Trying to find expired ProcessedMessage and SagaHistory documents to delete (with threshold {0})", threshold.ToString(Default.DateTimeFormatsToWrite, CultureInfo.InvariantCulture));

but even there it is kind of not the same because ProcessedMessages and SagaHistory are in one log message. Happy to change

Comment on lines +19 to +20
logger.Debug("Trying to find expired KnownEndpoints to delete (with threshold {0})", threshold.ToString(Default.DateTimeFormatsToWrite, CultureInfo.InvariantCulture));
KnownEndpointsCleaner.Clean(deletionBatchSize, database, threshold, token);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The collection is also likely to be relatively small so the overhead of checking it will be negligible compared to the audit cleanup


try
{
var query = new IndexQuery
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we considered just setting the Raven Expiration Date, sliding it forward for each new entry, and relying on the built-in retention bundle to do the job?

https://ravendb.net/docs/article-page/3.5/Csharp/server/bundles/expiration

@danielmarbach danielmarbach merged commit 193c3cf into known-endpoints Aug 12, 2020
@danielmarbach danielmarbach deleted the expire-knownendpoints branch August 12, 2020 06:09
@danielmarbach
Copy link
Contributor Author

Merging for now. Will consolidate the logging in the other PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants