Add ability to schedule recurring incremental backups #359

HoustonPutman · 2021-10-28T21:49:28Z

Resolves #303

Still needs a bit of love, but the bulk of the work should be there.

TODO:

Documentation & Changelog entry
Cleaning up logic around restarts
Rebase after other backup PRs are merged
Manual testing

gerlowskija · 2021-10-29T17:05:52Z

Whoa man, this is awesome. Thanks for picking this up - you took it right off my TODO list haha.

I've just taken a glance so far but intend to give it a more thorough review this afternoon (or Monday at latest).

One preliminary comment that I'm sure will put my foot solidly in my mouth once I actually review the code: I'm surprised to see that the scheduling functionality is exposed in the 'solrbackup' CRD as opposed to a new, separate resource type.

Did you consider a separate CRD at all, and if so was there a reason you decided to do same-CRD?

(When I thought of starting this work myself I assumed the backup-schedule functionality would have to be a separate CRD - so now I'm trying to recall whether that was just an assumption on my part or whether there was a particular reason behind that thought haha

Maybe it was jut a conceptual misunderstanding or inflexibility on my part? In my head I've always conceptualized 'solrbackup' as being scoped to tracking the progress of a single, one-off backup. A user might be surprised that some solrbackup resources represent one-offs that will never impact the cluster again, and others represent ongoing periodic work that'll continue indefinitely.)

HoustonPutman · 2021-10-29T17:36:52Z

I never really considered a separate CRD. Generally CRDs are a necessary evil, and therefore the less the better IMO. The reason why this fits really well into a single CRD, in my opinion, is that the options would be identical between the two, except the recurring backup has an additional setting for recurrence. Because of this, it makes a ton of since to have them share the CRD. Otherwise every option we add to the SolrBackup CRD, we would also have to add to the RecurringSolrBackup CRD.

Think of a kubernetes Headless Service / regular Service (with clusterIP). They are pretty different in usage, much further apart than SolrBackup and RecurringSolrBackup would be. However they share a single resource type, because they share almost every option, and are used for the same reasons. Users can distinguish them easily enough by looking at the options. (described more below)

A CRD & controller pair is complex, the less complexity we add to the project, the better.

A user might be surprised that some solrbackup resources represent one-offs that will never impact the cluster again, and others represent ongoing periodic work that'll continue indefinitely.

We can fix this quite easily. When you run kubectl get solrbackups, the data it returns is not random, it is selected by us here. We can add a column(s) here to make it much more clear to the user what the solrbackup resource is doing. Maybe we add "Schedule" and/or "Next Backup Time". That way when you run kubectl get solrbackups, it is very easy to tell which ones are one-off, and which ones are recurring.

I also really like the idea of someone taking a backup and then realizing they need it to be done on a schedule. They wouldn't need to delete the SolrBackup resource or create something else. All they would have to do is edit their SolrBackup resource, adding the recurrence section. At that point, the controller would work fine and treat it as a recurring backup that has been taken once already.

gerlowskija · 2021-10-29T17:46:24Z

Cool - that all makes sense to me, thanks for clarifying!

gerlowskija

This looks awesome!

All of the code itself looks good (aside from tests which I imagine you just haven't gotten to yet). I do have two suggestions/question about the feature overall though:

Confusingly, incremental backup 'deletion' doesn't really do a full delete: instead it removes only the metadata associated with the backup, potentially leaving index-files behind. Any backed up index-files that are no longer used must then be deleted by a separate garbage-collection process called "purging."
IMO we should handle this purging process for users: either by triggering a "purge" on every backup, or by exposing a purgeEvery setting alongside maxSaved to allow a less aggressive approach.
I can imagine a lot of users might use recurrence but want some way to temporarily disable it while they do a perf test, upgrade, bulk data loading, etc. What do you think about putting an "enabled" flag in BackupRecurrence to let users temporarily disable things without totally deleting the solrbackup?

gerlowskija · 2021-11-01T20:03:45Z

api/v1beta1/solrbackup_types.go

+	// +kubebuilder:default:=10
+	// +kubebuilder:validation:Minimum:=1
+	// +optional
+	MaxSaved int `json:"maxSaved,omitempty"`


[0] This is mostly subjective preference, but I'd vote for a smaller default value here. For collections with a lot of churn, 10 backups might be a lot of data to store by default. Would you be open to a lower default like 3 or 5?

Very open. I literally just chose a number here. I am very happy to differ to whatever you think it should be. Give the call and it is done.

I set the default to 5

HoustonPutman · 2021-11-02T15:04:10Z

I'm content with a purgeEvery option in Solr. I'm not sure I understand why you would want un-garbage collected index files laying about, but that's definitely a different issue. We can definitely go ahead and purge after every backup until the purgeEvery option is added. (Oh now I realize you mean a purgeEvery option in the CRD. I think we can assume that would be equivalent to saveEvery.)
I did talk with @anshumg about a ticket that saw some issues with purging that does make me a bit wary to do it always. https://issues.apache.org/jira/browse/SOLR-15706
We should make sure that's not an issue before going forward with that.
That's a good idea. I do think I want to have it "enabled" by default. Which is easy to do with those kubebuilder annotations.

gerlowskija · 2021-11-02T15:15:08Z

now I realize you mean a purgeEvery option in the CRD. I think we can assume that would be equivalent to saveEvery.

Yep, that's what I was suggesting - an option in the CRD to expose the option that's already available in Solr. That SOLR-15703 you pointed to looks very serious 😬 . I'll take a look and hopefully get the Solr concern out of the way so we can consider exposure in the operator.

When you mention "saveEvery" I assume you mean maxSaved? Or is there some other "saveEvery" param I missed?

HoustonPutman · 2021-11-02T15:33:52Z

When you mention "saveEvery" I assume you mean maxSaved? Or is there some other "saveEvery" param I missed?

I mis-typed. I meant maxSaved 🙂

HoustonPutman · 2021-11-08T20:47:09Z

@gerlowskija, given that https://issues.apache.org/jira/browse/SOLR-15706 is not released yet, maybe we just wait to add in the purge call until the next Solr Operator release.

HoustonPutman · 2021-11-09T16:25:21Z

@gerlowskija almost forgot, but I added the disabled flag and it works really well!

First pass at scheduled backups.

99af40d

gerlowskija requested changes Nov 1, 2021

View reviewed changes

HoustonPutman added 8 commits November 8, 2021 12:36

Merge branch 'main' into scheduled-backups

fa10623

Fix issues with merge.

5eef369

Change Backup maxSaved from 10 to 5.

5f644e2

Doesn't work but getting closer. Status is not being written

edc6632

Currently works. Still need to test with recurrence

ad7111a

Working with recurrence. Fixed up print columns as well.

b508128

Formatting and generation.

ae31868

Extraneous logging bugs.

c5b92d4

HoustonPutman added 3 commits November 8, 2021 15:53

Add ability to remove schedule before next run

d919a8d

Add changelog entry and documentation.

8b92e47

Add disabled flag for recurrence.

3544a15

HoustonPutman merged commit ff3bcd1 into apache:main Nov 9, 2021

HoustonPutman deleted the scheduled-backups branch November 9, 2021 17:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to schedule recurring incremental backups #359

Add ability to schedule recurring incremental backups #359

HoustonPutman commented Oct 28, 2021 •

edited

gerlowskija commented Oct 29, 2021

HoustonPutman commented Oct 29, 2021

gerlowskija commented Oct 29, 2021

gerlowskija left a comment

gerlowskija Nov 1, 2021

HoustonPutman Nov 2, 2021

HoustonPutman Nov 9, 2021

HoustonPutman commented Nov 2, 2021

gerlowskija commented Nov 2, 2021 •

edited

HoustonPutman commented Nov 2, 2021

HoustonPutman commented Nov 8, 2021

HoustonPutman commented Nov 9, 2021

Add ability to schedule recurring incremental backups #359

Add ability to schedule recurring incremental backups #359

Conversation

HoustonPutman commented Oct 28, 2021 • edited

gerlowskija commented Oct 29, 2021

HoustonPutman commented Oct 29, 2021

gerlowskija commented Oct 29, 2021

gerlowskija left a comment

Choose a reason for hiding this comment

gerlowskija Nov 1, 2021

Choose a reason for hiding this comment

HoustonPutman Nov 2, 2021

Choose a reason for hiding this comment

HoustonPutman Nov 9, 2021

Choose a reason for hiding this comment

HoustonPutman commented Nov 2, 2021

gerlowskija commented Nov 2, 2021 • edited

HoustonPutman commented Nov 2, 2021

HoustonPutman commented Nov 8, 2021

HoustonPutman commented Nov 9, 2021

HoustonPutman commented Oct 28, 2021 •

edited

gerlowskija commented Nov 2, 2021 •

edited