Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to schedule recurring incremental backups #359

Merged
merged 12 commits into from
Nov 9, 2021

Conversation

HoustonPutman
Copy link
Contributor

@HoustonPutman HoustonPutman commented Oct 28, 2021

Resolves #303

Still needs a bit of love, but the bulk of the work should be there.

TODO:

  • Documentation & Changelog entry
  • Cleaning up logic around restarts
  • Rebase after other backup PRs are merged
  • Manual testing

@gerlowskija
Copy link
Contributor

Whoa man, this is awesome. Thanks for picking this up - you took it right off my TODO list haha.

I've just taken a glance so far but intend to give it a more thorough review this afternoon (or Monday at latest).

One preliminary comment that I'm sure will put my foot solidly in my mouth once I actually review the code: I'm surprised to see that the scheduling functionality is exposed in the 'solrbackup' CRD as opposed to a new, separate resource type.

Did you consider a separate CRD at all, and if so was there a reason you decided to do same-CRD?

(When I thought of starting this work myself I assumed the backup-schedule functionality would have to be a separate CRD - so now I'm trying to recall whether that was just an assumption on my part or whether there was a particular reason behind that thought haha

Maybe it was jut a conceptual misunderstanding or inflexibility on my part? In my head I've always conceptualized 'solrbackup' as being scoped to tracking the progress of a single, one-off backup. A user might be surprised that some solrbackup resources represent one-offs that will never impact the cluster again, and others represent ongoing periodic work that'll continue indefinitely.)

@HoustonPutman
Copy link
Contributor Author

I never really considered a separate CRD. Generally CRDs are a necessary evil, and therefore the less the better IMO. The reason why this fits really well into a single CRD, in my opinion, is that the options would be identical between the two, except the recurring backup has an additional setting for recurrence. Because of this, it makes a ton of since to have them share the CRD. Otherwise every option we add to the SolrBackup CRD, we would also have to add to the RecurringSolrBackup CRD.

Think of a kubernetes Headless Service / regular Service (with clusterIP). They are pretty different in usage, much further apart than SolrBackup and RecurringSolrBackup would be. However they share a single resource type, because they share almost every option, and are used for the same reasons. Users can distinguish them easily enough by looking at the options. (described more below)

A CRD & controller pair is complex, the less complexity we add to the project, the better.

A user might be surprised that some solrbackup resources represent one-offs that will never impact the cluster again, and others represent ongoing periodic work that'll continue indefinitely.

We can fix this quite easily. When you run kubectl get solrbackups, the data it returns is not random, it is selected by us here. We can add a column(s) here to make it much more clear to the user what the solrbackup resource is doing. Maybe we add "Schedule" and/or "Next Backup Time". That way when you run kubectl get solrbackups, it is very easy to tell which ones are one-off, and which ones are recurring.

I also really like the idea of someone taking a backup and then realizing they need it to be done on a schedule. They wouldn't need to delete the SolrBackup resource or create something else. All they would have to do is edit their SolrBackup resource, adding the recurrence section. At that point, the controller would work fine and treat it as a recurring backup that has been taken once already.

@gerlowskija
Copy link
Contributor

Cool - that all makes sense to me, thanks for clarifying!

Copy link
Contributor

@gerlowskija gerlowskija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome!

All of the code itself looks good (aside from tests which I imagine you just haven't gotten to yet). I do have two suggestions/question about the feature overall though:

  1. Confusingly, incremental backup 'deletion' doesn't really do a full delete: instead it removes only the metadata associated with the backup, potentially leaving index-files behind. Any backed up index-files that are no longer used must then be deleted by a separate garbage-collection process called "purging."
    IMO we should handle this purging process for users: either by triggering a "purge" on every backup, or by exposing a purgeEvery setting alongside maxSaved to allow a less aggressive approach.
  2. I can imagine a lot of users might use recurrence but want some way to temporarily disable it while they do a perf test, upgrade, bulk data loading, etc. What do you think about putting an "enabled" flag in BackupRecurrence to let users temporarily disable things without totally deleting the solrbackup?

// +kubebuilder:default:=10
// +kubebuilder:validation:Minimum:=1
// +optional
MaxSaved int `json:"maxSaved,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[0] This is mostly subjective preference, but I'd vote for a smaller default value here. For collections with a lot of churn, 10 backups might be a lot of data to store by default. Would you be open to a lower default like 3 or 5?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very open. I literally just chose a number here. I am very happy to differ to whatever you think it should be. Give the call and it is done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set the default to 5

@HoustonPutman
Copy link
Contributor Author

  1. I'm content with a purgeEvery option in Solr. I'm not sure I understand why you would want un-garbage collected index files laying about, but that's definitely a different issue. We can definitely go ahead and purge after every backup until the purgeEvery option is added. (Oh now I realize you mean a purgeEvery option in the CRD. I think we can assume that would be equivalent to saveEvery.)
    I did talk with @anshumg about a ticket that saw some issues with purging that does make me a bit wary to do it always. https://issues.apache.org/jira/browse/SOLR-15706
    We should make sure that's not an issue before going forward with that.
  2. That's a good idea. I do think I want to have it "enabled" by default. Which is easy to do with those kubebuilder annotations.

@gerlowskija
Copy link
Contributor

gerlowskija commented Nov 2, 2021

now I realize you mean a purgeEvery option in the CRD. I think we can assume that would be equivalent to saveEvery.

Yep, that's what I was suggesting - an option in the CRD to expose the option that's already available in Solr. That SOLR-15703 you pointed to looks very serious 😬 . I'll take a look and hopefully get the Solr concern out of the way so we can consider exposure in the operator.

When you mention "saveEvery" I assume you mean maxSaved? Or is there some other "saveEvery" param I missed?

@HoustonPutman
Copy link
Contributor Author

When you mention "saveEvery" I assume you mean maxSaved? Or is there some other "saveEvery" param I missed?

I mis-typed. I meant maxSaved 🙂

@HoustonPutman
Copy link
Contributor Author

@gerlowskija, given that https://issues.apache.org/jira/browse/SOLR-15706 is not released yet, maybe we just wait to add in the purge call until the next Solr Operator release.

@HoustonPutman
Copy link
Contributor Author

@gerlowskija almost forgot, but I added the disabled flag and it works really well!

@HoustonPutman HoustonPutman merged commit ff3bcd1 into apache:main Nov 9, 2021
@HoustonPutman HoustonPutman deleted the scheduled-backups branch November 9, 2021 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add scheduling support for backups
2 participants