Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drift correction for SQLUser spec.password? #292

Open
jcanseco opened this issue Oct 20, 2020 · 8 comments
Open

Drift correction for SQLUser spec.password? #292

jcanseco opened this issue Oct 20, 2020 · 8 comments
Labels
question Further information is requested

Comments

@jcanseco
Copy link
Member

Opening a new issue for this comment.

Background: KCC before v1.19.0 would constantly update SQLUser if a spec.password is specified since the only way to correct drift for spec.password is to send updates constantly. This is because, unlike most GCP resource fields, there is no way actually to detect drift for spec.password given that its current value cannot be read using the API (and understandably so).

Customers have identified this "constantly updating" behavior for SQLUser spec.password as problematic (Issue #153) since it spams the SQL operations logging with "Password changed" messages, which not only looks like a security concern but also renders the SQL operations logs difficult to use.

Other customers, however, actually prefer the "constantly updating" behavior (see the linked comment that motivated this GitHub issue).

@tonybenchsci, can you please confirm if my understanding of your use-case is accurate: you're restoring SQL instances from backups (like described here) which also reset user passwords, and you had been relying on the SQLUser spec.password's "constantly updating" behavior to set back the user passwords to their desired values as specified by the SQLUser resources?

@jcanseco jcanseco added the question Further information is requested label Oct 20, 2020
@mruoss
Copy link

mruoss commented Oct 20, 2020

Just writing down my thoughts as it was me opening #284.

I have come across a similar situation in a k8s operator of my own. Obviously I could not retrieve the plain password from the target system, but I could retrieve the hashed password. I solved the problem by hashing the password inside the operator code using the salt from the retrieved password hash and the same hashing algorithm as the target system. I would then compare the hashes in order to know if the password needs to be updated or not. I don't know if this is applicable to your case, but might be an approach.

@tonybenchsci
Copy link

tonybenchsci commented Oct 26, 2020

@jcanseco Yes, that is correct. Thank you for raising a separate issue for this! I guess the related general question is: Does drift correction detect deltas in referenced resources?

Although, that isn't what was happening before 1.19.0 either from my understanding. I don't think "not constantly updating SQLUser" broke our implementation, because the password (regardless of the loop frequency) used to get drift-corrected and now doesn't at all (even after the now-less-frequent Up-to-date). I suspect that as part of 1.19.0, something else other than the loop frequency also changed, that doesn't actually apply the full (User + secretRef) resources anymore.

@ryanbenchsci
Copy link

ryanbenchsci commented Oct 27, 2020

I read through #153 and I do understand that this is a problem without an easy, obvious, perfect solution for all users.

The change that was made as a result of #153 has the following costs and benefits:

  • Cost: Drift correction for SQLUser passwords is no longer functionality provided by KCC, where it was previously. This is a loss of functionality and should likely be classified as a breaking change. As a thought experiment, imagine if drift correction was removed for something other than a password and how that change would be viewed.
  • Benefit: The resource won't enter NotReady periodically, but only if cnrm.cloud.google.com/management-conflict-prevention-policy is disabled. With conflict prevention enabled, the resource still enters NotReady due to the update of the labels describing the lease.
  • Benefit: Less Cloud SQL log messages, but only if cnrm.cloud.google.com/management-conflict-prevention-policy is disabled. With conflict prevention enabled, a log message is still generated due to the update of the labels describing the lease.

Therefore, there are essentially no benefits for a user running with conflict prevention enabled (the default, I believe), but a substantial cost in loss of functionality.

@jcanseco
Copy link
Member Author

@tonybenchsci Referenced resources don't really have anything to do with the issue. Basically, KCC sends an update to GCP whenever it detects a diff between the resource's desired state and its actual state. However, the problem with SQLUser is that the actual state cannot be fully determined because its password cannot be read. Therefore, there is no way to know if the desired password is actually different from the actual password.

The previous approach was to just update the password on every reconciliation to ensure that drift is always corrected even if there wasn't actually any drift.

However, customers have complained about this since they noted that their SQL users on GCP were having their passwords updated even though they didn't change anything.

@ryanbenchsci Yes, we do see your point about this seeming like a breaking change. Originally, we considered the previous behavior a bug, but we recognize that we should revisit this behavior since we can see that it has caused problems, and we will make sure to be careful with making similar changes in the future.

We will need to sink time thinking about how best to address the issue. For now, can you see if the following workaround works for you?

While KCC no longer corrects drift if the password is changed from outside KCC (since it's not possible to read what the current password is), KCC will still send an update if it detects that the password may have been changed on KCC-side (e.g. if spec.password is modified or if the referenced Secret is modified in any way).

Therefore, you could trigger an update for SQLUser by modifying the Secrets. For example, one idea is to add an annotation to the Secrets containing some kind of unique backup ID or timestamp, but any modification will work.

Could you please see if that works for you?

@tonybenchsci
Copy link

tonybenchsci commented Oct 29, 2020

Thanks @jcanseco for getting back. Wouldn't it be a more complete approach for you guys to work with the CloudSQL team and fix their logging API so that unchanged passwords are not logged as changed? (I'm sure there might be complications depending on how the passwords are hashed/protected). I think those customers just didn't like the logs, where as we actually had functionality dependent on full resource drift correction.

So the workaround to add an annotation to the secret would require manual intervention every time the SQL instance is restored (which runs on a cronjob). I guess we could run kubectl annotate secret <secret-name> backupdate=yyyymmdd on cron as well...

Is it possible/advisable to mix native k8s resources like cronjobs in the same namespace where everything else is using cnrm CRDs?

@jcanseco
Copy link
Member Author

jcanseco commented Oct 29, 2020

Thanks @tonybenchsci, yes that is something we can consider, thanks for the suggestion. We will definitely see if we can work with the CloudSQL team in some way.

I guess we could run kubectl annotate secret backupdate=yyyymmdd on cron as well...

Yes this is the workaround we had in mind as well.

Is it possible/advisable to mix native k8s resources like cronjobs in the same namespace where everything else is using cnrm CRDs?

Yes I don't see the problem with doing so. In fact, I would assume you're already doing that, given that KCC resources that can reference Secrets can only reference Secrets within the same namespace.

@tonybenchsci
Copy link

@jcanseco Has there been any updates to reconcile logic on Nov 19, 2020?
We saw a big spike in reconcile errors on that date. And ever since the, the annotation workaround no longer works.

@jcanseco
Copy link
Member Author

Hi @tonybenchsci, none that I know of.

And ever since the, the annotation workaround no longer works.

Can you clarify what you mean by "no longer works"? Do you mean that modifying an annotation on the referenced Secret no longer triggers an update for the SQLUser?

We saw a big spike in reconcile errors on that date.

And can you specify what kind of errors you observed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants