zync needs plaintext tokens#4309
Conversation
929395c to
dda5113
Compare
|
I've been examining this approach and I think it could work. The main issues would be:
porta/app/workers/zync_worker.rb Lines 76 to 79 in 42de665 This is not necessarily bad, but I think it justifies adding some caching mechanism
retry_on ForbiddenError, wait: :polynomially_longer, attempts: 5
So summarizing, pros and cons of your approach and mine:
I wouldn't accept this approach unless we add some caching mechanism to avoid constant rotation of tokens and reduce the race condition window each rotation implies. Also, I think we definitely should add retrying logic to 403 errors if we go with this approach. @mayorova WDYT? |
|
On one hand, I like this solution, because it's simple and probably effective. On the other hand, I am concerned about potential race conditions, when a job in zync is already being executed with an "older" token, and in the meantime, this token is being deleted by porta, and thus is invalidated. |
Yeah, but I think the race condition is not the main concern. Caching the token during, say, 1 hour, will open the race condition window for a small fraction of a second once per hour. Besides, adding the retry logic to zync will fix the effects of the race condition probably all the times it happens. So I consider that problem is solved. For me the main concern is what I mentioned about the tokens, storing them plaintext in zync DB is wrong IMO, is an existing vector of risk that we could eliminate. If we accept storing tokens in plaintext in zync DB, then those tokens should be very short-lived. But then we would hit the problem of tokens expiring and zync not being able to refresh the tokens unless porta decides to send the new one. In practice, every time a job is queued in zync, it will happen that zync just received a request that included a new valid token. So zync failing to update due to expired tokens will only happen when the load is so high that the zync queue worker took more time to reach to the job than the time the token lived. Possible but unlikely I think. |
Yeah, but in every update in porta that triggers zync notification will rotate the token. Of course, we can't know how often these events will happen. But on the other hand, if we opt for the |
I think we can reach this agreement:
The expiration times for tokens should be the shorter possible that would leave zync reasonable time to do its best trying to process the jobs. That would be:
Jobs failing after that time, we assume it's fine to lose them. If new events come from porta they will include a new token and restore the provider. @akostadinov @mayorova WDYT? |
My suggestion is that in a separate PR, we switch zync to encrypt the tokens as passwords to prevent such storage. I think this is the standard way to do it. I'm not in favor of expiring tokens, that will require a separate mechanism to update them before expiry... I mean it is fine but more complex and not immediately necessary as previously we were still storing the tokens plaintext, so no regression in that regard. Also the tokens are read-only with limited scope. |
That would remove the need to make the tokens expirable. But it has it's own problems also like performance impact, maybe it's negligible but needs to be investigated. Also how would this be deployed, with or without a migration? and with or without downtime?
I my comments above I mention that we don't need that mechanism if we accept losing jobs until they succeeded to after a reasonable amount of time.
The read-only limited scope still grants access to client credentials for Cinstances. |
Alternative to #4304
@jlledom , how about something like this instead? I think there is some value in keeping the tokens for auth.
update: if needed we can also memcache the token plaintexts but I doubt it is really needed. Also we can check
Rails.configuration.zync.skip_non_oidc_applicationsbut either way, unless tests in stg show a significant performance difference, I wouldn't complicate things with such logic.