Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-7734. Implement symmetric SecretKeys lifescycle management in SCM #4194

Merged
merged 27 commits into from
Mar 3, 2023

Conversation

duongkame
Copy link
Contributor

@duongkame duongkame commented Jan 20, 2023

What changes were proposed in this pull request?

This is the implementation of symmetric SecretKeys lifecycle management in SCM, with SecretKeys synchronized between SCMs using Ratis (a few notes for SecretKey replication are here).

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7734

How was this patch tested?

Unit-tested. Integration test will come per HDDS-7830)

@duongkame duongkame marked this pull request as ready for review January 25, 2023 16:59
@duongkame duongkame changed the title HDDS-7734. Implement symmetric SecretKeys lifescycle management in SCM (Draft) HDDS-7734. Implement symmetric SecretKeys lifescycle management in SCM Jan 26, 2023
public ByteString serialize(Object object)
throws InvalidProtocolBufferException {
try {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should stay away from ByteArrayOutputStream and ObjectOutputStream.
(1) Potential vulnerability due to code injection.
(2) No guarantee of compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, I've updated to use protobuf.

try {
secretKeyManager.flushInitializedState();
} catch (TimeoutException e) {
throw new RuntimeException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it retry if time out? RuntimeException would crash the SCM. Probably makes sense for such a critical service.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A TimeoutException translated to RuntimeException would crash the scheduled task, not SCM.
And a TimeoutException doesn't indicate a failure, it just means the wait is over, and the ratis update flow would be still happening asynchronously.

The next rotation check will check if the state is initialized by ratis, otherwise, it'll retry.
Thay may, however, leave a window in which SecretKey state may not have been initialized and tokens can't be generated. For now, the rotation check is configured to run every 10m.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should look into uncaught exception handling. Cashing SCM in the handler makes sense if keys cannot be managed.

Copy link
Contributor

@fapifta fapifta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duongkame thank you for working on this one, the code and the structure of the code looks well thought out.

As we discussed with @jojochuang we should not use Serializable, instead we probably should serialize and deserialize the things with protobuf as we do in other parts of the code.

Besides this please find my comments inline.

Copy link
Contributor

@kerneltime kerneltime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the code is differentiating current keys and all keys and has a lot of code to make the distinction between them. Would keeping the keys to always be a sorted list be cleaner? The index 0 or n-1 is always the current key and we copy and move the entire list around in the code as needed.
Half way done will take a deeper look at ratis integration and tests later.

</property>
<property>
<name>hdds.secret.key.expiry.duration</name>
<value>P7D</value>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We follow a different convention in other places, it might make sense to stick with TimeDurationUtil.java which is used in the same config file for other durations.

</property>
<property>
<name>hdds.secret.key.rotate.check.duration</name>
<value>PT10M</value>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be in hours considering a day is the default.

hadoop-hdds/common/src/main/resources/ozone-default.xml Outdated Show resolved Hide resolved
}

private void setFileOwnerPermissions() {
Set<PosixFilePermission> permissions = newHashSet(OWNER_READ, OWNER_WRITE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a static unmodifiable set.

Files.createFile(secretKeysFile);
}
Files.setPosixFilePermissions(secretKeysFile, permissions);
} catch (IOException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlikely but include FileAlreadyExistsException

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileAlreadyExistsException is a subclass of IOException, maybe we don't need to include both?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense unless we want to differentiate. What would the error handling be if this step fails? Will go over it again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any IOException will result in SCM crash because the secret keys cannot be saved to files as a reaction to a Ratis log entry.

As we check for the file existence before creating them, the window of error is small (as someone may create the same file at the very same time). Crashing SCM for this unlikely event is just fine to me.

LOG.info("Saved {} to file {}", secretKeys, secretKeysFile);
}

private void setFileOwnerPermissions() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private void setFileOwnerPermissions() {
private void createSecretKeysFile() {

}

private static class ManagedSecretKeyDto {
private UUID id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a monotonically increasing count to avoid depending on wallclock time to sort the keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe all the time durations like rotation or expiry should be much longer than the accepted time skew.

Plus, a SecretKey is always generated on top of the existing agreed ones (we use the latest key timestamp to determine if it's time to generate a new key). And that way, the time order between SecretKey is always protected.


// First, update the SecretKey state to make it visible immediately on the
// current instance.
state.updateKeysInternal(currentKey, sortedKeys);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the keys be agreed upon before serving them out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. As discussed offline, the keys should be agreed upon by all SCM parties before being visible to clients. A change has been made.

return true;
}

public synchronized void flushInitializedState() throws TimeoutException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a javadoc for what this method is expected to do

@duongkame
Copy link
Contributor Author

Currently, the code is differentiating current keys and all keys and has a lot of code to make the distinction between them. Would keeping the keys to always be a sorted list be cleaner? The index 0 or n-1 is always the current key and we copy and move the entire list around in the code as needed. Half way done will take a deeper look at ratis integration and tests later.

Thanks for the suggestion, @kerneltime. It was started in the most natural way which apparently not the smartest one. All the code related to the distinction has been cleaned up.

@duongkame
Copy link
Contributor Author

@duongkame thank you for working on this one, the code and the structure of the code looks well thought out.

As we discussed with @jojochuang we should not use Serializable, instead we probably should serialize and deserialize the things with protobuf as we do in other parts of the code.

Besides this please find my comments inline.

Thanks for the review @fapifta. I visited all the comments and took action.

@duongkame duongkame requested review from jojochuang, fapifta and kerneltime and removed request for jojochuang and fapifta February 4, 2023 01:22
@hemantk-12
Copy link
Contributor

@jojochuang and @fapifta Can you please look at the latest revision? Thanks.

@duongkame duongkame requested review from jojochuang and removed request for kerneltime February 11, 2023 01:39
@duongkame duongkame changed the base branch from master to HDDS-7733-Symmetric-Tokens February 15, 2023 19:09
Copy link
Contributor

@fapifta fapifta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @duongkame thank you for the continued work on this one, in general it is getting better and I like the approach and most of the code, though I have added some minor comments inline, please take a look.

* </ul>
*
* <p/>
* The overall design is documented <a href=http://surl.li/epvkc>here</a>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to export the document into a PDF, upload it to the JIRA, and link the JIRA ID from here instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I pinned the link to the jira containing the design doc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I noticed in the document linked to the JIRA, it has a link to a Cloudera internal JIRA ticket ID in the first paragraph, that is not visible for others, we might edit that part of the upstream design doc to give the information what is hidden for non-Cloudera folks.

hadoop-hdds/common/src/main/resources/ozone-default.xml Outdated Show resolved Hide resolved
@duongkame
Copy link
Contributor Author

Hi @duongkame thank you for the continued work on this one, in general it is getting better and I like the approach and most of the code, though I have added some minor comments inline, please take a look.

Thanks again for the very thorough review, @fapifta. I've looked into them and made the necessary changes.

@fapifta
Copy link
Contributor

fapifta commented Feb 23, 2023

You're welcome!
In the meantime one thought does bother me a bit, and would like to understand the intent for that...

In the SecretKeyManager and the SecretKeyState, I have some doubts around the mechanism that generates a new key and does the rotation... it is hard to tell what exactly, it is just a bad feeling so far, so let me think out loud here:
If I understand correctly, SecretKeyManager will decide it should rotate, once the current key is older than 1 day.
Once the rotation happens eventually after 1d, the SecretKeyManager#checkAndRotate() method creates a new list of secret keys, it filters the oldest expired key from the list, and SecretKeyState#updateKeys is called with the keys in a sorted order, where the newest key is the first item in the list.
This call to updateKeys is replicated to the other SCMs, so all SCMs eventually will have the new set of keys.

So far this is clean on its own, the confusion on my side comes after this. The way of implementation suggests that the second proposal the pull model will be used on the DNs.
How this will look like?
I guess if the DN runs into a token, that is newer then the newest key it has, it will pull the list of the keys from SCM. While OMs will pull periodically from SCM.
If my understanding is correct, then operations wise we are good, and probably my mind stuck with the push model and the pre-generated and distributed next key.

During the thought experiment around this, I started to feel that it might not be correct to have a SecretKeyState that gets its internal representation from outside, and also misses proper encapsulation... With the current code this might not be a problem, but here are the concerns:

  1. the internal list is guarded by locking, but its reference is leaking via the getSortedKeys() method. Though the actual implementation used is an unmodifiable list as of now.
  2. updateKeys can override the full internal state

The first one might not be a problem at all at the end of the day, it might be, if someone carelessly change the internal representation from unmodifiable list to a mutable list impementation.
Second one on the other hand there might be a more interesting question... Let's do some thought experiment:
Let's say we have one SCM (SCM1) that is stopped for an extended period of time, and its raft logs are so old that when it gets back in service it downloads a new snapshot, and since that snapshot there was no rotation happening. At this time SCM2 is the leader, but after SCM1 comes up and syncronizes its data and becomes a proper follower, SCM2 goes down before there was a rotation of secret keys, and during election SCM1 becomes the elected leader.

In this scenario will SCM1 have the secretKeys properly? As based on the code my assumption is that in this case SCM1 will have 1 secret key in its list generated at startup (let's assume every key that was persisted has expired already), with that it will issue tokens, and DNs will be able to validate those tokens, but once keys are rotated, the list of old keys will be overridden in all SCMs.

After this has happened, let's assume a DN goes down, and comes up due to a restart for any reason. In this case if there is a client that has a token issued by SCM2 back when it was the leader, was not be able to read data from the restarted DN, as that DN will not be able to fetch the formerly valid token from anywhere, while those DNs that have the token in memory will work fine further causing intermittent read failures.

Do I miss something, or this is a legit scenario with this approach of managing the secret keys?
I see this happening because the keys are managed in a file, not in rocksDB, and so when SCM1 downloads the snapshot of the data to replay logs from that snapshot, it might not replay the transaction that updates the file, and the file is not in the snapshot, but putting the secret keys to rocksDB with that to the snapshot will hurt us later when it comes to conforming with FISMA for example.

@duongkame
Copy link
Contributor Author

duongkame commented Feb 24, 2023

Thanks again for being thorough, @fapifta.

  1. the internal list is guarded by locking, but its reference is leaking via the getSortedKeys() method. Though the actual implementation used is an unmodifiable list as of now.

This is just a design choice. I was considering a) creating a copy list to serve a query, or b) leveraging immutability. I notice when doing a memory scaling analysis in SCM, we're so easy to go with a) and that's something. Buffers, like pipeline datanodes, are just copied over and that creates unnecessary heat to GC. Immutability is a good alternative to deal with concurrency and should be used where possible. Yes, I know it's quite micro-optimized (or premature) and it's not that easy to spread that awareness in Java, i.e. there's no good implicit type for immutable data, which would be easier in the functional world. But sooner or later, we'll get into a state where we have to optimize such things given that SCM/OM is not horizontally scalable.

  1. updateKeys can override the full internal state

The first one might not be a problem at all at the end of the day, it might be, if someone carelessly change the internal representation from unmodifiable list to a mutable list impementation. Second one on the other hand there might be a more interesting question... Let's do some thought experiment: Let's say we have one SCM (SCM1) that is stopped for an extended period of time, and its raft logs are so old that when it gets back in service it downloads a new snapshot, and since that snapshot there was no rotation happening. At this time SCM2 is the leader, but after SCM1 comes up and syncronizes its data and becomes a proper follower, SCM2 goes down before there was a rotation of secret keys, and during election SCM1 becomes the elected leader.

In this scenario will SCM1 have the secretKeys properly? As based on the code my assumption is that in this case SCM1 will have 1 secret key in its list generated at startup (let's assume every key that was persisted has expired already), with that it will issue tokens, and DNs will be able to validate those tokens, but once keys are rotated, the list of old keys will be overridden in all SCMs.

After this has happened, let's assume a DN goes down, and comes up due to a restart for any reason. In this case if there is a client that has a token issued by SCM2 back when it was the leader, was not be able to read data from the restarted DN, as that DN will not be able to fetch the formerly valid token from anywhere, while those DNs that have the token in memory will work fine further causing intermittent read failures.

Do I miss something, or this is a legit scenario with this approach of managing the secret keys? I see this happening because the keys are managed in a file, not in rocksDB, and so when SCM1 downloads the snapshot of the data to replay logs from that snapshot, it might not replay the transaction that updates the file, and the file is not in the snapshot, but putting the secret keys to rocksDB with that to the snapshot will hurt us later when it comes to conforming with FISMA for example.

Indeed consistency is what I'm dead worrying it may not work (original thoughts document [here|https://docs.google.com/document/d/125D83cFu47AslupJ-3k0FRskgzpNbz4hzDC00dV5-O4/edit#heading=h.vzi1ddqxwlg0].
The design replies on 2 assumptions about Ratis:

a) When a node becomes a functioning leader, it has to catch up with the transactions committed by the previous leader.
b) A transaction (to update SecretKeyState) will not be (partially) lost.

If those are true, the circle of the SCM leader relies on its SecretKey state to create a new state and uses Raft to commit the state, is always guaranteed as consistent. And the choice to update the entire state (instead of partially building up) is indeed a way to simplify the update scenarios (SecretKeyState.updateKeys is always invoked from raft/ratis, except startup).

Let's say we have one SCM (SCM1) that is stopped for an extended period of time, and its raft logs are so old that when it gets back in service it downloads a new snapshot, and since that snapshot there was no rotation happening. At this time SCM2 is the leader, but after SCM1 comes up and syncronizes its data and becomes a proper follower, SCM2 goes down before there was a rotation of secret keys, and during election SCM1 becomes the elected leader.

In this scenario will SCM1 have the secretKeys properly?

Sorry that I don't get this very well. So, if SCM2 successfully performed a rotation, it means SCM2 already committed a transaction to update SecretKeyState (SCM2 never update its SecretKeyState state during rotation, but always via ratis). So, I assume when SCM1 takes over the leadership, it will catch up with the new SecretKeyState committed by SCM2.
And as you mentioned, as SecretKey is stored in a file, not rocksDB, we will need to integrate SecretKeyState change to Ratis snapshot, so that when SCM1 rejoins the cluster, it can catch up with the latest SecretKeys. This is planned as per HDDS-7945 (it doesn't have much content now, but we talked about that several weeks ago :P).

And again, I'm still a fresher in Ratis/raft, so all the above may be wrong. Please share your thoughts. Also, @szetszwo can correct me with the Ratis assumptions.

@fapifta
Copy link
Contributor

fapifta commented Feb 27, 2023

Hi @duongkame,

Thank you for your answers, I haven't noticed HDDS-7945, and that I believe is entitled to solve the consistency problem I tried to describe. The problem is when SCM2 takes on leadership, and does not get the previous state because it was not part of the snapshot, and the updates are not there during log replay, so after that it would overwrite the keys used with the next rotation. This problem should be solved, if the keys are syncronized within the snapshot as well in between the SCMs.

The first one with the unmodifiable list is perfectly fine this way, I just left the thought there, but did not wanted to propose any change just wanted to discuss and understand how you think about it, thank you for sharing the thought process of yours.

From my side, I think I am fine with the changes. +1.
The only thing remained, that is not to be fixed within the PR itself, but we should get rid of the internal JIRA link in the design doc added to the main JIRA ;)

@duongkame
Copy link
Contributor Author

Thanks, @fapifta. Totally agree that anything replicated by Ratis should be a part of Ratis snapshot, including SecretKeys.
Also, I'll clean up the design doc and create a final immutable version (pdf).
Just one more thing, can you indeed approve the PR?

Copy link
Contributor

@kerneltime kerneltime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have questions around failover testing but that can be addressed in a subsequent PR

@kerneltime
Copy link
Contributor

We can merge this in once @fapifta has approved it.
Since we have a developer branch now, we can merge more eagerly and file follow up jiras for work identified.

Copy link
Contributor

@fapifta fapifta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duongkame thank you for addressing my suggestions. +1

@duongkame duongkame merged commit 8f8cf6b into apache:HDDS-7733-Symmetric-Tokens Mar 3, 2023
@duongkame
Copy link
Contributor Author

Thanks @fapifta @kerneltime @jojochuang for the review.

duongkame added a commit to duongkame/ozone that referenced this pull request Mar 16, 2023
duongkame added a commit to duongkame/ozone that referenced this pull request May 9, 2023
duongkame added a commit to duongkame/ozone that referenced this pull request Jun 8, 2023
errose28 added a commit to errose28/ozone that referenced this pull request Jun 10, 2023
* master: (73 commits)
  HDDS-8587. Test that CertificateClient can store multiple rootCA certificates (apache#4852)
  HDDS-8801. ReplicationManager: Add metric to count how often replication is throttled (apache#4864)
  HDDS-8477. Unit test for Snapdiff using tombstone entries (apache#4678)
  HDDS-7507. [Snapshot] Implement List Snapshot API Pagination (apache#4065) (apache#4861)
  HDDS-8373. Document that setquota doesn't accept decimals (apache#4856)
  HDDS-8779. Recon - Expose flag for enable/disable of heatmap. (apache#4845)
  HDDS-8677. Ozone admin OM CLI command for block tokens (apache#4760)
  HDDS-8164. Authorize secret key APIs (apache#4597)
  HDDS-7945. Integrate secret keys to SCM snapshot (apache#4549)
  HDDS-8003. E2E integration test cases for block tokens (apache#4547)
  HDDS-7831. Use symmetric secret key to sign and verify token (apache#4417)
  HDDS-7830. SCM API for OM and Datanode to get secret keys (apache#4345)
  HDDS-7734. Implement symmetric SecretKeys lifescycle management in SCM (apache#4194)
  HDDS-8679. Add dedicated, configurable thread pool for OM gRPC server (apache#4771)
  HDDS-8790. Split EC acceptance tests (apache#4855)
  HDDS-8714. TestScmHAFinalization: mark testFinalizationWithRestart as flaky, enable other test cases
  HDDS-8787. Reduce ozone sh calls in robot tests (apache#4854)
  HDDS-8774. Log allocation stack trace for leaked CodecBuffer (apache#4840)
  HDDS-8729. Add metric for count of blocks pending deletion on datanode (apache#4800)
  HDDS-8780. Leak of ManagedChannel in HASecurityUtils (apache#4850)
  ...
errose28 added a commit to errose28/ozone that referenced this pull request Jun 10, 2023
* tmp-dir-refactor: (73 commits)
  HDDS-8587. Test that CertificateClient can store multiple rootCA certificates (apache#4852)
  HDDS-8801. ReplicationManager: Add metric to count how often replication is throttled (apache#4864)
  HDDS-8477. Unit test for Snapdiff using tombstone entries (apache#4678)
  HDDS-7507. [Snapshot] Implement List Snapshot API Pagination (apache#4065) (apache#4861)
  HDDS-8373. Document that setquota doesn't accept decimals (apache#4856)
  HDDS-8779. Recon - Expose flag for enable/disable of heatmap. (apache#4845)
  HDDS-8677. Ozone admin OM CLI command for block tokens (apache#4760)
  HDDS-8164. Authorize secret key APIs (apache#4597)
  HDDS-7945. Integrate secret keys to SCM snapshot (apache#4549)
  HDDS-8003. E2E integration test cases for block tokens (apache#4547)
  HDDS-7831. Use symmetric secret key to sign and verify token (apache#4417)
  HDDS-7830. SCM API for OM and Datanode to get secret keys (apache#4345)
  HDDS-7734. Implement symmetric SecretKeys lifescycle management in SCM (apache#4194)
  HDDS-8679. Add dedicated, configurable thread pool for OM gRPC server (apache#4771)
  HDDS-8790. Split EC acceptance tests (apache#4855)
  HDDS-8714. TestScmHAFinalization: mark testFinalizationWithRestart as flaky, enable other test cases
  HDDS-8787. Reduce ozone sh calls in robot tests (apache#4854)
  HDDS-8774. Log allocation stack trace for leaked CodecBuffer (apache#4840)
  HDDS-8729. Add metric for count of blocks pending deletion on datanode (apache#4800)
  HDDS-8780. Leak of ManagedChannel in HASecurityUtils (apache#4850)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants