HDDS-8590. Implement a protocol call to get the rootCA from SCM#4934
Conversation
…nts_and_trust_managers_with_multiple_root_ca
…nts_and_trust_managers_with_multiple_root_ca
…or getting all root CA
…tion_for_multiple_root_ca
ashishkumar50
left a comment
There was a problem hiding this comment.
@Galsza, Thanks for working on this, Please find the comments inline.
| /** | ||
| * Get all root CA certificates known to SCM. | ||
| * | ||
| * @return String - pem encoded concatanation of root CA certificates |
There was a problem hiding this comment.
Typo correct to: concatenation
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/protocol/SCMSecurityProtocol.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| @Override | ||
| public synchronized List<String> getAllRootCaCertificates() |
There was a problem hiding this comment.
Do we need synchronized here? May not be required.
There was a problem hiding this comment.
Yes, in the end the rootCas might change once root ca rotation is implemented and this is going to fail with ConcurrentModificationException if we don't synchronize
There was a problem hiding this comment.
As the name suggest this method is intended to use just for retrieving Certificate. If rootCACertificate is going to change that will be in other method. I think we should protect this with lock and use read/write lock accordingly.
...ds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMSecurityProtocolServer.java
Outdated
Show resolved
Hide resolved
| if (storageContainerManager.getScmStorageConfig() | ||
| .checkPrimarySCMIdInitialized()) { | ||
| return CertificateCodec.getPEMEncodedString(rootCACertificate); | ||
| public synchronized String getRootCACertificate() throws IOException { |
There was a problem hiding this comment.
Do we need synchronized here?, The old existing method is without synchronized.
There was a problem hiding this comment.
Yes, since the list of the root cas can change. The old method didn't need synch because the whole server could only ever use one root ca
| return CertificateCodec.getPEMEncodedString(lastCert); | ||
| } | ||
|
|
||
| public synchronized void addNewRootCa(X509Certificate rootCaCertToAdd) { |
There was a problem hiding this comment.
I can see this is used only by test code, can you mark this with test annotation. Also synchronized is not required.
There was a problem hiding this comment.
This method is going to be used later once the root ca certificate rotation is implemented. It's here because it makes sense as well as we can ensure that the sync is properly placed.
...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
Outdated
Show resolved
Hide resolved
| scmCertificateClient.getRootCACertificate() : | ||
| scmCertificateClient.getCACertificate(); | ||
| ArrayList<X509Certificate> rootCaList = new ArrayList<>(); | ||
| rootCaList.add(rootCaCert); |
There was a problem hiding this comment.
If rootCaCert is null based on above condition, in that case we should not add into rootCaList.
| } | ||
| } | ||
| return null; | ||
| return CertificateCodec.getPEMEncodedString(lastCert); |
There was a problem hiding this comment.
Before calling need to ensure lastCert is not NULL or else it will lead to NPE in CertificateCodec .getPEMEncodedString
There was a problem hiding this comment.
Thank you, that was a nice catch, I've added functionality to handle that
|
@fapifta @ChenSammi please take a look |
| } | ||
|
|
||
| @Override | ||
| public synchronized List<String> getAllRootCaCertificates() |
There was a problem hiding this comment.
As the name suggest this method is intended to use just for retrieving Certificate. If rootCACertificate is going to change that will be in other method. I think we should protect this with lock and use read/write lock accordingly.
| private final CertificateServer rootCertificateServer; | ||
| private final CertificateServer scmCertificateServer; | ||
| private final X509Certificate rootCACertificate; | ||
| private final List<X509Certificate> rootCACertificate; |
There was a problem hiding this comment.
Can we change the name to add suffix/prefix "list" to avoid confusion that it not just single Certificate.
| } | ||
|
|
||
| message SCMGetAllRootCaCertificatesResponseProto { | ||
| enum ResponseCode { |
There was a problem hiding this comment.
In the general SCMSecurityRequest/Response where we have the different request/Response types, and where you have added this protocol message as optional on the 11th position in line 86, we have a couple of fields: success, message, Status. I believe we should use that to convey success or failure related information, the submitRequest method on the client side, and the request handling on the server has the facilities to deal with errors and succes via these. Can you please use those instead of a specific ResponseCode here?
There was a problem hiding this comment.
I'm not sure if I managed to resolve this successfully. Do I understand it correctly that these specific response codes should only be used when we want to specify something that is not already covered in SCMSecurityRequest/Response?
Now the command type is set according to the rest of the examples and the ResponseCode handling should be dealt with in the central location.
|
@ashishkumar50 @Galsza I wanted to chime into the locking related discussion, but as I see it is already sorted out, let me tell it upfront, I am fine with either using synchronized or using a read/write lock, and whichever you agree on, I accept that. Two consideration that I would like to speak out loud here:
Considering the anticipated traffic, and the expected performance of this server side endpoint I think having a read/write lock is probably more than we need, as locking here seems to be a potential problem only because we are converting the certificates from X509Certificate to PEM encoded string within the lock. That should not be happening I believe, but to avoid that will require some further changes that are not in scope for this PR I believe. I am happy to hear your opinion, I wanted to share mine as once @Galsza asked me what he should do with concurrent access here, and I suggested that using synchronized would be enough, I wanted to detail why hoping to see if I can learn something I have not thought about and also I would like to understand your point of view. |
|
@fapifta, Thanks for the explanation. |
|
Thanks @ashishkumar50 and @fapifta for the conversation, I've updated the patch again with only synching for the |
|
@ashishkumar50, multiple reads are fine concurrently, but if we do not guard reads and writes with either a lock or by putting them into a synchronized method, then write may happen during reads, and that leads to a ConcurrentModificationException. Consider the following example which is a scheduling that can happen in an env if write happens during iteration: List<String> strs = new ArrayList<>();
strs.add("foo");
Semaphore sem = new Semaphore(1);
CountDownLatch latch = new CountDownLatch(2);
Runnable r1 = () -> {
try {
sem.acquire();
latch.countDown();
latch.await();
strs.add("bar");
sem.release();
} catch (Exception e) {
e.printStackTrace();
}
};
Runnable r2 = () -> {
try {
for (String s : strs) {
latch.countDown();
latch.await();
sem.acquire();
System.out.println(s);
sem.release();
}
} catch (Exception e) {
e.printStackTrace();
}
};
new Thread(r1).start();
new Thread(r2).start();This code produces a ConcurrentModificationException consistently. So @Galsza, I think, either we use a lock around the access to the rootCA certificate list, or we use synchronized, but we need to guard all accesses, as the addition of a new rootCA certificate will happen on a different thread than the one that serves requests and iterates over the list. I know that the chances to win the lottery is higher than to hit this here, but still, if we started to think about it, then let's get to the bottom of it, and have a proper solution. |
|
@fapifta It seems that the CI wasn't running properly anyway with synchronization only on the addRootCa. I've pushed a new version where we synch everywhere for it, but I'm still not entirely sure this should be the final solution. I'm going to see the what happens with the CI on my fork first. |
Yes correct, If request traffic is frequent there is a chance that read might fail because writes are happening on same object. But as you mentioned traffic is very less so the chance of happening both read/write together is rare and even if this happens there is a retry from client which will be success eventually. |
You can use either lock or synchronized to handle this, which will protect object. Either way is fine. May be can test locally by running multiple read/writes at same time. |
|
Thank you @Galsza for the patch. +1 As I see we got to an agreement regarding synchronization, and the rest of the comments I believe were addressed properly. |
What changes were proposed in this pull request?
For root CA cert rotation the clients need to be able to get all root CAs from SCM. The messaging is implemented here, the clients getting the root CA will be implemented in a later patch.
This can't be done through the old protocol messages, because older clients save the pem encoded string directly into a file and they read it back generating a cert path. If this pem encoded message contained multiple root ca then it would cause problems when read back and trying to generate a cert path with multiple root CAs in them. Therefore a new communication protocol is needed.
What is the link to the Apache JIRA
HDDS-8590
How was this patch tested?
Added a small unit test, CI run on my fork: (This is still in progress at the moment of opening this PR, but the previous workflow run was executed successfully and no untested changes were made after)
https://github.com/Galsza/ozone/actions/runs/5313261796