Skip to content

HDDS-8590. Implement a protocol call to get the rootCA from SCM#4934

Merged
fapifta merged 38 commits intoapache:masterfrom
Galsza:HDDS-8590_communication_for_multiple_root_ca
Jun 21, 2023
Merged

HDDS-8590. Implement a protocol call to get the rootCA from SCM#4934
fapifta merged 38 commits intoapache:masterfrom
Galsza:HDDS-8590_communication_for_multiple_root_ca

Conversation

@Galsza
Copy link
Contributor

@Galsza Galsza commented Jun 19, 2023

What changes were proposed in this pull request?

For root CA cert rotation the clients need to be able to get all root CAs from SCM. The messaging is implemented here, the clients getting the root CA will be implemented in a later patch.

This can't be done through the old protocol messages, because older clients save the pem encoded string directly into a file and they read it back generating a cert path. If this pem encoded message contained multiple root ca then it would cause problems when read back and trying to generate a cert path with multiple root CAs in them. Therefore a new communication protocol is needed.

What is the link to the Apache JIRA

HDDS-8590

How was this patch tested?

Added a small unit test, CI run on my fork: (This is still in progress at the moment of opening this PR, but the previous workflow run was executed successfully and no untested changes were made after)
https://github.com/Galsza/ozone/actions/runs/5313261796

Galsza added 30 commits May 30, 2023 13:49
…nts_and_trust_managers_with_multiple_root_ca
…nts_and_trust_managers_with_multiple_root_ca
Copy link
Contributor

@ashishkumar50 ashishkumar50 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Galsza, Thanks for working on this, Please find the comments inline.

/**
* Get all root CA certificates known to SCM.
*
* @return String - pem encoded concatanation of root CA certificates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo correct to: concatenation

}

@Override
public synchronized List<String> getAllRootCaCertificates()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need synchronized here? May not be required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in the end the rootCas might change once root ca rotation is implemented and this is going to fail with ConcurrentModificationException if we don't synchronize

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the name suggest this method is intended to use just for retrieving Certificate. If rootCACertificate is going to change that will be in other method. I think we should protect this with lock and use read/write lock accordingly.

if (storageContainerManager.getScmStorageConfig()
.checkPrimarySCMIdInitialized()) {
return CertificateCodec.getPEMEncodedString(rootCACertificate);
public synchronized String getRootCACertificate() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need synchronized here?, The old existing method is without synchronized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, since the list of the root cas can change. The old method didn't need synch because the whole server could only ever use one root ca

return CertificateCodec.getPEMEncodedString(lastCert);
}

public synchronized void addNewRootCa(X509Certificate rootCaCertToAdd) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see this is used only by test code, can you mark this with test annotation. Also synchronized is not required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is going to be used later once the root ca certificate rotation is implemented. It's here because it makes sense as well as we can ensure that the sync is properly placed.

scmCertificateClient.getRootCACertificate() :
scmCertificateClient.getCACertificate();
ArrayList<X509Certificate> rootCaList = new ArrayList<>();
rootCaList.add(rootCaCert);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If rootCaCert is null based on above condition, in that case we should not add into rootCaList.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, nice catch

}
}
return null;
return CertificateCodec.getPEMEncodedString(lastCert);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before calling need to ensure lastCert is not NULL or else it will lead to NPE in CertificateCodec .getPEMEncodedString

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, that was a nice catch, I've added functionality to handle that

@Galsza
Copy link
Contributor Author

Galsza commented Jun 20, 2023

@fapifta @ChenSammi please take a look

}

@Override
public synchronized List<String> getAllRootCaCertificates()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the name suggest this method is intended to use just for retrieving Certificate. If rootCACertificate is going to change that will be in other method. I think we should protect this with lock and use read/write lock accordingly.

private final CertificateServer rootCertificateServer;
private final CertificateServer scmCertificateServer;
private final X509Certificate rootCACertificate;
private final List<X509Certificate> rootCACertificate;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change the name to add suffix/prefix "list" to avoid confusion that it not just single Certificate.

Copy link
Contributor

@fapifta fapifta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Galsza for the work on this one, I have a comment inline for the protocol changes, and I would like to share my thoughts on synchronized vs locking in a separate discussion shortly.

}

message SCMGetAllRootCaCertificatesResponseProto {
enum ResponseCode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the general SCMSecurityRequest/Response where we have the different request/Response types, and where you have added this protocol message as optional on the 11th position in line 86, we have a couple of fields: success, message, Status. I believe we should use that to convey success or failure related information, the submitRequest method on the client side, and the request handling on the server has the facilities to deal with errors and succes via these. Can you please use those instead of a specific ResponseCode here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I managed to resolve this successfully. Do I understand it correctly that these specific response codes should only be used when we want to specify something that is not already covered in SCMSecurityRequest/Response?

Now the command type is set according to the rest of the examples and the ResponseCode handling should be dealt with in the central location.

@fapifta
Copy link
Contributor

fapifta commented Jun 20, 2023

@ashishkumar50 @Galsza I wanted to chime into the locking related discussion, but as I see it is already sorted out, let me tell it upfront, I am fine with either using synchronized or using a read/write lock, and whichever you agree on, I accept that.

Two consideration that I would like to speak out loud here:

  • having a read/write lock makes the code a bit more longer, and with that it increases the time needed to understand what is happening and why, so I have no particular preference for read/write locks if the anticipated load on the endpoint is low.
  • This endpoint will not handle large traffic ever, and even if there might be a contention once services start to poll regularly this API, it does not block anything else, and all the clients are prepared to wait/retry so it is not a big deal. (Also the client side is not in a performance critical path either at the moment, especially as the clients are caching the data, and start off from the cached data, then refresh the cache in the background.)

Considering the anticipated traffic, and the expected performance of this server side endpoint I think having a read/write lock is probably more than we need, as locking here seems to be a potential problem only because we are converting the certificates from X509Certificate to PEM encoded string within the lock. That should not be happening I believe, but to avoid that will require some further changes that are not in scope for this PR I believe.

I am happy to hear your opinion, I wanted to share mine as once @Galsza asked me what he should do with concurrent access here, and I suggested that using synchronized would be enough, I wanted to detail why hoping to see if I can learn something I have not thought about and also I would like to understand your point of view.

@ashishkumar50
Copy link
Contributor

@fapifta, Thanks for the explanation.
My point is even if we add synchronize in both the methods getAllRootCaCertificates, addNewRootCa. It doesn't ensure read/write are safe enough. But as you mentioned since this has very low traffic and not on critical path
we can just add synchronize in addNewRootCa to ensure no issue while writing.
For read path I think we can remove synchronize, as multiple thread can read concurrently without any issue.

@Galsza
Copy link
Contributor Author

Galsza commented Jun 20, 2023

Thanks @ashishkumar50 and @fapifta for the conversation, I've updated the patch again with only synching for the addNewRootCa. I've run findbugs locally, it didn't show an error for it, hopefully it stays that way once this protocol is used not just in test environment.

@fapifta
Copy link
Contributor

fapifta commented Jun 20, 2023

@ashishkumar50, multiple reads are fine concurrently, but if we do not guard reads and writes with either a lock or by putting them into a synchronized method, then write may happen during reads, and that leads to a ConcurrentModificationException.

Consider the following example which is a scheduling that can happen in an env if write happens during iteration:

    List<String> strs = new ArrayList<>();
    strs.add("foo");

    Semaphore sem = new Semaphore(1);
    CountDownLatch latch = new CountDownLatch(2);

    Runnable r1 = () -> {
      try {
        sem.acquire();
        latch.countDown();
        latch.await();
        strs.add("bar");
        sem.release();
      } catch (Exception e) {
        e.printStackTrace();
      }
    };

    Runnable r2 = () -> {
      try {
        for (String s : strs) {
          latch.countDown();
          latch.await();
          sem.acquire();
          System.out.println(s);
          sem.release();
        }
      } catch (Exception e) {
        e.printStackTrace();
      }
    };

    new Thread(r1).start();
    new Thread(r2).start();

This code produces a ConcurrentModificationException consistently.

So @Galsza, I think, either we use a lock around the access to the rootCA certificate list, or we use synchronized, but we need to guard all accesses, as the addition of a new rootCA certificate will happen on a different thread than the one that serves requests and iterates over the list.

I know that the chances to win the lottery is higher than to hit this here, but still, if we started to think about it, then let's get to the bottom of it, and have a proper solution.

@Galsza
Copy link
Contributor Author

Galsza commented Jun 21, 2023

@fapifta It seems that the CI wasn't running properly anyway with synchronization only on the addRootCa. I've pushed a new version where we synch everywhere for it, but I'm still not entirely sure this should be the final solution. I'm going to see the what happens with the CI on my fork first.

@ashishkumar50
Copy link
Contributor

@ashishkumar50, multiple reads are fine concurrently, but if we do not guard reads and writes with either a lock or by putting them into a synchronized method, then write may happen during reads, and that leads to a ConcurrentModificationException.

Consider the following example which is a scheduling that can happen in an env if write happens during iteration:

    List<String> strs = new ArrayList<>();
    strs.add("foo");

    Semaphore sem = new Semaphore(1);
    CountDownLatch latch = new CountDownLatch(2);

    Runnable r1 = () -> {
      try {
        sem.acquire();
        latch.countDown();
        latch.await();
        strs.add("bar");
        sem.release();
      } catch (Exception e) {
        e.printStackTrace();
      }
    };

    Runnable r2 = () -> {
      try {
        for (String s : strs) {
          latch.countDown();
          latch.await();
          sem.acquire();
          System.out.println(s);
          sem.release();
        }
      } catch (Exception e) {
        e.printStackTrace();
      }
    };

    new Thread(r1).start();
    new Thread(r2).start();

This code produces a ConcurrentModificationException consistently.

So @Galsza, I think, either we use a lock around the access to the rootCA certificate list, or we use synchronized, but we need to guard all accesses, as the addition of a new rootCA certificate will happen on a different thread than the one that serves requests and iterates over the list.

I know that the chances to win the lottery is higher than to hit this here, but still, if we started to think about it, then let's get to the bottom of it, and have a proper solution.

Yes correct, If request traffic is frequent there is a chance that read might fail because writes are happening on same object. But as you mentioned traffic is very less so the chance of happening both read/write together is rare and even if this happens there is a retry from client which will be success eventually.
So we can decide whether we should fix this with no chance of failure because of concurrency.

@ashishkumar50
Copy link
Contributor

ashishkumar50 commented Jun 21, 2023

@fapifta It seems that the CI wasn't running properly anyway with synchronization only on the addRootCa. I've pushed a new version where we synch everywhere for it, but I'm still not entirely sure this should be the final solution. I'm going to see the what happens with the CI on my fork first.

You can use either lock or synchronized to handle this, which will protect object. Either way is fine. May be can test locally by running multiple read/writes at same time.

@fapifta
Copy link
Contributor

fapifta commented Jun 21, 2023

Thank you @Galsza for the patch. +1

As I see we got to an agreement regarding synchronization, and the rest of the comments I believe were addressed properly.
@ashishkumar50 please let us know if you are ok with merging this one, we are dealing with the next steps in the meantime, so it would be nice to have this one in soon if there are no further doubts/questions.

@ashishkumar50
Copy link
Contributor

@Galsza thanks for updating the patch.
@fapifta patch LGTM +1, we can merge.

@fapifta fapifta merged commit a6ed346 into apache:master Jun 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants