Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-8879. Cleanup SecurityConfig and related class initialization #4921

Merged
merged 19 commits into from
Jun 22, 2023

Conversation

fapifta
Copy link
Contributor

@fapifta fapifta commented Jun 16, 2023

What changes were proposed in this pull request?

This is a fairly large refactor mainly due to moving files around in different modules/packages.

  • OzoneSecurityException moved from hdds-server-framework to hdds-common and from o.a.h.hdds.security to o.a.h.hdds.security.exception package.
  • KeyStoresFactory interface moved from hdds-server-framework to hdds-common, but remained in the same package along with its implementation
  • Reloading key and trust manager moved from hdds-server-framework to hdds-common, but remained in the same package
  • CAType enum moved from hdds-server-framework to hdds-common but remained in the same package
  • CertificateClient and CertificateNotification interfaces moved from hdds-server-framework to hdds-common, they have remained in the same package.
  • CertificateCode, CertificateSignRequest, SelfSignedCertificate is moved from hdds-server-framework to hdds-common they remained in the same package
  • CertificateException moved from hdds-server-framework to hdds-common but remained in the same package
  • SecurityConfig moved from o.a.h.hdds.security.x509 to o.a.h.hdds.security package.

Besides all these moves the following significant changes were made:

  • PemFileBasedKeyStoresFactory implementation was changed to avoid findbugs warning about exposing internal structure.
  • Everything under the packages org.apache.hadoop.hdds.security and any derived classes rely only on the SecurityConfig object, and the OzoneConfiguration and its descendants are not used there anymore.
  • block token duration related sanity check was bought into the SecurityConfig, sanity check had to be disabled because of this change in quite a lot of tests. Also this reveals a possible follow up JIRA, as we does not seem to have a similar check for container token lifetime.
  • OZONE_S3_AUTHINFO_MAX_LIFETIME_KEY removed from OzoneConfigKeys after removing the only usage of it in SecurityConfig along with the associated getter that is not used anymore anywhere in the code. It was not added to ozone-default.xml either.
  • TokenHelper used by EC reconstruction now uses SecurityConfig in its constructor, and its code have been changed to delegate understanding the configuration to the SecurityConfig object, and just use the proper getters to get values.
  • HddsDatanodeService lost its two static creation method, and now the costructors are called directly, as there was no further logic or any other use of the static creator methods.
  • CertificateClient construction has changed, it relies on a SecurityConfig instance from now on, and also it relies to get the SCMSecurityProtocol client from outside. This has an effect on all certificate client instantiation, and there was a need to change the order of doing UGI login and initializaton due to some assumptions in tests that I did not changed.
  • Moved the utility method that sets up the SCMSecurityProtocol client from HddsServerUtil to HASecurityUtils
  • OmCertificateClient now relies on OzoneManagerDetailsProto via its constructor, and does not call the static method on OzoneManager class directly to figure it out.

Why I would like to change all of these?

This is one of the steps to simplify and reduce external dependencies of the CertificateClient and related codebase. The final aim is to get to a stage where we can separate the code that is responsible to create the certificate sign request for a service and send it to the SCM, the code that is responsible to store the certificate and key material of a particular service, the code that is responsible to gather the rootCA certificate from SCM, and the code that provides means to set up the different type of secure connections.
Also, by centralizing all the security related configuration and its handling, we have a single pane of glass to see what configurations we use, and what rules we apply to these configuration values.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8879

How was this patch tested?

Existing JUnit and integration tests.

fapifta and others added 19 commits June 16, 2023 15:13
…security configuration like token enablement, security enablement amongst others.
…and make everything security related rely on SecurityConfig purely.
…he service user before initializing certificate client.
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fapifta for the cleanup, LGTM.

@Galsza
Copy link
Contributor

Galsza commented Jun 20, 2023

Thank you @fapifta for working on this. LGTM +1

@@ -472,31 +470,6 @@ public static SCMSecurityProtocolClientSideTranslatorPB getScmSecurityClient(
ugi == null ? UserGroupInformation.getCurrentUser() : ugi));
}

public static SCMSecurityProtocolClientSideTranslatorPB
Copy link
Contributor

@ChenSammi ChenSammi Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fapifta , I didn't see obvious benefit of moving this getScmSecurityClientWithFixedDuration from HddsServerUtil.java to HASecurityUtils, could you explain it a bit? Changing this function location, cause a lot of changes in DefaultCertificateClient and HddsDatanodeService. If there is no obvious benefit, I would strongly suggest keep the function in this file, so that all other service modules can call this function, instead of now, only the scm module can call this function.

Others looks good to me.

Copy link
Contributor Author

@fapifta fapifta Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method was only referenced from HASecurityUtils#getRootCASignedSCMCert, hence the move and then I also made this private in HASecurityUtils.

I guess you are referring to getScmSecurityClientWithMaxRetry that is part of a change everywhere where we create a CertificateClient, as the creation of the ScmSecurityClient this way is moved out of the CertificateClient. By injecting the ScmSecurityClient via the constructor gives us the benefit of not having to be worried about initialization order in tests we do not need a setter just for testing, and we can easily inject mocking for the communicaiton via the constructor (actually we have to, which makes the test writer think about this, and do not just assume that the server communication would not be call from the test flow), also all the services can separately specify retries, timeouts and such, which can be a benefit in the future (but I admit it is not really a reason now).

Copy link
Contributor

@ChenSammi ChenSammi Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fapifta , Yeah, I mean getScmSecurityClientWithMaxRetry, not getScmSecurityClientWithFixedDuration. To support every service can have different scm client retires, timeouts, we can just override the getScmSecureClient() function, I just did it in SCMCertificateClient:)

I'm kind of think it would better to handle the scm client creation in Certificate client internally, instead of every time, it need create the scm client before instantiate a certificate client, so to avoid many duplicate codes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have looked into how we can bring back the SCMSecurityProtocolClient creation into the CertificateClient, and then I realized the main reason why it has to be outside, besides other advantages I think are there and I noted before.

It is using Hadoop RPC, with that it requires the full Configuration, and not just the SecurityConfig, so with that we need to have the full Configuration in the CertificateClient code again, and that means the bulk if this PR is not really meaningful anymore.
Actually the code of creation is not duplicated, we are calling the same utility method everywhere to create the client, and the provide it to the CertificateClient instance.

I have played around with the idea of having the SCMClientConfig in the constructor, and based on it create the client, but that one is also somewhat tedious, as either we take the Configuration and convert it internally, or we do something different...

One idea I can imagine working, where we take a ConfigurationSource in the DefaultCertificateClient constructor instead of SecurityConfig, and then in the DefaultCertificateClient we create the SecurityConfig and the proper SCMClientConfig and then use these further down the line. Unsure, but I can play around with this idea if it sounds more feasible, should I?

Copy link
Contributor

@ChenSammi ChenSammi Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I understand the reason. We can get ConfigurationSource from SecurityConfig.getConfiguration(). So the DefaultCertificateClient constructor can just use the SecurityConfig.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get ConfigurationSource from SecurityConfig.getConfiguration().

After this change we cannot, this method is being removed.

Copy link
Contributor

@ChenSammi ChenSammi Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, then I guess this solution is the most fit one since we want to remove the usage of ConfigurationSource.

Thanks @fapifta for all the effort. I will merge it.

@ChenSammi ChenSammi merged commit 9d87d52 into apache:master Jun 22, 2023
errose28 added a commit to errose28/ozone that referenced this pull request Jun 25, 2023
* master: (96 commits)
  HDDS-8586 Recon. - API for Count of deletePending keys and amount of data mapped to such keys. (apache#4923)
  HDDS-8908. Intermittent failure in TestBlockDeletion#testBlockDeletion (apache#4958)
  HDDS-8910. Replace LockManager with striped lock in ContainerStateManager (apache#4962)
  HDDS-8917. Move protobuf conversion out of the lock in PipelineStateManagerImpl (apache#4965)
  HDDS-8825. Use apache/hadoop 3.3.5 docker image (apache#4963)
  HDDS-8906. Avoid stream when getting in-service healthy nodes (apache#4960)
  HDDS-8907. Store volume count when storage report is updated (apache#4957)
  HDDS-8905. PipelineManager metrics should not be synchronized (apache#4959)
  HDDS-8553. Improve scanner integration tests. (apache#4936)
  HDDS-8854. Avoid unnecessary DatanodeDetails creation for NodeStateManager lookup (apache#4925)
  HDDS-8315. [Snapshot] Added unit tests for SnapshotDiffManager (apache#4716)
  HDDS-7968. [Snapshot] Improve KeyDeletingService to reclaim eligible key blocks in snapshot's deletedTable (apache#4935)
  HDDS-8838. Update default datanode check empty containter on disk to false (apache#4937)
  HDDS-8763. Support RocksDB iterator with ByteBuffer. (apache#4942)
  HDDS-8543. FSO directory should reflect bucket/cluster default replication (apache#4947)
  HDDS-8898. Replication limit should not be less than reconstruction weight (apache#4954)
  HDDS-8739. Snapdiff should return complete absolute path in Diff Entry (apache#4823)
  HDDS-8908. Mark TestBlockDeletion#testBlockDeletion as flaky
  HDDS-8534. Support asynchronous service logging (apache#4663)
  HDDS-8879. Cleanup SecurityConfig and related class initialization (apache#4921)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants