Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-8690. Ozone Support deletion related parameter dynamic configuration #4798

Merged
merged 30 commits into from
Oct 8, 2023

Conversation

xichen01
Copy link
Contributor

@xichen01 xichen01 commented May 30, 2023

What changes were proposed in this pull request?

Support deletion related parameter dynamic configuration
OM:

  • ozone.key.deleting.limit.per.task

SCM:

  • hdds.scm.block.deletion.per-interval.max

Datanode:

  • hdds.datanode.block.deleting.limit.per.interval
  • hdds.datanode.block.delete.threads.max
  • ozone.block.deleting.service.workers

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8690

How was this patch tested?

unit test.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xichen01 for working on this. Mostly looks good, but I would prefer to implement this after #4774, #4794 (which both need some committer reviews), and a follow-up that connects the annotated configurations with the reconfiguration handler. The change will be much simpler then.

@xichen01
Copy link
Contributor Author

xichen01 commented Jun 5, 2023

Thanks @xichen01 for working on this. Mostly looks good, but I would prefer to implement this after #4774, #4794 (which both need some committer reviews), and a follow-up that connects the annotated configurations with the reconfiguration handler. The change will be much simpler then.

OK, I will update this PR after #4774 and #4794 were merged

@adoroszlai adoroszlai marked this pull request as draft June 6, 2023 10:01
# Conflicts:
#	hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/HddsDatanodeService.java
#	hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
#	hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/reconfig/TestOmReconfigure.java
#	hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/reconfig/TestScmReconfigure.java
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
@xichen01
Copy link
Contributor Author

@adoroszlai Should we maintain Reconfigurable properties lists to describe which keys can be configured, this may require effort to keep the documentation in sync with the code.
Perhaps we could remove this section, and the user can get exactly this information via the command?

## OM Reconfigurability
**Reconfigurable properties**
key | description
-----------------------------------|-----------------------------------------
ozone.administrators | OM startup user will be added to admin by default
>For example, modify `ozone.administrators` in ozone-site.xml and execute:

@adoroszlai
Copy link
Contributor

Should we maintain Reconfigurable properties lists to describe which keys can be configured, this may require effort to keep the documentation in sync with the code. Perhaps we could remove this section, and the user can get exactly this information via the command?

Yes, let's remove the specific properties. Documenting the way to explore the list of properties is enough. An outdated doc may be worse than no doc.

# Conflicts:
#	hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/ScmConfig.java
#	hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/HddsDatanodeService.java
#	hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
#	hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/reconfig/TestScmReconfiguration.java
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/KeyDeletingService.java
@xichen01 xichen01 marked this pull request as ready for review July 17, 2023 15:12
@xichen01 xichen01 requested a review from adoroszlai July 17, 2023 15:13
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xichen01 for continuing work on this. Mostly looks good.

Comment on lines 2131 to 2137
private String reconfHddsScmBlockDeletionPerIntervalMax(String newVal) {
getConfiguration().set(HDDS_SCM_BLOCK_DELETION_PER_INTERVAL_MAX, newVal);

getScmBlockManager().getSCMBlockDeletingService()
.setBlockDeleteTXNum(Integer.parseInt(newVal));
return newVal;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We simplify reconfiguration of this property by using reconfigurable = true and making sure the same instance of scmConfig is used in SCMBlockDeletingService as the one being reconfigured in SCM.

Full change (also fixes TestScmReconfiguration failure):
adoroszlai@1ce698f

Comment on lines 64 to 105
@Test
public void blockDeleteThreadMax() throws ReconfigurationException {
// Start the service and get the original pool size
ThreadPoolExecutor executor = ((DeleteBlocksCommandHandler)
getFirstDatanode().getDatanodeStateMachine().getCommandDispatcher()
.getDeleteBlocksCommandHandler()).getExecutor();
int originPoolSize = executor.getMaximumPoolSize();

// Attempt to increase the pool size by 1 and verify if it's successful
getFirstDatanode().getReconfigurationHandler().reconfigurePropertyImpl(
HDDS_DATANODE_BLOCK_DELETE_THREAD_MAX,
String.valueOf(originPoolSize + 1));
assertEquals(originPoolSize + 1, executor.getMaximumPoolSize());
assertEquals(originPoolSize + 1, executor.getCorePoolSize());

// Attempt to decrease the pool size by 1 and verify if it's successful
getFirstDatanode().getReconfigurationHandler().reconfigurePropertyImpl(
HDDS_DATANODE_BLOCK_DELETE_THREAD_MAX,
String.valueOf(originPoolSize - 1));
assertEquals(originPoolSize - 1, executor.getMaximumPoolSize());
assertEquals(originPoolSize - 1, executor.getCorePoolSize());
}

@Test
public void blockDeletingServiceWorkers() throws ReconfigurationException {
ScheduledThreadPoolExecutor executor = (ScheduledThreadPoolExecutor)
getFirstDatanode().getDatanodeStateMachine().getContainer()
.getBlockDeletingService().getExecutorService();
int originPoolSize = executor.getCorePoolSize();

// Attempt to increase the pool size by 1 and verify if it's successful
getFirstDatanode().getReconfigurationHandler().reconfigurePropertyImpl(
OZONE_BLOCK_DELETING_SERVICE_WORKERS,
String.valueOf(originPoolSize + 1));
assertEquals(originPoolSize + 1, executor.getCorePoolSize());

// Attempt to decrease the pool size by 1 and verify if it's successful
getFirstDatanode().getReconfigurationHandler().reconfigurePropertyImpl(
OZONE_BLOCK_DELETING_SERVICE_WORKERS,
String.valueOf(originPoolSize - 1));
assertEquals(originPoolSize - 1, executor.getCorePoolSize());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can avoid duplication by converting to @ParameterizedTest: adoroszlai@5aeb768

Also fixes findbugs error:

M D RV: Return value of org.apache.hadoop.ozone.container.keyvalue.statemachine.background.BlockDeletingService.getBlockLimitPerInterval() ignored, but method has no side effect  At TestDatanodeReconfiguration.java:[line 58]

https://github.com/xichen01/ozone/actions/runs/5577408467/jobs/10190274215#step:6:2367

@@ -26,6 +26,7 @@
import static org.apache.commons.lang3.RandomStringUtils.randomAlphabetic;
import static org.apache.hadoop.ozone.OzoneConfigKeys.OZONE_ADMINISTRATORS;
import static org.apache.hadoop.ozone.OzoneConfigKeys.OZONE_READONLY_ADMINISTRATORS;
import static org.apache.hadoop.ozone.om.OMConfigKeys.OZONE_KEY_DELETING_LIMIT_PER_TASK;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 676 to 677
public DatanodeConfiguration getDfsConf() {
return dfsConf;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think dnConf / getDnConf would be better.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xichen01 for updating the patch.

Comment on lines 340 to 349
@Override
public synchronized <T> T getSingletonObject(Class<T> configurationClass) {
if (singletons.containsKey(configurationClass)) {
return (T) singletons.get(configurationClass);
}

T singletonObject = getObject(configurationClass);
singletons.put(configurationClass, singletonObject);
return singletonObject;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be implemented more simply:

Suggested change
@Override
public synchronized <T> T getSingletonObject(Class<T> configurationClass) {
if (singletons.containsKey(configurationClass)) {
return (T) singletons.get(configurationClass);
}
T singletonObject = getObject(configurationClass);
singletons.put(configurationClass, singletonObject);
return singletonObject;
}
public <T> T getSingletonObject(Class<T> configurationClass) {
return (T) singletons.computeIfAbsent(configurationClass,
c -> getObject(configurationClass));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't need to be synchronized, since it uses a ConcurrentHashMap's atomic operation.

Comment on lines 186 to 199
/**
* Returns a singleton instance of the given configuration class.
* If an instance of the class has already been created,
* it will be returned; otherwise, a new instance will be created,
* stored in a map for future retrieval.
*
* @param configurationClass The class for which a singleton
* instance is required
* @return a singleton instance of the given class
*/
default <T> T getSingletonObject(Class<T> configurationClass) {
return getObject(configurationClass);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be added to the ConfigurationSource interface, since it does not offer the "singleton" functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getSingletonObject has been removed from ConfigurationSource.java

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xichen01 for making the change. On second thought, instead of adding a new method getSingletonObject, we should override getObject from ConfigurationSource with the singleton-based implementation. This way we can keep passing ConfigurationSource.

In the long run, we should move towards passing the type-safe config objects (DatanodeConfiguration, ScmConfig, etc.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change the logic of ConfigurationSource#getObject directly, will this cause some of the tests to fail, since at the moment we can assume that each object from ConfigurationSource#getObject is independent and unaffected.
If the ConfigurationSource#getObject is "Singleton" that maybe cause tests to affect with each other.

Should we keep this implemented(getSingletonObject) temporarily, and implement a "Singleton" ConfigurationSource#getObject in another MR?

@ChenSammi
Copy link
Contributor

@xichen01 , could you rebase this patch once you got time?

@@ -335,4 +334,5 @@ private static void addDeprecatedKeys() {
OZONE_CONTAINER_COPY_WORKDIR)
});
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you revert the changes in this file since it's actually doesn't change any content?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -137,7 +146,7 @@ public class DatanodeConfiguration {
* missed. With max threads 5, optimistically DN can handle 1500 individual
* container delete tx in 60s with RocksDB cache miss.
*/
@Config(key = "block.delete.threads.max",
@Config(key = BLOCK_DELETE_THREAD_MAX,
Copy link
Contributor

@ChenSammi ChenSammi Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xichen01 , I find this property is registered to ReconfigurationHandler, but it doesn't have the "reconfigurable = true" field, while the property "block.deleting.limit.per.interval" below has the "reconfigurable = true" but not registered to ReconfigurationHandler.
Could you explain it a bit more that what's the required steps to make a property reconfigurable, and whether "reconfigurable = true" is an mandatory attribute for a reconfigurable property?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change a configuration's value, if only need to change the value itself, and no additional operations are performed. we can just add a reconfigurable = true in @Config annotation. the reconfigurable configuration can be auto registered.

      reconfigurationHandler =
          new ReconfigurationHandler("DN", conf, this::checkAdminPrivilege)
              .register(HDDS_DATANODE_BLOCK_DELETE_THREAD_MAX,
                  this::reconfigBlockDeleteThreadMax)
              .register(OZONE_BLOCK_DELETING_SERVICE_WORKERS,
                  this::reconfigDeletingServiceWorkers)
              .register(conf.getObject(DatanodeConfiguration.class));  // register the reconfigurable configuration

then this configuration's value can be modify in runtime by ozone admin reconfig command.

But If need to do something extra when modifying the configuration value, such as modifying the thread pool (block.delete.threads.max), we need to register the operation manually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So either register the property explicitly with a consumer handler, or set the "reconfigurable = true" attribute to the property. The later way the property will be auto registered with a default consumer handler, right? @xichen01

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if you set the "reconfigurable = true" attribute, the configuration will be auto handled when "reconfigure".
but one point needs to notice:

The object registered to ReconfigurationHandler must be the same conf as the one that wants to get the value after reconfiguring.
such as:

      DatanodeConfiguration dnConf1 = conf.getObject(DatanodeConfiguration.class)
      DatanodeConfiguration dnConf2 = conf.getObject(DatanodeConfiguration.class)

      reconfigurationHandler =
          new ReconfigurationHandler("DN", conf, this::checkAdminPrivilege)
              .register(dnConf1); 

After reconfiguring, only dnConf1 can get the changed value, because only dnConf1 was registered to the ReconfigurationHandler

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xichen01 , thanks for the explain.

@@ -260,7 +261,7 @@ public DatanodeStateMachine(DatanodeDetails datanodeDetails,

@VisibleForTesting
public DatanodeStateMachine(DatanodeDetails datanodeDetails,
ConfigurationSource conf) throws IOException {
OzoneConfiguration conf) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we change the conf from ConfigurationSource to OzoneConfiguration? it's also changed in below OzoneContainer.java.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xichen01 , is this one missed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

## OM Reconfigurability
## Retrieve the reconfigurable properties list
To retrieve all the reconfigurable properties list for a specific component in Ozone,
you can use the command: `ozone admin reconfig properties --address=<ip:port>`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ozone admin reconfig properties --address=<ip:port> doesn't work. The accepted format is what you used in the below examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already corrected

@@ -4704,6 +4707,14 @@ private String reconfOzoneReadOnlyAdmins(String newVal) {
return String.valueOf(newVal);
}

private String reconfOzoneKeyDeletingLimitPerTask(String newVal) {
getConfiguration().set(OZONE_KEY_DELETING_LIMIT_PER_TASK, newVal);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the check that newValue should be a positive value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think may we can provide a conf.getUnsignedInt() for those configurations which must be larger than 0. we can do some check in conf.getUnsignedInt()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean conf.setUnsignedInt? It's a good idea. But the configuration class is in hadoop common module.

# Conflicts:
#	hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/statemachine/background/BlockDeletingService.java
# Conflicts:
#	hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/BlockDeletingService.java
@ChenSammi
Copy link
Contributor

ChenSammi commented Sep 22, 2023

@xichen01 , thanks for the improving the patch. There is one comment missed to address. And could you rebase the patch since HDDS-8869 is merged.

# Conflicts:
#	hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/DeleteBlocksCommandHandler.java
@xichen01
Copy link
Contributor Author

@xichen01 , thanks for the improving the patch. There is one comment missed to address. And could you rebase the patch since HDDS-8869 is merged.

Those useless modifies have been reverted.

@ChenSammi
Copy link
Contributor

ChenSammi commented Oct 8, 2023

@xichen01 , the last patch looks good, could you rebase the patch to solve the conflicts?

private String reconfigBlockDeleteThreadMax(String value) {
getConf().set(HDDS_DATANODE_BLOCK_DELETE_THREAD_MAX, value);
conf.getObject(DatanodeConfiguration.class)
.setBlockDeleteThreads(Integer.parseInt(value));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this line is covered by L684 already. It can be removed.

# Conflicts:
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/KeyDeletingService.java
@ChenSammi
Copy link
Contributor

@xichen01 , can you check the failed test "TestOzoneManagerHAKeyDeletion.testKeyDeletion"? It might be related with the code change.

@xichen01
Copy link
Contributor Author

xichen01 commented Oct 8, 2023

@xichen01 , can you check the failed test "TestOzoneManagerHAKeyDeletion.testKeyDeletion"? It might be related with the code change.

The master branch have this Error too https://github.com/apache/ozone/actions/runs/6439086990/job/17486681800

@ChenSammi
Copy link
Contributor

@xichen01 , can you check the failed test "TestOzoneManagerHAKeyDeletion.testKeyDeletion"? It might be related with the code change.

The master branch have this Error too https://github.com/apache/ozone/actions/runs/6439086990/job/17486681800

Yes, it's reproduced with master branch. Created HDDS-9414 to track the failure.

@xichen01
Copy link
Contributor Author

xichen01 commented Oct 8, 2023

@xichen01 , can you check the failed test "TestOzoneManagerHAKeyDeletion.testKeyDeletion"? It might be related with the code change.

The master branch have this Error too https://github.com/apache/ozone/actions/runs/6439086990/job/17486681800

Yes, it's reproduced with master branch. Created HDDS-9414 to track the failure.

@adoroszlai has create a ticket
https://issues.apache.org/jira/projects/HDDS/issues/HDDS-9412

@ChenSammi ChenSammi merged commit f635904 into apache:master Oct 8, 2023
31 of 32 checks passed
ibrusentsev pushed a commit to ibrusentsev/ozone that referenced this pull request Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants