Merge release/2.6 into google/2.6#15908
Conversation
jolivier23
commented
Feb 14, 2025
- DAOS-16936 object: disable object collective operation by default (DAOS-16936 object: disable object collective operation by default #15793)
- DAOS-16807 client: intercept io_queue_init in libpil4dfs (DAOS-16807 client: intercept io_queue_init in libpil4dfs #15784)
- DAOS-16968 test: Fix pool query tests (DAOS-16968 test: Fix pool query tests #15770)
- DAOS-15604 test: Address intermittent scrubber aggregation test failure. (DAOS-15604 test: Address intermittent scrubber aggregation test failure. #15696) (DAOS-15604 test: Address intermittent scrubber aggregation test failure. (#15696) #15776)
- DAOS-16153 test: Do not run NLT fi tests for release builds. (DAOS-16153 test: Do not run NLT fi tests for release builds. #15171) (DAOS-16153 test: Do not run NLT fi tests for release builds. (#15171) #15258)
- DAOS-16257 mercury: Update build.config to include ep flush patch. (DAOS-16257 mercury: Update build.config to include ep flush patch. #15824)
- DAOS-16312 control: Always use --force for dmg system stop (DAOS-16312 control: Always use --force for dmg system stop #15811)
- DAOS-16990 cart: workaround to CXI init errors with retrying HG init (DAOS-16990 cart: workaround to CXI init errors with retrying HG init #15833) (DAOS-16990 cart: workaround to CXI init errors with retrying HG init … #15837)
- DAOS-17020 test: Increase unit test bdev memcheck timeout (DAOS-17020 test: Increase unit test bdev memcheck timeout #15835)
- DAOS-16768 pool: larger ABT ULT stack sizes (DAOS-16768 pool: larger ABT ULT stack sizes #15832)
- DAOS-17021 build: Tag 2.6.3 rc2 (DAOS-17021 build: Tag 2.6.3 rc2 #15842)
- DAOS-17059 client: fcntl 3rd parameter should be void * (DAOS-17059 client: fcntl 3rd parameter should be void * #15869)
- DAOS-16969 test: Reduce cleanup operations for metadata.py test (DAOS-16969 test: Reduce cleanup operations for metadata.py test #15779) (DAOS-16969 test: Reduce cleanup operations for metadata.py test (#15779) #15859)
- DAOS-16876 vos: remove DTX record after partial commit - b26 (DAOS-16876 vos: remove DTX record after partial commit - b26 #15858)
- DAOS-16876 vos: skip DTX record when load partial committed DTX - b26 (DAOS-16876 vos: skip DTX record when load partial committed DTX - b26 #15882)
- DAOS-17055 client: add a soft limit of 4k to nr ranges for list-io (DAOS-17055 client: add a soft limit of 4k to nr ranges for list-io #15879)
- DAOS-17060 build: Tag 2.6.3 rc3 (DAOS-17060 build: Tag 2.6.3 rc3 #15885)
- DAOS-17052 test: update expected msg for critical_integration (DAOS-17052 test: update expected msg for critical_integration #15855)
…5793) It is a temporary workaround for the collective punch crash at large scale. Signed-off-by: Fan Yong <fan.yong@hpe.com>
Intercepting io_queue_init() is needed on Ubuntu. There is compatibility issue for pil4dfs interception library when used with fio libaio engine. In some cases, fio initialize the aio context through io_queue_init function when loading the libaio engine. Through the pil4dfs has intercepted the io_setup function, but it seems that the io_setup which called by io_queue_init is not intercepted some times, which causing invalid aio context for I/O processing. So add an interception for io_queue_init to make it work for this case. Signed-off-by: Jun Zeng <jun1.zeng@intel.com> Signed-off-by: Lei Huang <lei.huang@intel.com>
The backport from master missed a couple of pool tests that needed to be updated for the new JSON output from pool query. Query results always include a dead_ranks array, even when it's empty. Signed-off-by: Michael MacDonald <mjmac@google.com>
…15824) Signed-off-by: Joseph Moore <joseph.moore@hpe.com>
Whenever stopping an engine process from within the control-plane, use SIGKILL rather than asking nicely (SIGTERM). This has been requested to try to avoid situations that could result in dataloss. This change preserves the behaviour where ds_mgmt_drpc_prep_shutdown() and then ds_pool_disable_exclude() will be called during a controlled shutdown where dmg system stop is called with new --full argument. Notable behavior changes with this PR: * Always performs SIGKILL on dmg system stop unless --full command option is supplied. * Will attempt prepare shutdown to disable exclusions across cluster during “controlled” shutdown where dmg system stop is called with --full option but this should be regarded as experimental and not for use in production environments. * Force option is a no-op and is retained for backward compatibility and future use. Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
…15833) (#15837) This is a workaround for DAOS-16990 and DAOS-17011. When using the CXI provider, retry HG_Init_opt2() on error cases since it seems CXI has intermittent issues on initialization. A new environment variable is added (CRT_CXI_INIT_RETRY) to control the retry count (default is 3) and to be able to test future SS fixes without retry. Signed-off-by: Mohamad Chaarawi <mohamad.chaarawi@hpe.com>
Increase the "Unit Test bdev with memcheck on EL 8.8" step timeout to be in sync with the master branch. Signed-off-by: Phil Henderson <phillip.henderson@hpe.com>
With this change, three ULTs in pool and container code launched via
ds_pool_thread_collective() are changed to specify a larger ("deep")
stack size of 64KiB rather than a default 16KiB stack size. i.e., the
flags parameter specified as DSS_ULT_DEEP_STACK. The three ULT
function entrypoints are:
cont_open_one, cont_snap_update_one,and update_vos_prop_on_targets.
Before this change, intermittently in CI testing, shortly after
daos_engine startup, a dmg pool list (pool query on the back end)
would occasionally result in a segmentation fault in an engine, in
these three particular areas of the code. Specifically, the faults
occurred within the ABT thread create, inside ABTI_mem_pool_alloc().
This change is based on a guess that the stack size parameter may have
some effect.
Signed-off-by: Kenneth Cain <kenneth.cain@hpe.com>
Tag second release build for 2.6.3. Signed-off-by: Phil Henderson <phillip.henderson@hpe.com>
Third argument is "void *" type in libc source code. "va_arg(arg, int);" leads to wrong argument retrieved. also need to return ENOTSUP for flock when compatible mode is not enabled. Signed-off-by: Lei Huang <lei.huang@hpe.com>
…) (#15859) Remove the calling of cleanup methods for multiple containers and ior commands that can be handled by destroying the pool and a single ior kill command. Signed-off-by: Phil Henderson <phillip.henderson@hpe.com>
Otherwise, the partial committed DTX entry will be re-committed when reopen the container. Then access related dangling DTX record(s) may trigger assertion and cause corruption. Signed-off-by: Fan Yong <fan.yong@hpe.com>
…#15882) Skip existing partial committed DTX records that were generated by DAOS-2.6.3-rc{1,2} to avoid repeated DTX commit after engine upgrade. To be safe, it is required for the user/admin to explicitly set server side environment variable "DAOS_SKIP_OLD_PARTIAL_DTX" while upgrading from DAOS-2.6.3-rc{1,2}. The environment variable can be ignored for upgrade from earlier versions. Signed-off-by: Fan Yong <fan.yong@hpe.com>
…15879) For dfs_readx/writex and array_read/write operations, add a limit for the number of IODs being passed to DAOS of 16k if the range lengths are under 16 bytes (best effort checking). Signed-off-by: Mohamad Chaarawi <mohamad.chaarawi@hpe.com>
Tag third release build for 2.6.3. Signed-off-by: Phil Henderson <phillip.henderson@hpe.com>
Updated the expected journalctl message from "exited with 0" to "killed", since #15811 changed the default dmg system stop to use --force. Signed-off-by: Dalton Bohning <dalton.bohning@hpe.com>
|
Errors are component not formatted correctly,Ticket number prefix incorrect,PR title is malformatted. See https://daosio.atlassian.net/wiki/spaces/DC/pages/11133911069/Commit+Comments,Unable to load ticket data |
…le/2.6 Revert e1393d8 as part of merge Change-Id: I7e6c15c07ad7fcb94622ec8d6081624641c44441 Signed-off-by: Jeff Olivier <jeffolivier@google.com>
8921034 to
1dd4576
Compare
Signed-off-by: Jeff Olivier <jeffolivier@google.com>
|
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15908/3/execution/node/344/log |
|
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15908/3/execution/node/319/log |
|
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15908/3/execution/node/345/log |
|
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15908/3/execution/node/316/log |