Merge release/2.6 into google/2.6#15914
Merged
jolivier23 merged 20 commits intogoogle/2.6from Feb 15, 2025
Merged
Conversation
Contributor
jolivier23
commented
Feb 14, 2025
- DAOS-16936 object: disable object collective operation by default (DAOS-16936 object: disable object collective operation by default #15793)
- DAOS-16807 client: intercept io_queue_init in libpil4dfs (DAOS-16807 client: intercept io_queue_init in libpil4dfs #15784)
- DAOS-16968 test: Fix pool query tests (DAOS-16968 test: Fix pool query tests #15770)
- DAOS-15604 test: Address intermittent scrubber aggregation test failure. (DAOS-15604 test: Address intermittent scrubber aggregation test failure. #15696) (DAOS-15604 test: Address intermittent scrubber aggregation test failure. (#15696) #15776)
- DAOS-16153 test: Do not run NLT fi tests for release builds. (DAOS-16153 test: Do not run NLT fi tests for release builds. #15171) (DAOS-16153 test: Do not run NLT fi tests for release builds. (#15171) #15258)
- DAOS-16257 mercury: Update build.config to include ep flush patch. (DAOS-16257 mercury: Update build.config to include ep flush patch. #15824)
- DAOS-16312 control: Always use --force for dmg system stop (DAOS-16312 control: Always use --force for dmg system stop #15811)
- DAOS-16990 cart: workaround to CXI init errors with retrying HG init (DAOS-16990 cart: workaround to CXI init errors with retrying HG init #15833) (DAOS-16990 cart: workaround to CXI init errors with retrying HG init … #15837)
- DAOS-17020 test: Increase unit test bdev memcheck timeout (DAOS-17020 test: Increase unit test bdev memcheck timeout #15835)
- DAOS-16768 pool: larger ABT ULT stack sizes (DAOS-16768 pool: larger ABT ULT stack sizes #15832)
- DAOS-17021 build: Tag 2.6.3 rc2 (DAOS-17021 build: Tag 2.6.3 rc2 #15842)
- DAOS-17059 client: fcntl 3rd parameter should be void * (DAOS-17059 client: fcntl 3rd parameter should be void * #15869)
- DAOS-16969 test: Reduce cleanup operations for metadata.py test (DAOS-16969 test: Reduce cleanup operations for metadata.py test #15779) (DAOS-16969 test: Reduce cleanup operations for metadata.py test (#15779) #15859)
- DAOS-16876 vos: remove DTX record after partial commit - b26 (DAOS-16876 vos: remove DTX record after partial commit - b26 #15858)
- DAOS-16876 vos: skip DTX record when load partial committed DTX - b26 (DAOS-16876 vos: skip DTX record when load partial committed DTX - b26 #15882)
- DAOS-17055 client: add a soft limit of 4k to nr ranges for list-io (DAOS-17055 client: add a soft limit of 4k to nr ranges for list-io #15879)
- DAOS-17060 build: Tag 2.6.3 rc3 (DAOS-17060 build: Tag 2.6.3 rc3 #15885)
- DAOS-17052 test: update expected msg for critical_integration (DAOS-17052 test: update expected msg for critical_integration #15855)
- DAOS-17108 common: suppress io.netty 4.1.115 CVE (DAOS-17108 common: suppress io.netty 4.1.115 CVE (#15889) #15890)
…5793) It is a temporary workaround for the collective punch crash at large scale. Signed-off-by: Fan Yong <fan.yong@hpe.com>
Intercepting io_queue_init() is needed on Ubuntu. There is compatibility issue for pil4dfs interception library when used with fio libaio engine. In some cases, fio initialize the aio context through io_queue_init function when loading the libaio engine. Through the pil4dfs has intercepted the io_setup function, but it seems that the io_setup which called by io_queue_init is not intercepted some times, which causing invalid aio context for I/O processing. So add an interception for io_queue_init to make it work for this case. Signed-off-by: Jun Zeng <jun1.zeng@intel.com> Signed-off-by: Lei Huang <lei.huang@intel.com>
The backport from master missed a couple of pool tests that needed to be updated for the new JSON output from pool query. Query results always include a dead_ranks array, even when it's empty. Signed-off-by: Michael MacDonald <mjmac@google.com>
…15824) Signed-off-by: Joseph Moore <joseph.moore@hpe.com>
Whenever stopping an engine process from within the control-plane, use SIGKILL rather than asking nicely (SIGTERM). This has been requested to try to avoid situations that could result in dataloss. This change preserves the behaviour where ds_mgmt_drpc_prep_shutdown() and then ds_pool_disable_exclude() will be called during a controlled shutdown where dmg system stop is called with new --full argument. Notable behavior changes with this PR: * Always performs SIGKILL on dmg system stop unless --full command option is supplied. * Will attempt prepare shutdown to disable exclusions across cluster during “controlled” shutdown where dmg system stop is called with --full option but this should be regarded as experimental and not for use in production environments. * Force option is a no-op and is retained for backward compatibility and future use. Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
…15833) (#15837) This is a workaround for DAOS-16990 and DAOS-17011. When using the CXI provider, retry HG_Init_opt2() on error cases since it seems CXI has intermittent issues on initialization. A new environment variable is added (CRT_CXI_INIT_RETRY) to control the retry count (default is 3) and to be able to test future SS fixes without retry. Signed-off-by: Mohamad Chaarawi <mohamad.chaarawi@hpe.com>
Increase the "Unit Test bdev with memcheck on EL 8.8" step timeout to be in sync with the master branch. Signed-off-by: Phil Henderson <phillip.henderson@hpe.com>
With this change, three ULTs in pool and container code launched via
ds_pool_thread_collective() are changed to specify a larger ("deep")
stack size of 64KiB rather than a default 16KiB stack size. i.e., the
flags parameter specified as DSS_ULT_DEEP_STACK. The three ULT
function entrypoints are:
cont_open_one, cont_snap_update_one,and update_vos_prop_on_targets.
Before this change, intermittently in CI testing, shortly after
daos_engine startup, a dmg pool list (pool query on the back end)
would occasionally result in a segmentation fault in an engine, in
these three particular areas of the code. Specifically, the faults
occurred within the ABT thread create, inside ABTI_mem_pool_alloc().
This change is based on a guess that the stack size parameter may have
some effect.
Signed-off-by: Kenneth Cain <kenneth.cain@hpe.com>
Tag second release build for 2.6.3. Signed-off-by: Phil Henderson <phillip.henderson@hpe.com>
Third argument is "void *" type in libc source code. "va_arg(arg, int);" leads to wrong argument retrieved. also need to return ENOTSUP for flock when compatible mode is not enabled. Signed-off-by: Lei Huang <lei.huang@hpe.com>
…) (#15859) Remove the calling of cleanup methods for multiple containers and ior commands that can be handled by destroying the pool and a single ior kill command. Signed-off-by: Phil Henderson <phillip.henderson@hpe.com>
Otherwise, the partial committed DTX entry will be re-committed when reopen the container. Then access related dangling DTX record(s) may trigger assertion and cause corruption. Signed-off-by: Fan Yong <fan.yong@hpe.com>
…#15882) Skip existing partial committed DTX records that were generated by DAOS-2.6.3-rc{1,2} to avoid repeated DTX commit after engine upgrade. To be safe, it is required for the user/admin to explicitly set server side environment variable "DAOS_SKIP_OLD_PARTIAL_DTX" while upgrading from DAOS-2.6.3-rc{1,2}. The environment variable can be ignored for upgrade from earlier versions. Signed-off-by: Fan Yong <fan.yong@hpe.com>
…15879) For dfs_readx/writex and array_read/write operations, add a limit for the number of IODs being passed to DAOS of 16k if the range lengths are under 16 bytes (best effort checking). Signed-off-by: Mohamad Chaarawi <mohamad.chaarawi@hpe.com>
Tag third release build for 2.6.3. Signed-off-by: Phil Henderson <phillip.henderson@hpe.com>
Updated the expected journalctl message from "exited with 0" to "killed", since #15811 changed the default dmg system stop to use --force. Signed-off-by: Dalton Bohning <dalton.bohning@hpe.com>
Suppress io.netty:netty-common 4.1.115.Final CVE - no fix available
Suppress io.netty:netty-handler 4.1.100.Final CVE -
fix available in 4.1.118.Final
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>
…le/2.6 Revert e1393d8 Change-Id: Ica1c2d04f7f54d60a616282323824baae1350f72 Signed-off-by: Jeff Olivier <jeffolivier@google.com>
|
Errors are component not formatted correctly,Ticket number prefix incorrect,PR title is malformatted. See https://daosio.atlassian.net/wiki/spaces/DC/pages/11133911069/Commit+Comments,Unable to load ticket data |
Collaborator
|
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15914/1/testReport/ |
wangdi1
approved these changes
Feb 15, 2025
jxiong
approved these changes
Feb 15, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.