DAOS-11092 bio: restrict inflight ios#9667
Conversation
|
Bug-tracker data: |
daosbuild1
left a comment
There was a problem hiding this comment.
LGTM. No errors found by checkpatch.
daosbuild1
left a comment
There was a problem hiding this comment.
LGTM. No errors found by checkpatch.
daosbuild1
left a comment
There was a problem hiding this comment.
LGTM. No errors found by checkpatch.
|
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-9667/3/display/redirect |
|
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-9667/4/display/redirect |
daosbuild1
left a comment
There was a problem hiding this comment.
LGTM. No errors found by checkpatch.
daosbuild1
left a comment
There was a problem hiding this comment.
LGTM. No errors found by checkpatch.
|
See my last comment in DAOS-11092, I'm not sure why this PR could fix the segfault, I still suspect it's a stack overrun from EC rebuild/aggregation, let's investigate it further once Argobots upgraded. |
|
Add missed 'pr' in Test-tag. |
daosbuild1
left a comment
There was a problem hiding this comment.
LGTM. No errors found by checkpatch.
|
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9667/7/execution/node/371/log |
|
Test stage Build RPM on Leap 15 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9667/7/execution/node/367/log |
|
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9667/7/execution/node/410/log |
|
Test stage Build on CentOS 7 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9667/7/execution/node/386/log |
daosbuild1
left a comment
There was a problem hiding this comment.
LGTM. No errors found by checkpatch.
|
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9667/10/execution/node/1368/log |
daosbuild1
left a comment
There was a problem hiding this comment.
LGTM. No errors found by checkpatch.
daosbuild1
left a comment
There was a problem hiding this comment.
LGTM. No errors found by checkpatch.
3ee32cb to
cb61bd5
Compare
daosbuild1
left a comment
There was a problem hiding this comment.
LGTM. No errors found by checkpatch.
daosbuild1
left a comment
There was a problem hiding this comment.
LGTM. No errors found by checkpatch.
94387fa to
26f056a
Compare
daosbuild1
left a comment
There was a problem hiding this comment.
LGTM. No errors found by checkpatch.
Stop issuing new NVMe I/O when the per-xstream inflight IOs reachs a threshold. Confine the number of batched blob unmap calls in bio_blob_unmap_sgl() to workaroud the mysterious segfault. The default max unmap count is set to 32, and it can be configured through DAOS_SPDK_MAX_UNMAP_CNT. Test-tag: pr ec_aggregation_default ec_aggregation_time ec_online_rebuild_mdtest Required-githooks: true Signed-off-by: Niu Yawei <yawei.niu@intel.com>
26f056a to
0467b1e
Compare
daosbuild1
left a comment
There was a problem hiding this comment.
LGTM. No errors found by checkpatch.
|
ping reviewers, this patch is ready for review now. |
Stop issuing new NVMe I/O when the per-xstream inflight IOs reachs a threshold. Confine the number of batched blob unmap calls in bio_blob_unmap_sgl() to workaroud the mysterious segfault. The default max unmap count is set to 32, and it can be configured through DAOS_SPDK_MAX_UNMAP_CNT. Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Stop issuing new NVMe I/O when the per-xstream inflight IOs reachs
a threshold.
Confine the number of batched blob unmap calls in bio_blob_unmap_sgl()
to workaroud the mysterious segfault. The default max unmap count is
set to 32, and it can be configured through DAOS_SPDK_MAX_UNMAP_CNT.
Test-tag: pr ec_aggregation_default ec_aggregation_time ec_online_rebuild_mdtest
Required-githooks: true
Signed-off-by: Niu Yawei yawei.niu@intel.com