DAOS-18597 ddb: enforce SPDK re-init rules#17838
Conversation
SPDK, in all known applications—including daos_engine—is initialized only once during the lifetime of a process. DDB uses SPDK differently, allowing the user to initialize and re‑initialize SPDK multiple times within the same process. However, SPDK does not fully support re‑initialization. At the moment, this issue appears only when the SPDK configuration uses the VMD subsystem. When a VMD‑enabled SPDK configuration is in use, two conditions must be respected: - The VMD‑enabled configuration must be used during the first SPDK initialization. The VMD subsystem and DPDK will not be initialized on subsequent SPDK re‑initializations. - After a VMD‑enabled configuration has been used, the user cannot re‑initialize SPDK—whether with VMD enabled or disabled. The internal state of the VMD subsystem and DPDK becomes unsafe to use, even if the next SPDK configuration does not include VMD. These rules if not adhered to causes DDB to crash. Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
|
Ticket title is 'ddb dev_list with --vos_path in interactive mode causes segmentation fault' |
|
|
||
| if (opt->db_path != NULL && strnlen(opt->db_path, PATH_MAX) != 0) { | ||
| memset(path_parts.vf_db_path, 0, sizeof(path_parts.vf_db_path)); | ||
| strncpy(path_parts.vf_db_path, opt->db_path, sizeof(path_parts.vf_db_path) - 1); |
There was a problem hiding this comment.
Can we move this section into above vos_path_parse(), then the other similar process in the patch can share that, and nobody will miss db_path.
| int rc; | ||
|
|
||
| rc = json_object_object_get_ex(vmd_subsystem, KEY_SUBSYSTEM_CONFIG, &config); | ||
| D_ASSERTF(rc == 1, "VMD subsystem does not have a '%s' key.\n", KEY_SUBSYSTEM_CONFIG); |
There was a problem hiding this comment.
Do not assertion since the input source maybe out of control. Similarly for other places.
There was a problem hiding this comment.
Done.
But I still left two cases where, in my opinion, we have all the reason to expect a given json_*() call to succeed. e.g.
methods_num = json_object_array_length(config);
for (int i = 0; i < methods_num; i++) {
struct json_object *method = json_object_array_get_idx(config, i);
D_ASSERT(method != NULL);The number of objects in the array is already established. If they are there it should be possible to get them one-by-one.
| is_vmd_enabled(struct json_object *vmd_subsystem) | ||
| { | ||
| struct json_object *config; | ||
| struct json_object *method; |
There was a problem hiding this comment.
NIT, the scope of the variable method can be reduced.
| } | ||
|
|
||
| static struct json_object * | ||
| get_vmd_subsystem(struct json_object *subsystems) |
There was a problem hiding this comment.
NIT, the scope of the variable subsystem and rc can be reduced
| memset(path_parts.vf_db_path, 0, sizeof(path_parts.vf_db_path)); | ||
| strncpy(path_parts.vf_db_path, pa.pa_db_path, | ||
| sizeof(path_parts.vf_db_path) - 1); | ||
| } |
There was a problem hiding this comment.
Should we also run vmd_wa_can_proceed() to update the status of the global variable.
If we are using a command file, several open and close could be done.
There was a problem hiding this comment.
If we are using a command file, several open and close could be done.
If I am not mistaken this is already covered. As far as I can tell the call sequence will go something like: main() -> parseOpts() -> runFileCmds() -> runCmdStr() -> app.RunCommand() -> ...
So, all open and close commands from a command file should go through ddb_run_open() as normal.
There was a problem hiding this comment.
Should we also run vmd_wa_can_proceed() to update the status of the global variable.
Done.
+ limit scope for some local variables Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
|
@knard38 wrote:
Lack of unit tests always come back to bite you and from experience short term workarounds are longer lived than expected. 😅 Let's see how the review and validation will go. I do not want to slow down the release or something and I have a few more things to take care of. If we end up still working on this review few days later it may give me time to write unit tests. If it makes sense. |
|
Test stage Unit Test with memcheck completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17838/3/testReport/ |
|
Test stage Unit Test completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17838/3/testReport/ |
| char *nvme_conf; | ||
| int rc; | ||
|
|
||
| D_ASPRINTF(nvme_conf, "%s/%s", db_path, VOS_NVME_CONF); |
There was a problem hiding this comment.
Where will nvme_conf be released?
|
|
||
| rc = json_object_object_get_ex(root, KEY_SUBSYSTEMS, &subsystems); | ||
| if (rc != JSON_TRUE) { | ||
| ddb_errorf(ctx, "File %s does not have '%s' key\n", nvme_conf, KEY_SUBSYSTEMS); |
There was a problem hiding this comment.
missing call of json_object_put(root) ?
There was a problem hiding this comment.
You are right! Done. Thank you.
Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
…DDB-limit-VMD-usage Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
|
Test stage Unit Test completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17838/4/testReport/ |
|
Test stage Unit Test with memcheck completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17838/4/testReport/ |
|
it still has failed UT, please check |
Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
…DDB-limit-VMD-usage Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
|
FYI master is now 3.0 development so, if needed, this will need to be backported to release/2.8 |
SPDK, in all known applications—including daos_engine—is initialized only once during the lifetime of a process. DDB uses SPDK differently, allowing the user to initialize and re‑initialize SPDK multiple times within the same process. However, SPDK does not fully support re‑initialization. At the moment, this issue appears only when the SPDK configuration uses the VMD subsystem. When a VMD‑enabled SPDK configuration is in use, two conditions must be respected: - The VMD‑enabled configuration must be used during the first SPDK initialization. The VMD subsystem and DPDK will not be initialized on subsequent SPDK re‑initializations. - After a VMD‑enabled configuration has been used, the user cannot re‑initialize SPDK—whether with VMD enabled or disabled. The internal state of the VMD subsystem and DPDK becomes unsafe to use, even if the next SPDK configuration does not include VMD. These rules if not adhered to causes DDB to crash. Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
SPDK, in all known applications—including daos_engine—is initialized only once during the lifetime of a process. DDB uses SPDK differently, allowing the user to initialize and re‑initialize SPDK multiple times within the same process. However, SPDK does not fully support re‑initialization. At the moment, this issue appears only when the SPDK configuration uses the VMD subsystem. When a VMD‑enabled SPDK configuration is in use, two conditions must be respected: - The VMD‑enabled configuration must be used during the first SPDK initialization. The VMD subsystem and DPDK will not be initialized on subsequent SPDK re‑initializations. - After a VMD‑enabled configuration has been used, the user cannot re‑initialize SPDK—whether with VMD enabled or disabled. The internal state of the VMD subsystem and DPDK becomes unsafe to use, even if the next SPDK configuration does not include VMD. These rules if not adhered to causes DDB to crash. Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com> Co-authored-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>
SPDK, in all known applications—including daos_engine—is initialized only once during the lifetime of a process. DDB uses SPDK differently, allowing the user to initialize and re‑initialize SPDK multiple times within the same process.
However, SPDK does not fully support re‑initialization. At the moment, this issue appears only when the SPDK configuration uses the VMD subsystem. When a VMD‑enabled SPDK configuration is in use, two conditions must be respected:
These rules if not adhered to causes DDB to crash.
Steps for the author:
After all prior steps are complete: