Feature branch sync - pub/q2_dev to pub/telemetry#4367
Merged
Conversation
Signed-off-by: sakshi-singla-1735 <sakshi.s@dell.com>
Signed-off-by: sakshi-singla-1735 <sakshi.s@dell.com>
Signed-off-by: sakshi-singla-1735 <sakshi.s@dell.com>
Signed-off-by: sakshi-singla-1735 <sakshi.s@dell.com>
Merge pub/telemetry to pub/q2_dev
Signed-off-by: sakshi-singla-1735 <sakshi.s@dell.com>
Login compiler node and slurm node atmic lock cuda installation with drivers, dcgm and peermem
* pub telemetry changes * service to scrape metrics from OTEL collector * vmservice to scrape metrics from otel collector * update endpoints * revert other changes * revert merge changes as per head * revert variable set * revert changes * revert changes * pylint fixes * ansible lint fixes * updating completion messaage * telemetry validation while prepare oim * update condition * added check for LDMS
- Extract service readiness checks into separate block/rescue pattern - Add SMD API health check before discovery attempt - Implement automatic retry on discovery failure with service restart - Increase service check timeout from 2 to 2 minutes (12 retries × 10s delay) - Prevent connection refused errors by ensuring SMD endpoint is ready This addresses the race condition where systemd marks smd service as "started" but the HTTP endpoint at oimcp.oim.test:8443 isn't accepting connections yet.
…ment (#4366) 1. Skip OME credential prompt when enable_bmc_discovery is false - Changed discovery credentials from mandatory to conditional_mandatory in credential utility vars (gated on enable_bmc_discovery) - Added set_fact in prepare_oim.yml to promote enable_bmc_discovery from namespaced to top-level scope before credential utility runs 2. Fix PARENT_SERVICE_TAG assignment in PXE mapping - Source changed from service_kube_control_plane to service_kube_node - Only slurm_node_aarch64 and slurm_node_x86_64 receive PARENT_SERVICE_TAG - All other roles (control_plane, kube_node, login, slurm_control) remain empty
ochami smd restries added after cloud-int service restart
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Feature branch sync - pub/q2_dev to pub/telemetry