Update ansible-lint.yml and pylint for pub/telemetry#4296
Merged
Conversation
Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>
Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>
abhishek-sa1
approved these changes
Apr 21, 2026
priti-parate
approved these changes
Apr 21, 2026
abhishek-sa1
added a commit
that referenced
this pull request
Apr 21, 2026
* Update openchami git version (#4251) Co-authored-by: mithileshreddy04 <mithilesh.reddy@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> * powerscale teleemtry support with direct authentication mode * use existing vmagent * update messages in vars * merge Pub/q2 dev to pub/telemetry (#4254) * removing input template * Fix for pulp remote RemoteArtifacts is 0 after repo migration Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> * Powerscale teleemtry support using helm * deploy powerscale telemetry using cloud-init * offline deployment of powerscale telemetry * fix for cert-manager failure * fix for cert manager failure * powerscale telemetry deployment with telemetry namespace * sync q2_dev changes (#4263) * removing input template * Fix for pulp remote RemoteArtifacts is 0 after repo migration Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> * Feature/ome discovery pxe mapping enhancements (#4245) * feat(discovery): OME static group extraction, PXE mapping IP/SU/parent tag enhancements ome_server_inventory.py: - Fix static group extraction: find 'Static Groups' container by name and select only direct children via ParentId; avoids picking system/nested groups - Emit module.warn() for static groups that exist but have no devices assigned - Fix idrac_hostname: read InstrumentationName/DnsName from DeviceManagement ManagementType==2 entry instead of DeviceName which returns the IP address generate_pxe_mapping.py: - ADMIN_IP: derive from first 2 octets of admin_network.subnet + last 2 of BMC IP - IB_IP: derive from first 2 octets of ib_network.subnet + last 2 of BMC IP - Skip IB_IP/IB_MAC when server has no IB NIC (ib_nic_mac is empty) - Add extract_su_from_hostname() with regex (SU[A-Z]?\d+)(?=R\d+) to parse Scalable Unit from BMC hostname; rejects service-tag-only hostnames (idrac-JCGT033) and falls back to grp0 when no SU pattern is found - Set GROUP_NAME to extracted SU identifier (fallback: grp0) - Post-process rows to assign PARENT_SERVICE_TAG from the service_kube_control_plane_x86_64 node within the same SU group - Remove BMC_HOSTNAME from CSV headers and output rows - Lint: remove dead try/except in calculate_admin_ip/calculate_ib_ip, reuse ib_mac variable, suppress broad-except pylint warning generate_pxe_mapping.yml: - Load network_spec.yml via include_vars - Set admin_subnet and ib_subnet using selectattr on Networks list - Pass both subnets as parameters to the generate_pxe_mapping module defaults/main.yml: - Add admin_subnet and ib_subnet default variables (empty string) provision_validation.py: - Comment out validate_admin_ips_against_network_spec function and its call site; ADMIN_IPs are now derived from subnet octets + BMC IP and will not necessarily fall within primary_oim_admin_ip/netmask_bits range * refactor: rename discovery directory to provision, update network_spec.yml - Renamed discovery/ to provision/ (git detected as rename, no content loss) - Updated input/network_spec.yml with latest network configuration changes * Update discovery.yml * refactor: unify OME credentials into get_config_credentials flow - Added ome_ip, ome_username, ome_password to omnia_credential.j2 template - Added 'discovery' service entry to omnia_credentials in update_config/vars/main.yml - Added 'discovery' to the hardcoded service key trigger list in fetch_credentials.yml - Replaced custom vault logic in get_ome_credentials.yml with unified decrypt_include_encrypt.yml call against omnia_config_credentials.yml - Updated ome_discovery/vars/main.yml to reference omnia_config_credentials_file and omnia_config_credentials_vault_key instead of the separate .vault/ paths - Deleted .vault/ome_credentials.yml and .vault/.vault_password (no longer needed) * chore: update copyright year from 2025 to 2026 in modified files Updated copyright header in all ome_discovery files modified during this feature branch: - library/generate_pxe_mapping.py - library/ome_server_inventory.py - tasks/generate_pxe_mapping.yml - tasks/get_ome_credentials.yml - defaults/main.yml - vars/main.yml * fix: restore discovery_validations role missed during discovery-to-provision rename discovery/roles/discovery_validations/ was accidentally dropped when renaming the discovery/ directory to provision/. Add it back under provision/roles/discovery_validations/ to resolve the PR merge conflict. * chore: update copyright year to 2026 in provision/roles/discovery_validations files * fix: remove duplicate discovery_validations role (provision_validations already exists) provision/roles/provision_validations/ is the correct renamed equivalent of discovery/roles/discovery_validations/. The discovery_validations copy added to provision/ was redundant. * feat: apply upstream telemetry upgrade changes from dell/omnia pub/q2_dev - Replace kubectl command with kubernetes.core.k8s module for iDRAC StatefulSet - Preserve existing replica count during iDRAC StatefulSet upgrade - Add LDMS store daemon check, restart, and readiness wait tasks * fix: quote build_stream_job_id_absent message in provision_validations vars * feat: add discovery/roles/discovery_validations and telemetry files - Add discovery/roles/discovery_validations/vars/main.yml with task definitions for validation flow - Add discovery/roles/telemetry/tasks/apply_telemetry_on_upgrade.yml with upstream telemetry upgrade logic (replica preservation + LDMS store) * fix: wrap long line in fetch_credentials.yml to satisfy yaml[line-length] lint * refactor: move ome_ip from credentials to discovery_config.yml - Create input/discovery_config.yml for non-credential discovery settings (ome_ip, future Magellan config) - Remove ome_ip from omnia_credential.j2 and credential update vars - Load ome_ip via include_vars from discovery_config.yml in get_ome_credentials.yml - Add discovery_config.yml to provision_validations discovery_inputs - Remove redundant ib_subnet/admin_subnet defaults from ome_discovery * fix: add newline at end of ome_discovery/defaults/main.yml * fix: override role_path to absolute path for decrypt_include_encrypt.yml role_path resolves to ome_discovery role path, causing encrypt_files_vars.yml to be looked up incorrectly. Override to playbook_dir dirname (/opt/omnia/omnia). * fix: inline credential loading to avoid role_path resolution issue role_path cannot be overridden in include_tasks vars. Replace the call to decrypt_include_encrypt.yml with direct include_vars using stat checks for encrypted vs unencrypted credential file handling. * fix: skip load-failure rule in ansible-lint to avoid CI false positives ansible-lint fails to resolve role_path relative paths during static analysis in GitHub Actions, causing false load-failure errors for files that exist and work at runtime. * Update ansible.cfg * Update ansible.cfg * refactor: rename discovery references to provision and add discovery_config variable - Rename discover_mapping_nodes.yml to provision_mapping_nodes.yml - Replace "discovery" terminology with "provision" across playbooks, vars, READMEs, and task names in provision roles - Add subnet as required field with IP pattern validation in network_spec schema - Define discovery_config variable in ome_discovery vars and use it in get_ome_credentials.yml (consistent with provision_config pattern) - Rename discovery_inputs to provision_inputs in validation vars - Rename discovery_mech_mapping to provision_mech_mapping - Update user-facing messages to reference provision.yml Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: credential rules, vault handling, GROUP_NAME validation, and discovery playbook improvements - Add ome_username and ome_password validation rules to credential_rules.json - Add 'discovery' tag to prepare_oim omnia_run_tags so OME credentials are prompted - Fix vault-encrypted credential loading in get_ome_credentials.yml (use decrypt-include-reencrypt pattern instead of unsupported vault_password_file) - Add include_input_dir.yml import to discovery.yml so input_project_dir is set - Accept SU1-SU100 (case-insensitive) in addition to grp0-grp100 for GROUP_NAME - Fix Magellan message to use list format (avoids \n in debug output) - Remove escaped quotes from discovery usage examples Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: extend SU group name support to build_image validation and schemas - Add build_aarch_image tag to input_file_inventory so build_image_aarch64.yml runs provision_config validation (was missing, causing no validation to run for aarch64 builds) - Update GROUP_NAME patterns in functional_groups_config.json and omnia_config.json schemas to accept SU1-SU100 format alongside grp0-grp100 - Update INVALID_GROUP_NAME_MSG to reflect both accepted formats Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Cleanup discovery roles: move library modules, remove unused roles (#4261) * Cleanup discovery roles: move library modules, remove unused roles - Move ome_server_inventory.py and generate_pxe_mapping.py from discovery/roles/ome_discovery/library/ to common/library/modules/ so they are shared via the common module search path already configured in discovery/ansible.cfg - Remove unused discovery/roles/telemetry/ directory - Remove unused discovery/roles/discovery_validations/ directory - Load discovery_config.yml at playbook level in discovery.yml (consistent with how build_stream_config.yml is loaded in provision.yml) - Fix discovery_complete_msg formatting for readable Ansible output Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Remove unused discovery_validations role Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix for set_pxe_boot.yml when custom inventory given (#4260) * Update generate_bmc_inventory.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * Update pre_checks.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * lint issue Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> --------- Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * resolving merge conflict * revert openchami commit id * resolving review comments * addressing review comments * fix for vmagent scraping powerscale metrics * cleanup script correction for powerscale telemetry cleanup * victoria operator and victoria log input validation * vitoria log input and input validation * remving L2 vslidation for victoria log which is not required * input validation and review comment addressing * change idrac_telemetry_collection_type to telemetry_collection_type * Remove invisible Unicode LRM (U+200E) characters from victoria-operator template filenames * VictoriaLogs container image references and default variable * port check * resolve merge conflict * correction for schema * Update telemetry_config.json * Update validate_input.py * merge conflict telemetry_prereq.yml * change victoria_configurations to victoria_metrics_configurations * remove deployment mode input variable * update for upgrade scenarios * update comments * update comment * resolving issues due to merge conflict * vitoria log changes * victoria log cluster component and VLAgent deployment * updating pod name * removing the changes of adding cert * victoria log changes * remivng victoria log pod calidation playbook * cleanup changes for victoria log * Update ansible-lint.yml and pylint for pub/telemetry (#4296) * Update ansible-lint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * Update pylint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * fixing ansible-lint * lint * line-lenght --------- Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Co-authored-by: mithileshreddy04 <mithilesh.reddy@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: Kratika_Patidar <Kratika.Patidar@dell.com>
abhishek-sa1
added a commit
that referenced
this pull request
Apr 27, 2026
* Merge pull request #4294 from mithileshreddy04/pub/q2_dev OpenCHAMI upgrade changes in prepare_oim and oim_cleanup * Feature branch sync - pub/telemetry to pub/q2_dev (#4293) * Update openchami git version (#4251) Co-authored-by: mithileshreddy04 <mithilesh.reddy@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> * powerscale teleemtry support with direct authentication mode * use existing vmagent * update messages in vars * merge Pub/q2 dev to pub/telemetry (#4254) * removing input template * Fix for pulp remote RemoteArtifacts is 0 after repo migration Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> * Powerscale teleemtry support using helm * deploy powerscale telemetry using cloud-init * offline deployment of powerscale telemetry * fix for cert-manager failure * fix for cert manager failure * powerscale telemetry deployment with telemetry namespace * sync q2_dev changes (#4263) * removing input template * Fix for pulp remote RemoteArtifacts is 0 after repo migration Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> * Feature/ome discovery pxe mapping enhancements (#4245) * feat(discovery): OME static group extraction, PXE mapping IP/SU/parent tag enhancements ome_server_inventory.py: - Fix static group extraction: find 'Static Groups' container by name and select only direct children via ParentId; avoids picking system/nested groups - Emit module.warn() for static groups that exist but have no devices assigned - Fix idrac_hostname: read InstrumentationName/DnsName from DeviceManagement ManagementType==2 entry instead of DeviceName which returns the IP address generate_pxe_mapping.py: - ADMIN_IP: derive from first 2 octets of admin_network.subnet + last 2 of BMC IP - IB_IP: derive from first 2 octets of ib_network.subnet + last 2 of BMC IP - Skip IB_IP/IB_MAC when server has no IB NIC (ib_nic_mac is empty) - Add extract_su_from_hostname() with regex (SU[A-Z]?\d+)(?=R\d+) to parse Scalable Unit from BMC hostname; rejects service-tag-only hostnames (idrac-JCGT033) and falls back to grp0 when no SU pattern is found - Set GROUP_NAME to extracted SU identifier (fallback: grp0) - Post-process rows to assign PARENT_SERVICE_TAG from the service_kube_control_plane_x86_64 node within the same SU group - Remove BMC_HOSTNAME from CSV headers and output rows - Lint: remove dead try/except in calculate_admin_ip/calculate_ib_ip, reuse ib_mac variable, suppress broad-except pylint warning generate_pxe_mapping.yml: - Load network_spec.yml via include_vars - Set admin_subnet and ib_subnet using selectattr on Networks list - Pass both subnets as parameters to the generate_pxe_mapping module defaults/main.yml: - Add admin_subnet and ib_subnet default variables (empty string) provision_validation.py: - Comment out validate_admin_ips_against_network_spec function and its call site; ADMIN_IPs are now derived from subnet octets + BMC IP and will not necessarily fall within primary_oim_admin_ip/netmask_bits range * refactor: rename discovery directory to provision, update network_spec.yml - Renamed discovery/ to provision/ (git detected as rename, no content loss) - Updated input/network_spec.yml with latest network configuration changes * Update discovery.yml * refactor: unify OME credentials into get_config_credentials flow - Added ome_ip, ome_username, ome_password to omnia_credential.j2 template - Added 'discovery' service entry to omnia_credentials in update_config/vars/main.yml - Added 'discovery' to the hardcoded service key trigger list in fetch_credentials.yml - Replaced custom vault logic in get_ome_credentials.yml with unified decrypt_include_encrypt.yml call against omnia_config_credentials.yml - Updated ome_discovery/vars/main.yml to reference omnia_config_credentials_file and omnia_config_credentials_vault_key instead of the separate .vault/ paths - Deleted .vault/ome_credentials.yml and .vault/.vault_password (no longer needed) * chore: update copyright year from 2025 to 2026 in modified files Updated copyright header in all ome_discovery files modified during this feature branch: - library/generate_pxe_mapping.py - library/ome_server_inventory.py - tasks/generate_pxe_mapping.yml - tasks/get_ome_credentials.yml - defaults/main.yml - vars/main.yml * fix: restore discovery_validations role missed during discovery-to-provision rename discovery/roles/discovery_validations/ was accidentally dropped when renaming the discovery/ directory to provision/. Add it back under provision/roles/discovery_validations/ to resolve the PR merge conflict. * chore: update copyright year to 2026 in provision/roles/discovery_validations files * fix: remove duplicate discovery_validations role (provision_validations already exists) provision/roles/provision_validations/ is the correct renamed equivalent of discovery/roles/discovery_validations/. The discovery_validations copy added to provision/ was redundant. * feat: apply upstream telemetry upgrade changes from dell/omnia pub/q2_dev - Replace kubectl command with kubernetes.core.k8s module for iDRAC StatefulSet - Preserve existing replica count during iDRAC StatefulSet upgrade - Add LDMS store daemon check, restart, and readiness wait tasks * fix: quote build_stream_job_id_absent message in provision_validations vars * feat: add discovery/roles/discovery_validations and telemetry files - Add discovery/roles/discovery_validations/vars/main.yml with task definitions for validation flow - Add discovery/roles/telemetry/tasks/apply_telemetry_on_upgrade.yml with upstream telemetry upgrade logic (replica preservation + LDMS store) * fix: wrap long line in fetch_credentials.yml to satisfy yaml[line-length] lint * refactor: move ome_ip from credentials to discovery_config.yml - Create input/discovery_config.yml for non-credential discovery settings (ome_ip, future Magellan config) - Remove ome_ip from omnia_credential.j2 and credential update vars - Load ome_ip via include_vars from discovery_config.yml in get_ome_credentials.yml - Add discovery_config.yml to provision_validations discovery_inputs - Remove redundant ib_subnet/admin_subnet defaults from ome_discovery * fix: add newline at end of ome_discovery/defaults/main.yml * fix: override role_path to absolute path for decrypt_include_encrypt.yml role_path resolves to ome_discovery role path, causing encrypt_files_vars.yml to be looked up incorrectly. Override to playbook_dir dirname (/opt/omnia/omnia). * fix: inline credential loading to avoid role_path resolution issue role_path cannot be overridden in include_tasks vars. Replace the call to decrypt_include_encrypt.yml with direct include_vars using stat checks for encrypted vs unencrypted credential file handling. * fix: skip load-failure rule in ansible-lint to avoid CI false positives ansible-lint fails to resolve role_path relative paths during static analysis in GitHub Actions, causing false load-failure errors for files that exist and work at runtime. * Update ansible.cfg * Update ansible.cfg * refactor: rename discovery references to provision and add discovery_config variable - Rename discover_mapping_nodes.yml to provision_mapping_nodes.yml - Replace "discovery" terminology with "provision" across playbooks, vars, READMEs, and task names in provision roles - Add subnet as required field with IP pattern validation in network_spec schema - Define discovery_config variable in ome_discovery vars and use it in get_ome_credentials.yml (consistent with provision_config pattern) - Rename discovery_inputs to provision_inputs in validation vars - Rename discovery_mech_mapping to provision_mech_mapping - Update user-facing messages to reference provision.yml Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: credential rules, vault handling, GROUP_NAME validation, and discovery playbook improvements - Add ome_username and ome_password validation rules to credential_rules.json - Add 'discovery' tag to prepare_oim omnia_run_tags so OME credentials are prompted - Fix vault-encrypted credential loading in get_ome_credentials.yml (use decrypt-include-reencrypt pattern instead of unsupported vault_password_file) - Add include_input_dir.yml import to discovery.yml so input_project_dir is set - Accept SU1-SU100 (case-insensitive) in addition to grp0-grp100 for GROUP_NAME - Fix Magellan message to use list format (avoids \n in debug output) - Remove escaped quotes from discovery usage examples Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: extend SU group name support to build_image validation and schemas - Add build_aarch_image tag to input_file_inventory so build_image_aarch64.yml runs provision_config validation (was missing, causing no validation to run for aarch64 builds) - Update GROUP_NAME patterns in functional_groups_config.json and omnia_config.json schemas to accept SU1-SU100 format alongside grp0-grp100 - Update INVALID_GROUP_NAME_MSG to reflect both accepted formats Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Cleanup discovery roles: move library modules, remove unused roles (#4261) * Cleanup discovery roles: move library modules, remove unused roles - Move ome_server_inventory.py and generate_pxe_mapping.py from discovery/roles/ome_discovery/library/ to common/library/modules/ so they are shared via the common module search path already configured in discovery/ansible.cfg - Remove unused discovery/roles/telemetry/ directory - Remove unused discovery/roles/discovery_validations/ directory - Load discovery_config.yml at playbook level in discovery.yml (consistent with how build_stream_config.yml is loaded in provision.yml) - Fix discovery_complete_msg formatting for readable Ansible output Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Remove unused discovery_validations role Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix for set_pxe_boot.yml when custom inventory given (#4260) * Update generate_bmc_inventory.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * Update pre_checks.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * lint issue Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> --------- Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * resolving merge conflict * revert openchami commit id * resolving review comments * addressing review comments * fix for vmagent scraping powerscale metrics * cleanup script correction for powerscale telemetry cleanup * victoria operator and victoria log input validation * vitoria log input and input validation * remving L2 vslidation for victoria log which is not required * input validation and review comment addressing * change idrac_telemetry_collection_type to telemetry_collection_type * Remove invisible Unicode LRM (U+200E) characters from victoria-operator template filenames * VictoriaLogs container image references and default variable * port check * resolve merge conflict * correction for schema * Update telemetry_config.json * Update validate_input.py * merge conflict telemetry_prereq.yml * change victoria_configurations to victoria_metrics_configurations * remove deployment mode input variable * update for upgrade scenarios * update comments * update comment * resolving issues due to merge conflict * vitoria log changes * victoria log cluster component and VLAgent deployment * updating pod name * removing the changes of adding cert * victoria log changes * remivng victoria log pod calidation playbook * cleanup changes for victoria log * Update ansible-lint.yml and pylint for pub/telemetry (#4296) * Update ansible-lint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * Update pylint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * fixing ansible-lint * lint * line-lenght --------- Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Co-authored-by: mithileshreddy04 <mithilesh.reddy@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: Kratika_Patidar <Kratika.Patidar@dell.com> * IB nic ip assignment * update MinIO and registry images to fixed tagged versions, omnia core container tag and version to 2.2 and v2.2.0.0 (#4309) * Minimal OS-only functional group enablement for x86_64 and aarch64 * Update image_package_collector.py * Update provision_validation.py * Minimal OS functional group updates in provision * Minimal OS functional group upgrade * Fix os_* package cross-contamination and remove stale discovery templates * OpenCHAMI upgrade changes * Update openchami container tags * Update main.yml * Update main.yml * Update main.yml * Update omnia version and core tag --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Co-authored-by: Mithilesh Reddy <mithilesh.reddy@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: Kratika_Patidar <Kratika.Patidar@dell.com> Co-authored-by: Nagachandan-P <Nagachandan.p@dell.com>
Rajeshkumar-s2
added a commit
that referenced
this pull request
Apr 30, 2026
* Minimal OS-only functional group enablement (#4267) * Minimal OS-only functional group enablement for x86_64 and aarch64 * Update image_package_collector.py * Update provision_validation.py * Minimal OS functional group updates in provision * Minimal OS functional group upgrade * Fix os_* package cross-contamination and remove stale discovery templates * csi defect fix * nvidia dcgm install * Add 'provision' tag to omnia_run_tags Signed-off-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> * Add 'provision' tag to omnia_run_tags (#4276) Signed-off-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> * Fix incorrect file path in Podman login failure message (OMN01D-2166) The podman_login_fail_msg referenced a hardcoded path input/omnia_config_credentials.yml which does not exist. Updated it to use the dynamic input_project_dir variable so the error message now correctly points to the actual credentials file at <input_project_dir>/omnia_config_credentials.yml. Fixed in both prepare_oim and gitlab roles. * Merge pull request #4294 from mithileshreddy04/pub/q2_dev OpenCHAMI upgrade changes in prepare_oim and oim_cleanup * Feature branch sync - pub/telemetry to pub/q2_dev (#4293) * Update openchami git version (#4251) Co-authored-by: mithileshreddy04 <mithilesh.reddy@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> * powerscale teleemtry support with direct authentication mode * use existing vmagent * update messages in vars * merge Pub/q2 dev to pub/telemetry (#4254) * removing input template * Fix for pulp remote RemoteArtifacts is 0 after repo migration Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> * Powerscale teleemtry support using helm * deploy powerscale telemetry using cloud-init * offline deployment of powerscale telemetry * fix for cert-manager failure * fix for cert manager failure * powerscale telemetry deployment with telemetry namespace * sync q2_dev changes (#4263) * removing input template * Fix for pulp remote RemoteArtifacts is 0 after repo migration Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> * Feature/ome discovery pxe mapping enhancements (#4245) * feat(discovery): OME static group extraction, PXE mapping IP/SU/parent tag enhancements ome_server_inventory.py: - Fix static group extraction: find 'Static Groups' container by name and select only direct children via ParentId; avoids picking system/nested groups - Emit module.warn() for static groups that exist but have no devices assigned - Fix idrac_hostname: read InstrumentationName/DnsName from DeviceManagement ManagementType==2 entry instead of DeviceName which returns the IP address generate_pxe_mapping.py: - ADMIN_IP: derive from first 2 octets of admin_network.subnet + last 2 of BMC IP - IB_IP: derive from first 2 octets of ib_network.subnet + last 2 of BMC IP - Skip IB_IP/IB_MAC when server has no IB NIC (ib_nic_mac is empty) - Add extract_su_from_hostname() with regex (SU[A-Z]?\d+)(?=R\d+) to parse Scalable Unit from BMC hostname; rejects service-tag-only hostnames (idrac-JCGT033) and falls back to grp0 when no SU pattern is found - Set GROUP_NAME to extracted SU identifier (fallback: grp0) - Post-process rows to assign PARENT_SERVICE_TAG from the service_kube_control_plane_x86_64 node within the same SU group - Remove BMC_HOSTNAME from CSV headers and output rows - Lint: remove dead try/except in calculate_admin_ip/calculate_ib_ip, reuse ib_mac variable, suppress broad-except pylint warning generate_pxe_mapping.yml: - Load network_spec.yml via include_vars - Set admin_subnet and ib_subnet using selectattr on Networks list - Pass both subnets as parameters to the generate_pxe_mapping module defaults/main.yml: - Add admin_subnet and ib_subnet default variables (empty string) provision_validation.py: - Comment out validate_admin_ips_against_network_spec function and its call site; ADMIN_IPs are now derived from subnet octets + BMC IP and will not necessarily fall within primary_oim_admin_ip/netmask_bits range * refactor: rename discovery directory to provision, update network_spec.yml - Renamed discovery/ to provision/ (git detected as rename, no content loss) - Updated input/network_spec.yml with latest network configuration changes * Update discovery.yml * refactor: unify OME credentials into get_config_credentials flow - Added ome_ip, ome_username, ome_password to omnia_credential.j2 template - Added 'discovery' service entry to omnia_credentials in update_config/vars/main.yml - Added 'discovery' to the hardcoded service key trigger list in fetch_credentials.yml - Replaced custom vault logic in get_ome_credentials.yml with unified decrypt_include_encrypt.yml call against omnia_config_credentials.yml - Updated ome_discovery/vars/main.yml to reference omnia_config_credentials_file and omnia_config_credentials_vault_key instead of the separate .vault/ paths - Deleted .vault/ome_credentials.yml and .vault/.vault_password (no longer needed) * chore: update copyright year from 2025 to 2026 in modified files Updated copyright header in all ome_discovery files modified during this feature branch: - library/generate_pxe_mapping.py - library/ome_server_inventory.py - tasks/generate_pxe_mapping.yml - tasks/get_ome_credentials.yml - defaults/main.yml - vars/main.yml * fix: restore discovery_validations role missed during discovery-to-provision rename discovery/roles/discovery_validations/ was accidentally dropped when renaming the discovery/ directory to provision/. Add it back under provision/roles/discovery_validations/ to resolve the PR merge conflict. * chore: update copyright year to 2026 in provision/roles/discovery_validations files * fix: remove duplicate discovery_validations role (provision_validations already exists) provision/roles/provision_validations/ is the correct renamed equivalent of discovery/roles/discovery_validations/. The discovery_validations copy added to provision/ was redundant. * feat: apply upstream telemetry upgrade changes from dell/omnia pub/q2_dev - Replace kubectl command with kubernetes.core.k8s module for iDRAC StatefulSet - Preserve existing replica count during iDRAC StatefulSet upgrade - Add LDMS store daemon check, restart, and readiness wait tasks * fix: quote build_stream_job_id_absent message in provision_validations vars * feat: add discovery/roles/discovery_validations and telemetry files - Add discovery/roles/discovery_validations/vars/main.yml with task definitions for validation flow - Add discovery/roles/telemetry/tasks/apply_telemetry_on_upgrade.yml with upstream telemetry upgrade logic (replica preservation + LDMS store) * fix: wrap long line in fetch_credentials.yml to satisfy yaml[line-length] lint * refactor: move ome_ip from credentials to discovery_config.yml - Create input/discovery_config.yml for non-credential discovery settings (ome_ip, future Magellan config) - Remove ome_ip from omnia_credential.j2 and credential update vars - Load ome_ip via include_vars from discovery_config.yml in get_ome_credentials.yml - Add discovery_config.yml to provision_validations discovery_inputs - Remove redundant ib_subnet/admin_subnet defaults from ome_discovery * fix: add newline at end of ome_discovery/defaults/main.yml * fix: override role_path to absolute path for decrypt_include_encrypt.yml role_path resolves to ome_discovery role path, causing encrypt_files_vars.yml to be looked up incorrectly. Override to playbook_dir dirname (/opt/omnia/omnia). * fix: inline credential loading to avoid role_path resolution issue role_path cannot be overridden in include_tasks vars. Replace the call to decrypt_include_encrypt.yml with direct include_vars using stat checks for encrypted vs unencrypted credential file handling. * fix: skip load-failure rule in ansible-lint to avoid CI false positives ansible-lint fails to resolve role_path relative paths during static analysis in GitHub Actions, causing false load-failure errors for files that exist and work at runtime. * Update ansible.cfg * Update ansible.cfg * refactor: rename discovery references to provision and add discovery_config variable - Rename discover_mapping_nodes.yml to provision_mapping_nodes.yml - Replace "discovery" terminology with "provision" across playbooks, vars, READMEs, and task names in provision roles - Add subnet as required field with IP pattern validation in network_spec schema - Define discovery_config variable in ome_discovery vars and use it in get_ome_credentials.yml (consistent with provision_config pattern) - Rename discovery_inputs to provision_inputs in validation vars - Rename discovery_mech_mapping to provision_mech_mapping - Update user-facing messages to reference provision.yml Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: credential rules, vault handling, GROUP_NAME validation, and discovery playbook improvements - Add ome_username and ome_password validation rules to credential_rules.json - Add 'discovery' tag to prepare_oim omnia_run_tags so OME credentials are prompted - Fix vault-encrypted credential loading in get_ome_credentials.yml (use decrypt-include-reencrypt pattern instead of unsupported vault_password_file) - Add include_input_dir.yml import to discovery.yml so input_project_dir is set - Accept SU1-SU100 (case-insensitive) in addition to grp0-grp100 for GROUP_NAME - Fix Magellan message to use list format (avoids \n in debug output) - Remove escaped quotes from discovery usage examples Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: extend SU group name support to build_image validation and schemas - Add build_aarch_image tag to input_file_inventory so build_image_aarch64.yml runs provision_config validation (was missing, causing no validation to run for aarch64 builds) - Update GROUP_NAME patterns in functional_groups_config.json and omnia_config.json schemas to accept SU1-SU100 format alongside grp0-grp100 - Update INVALID_GROUP_NAME_MSG to reflect both accepted formats Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Cleanup discovery roles: move library modules, remove unused roles (#4261) * Cleanup discovery roles: move library modules, remove unused roles - Move ome_server_inventory.py and generate_pxe_mapping.py from discovery/roles/ome_discovery/library/ to common/library/modules/ so they are shared via the common module search path already configured in discovery/ansible.cfg - Remove unused discovery/roles/telemetry/ directory - Remove unused discovery/roles/discovery_validations/ directory - Load discovery_config.yml at playbook level in discovery.yml (consistent with how build_stream_config.yml is loaded in provision.yml) - Fix discovery_complete_msg formatting for readable Ansible output Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Remove unused discovery_validations role Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix for set_pxe_boot.yml when custom inventory given (#4260) * Update generate_bmc_inventory.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * Update pre_checks.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * lint issue Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> --------- Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * resolving merge conflict * revert openchami commit id * resolving review comments * addressing review comments * fix for vmagent scraping powerscale metrics * cleanup script correction for powerscale telemetry cleanup * victoria operator and victoria log input validation * vitoria log input and input validation * remving L2 vslidation for victoria log which is not required * input validation and review comment addressing * change idrac_telemetry_collection_type to telemetry_collection_type * Remove invisible Unicode LRM (U+200E) characters from victoria-operator template filenames * VictoriaLogs container image references and default variable * port check * resolve merge conflict * correction for schema * Update telemetry_config.json * Update validate_input.py * merge conflict telemetry_prereq.yml * change victoria_configurations to victoria_metrics_configurations * remove deployment mode input variable * update for upgrade scenarios * update comments * update comment * resolving issues due to merge conflict * vitoria log changes * victoria log cluster component and VLAgent deployment * updating pod name * removing the changes of adding cert * victoria log changes * remivng victoria log pod calidation playbook * cleanup changes for victoria log * Update ansible-lint.yml and pylint for pub/telemetry (#4296) * Update ansible-lint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * Update pylint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * fixing ansible-lint * lint * line-lenght --------- Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Co-authored-by: mithileshreddy04 <mithilesh.reddy@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: Kratika_Patidar <Kratika.Patidar@dell.com> * IB nic ip assignment * update MinIO and registry images to fixed tagged versions, omnia core container tag and version to 2.2 and v2.2.0.0 (#4309) * Minimal OS-only functional group enablement for x86_64 and aarch64 * Update image_package_collector.py * Update provision_validation.py * Minimal OS functional group updates in provision * Minimal OS functional group upgrade * Fix os_* package cross-contamination and remove stale discovery templates * OpenCHAMI upgrade changes * Update openchami container tags * Update main.yml * Update main.yml * Update main.yml * Update omnia version and core tag * vast client installation * single template way * fix(OMN01D-2164): prompt OME credentials only when enable_bmc_discovery is true - Add enable_bmc_discovery flag (default: false) to discovery_config.yml - Load discovery_config.yml in prepare_oim.yml and set ome_discovery_enabled based on enable_bmc_discovery flag - Change discovery credentials from mandatory to conditional_mandatory gated on ome_discovery_enabled - When enable_bmc_discovery is false, OME username/password prompts are skipped during prepare_oim even if ome_ip is pre-filled * fix(OMN01D-2168): fail explicitly when discovery_mechanism is not provided Replace meta: end_play with ansible.builtin.fail so the playbook exits with non-zero status and a clear error message when discovery_mechanism is missing, instead of silently succeeding. * fix(OMN01D-2169): add L1 input validation for discovery.yml - Create discovery_config.json schema with: - enable_bmc_discovery (boolean, required) - ome_ip (string, required; must be valid IPv4 when enable_bmc_discovery is true) - Register discovery_config in config.py files dict and input_file_inventory - Add 'discovery' tag and invoke validate_config.yml in discovery.yml before role execution (consistent with provision.yml pattern) - Add explicit ome_ip check before OME role inclusion for clear fail-fast error when ome_ip is empty with discovery_mechanism=ome * fix(OMN01D-2225): improve OME authentication and reachability error messages - ome_server_inventory.py: auth failure now tells user to verify ome_username/ome_password in omnia_config_credentials.yml and rerun - collect_inventory.yml: wrap wait_for in block/rescue so timeout gives actionable message pointing to discovery_config.yml and network check * fix(OMN01D-2226): correct discovery completion message next steps - Replace misleading 'Rename or copy' instruction with guidance to update pxe_mapping_file_path in provision_config.yml - Show full absolute path of generated file throughout - Add spacing between steps for readability * fix(OMN01D-2227): escape backslash in docstring to suppress SyntaxWarning Python 3.12+ warns about invalid escape sequence '\d' in non-raw string literals. The docstring in extract_su_from_hostname() contained (?=R\d+) which triggered SyntaxWarning during discovery execution. Escaped the backslash to (?=R\\d+) in the docstring. * fix(OMN01D-2230): correct GROUP_NAME and PARENT_SERVICE_TAG in PXE mapping Issue 1 - GROUP_NAME: - Add fallback: try extracting SU pattern from OME group name when BMC hostname has no SU pattern (covers hierarchical OME groups like SU1_slurm_node, SU2_compute, etc.) - grp0 remains the correct default for single-cluster environments Issue 2 - PARENT_SERVICE_TAG: - Define CHILD_ROLES_OF_CONTROL_PLANE set (service_kube_node_x86_64) - Only assign PARENT_SERVICE_TAG to rows whose FUNCTIONAL_GROUP_NAME is a child role of the control plane within the same GROUP_NAME - Control plane nodes, slurm nodes, login nodes, etc. no longer get an incorrect PARENT_SERVICE_TAG * fix(OMN01D-2231): detect and fail on duplicate OME static group assignments - build_device_group_map() now tracks all group memberships per device and returns a conflicts dict for devices in multiple static groups - main() fails with an actionable error listing each conflicting device and its groups, instead of silently using the first-seen group - Prevents incorrect FUNCTIONAL_GROUP_NAME override in PXE mapping * fix(OMN01D-2232): validate OME group names against supported functional groups - Define SUPPORTED_FUNCTIONAL_GROUPS set matching Omnia's known roles - Skip servers whose OME static group is not in the supported set - Emit a warning per skipped device listing the unsupported group name and the full set of supported groups - Unsupported groups (e.g. 'abc') no longer appear in the PXE mapping * revert: temporarily revert discovery fixes for PR workflow * s3cmd configurations * update image tags * fix: Resolve FK constraint violation and catalog metadata persistence issues - Fix FK constraint by using shared session for image_group_repo and image_repo in ResultPoller - Concatenate multiple S3 paths with semicolon delimiter to respect unique constraint uq_images_image_group_id_role - Update cleanup_job.py to split semicolon-delimited paths and delete each individually - Widen image_name column from VARCHAR(256) to VARCHAR(512) to accommodate semicolon-delimited paths - Add logging to FileArtifactStore.store() to debug file write failures - Fix URI construction in parse_catalog.py for existing artifacts (construct file:// URI directly) - Update build_stream container image tag to 1.1 Tested with job 7da54d1f-ed26-41dd-b3ff-0386104e644d (image-build19) - ImageGroup and Images created successfully * Fix lint error --------- Signed-off-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Co-authored-by: Mithilesh Reddy <mithilesh.reddy@dell.com> Co-authored-by: mcas <sakshi.s@dell.com> Co-authored-by: Super User <root@oim.omnia.test> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> Co-authored-by: balajikumaran-c-s <balajikumaran.cs@dellteam.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: Abhishek S A <abhishek.sa3@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: Kratika_Patidar <Kratika.Patidar@dell.com> Co-authored-by: Nagachandan-P <Nagachandan.p@dell.com> Co-authored-by: Sujit Jadhav <sujit.jadhav@dell.com>
Rajeshkumar-s2
added a commit
that referenced
this pull request
May 1, 2026
* Minimal OS-only functional group enablement (#4267) * Minimal OS-only functional group enablement for x86_64 and aarch64 * Update image_package_collector.py * Update provision_validation.py * Minimal OS functional group updates in provision * Minimal OS functional group upgrade * Fix os_* package cross-contamination and remove stale discovery templates * csi defect fix * nvidia dcgm install * Add 'provision' tag to omnia_run_tags Signed-off-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> * Add 'provision' tag to omnia_run_tags (#4276) Signed-off-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> * Fix incorrect file path in Podman login failure message (OMN01D-2166) The podman_login_fail_msg referenced a hardcoded path input/omnia_config_credentials.yml which does not exist. Updated it to use the dynamic input_project_dir variable so the error message now correctly points to the actual credentials file at <input_project_dir>/omnia_config_credentials.yml. Fixed in both prepare_oim and gitlab roles. * Merge pull request #4294 from mithileshreddy04/pub/q2_dev OpenCHAMI upgrade changes in prepare_oim and oim_cleanup * Feature branch sync - pub/telemetry to pub/q2_dev (#4293) * Update openchami git version (#4251) Co-authored-by: mithileshreddy04 <mithilesh.reddy@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> * powerscale teleemtry support with direct authentication mode * use existing vmagent * update messages in vars * merge Pub/q2 dev to pub/telemetry (#4254) * removing input template * Fix for pulp remote RemoteArtifacts is 0 after repo migration Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> * Powerscale teleemtry support using helm * deploy powerscale telemetry using cloud-init * offline deployment of powerscale telemetry * fix for cert-manager failure * fix for cert manager failure * powerscale telemetry deployment with telemetry namespace * sync q2_dev changes (#4263) * removing input template * Fix for pulp remote RemoteArtifacts is 0 after repo migration Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> * Feature/ome discovery pxe mapping enhancements (#4245) * feat(discovery): OME static group extraction, PXE mapping IP/SU/parent tag enhancements ome_server_inventory.py: - Fix static group extraction: find 'Static Groups' container by name and select only direct children via ParentId; avoids picking system/nested groups - Emit module.warn() for static groups that exist but have no devices assigned - Fix idrac_hostname: read InstrumentationName/DnsName from DeviceManagement ManagementType==2 entry instead of DeviceName which returns the IP address generate_pxe_mapping.py: - ADMIN_IP: derive from first 2 octets of admin_network.subnet + last 2 of BMC IP - IB_IP: derive from first 2 octets of ib_network.subnet + last 2 of BMC IP - Skip IB_IP/IB_MAC when server has no IB NIC (ib_nic_mac is empty) - Add extract_su_from_hostname() with regex (SU[A-Z]?\d+)(?=R\d+) to parse Scalable Unit from BMC hostname; rejects service-tag-only hostnames (idrac-JCGT033) and falls back to grp0 when no SU pattern is found - Set GROUP_NAME to extracted SU identifier (fallback: grp0) - Post-process rows to assign PARENT_SERVICE_TAG from the service_kube_control_plane_x86_64 node within the same SU group - Remove BMC_HOSTNAME from CSV headers and output rows - Lint: remove dead try/except in calculate_admin_ip/calculate_ib_ip, reuse ib_mac variable, suppress broad-except pylint warning generate_pxe_mapping.yml: - Load network_spec.yml via include_vars - Set admin_subnet and ib_subnet using selectattr on Networks list - Pass both subnets as parameters to the generate_pxe_mapping module defaults/main.yml: - Add admin_subnet and ib_subnet default variables (empty string) provision_validation.py: - Comment out validate_admin_ips_against_network_spec function and its call site; ADMIN_IPs are now derived from subnet octets + BMC IP and will not necessarily fall within primary_oim_admin_ip/netmask_bits range * refactor: rename discovery directory to provision, update network_spec.yml - Renamed discovery/ to provision/ (git detected as rename, no content loss) - Updated input/network_spec.yml with latest network configuration changes * Update discovery.yml * refactor: unify OME credentials into get_config_credentials flow - Added ome_ip, ome_username, ome_password to omnia_credential.j2 template - Added 'discovery' service entry to omnia_credentials in update_config/vars/main.yml - Added 'discovery' to the hardcoded service key trigger list in fetch_credentials.yml - Replaced custom vault logic in get_ome_credentials.yml with unified decrypt_include_encrypt.yml call against omnia_config_credentials.yml - Updated ome_discovery/vars/main.yml to reference omnia_config_credentials_file and omnia_config_credentials_vault_key instead of the separate .vault/ paths - Deleted .vault/ome_credentials.yml and .vault/.vault_password (no longer needed) * chore: update copyright year from 2025 to 2026 in modified files Updated copyright header in all ome_discovery files modified during this feature branch: - library/generate_pxe_mapping.py - library/ome_server_inventory.py - tasks/generate_pxe_mapping.yml - tasks/get_ome_credentials.yml - defaults/main.yml - vars/main.yml * fix: restore discovery_validations role missed during discovery-to-provision rename discovery/roles/discovery_validations/ was accidentally dropped when renaming the discovery/ directory to provision/. Add it back under provision/roles/discovery_validations/ to resolve the PR merge conflict. * chore: update copyright year to 2026 in provision/roles/discovery_validations files * fix: remove duplicate discovery_validations role (provision_validations already exists) provision/roles/provision_validations/ is the correct renamed equivalent of discovery/roles/discovery_validations/. The discovery_validations copy added to provision/ was redundant. * feat: apply upstream telemetry upgrade changes from dell/omnia pub/q2_dev - Replace kubectl command with kubernetes.core.k8s module for iDRAC StatefulSet - Preserve existing replica count during iDRAC StatefulSet upgrade - Add LDMS store daemon check, restart, and readiness wait tasks * fix: quote build_stream_job_id_absent message in provision_validations vars * feat: add discovery/roles/discovery_validations and telemetry files - Add discovery/roles/discovery_validations/vars/main.yml with task definitions for validation flow - Add discovery/roles/telemetry/tasks/apply_telemetry_on_upgrade.yml with upstream telemetry upgrade logic (replica preservation + LDMS store) * fix: wrap long line in fetch_credentials.yml to satisfy yaml[line-length] lint * refactor: move ome_ip from credentials to discovery_config.yml - Create input/discovery_config.yml for non-credential discovery settings (ome_ip, future Magellan config) - Remove ome_ip from omnia_credential.j2 and credential update vars - Load ome_ip via include_vars from discovery_config.yml in get_ome_credentials.yml - Add discovery_config.yml to provision_validations discovery_inputs - Remove redundant ib_subnet/admin_subnet defaults from ome_discovery * fix: add newline at end of ome_discovery/defaults/main.yml * fix: override role_path to absolute path for decrypt_include_encrypt.yml role_path resolves to ome_discovery role path, causing encrypt_files_vars.yml to be looked up incorrectly. Override to playbook_dir dirname (/opt/omnia/omnia). * fix: inline credential loading to avoid role_path resolution issue role_path cannot be overridden in include_tasks vars. Replace the call to decrypt_include_encrypt.yml with direct include_vars using stat checks for encrypted vs unencrypted credential file handling. * fix: skip load-failure rule in ansible-lint to avoid CI false positives ansible-lint fails to resolve role_path relative paths during static analysis in GitHub Actions, causing false load-failure errors for files that exist and work at runtime. * Update ansible.cfg * Update ansible.cfg * refactor: rename discovery references to provision and add discovery_config variable - Rename discover_mapping_nodes.yml to provision_mapping_nodes.yml - Replace "discovery" terminology with "provision" across playbooks, vars, READMEs, and task names in provision roles - Add subnet as required field with IP pattern validation in network_spec schema - Define discovery_config variable in ome_discovery vars and use it in get_ome_credentials.yml (consistent with provision_config pattern) - Rename discovery_inputs to provision_inputs in validation vars - Rename discovery_mech_mapping to provision_mech_mapping - Update user-facing messages to reference provision.yml Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: credential rules, vault handling, GROUP_NAME validation, and discovery playbook improvements - Add ome_username and ome_password validation rules to credential_rules.json - Add 'discovery' tag to prepare_oim omnia_run_tags so OME credentials are prompted - Fix vault-encrypted credential loading in get_ome_credentials.yml (use decrypt-include-reencrypt pattern instead of unsupported vault_password_file) - Add include_input_dir.yml import to discovery.yml so input_project_dir is set - Accept SU1-SU100 (case-insensitive) in addition to grp0-grp100 for GROUP_NAME - Fix Magellan message to use list format (avoids \n in debug output) - Remove escaped quotes from discovery usage examples Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: extend SU group name support to build_image validation and schemas - Add build_aarch_image tag to input_file_inventory so build_image_aarch64.yml runs provision_config validation (was missing, causing no validation to run for aarch64 builds) - Update GROUP_NAME patterns in functional_groups_config.json and omnia_config.json schemas to accept SU1-SU100 format alongside grp0-grp100 - Update INVALID_GROUP_NAME_MSG to reflect both accepted formats Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Cleanup discovery roles: move library modules, remove unused roles (#4261) * Cleanup discovery roles: move library modules, remove unused roles - Move ome_server_inventory.py and generate_pxe_mapping.py from discovery/roles/ome_discovery/library/ to common/library/modules/ so they are shared via the common module search path already configured in discovery/ansible.cfg - Remove unused discovery/roles/telemetry/ directory - Remove unused discovery/roles/discovery_validations/ directory - Load discovery_config.yml at playbook level in discovery.yml (consistent with how build_stream_config.yml is loaded in provision.yml) - Fix discovery_complete_msg formatting for readable Ansible output Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Remove unused discovery_validations role Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix for set_pxe_boot.yml when custom inventory given (#4260) * Update generate_bmc_inventory.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * Update pre_checks.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * lint issue Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> --------- Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * resolving merge conflict * revert openchami commit id * resolving review comments * addressing review comments * fix for vmagent scraping powerscale metrics * cleanup script correction for powerscale telemetry cleanup * victoria operator and victoria log input validation * vitoria log input and input validation * remving L2 vslidation for victoria log which is not required * input validation and review comment addressing * change idrac_telemetry_collection_type to telemetry_collection_type * Remove invisible Unicode LRM (U+200E) characters from victoria-operator template filenames * VictoriaLogs container image references and default variable * port check * resolve merge conflict * correction for schema * Update telemetry_config.json * Update validate_input.py * merge conflict telemetry_prereq.yml * change victoria_configurations to victoria_metrics_configurations * remove deployment mode input variable * update for upgrade scenarios * update comments * update comment * resolving issues due to merge conflict * vitoria log changes * victoria log cluster component and VLAgent deployment * updating pod name * removing the changes of adding cert * victoria log changes * remivng victoria log pod calidation playbook * cleanup changes for victoria log * Update ansible-lint.yml and pylint for pub/telemetry (#4296) * Update ansible-lint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * Update pylint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * fixing ansible-lint * lint * line-lenght --------- Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Co-authored-by: mithileshreddy04 <mithilesh.reddy@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: Kratika_Patidar <Kratika.Patidar@dell.com> * IB nic ip assignment * update MinIO and registry images to fixed tagged versions, omnia core container tag and version to 2.2 and v2.2.0.0 (#4309) * Minimal OS-only functional group enablement for x86_64 and aarch64 * Update image_package_collector.py * Update provision_validation.py * Minimal OS functional group updates in provision * Minimal OS functional group upgrade * Fix os_* package cross-contamination and remove stale discovery templates * OpenCHAMI upgrade changes * Update openchami container tags * Update main.yml * Update main.yml * Update main.yml * Update omnia version and core tag * vast client installation * single template way * fix(OMN01D-2164): prompt OME credentials only when enable_bmc_discovery is true - Add enable_bmc_discovery flag (default: false) to discovery_config.yml - Load discovery_config.yml in prepare_oim.yml and set ome_discovery_enabled based on enable_bmc_discovery flag - Change discovery credentials from mandatory to conditional_mandatory gated on ome_discovery_enabled - When enable_bmc_discovery is false, OME username/password prompts are skipped during prepare_oim even if ome_ip is pre-filled * fix(OMN01D-2168): fail explicitly when discovery_mechanism is not provided Replace meta: end_play with ansible.builtin.fail so the playbook exits with non-zero status and a clear error message when discovery_mechanism is missing, instead of silently succeeding. * fix(OMN01D-2169): add L1 input validation for discovery.yml - Create discovery_config.json schema with: - enable_bmc_discovery (boolean, required) - ome_ip (string, required; must be valid IPv4 when enable_bmc_discovery is true) - Register discovery_config in config.py files dict and input_file_inventory - Add 'discovery' tag and invoke validate_config.yml in discovery.yml before role execution (consistent with provision.yml pattern) - Add explicit ome_ip check before OME role inclusion for clear fail-fast error when ome_ip is empty with discovery_mechanism=ome * fix(OMN01D-2225): improve OME authentication and reachability error messages - ome_server_inventory.py: auth failure now tells user to verify ome_username/ome_password in omnia_config_credentials.yml and rerun - collect_inventory.yml: wrap wait_for in block/rescue so timeout gives actionable message pointing to discovery_config.yml and network check * fix(OMN01D-2226): correct discovery completion message next steps - Replace misleading 'Rename or copy' instruction with guidance to update pxe_mapping_file_path in provision_config.yml - Show full absolute path of generated file throughout - Add spacing between steps for readability * fix(OMN01D-2227): escape backslash in docstring to suppress SyntaxWarning Python 3.12+ warns about invalid escape sequence '\d' in non-raw string literals. The docstring in extract_su_from_hostname() contained (?=R\d+) which triggered SyntaxWarning during discovery execution. Escaped the backslash to (?=R\\d+) in the docstring. * fix(OMN01D-2230): correct GROUP_NAME and PARENT_SERVICE_TAG in PXE mapping Issue 1 - GROUP_NAME: - Add fallback: try extracting SU pattern from OME group name when BMC hostname has no SU pattern (covers hierarchical OME groups like SU1_slurm_node, SU2_compute, etc.) - grp0 remains the correct default for single-cluster environments Issue 2 - PARENT_SERVICE_TAG: - Define CHILD_ROLES_OF_CONTROL_PLANE set (service_kube_node_x86_64) - Only assign PARENT_SERVICE_TAG to rows whose FUNCTIONAL_GROUP_NAME is a child role of the control plane within the same GROUP_NAME - Control plane nodes, slurm nodes, login nodes, etc. no longer get an incorrect PARENT_SERVICE_TAG * fix(OMN01D-2231): detect and fail on duplicate OME static group assignments - build_device_group_map() now tracks all group memberships per device and returns a conflicts dict for devices in multiple static groups - main() fails with an actionable error listing each conflicting device and its groups, instead of silently using the first-seen group - Prevents incorrect FUNCTIONAL_GROUP_NAME override in PXE mapping * fix(OMN01D-2232): validate OME group names against supported functional groups - Define SUPPORTED_FUNCTIONAL_GROUPS set matching Omnia's known roles - Skip servers whose OME static group is not in the supported set - Emit a warning per skipped device listing the unsupported group name and the full set of supported groups - Unsupported groups (e.g. 'abc') no longer appear in the PXE mapping * revert: temporarily revert discovery fixes for PR workflow * s3cmd configurations * update image tags * fix: Resolve FK constraint violation and catalog metadata persistence issues - Fix FK constraint by using shared session for image_group_repo and image_repo in ResultPoller - Concatenate multiple S3 paths with semicolon delimiter to respect unique constraint uq_images_image_group_id_role - Update cleanup_job.py to split semicolon-delimited paths and delete each individually - Widen image_name column from VARCHAR(256) to VARCHAR(512) to accommodate semicolon-delimited paths - Add logging to FileArtifactStore.store() to debug file write failures - Fix URI construction in parse_catalog.py for existing artifacts (construct file:// URI directly) - Update build_stream container image tag to 1.1 Tested with job 7da54d1f-ed26-41dd-b3ff-0386104e644d (image-build19) - ImageGroup and Images created successfully * Fix lint error * Modify the charlimit to 512 in DB --------- Signed-off-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Co-authored-by: Mithilesh Reddy <mithilesh.reddy@dell.com> Co-authored-by: mcas <sakshi.s@dell.com> Co-authored-by: Super User <root@oim.omnia.test> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> Co-authored-by: balajikumaran-c-s <balajikumaran.cs@dellteam.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: Abhishek S A <abhishek.sa3@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: Kratika_Patidar <Kratika.Patidar@dell.com> Co-authored-by: Nagachandan-P <Nagachandan.p@dell.com> Co-authored-by: Sujit Jadhav <sujit.jadhav@dell.com>
abhishek-sa1
added a commit
that referenced
this pull request
May 5, 2026
* Minimal OS-only functional group enablement (#4267) * Minimal OS-only functional group enablement for x86_64 and aarch64 * Update image_package_collector.py * Update provision_validation.py * Minimal OS functional group updates in provision * Minimal OS functional group upgrade * Fix os_* package cross-contamination and remove stale discovery templates * csi defect fix * nvidia dcgm install * Add 'provision' tag to omnia_run_tags Signed-off-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> * Add 'provision' tag to omnia_run_tags (#4276) Signed-off-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> * Fix incorrect file path in Podman login failure message (OMN01D-2166) The podman_login_fail_msg referenced a hardcoded path input/omnia_config_credentials.yml which does not exist. Updated it to use the dynamic input_project_dir variable so the error message now correctly points to the actual credentials file at <input_project_dir>/omnia_config_credentials.yml. Fixed in both prepare_oim and gitlab roles. * Merge pull request #4294 from mithileshreddy04/pub/q2_dev OpenCHAMI upgrade changes in prepare_oim and oim_cleanup * Feature branch sync - pub/telemetry to pub/q2_dev (#4293) * Update openchami git version (#4251) Co-authored-by: mithileshreddy04 <mithilesh.reddy@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> * powerscale teleemtry support with direct authentication mode * use existing vmagent * update messages in vars * merge Pub/q2 dev to pub/telemetry (#4254) * removing input template * Fix for pulp remote RemoteArtifacts is 0 after repo migration Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> * Powerscale teleemtry support using helm * deploy powerscale telemetry using cloud-init * offline deployment of powerscale telemetry * fix for cert-manager failure * fix for cert manager failure * powerscale telemetry deployment with telemetry namespace * sync q2_dev changes (#4263) * removing input template * Fix for pulp remote RemoteArtifacts is 0 after repo migration Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> * Feature/ome discovery pxe mapping enhancements (#4245) * feat(discovery): OME static group extraction, PXE mapping IP/SU/parent tag enhancements ome_server_inventory.py: - Fix static group extraction: find 'Static Groups' container by name and select only direct children via ParentId; avoids picking system/nested groups - Emit module.warn() for static groups that exist but have no devices assigned - Fix idrac_hostname: read InstrumentationName/DnsName from DeviceManagement ManagementType==2 entry instead of DeviceName which returns the IP address generate_pxe_mapping.py: - ADMIN_IP: derive from first 2 octets of admin_network.subnet + last 2 of BMC IP - IB_IP: derive from first 2 octets of ib_network.subnet + last 2 of BMC IP - Skip IB_IP/IB_MAC when server has no IB NIC (ib_nic_mac is empty) - Add extract_su_from_hostname() with regex (SU[A-Z]?\d+)(?=R\d+) to parse Scalable Unit from BMC hostname; rejects service-tag-only hostnames (idrac-JCGT033) and falls back to grp0 when no SU pattern is found - Set GROUP_NAME to extracted SU identifier (fallback: grp0) - Post-process rows to assign PARENT_SERVICE_TAG from the service_kube_control_plane_x86_64 node within the same SU group - Remove BMC_HOSTNAME from CSV headers and output rows - Lint: remove dead try/except in calculate_admin_ip/calculate_ib_ip, reuse ib_mac variable, suppress broad-except pylint warning generate_pxe_mapping.yml: - Load network_spec.yml via include_vars - Set admin_subnet and ib_subnet using selectattr on Networks list - Pass both subnets as parameters to the generate_pxe_mapping module defaults/main.yml: - Add admin_subnet and ib_subnet default variables (empty string) provision_validation.py: - Comment out validate_admin_ips_against_network_spec function and its call site; ADMIN_IPs are now derived from subnet octets + BMC IP and will not necessarily fall within primary_oim_admin_ip/netmask_bits range * refactor: rename discovery directory to provision, update network_spec.yml - Renamed discovery/ to provision/ (git detected as rename, no content loss) - Updated input/network_spec.yml with latest network configuration changes * Update discovery.yml * refactor: unify OME credentials into get_config_credentials flow - Added ome_ip, ome_username, ome_password to omnia_credential.j2 template - Added 'discovery' service entry to omnia_credentials in update_config/vars/main.yml - Added 'discovery' to the hardcoded service key trigger list in fetch_credentials.yml - Replaced custom vault logic in get_ome_credentials.yml with unified decrypt_include_encrypt.yml call against omnia_config_credentials.yml - Updated ome_discovery/vars/main.yml to reference omnia_config_credentials_file and omnia_config_credentials_vault_key instead of the separate .vault/ paths - Deleted .vault/ome_credentials.yml and .vault/.vault_password (no longer needed) * chore: update copyright year from 2025 to 2026 in modified files Updated copyright header in all ome_discovery files modified during this feature branch: - library/generate_pxe_mapping.py - library/ome_server_inventory.py - tasks/generate_pxe_mapping.yml - tasks/get_ome_credentials.yml - defaults/main.yml - vars/main.yml * fix: restore discovery_validations role missed during discovery-to-provision rename discovery/roles/discovery_validations/ was accidentally dropped when renaming the discovery/ directory to provision/. Add it back under provision/roles/discovery_validations/ to resolve the PR merge conflict. * chore: update copyright year to 2026 in provision/roles/discovery_validations files * fix: remove duplicate discovery_validations role (provision_validations already exists) provision/roles/provision_validations/ is the correct renamed equivalent of discovery/roles/discovery_validations/. The discovery_validations copy added to provision/ was redundant. * feat: apply upstream telemetry upgrade changes from dell/omnia pub/q2_dev - Replace kubectl command with kubernetes.core.k8s module for iDRAC StatefulSet - Preserve existing replica count during iDRAC StatefulSet upgrade - Add LDMS store daemon check, restart, and readiness wait tasks * fix: quote build_stream_job_id_absent message in provision_validations vars * feat: add discovery/roles/discovery_validations and telemetry files - Add discovery/roles/discovery_validations/vars/main.yml with task definitions for validation flow - Add discovery/roles/telemetry/tasks/apply_telemetry_on_upgrade.yml with upstream telemetry upgrade logic (replica preservation + LDMS store) * fix: wrap long line in fetch_credentials.yml to satisfy yaml[line-length] lint * refactor: move ome_ip from credentials to discovery_config.yml - Create input/discovery_config.yml for non-credential discovery settings (ome_ip, future Magellan config) - Remove ome_ip from omnia_credential.j2 and credential update vars - Load ome_ip via include_vars from discovery_config.yml in get_ome_credentials.yml - Add discovery_config.yml to provision_validations discovery_inputs - Remove redundant ib_subnet/admin_subnet defaults from ome_discovery * fix: add newline at end of ome_discovery/defaults/main.yml * fix: override role_path to absolute path for decrypt_include_encrypt.yml role_path resolves to ome_discovery role path, causing encrypt_files_vars.yml to be looked up incorrectly. Override to playbook_dir dirname (/opt/omnia/omnia). * fix: inline credential loading to avoid role_path resolution issue role_path cannot be overridden in include_tasks vars. Replace the call to decrypt_include_encrypt.yml with direct include_vars using stat checks for encrypted vs unencrypted credential file handling. * fix: skip load-failure rule in ansible-lint to avoid CI false positives ansible-lint fails to resolve role_path relative paths during static analysis in GitHub Actions, causing false load-failure errors for files that exist and work at runtime. * Update ansible.cfg * Update ansible.cfg * refactor: rename discovery references to provision and add discovery_config variable - Rename discover_mapping_nodes.yml to provision_mapping_nodes.yml - Replace "discovery" terminology with "provision" across playbooks, vars, READMEs, and task names in provision roles - Add subnet as required field with IP pattern validation in network_spec schema - Define discovery_config variable in ome_discovery vars and use it in get_ome_credentials.yml (consistent with provision_config pattern) - Rename discovery_inputs to provision_inputs in validation vars - Rename discovery_mech_mapping to provision_mech_mapping - Update user-facing messages to reference provision.yml Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: credential rules, vault handling, GROUP_NAME validation, and discovery playbook improvements - Add ome_username and ome_password validation rules to credential_rules.json - Add 'discovery' tag to prepare_oim omnia_run_tags so OME credentials are prompted - Fix vault-encrypted credential loading in get_ome_credentials.yml (use decrypt-include-reencrypt pattern instead of unsupported vault_password_file) - Add include_input_dir.yml import to discovery.yml so input_project_dir is set - Accept SU1-SU100 (case-insensitive) in addition to grp0-grp100 for GROUP_NAME - Fix Magellan message to use list format (avoids \n in debug output) - Remove escaped quotes from discovery usage examples Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: extend SU group name support to build_image validation and schemas - Add build_aarch_image tag to input_file_inventory so build_image_aarch64.yml runs provision_config validation (was missing, causing no validation to run for aarch64 builds) - Update GROUP_NAME patterns in functional_groups_config.json and omnia_config.json schemas to accept SU1-SU100 format alongside grp0-grp100 - Update INVALID_GROUP_NAME_MSG to reflect both accepted formats Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Cleanup discovery roles: move library modules, remove unused roles (#4261) * Cleanup discovery roles: move library modules, remove unused roles - Move ome_server_inventory.py and generate_pxe_mapping.py from discovery/roles/ome_discovery/library/ to common/library/modules/ so they are shared via the common module search path already configured in discovery/ansible.cfg - Remove unused discovery/roles/telemetry/ directory - Remove unused discovery/roles/discovery_validations/ directory - Load discovery_config.yml at playbook level in discovery.yml (consistent with how build_stream_config.yml is loaded in provision.yml) - Fix discovery_complete_msg formatting for readable Ansible output Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Remove unused discovery_validations role Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix for set_pxe_boot.yml when custom inventory given (#4260) * Update generate_bmc_inventory.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * Update pre_checks.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * lint issue Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> --------- Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * resolving merge conflict * revert openchami commit id * resolving review comments * addressing review comments * fix for vmagent scraping powerscale metrics * cleanup script correction for powerscale telemetry cleanup * victoria operator and victoria log input validation * vitoria log input and input validation * remving L2 vslidation for victoria log which is not required * input validation and review comment addressing * change idrac_telemetry_collection_type to telemetry_collection_type * Remove invisible Unicode LRM (U+200E) characters from victoria-operator template filenames * VictoriaLogs container image references and default variable * port check * resolve merge conflict * correction for schema * Update telemetry_config.json * Update validate_input.py * merge conflict telemetry_prereq.yml * change victoria_configurations to victoria_metrics_configurations * remove deployment mode input variable * update for upgrade scenarios * update comments * update comment * resolving issues due to merge conflict * vitoria log changes * victoria log cluster component and VLAgent deployment * updating pod name * removing the changes of adding cert * victoria log changes * remivng victoria log pod calidation playbook * cleanup changes for victoria log * Update ansible-lint.yml and pylint for pub/telemetry (#4296) * Update ansible-lint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * Update pylint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * fixing ansible-lint * lint * line-lenght --------- Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Co-authored-by: mithileshreddy04 <mithilesh.reddy@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: Kratika_Patidar <Kratika.Patidar@dell.com> * IB nic ip assignment * update MinIO and registry images to fixed tagged versions, omnia core container tag and version to 2.2 and v2.2.0.0 (#4309) * Minimal OS-only functional group enablement for x86_64 and aarch64 * Update image_package_collector.py * Update provision_validation.py * Minimal OS functional group updates in provision * Minimal OS functional group upgrade * Fix os_* package cross-contamination and remove stale discovery templates * OpenCHAMI upgrade changes * Update openchami container tags * Update main.yml * Update main.yml * Update main.yml * Update omnia version and core tag * vast client installation * single template way * fix(OMN01D-2164): prompt OME credentials only when enable_bmc_discovery is true - Add enable_bmc_discovery flag (default: false) to discovery_config.yml - Load discovery_config.yml in prepare_oim.yml and set ome_discovery_enabled based on enable_bmc_discovery flag - Change discovery credentials from mandatory to conditional_mandatory gated on ome_discovery_enabled - When enable_bmc_discovery is false, OME username/password prompts are skipped during prepare_oim even if ome_ip is pre-filled * fix(OMN01D-2168): fail explicitly when discovery_mechanism is not provided Replace meta: end_play with ansible.builtin.fail so the playbook exits with non-zero status and a clear error message when discovery_mechanism is missing, instead of silently succeeding. * fix(OMN01D-2169): add L1 input validation for discovery.yml - Create discovery_config.json schema with: - enable_bmc_discovery (boolean, required) - ome_ip (string, required; must be valid IPv4 when enable_bmc_discovery is true) - Register discovery_config in config.py files dict and input_file_inventory - Add 'discovery' tag and invoke validate_config.yml in discovery.yml before role execution (consistent with provision.yml pattern) - Add explicit ome_ip check before OME role inclusion for clear fail-fast error when ome_ip is empty with discovery_mechanism=ome * fix(OMN01D-2225): improve OME authentication and reachability error messages - ome_server_inventory.py: auth failure now tells user to verify ome_username/ome_password in omnia_config_credentials.yml and rerun - collect_inventory.yml: wrap wait_for in block/rescue so timeout gives actionable message pointing to discovery_config.yml and network check * fix(OMN01D-2226): correct discovery completion message next steps - Replace misleading 'Rename or copy' instruction with guidance to update pxe_mapping_file_path in provision_config.yml - Show full absolute path of generated file throughout - Add spacing between steps for readability * fix(OMN01D-2227): escape backslash in docstring to suppress SyntaxWarning Python 3.12+ warns about invalid escape sequence '\d' in non-raw string literals. The docstring in extract_su_from_hostname() contained (?=R\d+) which triggered SyntaxWarning during discovery execution. Escaped the backslash to (?=R\\d+) in the docstring. * fix(OMN01D-2230): correct GROUP_NAME and PARENT_SERVICE_TAG in PXE mapping Issue 1 - GROUP_NAME: - Add fallback: try extracting SU pattern from OME group name when BMC hostname has no SU pattern (covers hierarchical OME groups like SU1_slurm_node, SU2_compute, etc.) - grp0 remains the correct default for single-cluster environments Issue 2 - PARENT_SERVICE_TAG: - Define CHILD_ROLES_OF_CONTROL_PLANE set (service_kube_node_x86_64) - Only assign PARENT_SERVICE_TAG to rows whose FUNCTIONAL_GROUP_NAME is a child role of the control plane within the same GROUP_NAME - Control plane nodes, slurm nodes, login nodes, etc. no longer get an incorrect PARENT_SERVICE_TAG * fix(OMN01D-2231): detect and fail on duplicate OME static group assignments - build_device_group_map() now tracks all group memberships per device and returns a conflicts dict for devices in multiple static groups - main() fails with an actionable error listing each conflicting device and its groups, instead of silently using the first-seen group - Prevents incorrect FUNCTIONAL_GROUP_NAME override in PXE mapping * fix(OMN01D-2232): validate OME group names against supported functional groups - Define SUPPORTED_FUNCTIONAL_GROUPS set matching Omnia's known roles - Skip servers whose OME static group is not in the supported set - Emit a warning per skipped device listing the unsupported group name and the full set of supported groups - Unsupported groups (e.g. 'abc') no longer appear in the PXE mapping * revert: temporarily revert discovery fixes for PR workflow * s3cmd configurations * update image tags * fix: Resolve FK constraint violation and catalog metadata persistence issues - Fix FK constraint by using shared session for image_group_repo and image_repo in ResultPoller - Concatenate multiple S3 paths with semicolon delimiter to respect unique constraint uq_images_image_group_id_role - Update cleanup_job.py to split semicolon-delimited paths and delete each individually - Widen image_name column from VARCHAR(256) to VARCHAR(512) to accommodate semicolon-delimited paths - Add logging to FileArtifactStore.store() to debug file write failures - Fix URI construction in parse_catalog.py for existing artifacts (construct file:// URI directly) - Update build_stream container image tag to 1.1 Tested with job 7da54d1f-ed26-41dd-b3ff-0386104e644d (image-build19) - ImageGroup and Images created successfully * Fix lint error * Modify the charlimit to 512 in DB * Fix the image_groups name and stages in summary * Updating Catalog changes aligning to recent updates * Fix the pylint issues --------- Signed-off-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Co-authored-by: Mithilesh Reddy <mithilesh.reddy@dell.com> Co-authored-by: mcas <sakshi.s@dell.com> Co-authored-by: Super User <root@oim.omnia.test> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> Co-authored-by: balajikumaran-c-s <balajikumaran.cs@dellteam.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: Abhishek S A <abhishek.sa3@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: Kratika_Patidar <Kratika.Patidar@dell.com> Co-authored-by: Nagachandan-P <Nagachandan.p@dell.com> Co-authored-by: Sujit Jadhav <sujit.jadhav@dell.com>
Rajeshkumar-s2
added a commit
that referenced
this pull request
May 7, 2026
* Minimal OS-only functional group enablement (#4267) * Minimal OS-only functional group enablement for x86_64 and aarch64 * Update image_package_collector.py * Update provision_validation.py * Minimal OS functional group updates in provision * Minimal OS functional group upgrade * Fix os_* package cross-contamination and remove stale discovery templates * csi defect fix * nvidia dcgm install * Add 'provision' tag to omnia_run_tags Signed-off-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> * Add 'provision' tag to omnia_run_tags (#4276) Signed-off-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> * Fix incorrect file path in Podman login failure message (OMN01D-2166) The podman_login_fail_msg referenced a hardcoded path input/omnia_config_credentials.yml which does not exist. Updated it to use the dynamic input_project_dir variable so the error message now correctly points to the actual credentials file at <input_project_dir>/omnia_config_credentials.yml. Fixed in both prepare_oim and gitlab roles. * Merge pull request #4294 from mithileshreddy04/pub/q2_dev OpenCHAMI upgrade changes in prepare_oim and oim_cleanup * Feature branch sync - pub/telemetry to pub/q2_dev (#4293) * Update openchami git version (#4251) Co-authored-by: mithileshreddy04 <mithilesh.reddy@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> * powerscale teleemtry support with direct authentication mode * use existing vmagent * update messages in vars * merge Pub/q2 dev to pub/telemetry (#4254) * removing input template * Fix for pulp remote RemoteArtifacts is 0 after repo migration Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> * Powerscale teleemtry support using helm * deploy powerscale telemetry using cloud-init * offline deployment of powerscale telemetry * fix for cert-manager failure * fix for cert manager failure * powerscale telemetry deployment with telemetry namespace * sync q2_dev changes (#4263) * removing input template * Fix for pulp remote RemoteArtifacts is 0 after repo migration Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> * Feature/ome discovery pxe mapping enhancements (#4245) * feat(discovery): OME static group extraction, PXE mapping IP/SU/parent tag enhancements ome_server_inventory.py: - Fix static group extraction: find 'Static Groups' container by name and select only direct children via ParentId; avoids picking system/nested groups - Emit module.warn() for static groups that exist but have no devices assigned - Fix idrac_hostname: read InstrumentationName/DnsName from DeviceManagement ManagementType==2 entry instead of DeviceName which returns the IP address generate_pxe_mapping.py: - ADMIN_IP: derive from first 2 octets of admin_network.subnet + last 2 of BMC IP - IB_IP: derive from first 2 octets of ib_network.subnet + last 2 of BMC IP - Skip IB_IP/IB_MAC when server has no IB NIC (ib_nic_mac is empty) - Add extract_su_from_hostname() with regex (SU[A-Z]?\d+)(?=R\d+) to parse Scalable Unit from BMC hostname; rejects service-tag-only hostnames (idrac-JCGT033) and falls back to grp0 when no SU pattern is found - Set GROUP_NAME to extracted SU identifier (fallback: grp0) - Post-process rows to assign PARENT_SERVICE_TAG from the service_kube_control_plane_x86_64 node within the same SU group - Remove BMC_HOSTNAME from CSV headers and output rows - Lint: remove dead try/except in calculate_admin_ip/calculate_ib_ip, reuse ib_mac variable, suppress broad-except pylint warning generate_pxe_mapping.yml: - Load network_spec.yml via include_vars - Set admin_subnet and ib_subnet using selectattr on Networks list - Pass both subnets as parameters to the generate_pxe_mapping module defaults/main.yml: - Add admin_subnet and ib_subnet default variables (empty string) provision_validation.py: - Comment out validate_admin_ips_against_network_spec function and its call site; ADMIN_IPs are now derived from subnet octets + BMC IP and will not necessarily fall within primary_oim_admin_ip/netmask_bits range * refactor: rename discovery directory to provision, update network_spec.yml - Renamed discovery/ to provision/ (git detected as rename, no content loss) - Updated input/network_spec.yml with latest network configuration changes * Update discovery.yml * refactor: unify OME credentials into get_config_credentials flow - Added ome_ip, ome_username, ome_password to omnia_credential.j2 template - Added 'discovery' service entry to omnia_credentials in update_config/vars/main.yml - Added 'discovery' to the hardcoded service key trigger list in fetch_credentials.yml - Replaced custom vault logic in get_ome_credentials.yml with unified decrypt_include_encrypt.yml call against omnia_config_credentials.yml - Updated ome_discovery/vars/main.yml to reference omnia_config_credentials_file and omnia_config_credentials_vault_key instead of the separate .vault/ paths - Deleted .vault/ome_credentials.yml and .vault/.vault_password (no longer needed) * chore: update copyright year from 2025 to 2026 in modified files Updated copyright header in all ome_discovery files modified during this feature branch: - library/generate_pxe_mapping.py - library/ome_server_inventory.py - tasks/generate_pxe_mapping.yml - tasks/get_ome_credentials.yml - defaults/main.yml - vars/main.yml * fix: restore discovery_validations role missed during discovery-to-provision rename discovery/roles/discovery_validations/ was accidentally dropped when renaming the discovery/ directory to provision/. Add it back under provision/roles/discovery_validations/ to resolve the PR merge conflict. * chore: update copyright year to 2026 in provision/roles/discovery_validations files * fix: remove duplicate discovery_validations role (provision_validations already exists) provision/roles/provision_validations/ is the correct renamed equivalent of discovery/roles/discovery_validations/. The discovery_validations copy added to provision/ was redundant. * feat: apply upstream telemetry upgrade changes from dell/omnia pub/q2_dev - Replace kubectl command with kubernetes.core.k8s module for iDRAC StatefulSet - Preserve existing replica count during iDRAC StatefulSet upgrade - Add LDMS store daemon check, restart, and readiness wait tasks * fix: quote build_stream_job_id_absent message in provision_validations vars * feat: add discovery/roles/discovery_validations and telemetry files - Add discovery/roles/discovery_validations/vars/main.yml with task definitions for validation flow - Add discovery/roles/telemetry/tasks/apply_telemetry_on_upgrade.yml with upstream telemetry upgrade logic (replica preservation + LDMS store) * fix: wrap long line in fetch_credentials.yml to satisfy yaml[line-length] lint * refactor: move ome_ip from credentials to discovery_config.yml - Create input/discovery_config.yml for non-credential discovery settings (ome_ip, future Magellan config) - Remove ome_ip from omnia_credential.j2 and credential update vars - Load ome_ip via include_vars from discovery_config.yml in get_ome_credentials.yml - Add discovery_config.yml to provision_validations discovery_inputs - Remove redundant ib_subnet/admin_subnet defaults from ome_discovery * fix: add newline at end of ome_discovery/defaults/main.yml * fix: override role_path to absolute path for decrypt_include_encrypt.yml role_path resolves to ome_discovery role path, causing encrypt_files_vars.yml to be looked up incorrectly. Override to playbook_dir dirname (/opt/omnia/omnia). * fix: inline credential loading to avoid role_path resolution issue role_path cannot be overridden in include_tasks vars. Replace the call to decrypt_include_encrypt.yml with direct include_vars using stat checks for encrypted vs unencrypted credential file handling. * fix: skip load-failure rule in ansible-lint to avoid CI false positives ansible-lint fails to resolve role_path relative paths during static analysis in GitHub Actions, causing false load-failure errors for files that exist and work at runtime. * Update ansible.cfg * Update ansible.cfg * refactor: rename discovery references to provision and add discovery_config variable - Rename discover_mapping_nodes.yml to provision_mapping_nodes.yml - Replace "discovery" terminology with "provision" across playbooks, vars, READMEs, and task names in provision roles - Add subnet as required field with IP pattern validation in network_spec schema - Define discovery_config variable in ome_discovery vars and use it in get_ome_credentials.yml (consistent with provision_config pattern) - Rename discovery_inputs to provision_inputs in validation vars - Rename discovery_mech_mapping to provision_mech_mapping - Update user-facing messages to reference provision.yml Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: credential rules, vault handling, GROUP_NAME validation, and discovery playbook improvements - Add ome_username and ome_password validation rules to credential_rules.json - Add 'discovery' tag to prepare_oim omnia_run_tags so OME credentials are prompted - Fix vault-encrypted credential loading in get_ome_credentials.yml (use decrypt-include-reencrypt pattern instead of unsupported vault_password_file) - Add include_input_dir.yml import to discovery.yml so input_project_dir is set - Accept SU1-SU100 (case-insensitive) in addition to grp0-grp100 for GROUP_NAME - Fix Magellan message to use list format (avoids \n in debug output) - Remove escaped quotes from discovery usage examples Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: extend SU group name support to build_image validation and schemas - Add build_aarch_image tag to input_file_inventory so build_image_aarch64.yml runs provision_config validation (was missing, causing no validation to run for aarch64 builds) - Update GROUP_NAME patterns in functional_groups_config.json and omnia_config.json schemas to accept SU1-SU100 format alongside grp0-grp100 - Update INVALID_GROUP_NAME_MSG to reflect both accepted formats Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Cleanup discovery roles: move library modules, remove unused roles (#4261) * Cleanup discovery roles: move library modules, remove unused roles - Move ome_server_inventory.py and generate_pxe_mapping.py from discovery/roles/ome_discovery/library/ to common/library/modules/ so they are shared via the common module search path already configured in discovery/ansible.cfg - Remove unused discovery/roles/telemetry/ directory - Remove unused discovery/roles/discovery_validations/ directory - Load discovery_config.yml at playbook level in discovery.yml (consistent with how build_stream_config.yml is loaded in provision.yml) - Fix discovery_complete_msg formatting for readable Ansible output Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Remove unused discovery_validations role Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix for set_pxe_boot.yml when custom inventory given (#4260) * Update generate_bmc_inventory.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * Update pre_checks.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * lint issue Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> --------- Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * resolving merge conflict * revert openchami commit id * resolving review comments * addressing review comments * fix for vmagent scraping powerscale metrics * cleanup script correction for powerscale telemetry cleanup * victoria operator and victoria log input validation * vitoria log input and input validation * remving L2 vslidation for victoria log which is not required * input validation and review comment addressing * change idrac_telemetry_collection_type to telemetry_collection_type * Remove invisible Unicode LRM (U+200E) characters from victoria-operator template filenames * VictoriaLogs container image references and default variable * port check * resolve merge conflict * correction for schema * Update telemetry_config.json * Update validate_input.py * merge conflict telemetry_prereq.yml * change victoria_configurations to victoria_metrics_configurations * remove deployment mode input variable * update for upgrade scenarios * update comments * update comment * resolving issues due to merge conflict * vitoria log changes * victoria log cluster component and VLAgent deployment * updating pod name * removing the changes of adding cert * victoria log changes * remivng victoria log pod calidation playbook * cleanup changes for victoria log * Update ansible-lint.yml and pylint for pub/telemetry (#4296) * Update ansible-lint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * Update pylint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * fixing ansible-lint * lint * line-lenght --------- Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Co-authored-by: mithileshreddy04 <mithilesh.reddy@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: Kratika_Patidar <Kratika.Patidar@dell.com> * IB nic ip assignment * update MinIO and registry images to fixed tagged versions, omnia core container tag and version to 2.2 and v2.2.0.0 (#4309) * Minimal OS-only functional group enablement for x86_64 and aarch64 * Update image_package_collector.py * Update provision_validation.py * Minimal OS functional group updates in provision * Minimal OS functional group upgrade * Fix os_* package cross-contamination and remove stale discovery templates * OpenCHAMI upgrade changes * Update openchami container tags * Update main.yml * Update main.yml * Update main.yml * Update omnia version and core tag * vast client installation * single template way * fix(OMN01D-2164): prompt OME credentials only when enable_bmc_discovery is true - Add enable_bmc_discovery flag (default: false) to discovery_config.yml - Load discovery_config.yml in prepare_oim.yml and set ome_discovery_enabled based on enable_bmc_discovery flag - Change discovery credentials from mandatory to conditional_mandatory gated on ome_discovery_enabled - When enable_bmc_discovery is false, OME username/password prompts are skipped during prepare_oim even if ome_ip is pre-filled * fix(OMN01D-2168): fail explicitly when discovery_mechanism is not provided Replace meta: end_play with ansible.builtin.fail so the playbook exits with non-zero status and a clear error message when discovery_mechanism is missing, instead of silently succeeding. * fix(OMN01D-2169): add L1 input validation for discovery.yml - Create discovery_config.json schema with: - enable_bmc_discovery (boolean, required) - ome_ip (string, required; must be valid IPv4 when enable_bmc_discovery is true) - Register discovery_config in config.py files dict and input_file_inventory - Add 'discovery' tag and invoke validate_config.yml in discovery.yml before role execution (consistent with provision.yml pattern) - Add explicit ome_ip check before OME role inclusion for clear fail-fast error when ome_ip is empty with discovery_mechanism=ome * fix(OMN01D-2225): improve OME authentication and reachability error messages - ome_server_inventory.py: auth failure now tells user to verify ome_username/ome_password in omnia_config_credentials.yml and rerun - collect_inventory.yml: wrap wait_for in block/rescue so timeout gives actionable message pointing to discovery_config.yml and network check * fix(OMN01D-2226): correct discovery completion message next steps - Replace misleading 'Rename or copy' instruction with guidance to update pxe_mapping_file_path in provision_config.yml - Show full absolute path of generated file throughout - Add spacing between steps for readability * fix(OMN01D-2227): escape backslash in docstring to suppress SyntaxWarning Python 3.12+ warns about invalid escape sequence '\d' in non-raw string literals. The docstring in extract_su_from_hostname() contained (?=R\d+) which triggered SyntaxWarning during discovery execution. Escaped the backslash to (?=R\\d+) in the docstring. * fix(OMN01D-2230): correct GROUP_NAME and PARENT_SERVICE_TAG in PXE mapping Issue 1 - GROUP_NAME: - Add fallback: try extracting SU pattern from OME group name when BMC hostname has no SU pattern (covers hierarchical OME groups like SU1_slurm_node, SU2_compute, etc.) - grp0 remains the correct default for single-cluster environments Issue 2 - PARENT_SERVICE_TAG: - Define CHILD_ROLES_OF_CONTROL_PLANE set (service_kube_node_x86_64) - Only assign PARENT_SERVICE_TAG to rows whose FUNCTIONAL_GROUP_NAME is a child role of the control plane within the same GROUP_NAME - Control plane nodes, slurm nodes, login nodes, etc. no longer get an incorrect PARENT_SERVICE_TAG * fix(OMN01D-2231): detect and fail on duplicate OME static group assignments - build_device_group_map() now tracks all group memberships per device and returns a conflicts dict for devices in multiple static groups - main() fails with an actionable error listing each conflicting device and its groups, instead of silently using the first-seen group - Prevents incorrect FUNCTIONAL_GROUP_NAME override in PXE mapping * fix(OMN01D-2232): validate OME group names against supported functional groups - Define SUPPORTED_FUNCTIONAL_GROUPS set matching Omnia's known roles - Skip servers whose OME static group is not in the supported set - Emit a warning per skipped device listing the unsupported group name and the full set of supported groups - Unsupported groups (e.g. 'abc') no longer appear in the PXE mapping * revert: temporarily revert discovery fixes for PR workflow * s3cmd configurations * update image tags * fix: Resolve FK constraint violation and catalog metadata persistence issues - Fix FK constraint by using shared session for image_group_repo and image_repo in ResultPoller - Concatenate multiple S3 paths with semicolon delimiter to respect unique constraint uq_images_image_group_id_role - Update cleanup_job.py to split semicolon-delimited paths and delete each individually - Widen image_name column from VARCHAR(256) to VARCHAR(512) to accommodate semicolon-delimited paths - Add logging to FileArtifactStore.store() to debug file write failures - Fix URI construction in parse_catalog.py for existing artifacts (construct file:// URI directly) - Update build_stream container image tag to 1.1 Tested with job 7da54d1f-ed26-41dd-b3ff-0386104e644d (image-build19) - ImageGroup and Images created successfully * Fix lint error * Modify the charlimit to 512 in DB * Fix the image_groups name and stages in summary * Updating Catalog changes aligning to recent updates * Fix the pylint issues * Catalog Updates to support powerscale drivers --------- Signed-off-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Co-authored-by: Mithilesh Reddy <mithilesh.reddy@dell.com> Co-authored-by: mcas <sakshi.s@dell.com> Co-authored-by: Super User <root@oim.omnia.test> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> Co-authored-by: balajikumaran-c-s <balajikumaran.cs@dellteam.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: Abhishek S A <abhishek.sa3@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: Kratika_Patidar <Kratika.Patidar@dell.com> Co-authored-by: Nagachandan-P <Nagachandan.p@dell.com> Co-authored-by: Sujit Jadhav <sujit.jadhav@dell.com>
abhishek-sa1
added a commit
that referenced
this pull request
May 7, 2026
* vitoria log input and input validation * remving L2 vslidation for victoria log which is not required * input validation and review comment addressing * change idrac_telemetry_collection_type to telemetry_collection_type * Remove invisible Unicode LRM (U+200E) characters from victoria-operator template filenames * VictoriaLogs container image references and default variable * port check * resolve merge conflict * correction for schema * Update telemetry_config.json * Update validate_input.py * merge conflict telemetry_prereq.yml * change victoria_configurations to victoria_metrics_configurations * remove deployment mode input variable * update for upgrade scenarios * update comments * update comment * resolving issues due to merge conflict * vitoria log changes * adding the cuda and nvidia driver version gate check * victoria log cluster component and VLAgent deployment * updating pod name * storage_config.yml added * Added logic for mounts * removing the changes of adding cert * victoria log changes * remivng victoria log pod calidation playbook * cleanup changes for victoria log * Update ansible-lint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * Update pylint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * fixing ansible-lint * lint * line-lenght * Update ansible-lint.yml and pylint for pub/telemetry (#4296) * Update ansible-lint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * Update pylint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * fixing ansible-lint * lint * line-lenght --------- Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * adding SAN for vicotir agent * powerscale authorization input and input validation for powrscale authorization and powerscale telemetry (#4288) * input validation for powrscale authorization and powerscale telemetry * remove unused messages * fix cide review comments * fix UT issues * input valdiation onlu when powerscale authorization is true * pylint fixes * pylint fixes * update copyright year * Updating SAN changes for vmagengt and vlagent * Updating the SAN for vlagent (#4298) * telemetry input restructure Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * update telemetry * Update victorialogs-operator-vlcluster.yaml.j2 * Update telemetry.sh.j2 * telemetry input restructure (#4299) * telemetry input restructure Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * update telemetry * Update victorialogs-operator-vlcluster.yaml.j2 * Update telemetry.sh.j2 --------- Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * build image fix * build image suffix * Update service_k8s.json Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * issue fix * issue fix * build image fix * Revert "issue fix" This reverts commit 847746c. * Revert changes in telemetry branch (#4308) * Revert "Merge branch 'pub/telemetry' into pub/telemetry" This reverts commit 9b8faf8, reversing changes made to 895354b. * update telemetry fix * Mounts level 1 complete * initial cuda dnf installation * updating arch files * changes with respect to new input structure * powerscale_metrics_enabled variable access * powerscale telemetry enabled metrics access * pylint fixes * changes with respect to new input structure (#4312) * changes with respect to new input structure * powerscale_metrics_enabled variable access * powerscale telemetry enabled metrics access * pylint fixes * vlagent with loadbalancer type * feat(mount_config): enhance per-node bind mounts and secure runcmd execution - Implement literal moustache preservation for cloud-init variables using !unsafe with from_yaml. - Generalize node_key support to allow full metadata paths (e.g., ds.meta_data.instance_data.local_ipv4). - Fix b64encode evaluation in permission and directory creation tasks to ensure valid shell command generation. - Centralize mount-specific runcmd execution in cloud-init templates using base64 encoding to prevent shell injection/parsing issues. - Update storage_config schema to support flexible cloud-init variable references for node_key. - Add safety defaults for mount_on_oim selection to prevent 'attribute not found' errors. - Restore 'union' filter for mounts to maintain idempotency across multiple runs. * remove 9429 as it get added by operator * updating input validation, and default vars for vecotr * remove trailing space * feat(storage,provision): add cloud-init swap support + schema/validation updates New tasks: provision/roles/mount_config/tasks/{swap_config.yml,process_single_swap.yml} Build per-functional-group swap entries in cloud_init_groups_dict Validate swap filename path and size format; include optional maxsize Idempotent merge via combine(recursive=True) Render swap block in cloud-init group templates Inject conditional swap: {filename, size, [maxsize]} across login, compiler, kube (control/node), slurm (control/node) for x86_64/aarch64 Only rendered when cloud_init_groups_dict[fg].swap is present Strengthen input validation (common_validation.py) Enforce no overlap of functional_group_prefix/group across swap entries Validate maxsize >= size (supports bytes or K/M/G/T, and “auto”) Add helper parsers and validators for size comparison Update storage_config schema for swap (storage_config.json) Remove deprecated swap.name field Required: filename, size oneOf: require either functional_group_prefix or group Preserve existing size semantics (supports units/auto) * powerscale syslog support * update copyright year * UT issues * remove TLS mode as it is not supported forsyslog on VLAgent * make syslog source IP as optional * UT issue fixes * VLagent to victorial log transformation * VLAgent to victorial logs cert issue * updating default version * remving vector-idrac * fix for the script does not push the new cert to the Kubernetes secret or restart pods * vast client installation * refactor: make determine_target_groups.yml generic for mounts and swap - Change mount_item reference to config_item parameter - Update process_single_mount.yml to pass config_item: mount_item - Update process_single_swap.yml to pass config_item: swap_item - Enable reuse of target group determination logic This allows both mount and swap processing to use the same task. * single template way * Certificate resolution issue * rsyslog configuration * Added powervault to all * loadbalancer configuration * Added powervault to all functional groups * logs with loadbalncer configuration * get syslog with nodeport * syslog collection work on nodeport * ansible lint fix * update remotewrite url * Verify script * removing audit trail log * login compiler node cuda lock installation * updated the message for nodes where installation skips * Powervaulr iscsi packages added to all nodes * fix(OMN01D-2164): prompt OME credentials only when enable_bmc_discovery is true - Add enable_bmc_discovery flag (default: false) to discovery_config.yml - Load discovery_config.yml in prepare_oim.yml and set ome_discovery_enabled based on enable_bmc_discovery flag - Change discovery credentials from mandatory to conditional_mandatory gated on ome_discovery_enabled - When enable_bmc_discovery is false, OME username/password prompts are skipped during prepare_oim even if ome_ip is pre-filled * fix(OMN01D-2168): fail explicitly when discovery_mechanism is not provided Replace meta: end_play with ansible.builtin.fail so the playbook exits with non-zero status and a clear error message when discovery_mechanism is missing, instead of silently succeeding. * fix(OMN01D-2169): add L1 input validation for discovery.yml - Create discovery_config.json schema with: - enable_bmc_discovery (boolean, required) - ome_ip (string, required; must be valid IPv4 when enable_bmc_discovery is true) - Register discovery_config in config.py files dict and input_file_inventory - Add 'discovery' tag and invoke validate_config.yml in discovery.yml before role execution (consistent with provision.yml pattern) - Add explicit ome_ip check before OME role inclusion for clear fail-fast error when ome_ip is empty with discovery_mechanism=ome * fix(OMN01D-2225): improve OME authentication and reachability error messages - ome_server_inventory.py: auth failure now tells user to verify ome_username/ome_password in omnia_config_credentials.yml and rerun - collect_inventory.yml: wrap wait_for in block/rescue so timeout gives actionable message pointing to discovery_config.yml and network check * fix(OMN01D-2226): correct discovery completion message next steps - Replace misleading 'Rename or copy' instruction with guidance to update pxe_mapping_file_path in provision_config.yml - Show full absolute path of generated file throughout - Add spacing between steps for readability * fix(OMN01D-2227): escape backslash in docstring to suppress SyntaxWarning Python 3.12+ warns about invalid escape sequence '\d' in non-raw string literals. The docstring in extract_su_from_hostname() contained (?=R\d+) which triggered SyntaxWarning during discovery execution. Escaped the backslash to (?=R\\d+) in the docstring. * fix(OMN01D-2230): correct GROUP_NAME and PARENT_SERVICE_TAG in PXE mapping Issue 1 - GROUP_NAME: - Add fallback: try extracting SU pattern from OME group name when BMC hostname has no SU pattern (covers hierarchical OME groups like SU1_slurm_node, SU2_compute, etc.) - grp0 remains the correct default for single-cluster environments Issue 2 - PARENT_SERVICE_TAG: - Define CHILD_ROLES_OF_CONTROL_PLANE set (service_kube_node_x86_64) - Only assign PARENT_SERVICE_TAG to rows whose FUNCTIONAL_GROUP_NAME is a child role of the control plane within the same GROUP_NAME - Control plane nodes, slurm nodes, login nodes, etc. no longer get an incorrect PARENT_SERVICE_TAG * fix(OMN01D-2231): detect and fail on duplicate OME static group assignments - build_device_group_map() now tracks all group memberships per device and returns a conflicts dict for devices in multiple static groups - main() fails with an actionable error listing each conflicting device and its groups, instead of silently using the first-seen group - Prevents incorrect FUNCTIONAL_GROUP_NAME override in PXE mapping * fix(OMN01D-2232): validate OME group names against supported functional groups - Define SUPPORTED_FUNCTIONAL_GROUPS set matching Omnia's known roles - Skip servers whose OME static group is not in the supported set - Emit a warning per skipped device listing the unsupported group name and the full set of supported groups - Unsupported groups (e.g. 'abc') no longer appear in the PXE mapping * fix(OMN01D-2168): fail explicitly when discovery_mechanism is not provided Replace meta: end_play with ansible.builtin.fail so the playbook exits with non-zero status and a clear error message when discovery_mechanism is missing, instead of silently succeeding. * fix(OMN01D-2169): add L1 input validation for discovery.yml - Create discovery_config.json schema with: - enable_bmc_discovery (boolean, required) - ome_ip (string, required; must be valid IPv4 when enable_bmc_discovery is true) - Register discovery_config in config.py files dict and input_file_inventory - Add 'discovery' tag and invoke validate_config.yml in discovery.yml before role execution (consistent with provision.yml pattern) - Add explicit ome_ip check before OME role inclusion for clear fail-fast error when ome_ip is empty with discovery_mechanism=ome * fix(OMN01D-2225): improve OME authentication and reachability error messages - ome_server_inventory.py: auth failure now tells user to verify ome_username/ome_password in omnia_config_credentials.yml and rerun - collect_inventory.yml: wrap wait_for in block/rescue so timeout gives actionable message pointing to discovery_config.yml and network check * fix(OMN01D-2226): correct discovery completion message next steps - Replace misleading 'Rename or copy' instruction with guidance to update pxe_mapping_file_path in provision_config.yml - Show full absolute path of generated file throughout - Add spacing between steps for readability * fix(OMN01D-2227): escape backslash in docstring to suppress SyntaxWarning Python 3.12+ warns about invalid escape sequence '\d' in non-raw string literals. The docstring in extract_su_from_hostname() contained (?=R\d+) which triggered SyntaxWarning during discovery execution. Escaped the backslash to (?=R\\d+) in the docstring. * fix(OMN01D-2230): correct GROUP_NAME and PARENT_SERVICE_TAG in PXE mapping Issue 1 - GROUP_NAME: - Add fallback: try extracting SU pattern from OME group name when BMC hostname has no SU pattern (covers hierarchical OME groups like SU1_slurm_node, SU2_compute, etc.) - grp0 remains the correct default for single-cluster environments Issue 2 - PARENT_SERVICE_TAG: - Define CHILD_ROLES_OF_CONTROL_PLANE set (service_kube_node_x86_64) - Only assign PARENT_SERVICE_TAG to rows whose FUNCTIONAL_GROUP_NAME is a child role of the control plane within the same GROUP_NAME - Control plane nodes, slurm nodes, login nodes, etc. no longer get an incorrect PARENT_SERVICE_TAG * fix(OMN01D-2231): detect and fail on duplicate OME static group assignments - build_device_group_map() now tracks all group memberships per device and returns a conflicts dict for devices in multiple static groups - main() fails with an actionable error listing each conflicting device and its groups, instead of silently using the first-seen group - Prevents incorrect FUNCTIONAL_GROUP_NAME override in PXE mapping * fix(OMN01D-2232): validate OME group names against supported functional groups - Define SUPPORTED_FUNCTIONAL_GROUPS set matching Omnia's known roles - Skip servers whose OME static group is not in the supported set - Emit a warning per skipped device listing the unsupported group name and the full set of supported groups - Unsupported groups (e.g. 'abc') no longer appear in the PXE mapping * revert: temporarily revert discovery fixes for PR workflow * fix(OMN01D-2164): prompt OME credentials only when enable_bmc_discovery is true - Add enable_bmc_discovery flag (default: false) to discovery_config.yml - Load discovery_config.yml in prepare_oim.yml and set ome_discovery_enabled based on enable_bmc_discovery flag - Change discovery credentials from mandatory to conditional_mandatory gated on ome_discovery_enabled - When enable_bmc_discovery is false, OME username/password prompts are skipped during prepare_oim even if ome_ip is pre-filled * slurm cuda,dcg,peermem lock installation * vast enabled flag * fix: remove unsupported service_kube_control_plane_first_x86_64 from SUPPORTED_FUNCTIONAL_GROUPS * adding mod[rpbe for peermem * Victoria defect fixes * victoria changes * fix: Add duplicate mount point validation per functional group - Fix pxe_mapping_file reference in config and validation flow - Add validation to detect duplicate mount points per expanded functional group - Include entry names in error messages for clarity - Update schema and examples for consistency * Revert "revert the vast enabled flag" This reverts commit d20d1fb. * vast conditional installation * refactor: Align storage_config.yml with authoritative schema - Remove extra mount_params profiles (vast_nfs_performance, powervault_iscsi, network_storage, local_storage, bind_mounts, scratch_storage, global) - Keep only essential profiles: default and vast_nfs - Simplify mounts section with two core examples (nfs_slurm, nfs_k8s) - Remove VAST-specific entries and per-node bind mount examples - Comment out powervault_config entries (move to examples) - Comment out swap section (move to examples) - Fix field names: group → groups in comments and examples - Remove vast_enabled flag - Align with storage_config.json schema (only mount_params, mounts, powervault_config, swap sections) - Update comments to reference correct field names (groups instead of group) * comment updated configure-ib-network.sh.j2 Signed-off-by: Nagachandan P <Nagachandan.p@dell.com> * Revereted kube node changes * Removed acidentally committed files * Added ansible.cfg * Ansible lint fixes * lint issue Signed-off-by: Nagachandan P <Nagachandan.p@dell.com> * dns for ib network * adding cuda mount for slurm * adding the dcgm_support * adding variable approach * merge conflict resolution * revert condition * input comment * Adding new files * ansible lint error fixes * feat(discovery): robust OME pagination, GPU collection, and timestamped PXE mapping - Replace @odata.nextLink-based pagination with explicit $top/$skip OData parameters for deterministic, scalable device retrieval (8k-20k+ nodes) - Add configurable page_size parameter (default 200, max 1000) to ome_server_inventory module and wire through Ansible role defaults - Add retry with exponential backoff for transient HTTP 5xx/timeout failures - Return pagination stats (total_devices, page_size, total_pages, pages_fetched) in module output; display summary in playbook - Collect GPU vendor and type from OME devicePciDevice inventory; append GPU_VENDOR and GPU_TYPE columns to PXE mapping CSV - Generate timestamped PXE mapping files (bmc_pxe_mapping_file_<ts>.csv) so each run produces a unique file - Handle OME returning list instead of dict from inventory endpoints - Fix telemetry_config.yml validation: point to telemetry_validation module instead of removed common_validation.validate_telemetry_config - Add 20 unit tests covering pagination, retry, clamping, and filtering * variable chnages wrt new telemetry_config.yml chnages * fix(lint): resolve yaml line-length violations (>160 chars) * feat: implement host-specific mount map in cloud-init templates - Enable host-specific mount map generation from PXE mapping data - Add vendor_data metadata structure to cloud-init runcmd for dynamic mounts - Update cloud-init variable syntax from Jinja2 template to cloud-init query - Fix variable reference in build_host_mount_map.yml (PXE_GROUP -> GROUP_NAME) - Change mkdir commands to verbose mode (mkdir -pv) - Update pxe_mapping_file_path to use omnia_shared_path with path normalization This allows per-host mount configurations to be dynamically applied during node provisioning via cloud-init vendor_data metadata. * Removed group map removed legacy code * Update logical_validation.py- remove repetitive import * Update telemetry_pod_cleanup.yaml.j2 - revert telemetry pod cleanup changes * Update idrac_telemetry_statefulset.yaml.j2 - revert statefulset changes * Update telemetry_prereq.yml-remove unnecessary comment * Update csi_driver_powerscale.json- update URL * Update service_k8s.json-update version as per release 1.16.3 * Update telemetry_config.yml- fix csm observability values.yaml version * Update idrac_telemetry_statefulset.yaml.j2-update variable parsing * Update configure_powerscale_syslog.sh.j2 -fix syntax error for syslog * removing dcgm_support variable * removing stale entries * remove default dns entry for ib network Signed-off-by: Nagachandan P <Nagachandan.p@dell.com> * image layer initial commit * minor chnages * mounting hpc_tools * removinf from control node * Fix mount ordering and bind mount creation for cloud-init runcmd - Restructure runcmd to mount NFS shares before creating bind mount sources - Add explicit mount commands after each NFS fstab entry to ensure mounts are available before subdirectories are created on them - Add mkdir commands for bind mount target directories on mounted NFS - Consolidate mount -av at appropriate checkpoints to avoid premature mounting before all fstab entries and directories are ready - Update process_single_mount.yml to add mkdir for bind mount targets before fstab entries - Add "Refresh mounts" task to append final mount -av after all setup This ensures the correct sequence: 1. Add NFS mount to fstab 2. Mount NFS share 3. Create subdirectories on mounted NFS 4. Add bind mount entries to fstab 5. Mount bind mounts Fixes mount failures: "special device does not exist" and "mount point does not exist" errors during cloud-init execution. * VMscraper to scrape metrics from OTEL collector (#4362) * pub telemetry changes * service to scrape metrics from OTEL collector * vmservice to scrape metrics from otel collector * update endpoints * revert other changes * revert merge changes as per head * revert variable set * revert changes * revert changes * pylint fixes * ansible lint fixes * updating completion messaage * telemetry validation while prepare oim * update condition * added check for LDMS * Fix: Add retry logic for ochami node discovery with SMD readiness checks - Extract service readiness checks into separate block/rescue pattern - Add SMD API health check before discovery attempt - Implement automatic retry on discovery failure with service restart - Increase service check timeout from 2 to 2 minutes (12 retries × 10s delay) - Prevent connection refused errors by ensuring SMD endpoint is ready This addresses the race condition where systemd marks smd service as "started" but the HTTP endpoint at oimcp.oim.test:8443 isn't accepting connections yet. * Update smd api check command * chnaged approach * Cloud_init and other service failure messages * Updated messages * fix: BMC discovery credential prompting and PARENT_SERVICE_TAG assignment (#4366) 1. Skip OME credential prompt when enable_bmc_discovery is false - Changed discovery credentials from mandatory to conditional_mandatory in credential utility vars (gated on enable_bmc_discovery) - Added set_fact in prepare_oim.yml to promote enable_bmc_discovery from namespaced to top-level scope before credential utility runs 2. Fix PARENT_SERVICE_TAG assignment in PXE mapping - Source changed from service_kube_control_plane to service_kube_node - Only slurm_node_aarch64 and slurm_node_x86_64 receive PARENT_SERVICE_TAG - All other roles (control_plane, kube_node, login, slurm_control) remain empty * changing the script a little * ansible lint * ansible lint * Update provision_mapping_nodes.yml Signed-off-by: Jagadeesh N V <39791839+jagadeeshnv@users.noreply.github.com> * Refactor: Update K8s NFS configuration to use unified mounts structure - Replace nfs_client_params with mounts list for consistency - Extract k8s_nfs_server_ip from mounts source field (split on ':') - Extract k8s_client_mount_path from mounts mount_point field - Introduce k8s_nfs_server_path as unified source reference - Update error message to use k8s_nfs_server_path for clarity - Comment out deprecated k8s_server_share_path and nfs_server_ip references - Consolidate 'mount -av' into single set_fact call instead of separate task - Remove redundant debug task for single_mnt_runcmd - Add loop_control label for better Ansible output readability - Improve code formatting for regex_replace operations (no functional change) - Streamline permission and bind mount runcmd list construction * Updated xlurm custom repo names * Updated repo names * Removed static mounts block * rearrange args * Removed unecessary mounts * Merge pull request #4371 from Kratika-P/pub/q2_dev Telemetry enabled flag set for creating nd executing the telemetry.sh * fix(provision): resolve K8s NFS mount race condition and improve Slurm support detection - Move mkdir for K8s NFS mount directories before fstab entries to prevent race condition where mount attempts occur before target directories exist - Fix duplicate mount point names in storage_config.yml (ps3: slurm_login -> slurm_login_kube_node, ps4: login_node -> login_kubectrl) to avoid conflicts between functional groups - Add mkdir -p for /tmp/crio-storage, /var/lib/etcd, /var/lib/kubelet, /etc/kubernetes, /var/log/pods, /var/lib/packages BEFORE adding NFS entries to /etc/fstab - Affected templates: - ci-group-service_kube_control_plane_first_x86_64.yaml.j2 - ci-group-service_kube_control_plane_x86_64.yaml.j2 - ci-group-service_kube_node_x86_64.yaml.j2 - Remove redundant 'mount -av' from bind mount runcmd sequence (process_single_mount.yml) - Functional groups now handle mount -av independently - Fix slurm_support detection logic to check both software_config AND functional_groups - Initialize slurm_support to false, then set true only if: 1. slurm_custom is in software_config.softwares, AND 2. slurm_control_node_x86_64 exists in functional_groups - Apply fix to both mount_config and slurm_config roles - Remove unused slurm_support var from mount_config/vars/main.yml - Remove verbose flags from swap commands (fallocate -v, chmod -v, mkswap -v, swapon -v) in ci-group-login_compiler_node_aarch64.yaml.j2 - Update storage_config.json schema to reflect new mount point naming conventions This commit addresses Issue #3 (race condition) and Issue #4 (missing directory creation) from the cloud-init analysis, preventing silent NFS mount failures on K8s nodes. * Added missing mount-a * papi url chnaged * Updated powervault to service_kube nodes * LDAP home mount updated * Input refinement * Updated descriptions * gpu defect- peermem defect- prolog removal * defect fix for pxe validation * lint fixes * Templated swap part --------- Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> Signed-off-by: sakshi-singla-1735 <sakshi.s@dell.com> Signed-off-by: Jagadeesh N V <39791839+jagadeeshnv@users.noreply.github.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: Nagachandan P <Nagachandan.p@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: Kratika_Patidar <Kratika.Patidar@dell.com> Co-authored-by: mcas <sakshi.s@dell.com> Co-authored-by: Jagadeesh N V <jagadeesh_n_v@dell.com> Co-authored-by: Nagachandan-P <Nagachandan.p@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit.jadhav@dell.com> Co-authored-by: Jagadeesh N V <39791839+jagadeeshnv@users.noreply.github.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of the Solution
update telemetry branch in ansible-lint workflow