fix for set_pxe_boot.yml when custom inventory given#4260
Merged
Conversation
Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com>
Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com>
abhishek-sa1
approved these changes
Apr 8, 2026
priti-parate
approved these changes
Apr 8, 2026
Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com>
abhishek-sa1
pushed a commit
that referenced
this pull request
Apr 9, 2026
* removing input template * Fix for pulp remote RemoteArtifacts is 0 after repo migration Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> * Feature/ome discovery pxe mapping enhancements (#4245) * feat(discovery): OME static group extraction, PXE mapping IP/SU/parent tag enhancements ome_server_inventory.py: - Fix static group extraction: find 'Static Groups' container by name and select only direct children via ParentId; avoids picking system/nested groups - Emit module.warn() for static groups that exist but have no devices assigned - Fix idrac_hostname: read InstrumentationName/DnsName from DeviceManagement ManagementType==2 entry instead of DeviceName which returns the IP address generate_pxe_mapping.py: - ADMIN_IP: derive from first 2 octets of admin_network.subnet + last 2 of BMC IP - IB_IP: derive from first 2 octets of ib_network.subnet + last 2 of BMC IP - Skip IB_IP/IB_MAC when server has no IB NIC (ib_nic_mac is empty) - Add extract_su_from_hostname() with regex (SU[A-Z]?\d+)(?=R\d+) to parse Scalable Unit from BMC hostname; rejects service-tag-only hostnames (idrac-JCGT033) and falls back to grp0 when no SU pattern is found - Set GROUP_NAME to extracted SU identifier (fallback: grp0) - Post-process rows to assign PARENT_SERVICE_TAG from the service_kube_control_plane_x86_64 node within the same SU group - Remove BMC_HOSTNAME from CSV headers and output rows - Lint: remove dead try/except in calculate_admin_ip/calculate_ib_ip, reuse ib_mac variable, suppress broad-except pylint warning generate_pxe_mapping.yml: - Load network_spec.yml via include_vars - Set admin_subnet and ib_subnet using selectattr on Networks list - Pass both subnets as parameters to the generate_pxe_mapping module defaults/main.yml: - Add admin_subnet and ib_subnet default variables (empty string) provision_validation.py: - Comment out validate_admin_ips_against_network_spec function and its call site; ADMIN_IPs are now derived from subnet octets + BMC IP and will not necessarily fall within primary_oim_admin_ip/netmask_bits range * refactor: rename discovery directory to provision, update network_spec.yml - Renamed discovery/ to provision/ (git detected as rename, no content loss) - Updated input/network_spec.yml with latest network configuration changes * Update discovery.yml * refactor: unify OME credentials into get_config_credentials flow - Added ome_ip, ome_username, ome_password to omnia_credential.j2 template - Added 'discovery' service entry to omnia_credentials in update_config/vars/main.yml - Added 'discovery' to the hardcoded service key trigger list in fetch_credentials.yml - Replaced custom vault logic in get_ome_credentials.yml with unified decrypt_include_encrypt.yml call against omnia_config_credentials.yml - Updated ome_discovery/vars/main.yml to reference omnia_config_credentials_file and omnia_config_credentials_vault_key instead of the separate .vault/ paths - Deleted .vault/ome_credentials.yml and .vault/.vault_password (no longer needed) * chore: update copyright year from 2025 to 2026 in modified files Updated copyright header in all ome_discovery files modified during this feature branch: - library/generate_pxe_mapping.py - library/ome_server_inventory.py - tasks/generate_pxe_mapping.yml - tasks/get_ome_credentials.yml - defaults/main.yml - vars/main.yml * fix: restore discovery_validations role missed during discovery-to-provision rename discovery/roles/discovery_validations/ was accidentally dropped when renaming the discovery/ directory to provision/. Add it back under provision/roles/discovery_validations/ to resolve the PR merge conflict. * chore: update copyright year to 2026 in provision/roles/discovery_validations files * fix: remove duplicate discovery_validations role (provision_validations already exists) provision/roles/provision_validations/ is the correct renamed equivalent of discovery/roles/discovery_validations/. The discovery_validations copy added to provision/ was redundant. * feat: apply upstream telemetry upgrade changes from dell/omnia pub/q2_dev - Replace kubectl command with kubernetes.core.k8s module for iDRAC StatefulSet - Preserve existing replica count during iDRAC StatefulSet upgrade - Add LDMS store daemon check, restart, and readiness wait tasks * fix: quote build_stream_job_id_absent message in provision_validations vars * feat: add discovery/roles/discovery_validations and telemetry files - Add discovery/roles/discovery_validations/vars/main.yml with task definitions for validation flow - Add discovery/roles/telemetry/tasks/apply_telemetry_on_upgrade.yml with upstream telemetry upgrade logic (replica preservation + LDMS store) * fix: wrap long line in fetch_credentials.yml to satisfy yaml[line-length] lint * refactor: move ome_ip from credentials to discovery_config.yml - Create input/discovery_config.yml for non-credential discovery settings (ome_ip, future Magellan config) - Remove ome_ip from omnia_credential.j2 and credential update vars - Load ome_ip via include_vars from discovery_config.yml in get_ome_credentials.yml - Add discovery_config.yml to provision_validations discovery_inputs - Remove redundant ib_subnet/admin_subnet defaults from ome_discovery * fix: add newline at end of ome_discovery/defaults/main.yml * fix: override role_path to absolute path for decrypt_include_encrypt.yml role_path resolves to ome_discovery role path, causing encrypt_files_vars.yml to be looked up incorrectly. Override to playbook_dir dirname (/opt/omnia/omnia). * fix: inline credential loading to avoid role_path resolution issue role_path cannot be overridden in include_tasks vars. Replace the call to decrypt_include_encrypt.yml with direct include_vars using stat checks for encrypted vs unencrypted credential file handling. * fix: skip load-failure rule in ansible-lint to avoid CI false positives ansible-lint fails to resolve role_path relative paths during static analysis in GitHub Actions, causing false load-failure errors for files that exist and work at runtime. * Update ansible.cfg * Update ansible.cfg * refactor: rename discovery references to provision and add discovery_config variable - Rename discover_mapping_nodes.yml to provision_mapping_nodes.yml - Replace "discovery" terminology with "provision" across playbooks, vars, READMEs, and task names in provision roles - Add subnet as required field with IP pattern validation in network_spec schema - Define discovery_config variable in ome_discovery vars and use it in get_ome_credentials.yml (consistent with provision_config pattern) - Rename discovery_inputs to provision_inputs in validation vars - Rename discovery_mech_mapping to provision_mech_mapping - Update user-facing messages to reference provision.yml Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: credential rules, vault handling, GROUP_NAME validation, and discovery playbook improvements - Add ome_username and ome_password validation rules to credential_rules.json - Add 'discovery' tag to prepare_oim omnia_run_tags so OME credentials are prompted - Fix vault-encrypted credential loading in get_ome_credentials.yml (use decrypt-include-reencrypt pattern instead of unsupported vault_password_file) - Add include_input_dir.yml import to discovery.yml so input_project_dir is set - Accept SU1-SU100 (case-insensitive) in addition to grp0-grp100 for GROUP_NAME - Fix Magellan message to use list format (avoids \n in debug output) - Remove escaped quotes from discovery usage examples Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: extend SU group name support to build_image validation and schemas - Add build_aarch_image tag to input_file_inventory so build_image_aarch64.yml runs provision_config validation (was missing, causing no validation to run for aarch64 builds) - Update GROUP_NAME patterns in functional_groups_config.json and omnia_config.json schemas to accept SU1-SU100 format alongside grp0-grp100 - Update INVALID_GROUP_NAME_MSG to reflect both accepted formats Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Cleanup discovery roles: move library modules, remove unused roles (#4261) * Cleanup discovery roles: move library modules, remove unused roles - Move ome_server_inventory.py and generate_pxe_mapping.py from discovery/roles/ome_discovery/library/ to common/library/modules/ so they are shared via the common module search path already configured in discovery/ansible.cfg - Remove unused discovery/roles/telemetry/ directory - Remove unused discovery/roles/discovery_validations/ directory - Load discovery_config.yml at playbook level in discovery.yml (consistent with how build_stream_config.yml is loaded in provision.yml) - Fix discovery_complete_msg formatting for readable Ansible output Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Remove unused discovery_validations role Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix for set_pxe_boot.yml when custom inventory given (#4260) * Update generate_bmc_inventory.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * Update pre_checks.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * lint issue Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> --------- Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com>
abhishek-sa1
added a commit
that referenced
this pull request
Apr 9, 2026
* Update storage_config.j2 and transform_storage_config.yml * Update storage_config.j2 * Update storage_config.j2 * Ldms changes for upgrade * Admin NIC state validation * Update validation_utils.py * upgrade idrac telemetry replica preserve changes * localrepo upgrade migration Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> * Update ansible-lint.yml * Update pylint.yml * Removed display task from playbook Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> * pulp python content copy fix Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> * Merge pull request #4242 from SOWJANYAJAGADISH123/pub/q2_dev pxe boot utility * appended log directory details to migration msg Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> * removing input template * Fix for pulp remote RemoteArtifacts is 0 after repo migration Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> * Feature/ome discovery pxe mapping enhancements (#4245) * feat(discovery): OME static group extraction, PXE mapping IP/SU/parent tag enhancements ome_server_inventory.py: - Fix static group extraction: find 'Static Groups' container by name and select only direct children via ParentId; avoids picking system/nested groups - Emit module.warn() for static groups that exist but have no devices assigned - Fix idrac_hostname: read InstrumentationName/DnsName from DeviceManagement ManagementType==2 entry instead of DeviceName which returns the IP address generate_pxe_mapping.py: - ADMIN_IP: derive from first 2 octets of admin_network.subnet + last 2 of BMC IP - IB_IP: derive from first 2 octets of ib_network.subnet + last 2 of BMC IP - Skip IB_IP/IB_MAC when server has no IB NIC (ib_nic_mac is empty) - Add extract_su_from_hostname() with regex (SU[A-Z]?\d+)(?=R\d+) to parse Scalable Unit from BMC hostname; rejects service-tag-only hostnames (idrac-JCGT033) and falls back to grp0 when no SU pattern is found - Set GROUP_NAME to extracted SU identifier (fallback: grp0) - Post-process rows to assign PARENT_SERVICE_TAG from the service_kube_control_plane_x86_64 node within the same SU group - Remove BMC_HOSTNAME from CSV headers and output rows - Lint: remove dead try/except in calculate_admin_ip/calculate_ib_ip, reuse ib_mac variable, suppress broad-except pylint warning generate_pxe_mapping.yml: - Load network_spec.yml via include_vars - Set admin_subnet and ib_subnet using selectattr on Networks list - Pass both subnets as parameters to the generate_pxe_mapping module defaults/main.yml: - Add admin_subnet and ib_subnet default variables (empty string) provision_validation.py: - Comment out validate_admin_ips_against_network_spec function and its call site; ADMIN_IPs are now derived from subnet octets + BMC IP and will not necessarily fall within primary_oim_admin_ip/netmask_bits range * refactor: rename discovery directory to provision, update network_spec.yml - Renamed discovery/ to provision/ (git detected as rename, no content loss) - Updated input/network_spec.yml with latest network configuration changes * Update discovery.yml * refactor: unify OME credentials into get_config_credentials flow - Added ome_ip, ome_username, ome_password to omnia_credential.j2 template - Added 'discovery' service entry to omnia_credentials in update_config/vars/main.yml - Added 'discovery' to the hardcoded service key trigger list in fetch_credentials.yml - Replaced custom vault logic in get_ome_credentials.yml with unified decrypt_include_encrypt.yml call against omnia_config_credentials.yml - Updated ome_discovery/vars/main.yml to reference omnia_config_credentials_file and omnia_config_credentials_vault_key instead of the separate .vault/ paths - Deleted .vault/ome_credentials.yml and .vault/.vault_password (no longer needed) * chore: update copyright year from 2025 to 2026 in modified files Updated copyright header in all ome_discovery files modified during this feature branch: - library/generate_pxe_mapping.py - library/ome_server_inventory.py - tasks/generate_pxe_mapping.yml - tasks/get_ome_credentials.yml - defaults/main.yml - vars/main.yml * fix: restore discovery_validations role missed during discovery-to-provision rename discovery/roles/discovery_validations/ was accidentally dropped when renaming the discovery/ directory to provision/. Add it back under provision/roles/discovery_validations/ to resolve the PR merge conflict. * chore: update copyright year to 2026 in provision/roles/discovery_validations files * fix: remove duplicate discovery_validations role (provision_validations already exists) provision/roles/provision_validations/ is the correct renamed equivalent of discovery/roles/discovery_validations/. The discovery_validations copy added to provision/ was redundant. * feat: apply upstream telemetry upgrade changes from dell/omnia pub/q2_dev - Replace kubectl command with kubernetes.core.k8s module for iDRAC StatefulSet - Preserve existing replica count during iDRAC StatefulSet upgrade - Add LDMS store daemon check, restart, and readiness wait tasks * fix: quote build_stream_job_id_absent message in provision_validations vars * feat: add discovery/roles/discovery_validations and telemetry files - Add discovery/roles/discovery_validations/vars/main.yml with task definitions for validation flow - Add discovery/roles/telemetry/tasks/apply_telemetry_on_upgrade.yml with upstream telemetry upgrade logic (replica preservation + LDMS store) * fix: wrap long line in fetch_credentials.yml to satisfy yaml[line-length] lint * refactor: move ome_ip from credentials to discovery_config.yml - Create input/discovery_config.yml for non-credential discovery settings (ome_ip, future Magellan config) - Remove ome_ip from omnia_credential.j2 and credential update vars - Load ome_ip via include_vars from discovery_config.yml in get_ome_credentials.yml - Add discovery_config.yml to provision_validations discovery_inputs - Remove redundant ib_subnet/admin_subnet defaults from ome_discovery * fix: add newline at end of ome_discovery/defaults/main.yml * fix: override role_path to absolute path for decrypt_include_encrypt.yml role_path resolves to ome_discovery role path, causing encrypt_files_vars.yml to be looked up incorrectly. Override to playbook_dir dirname (/opt/omnia/omnia). * fix: inline credential loading to avoid role_path resolution issue role_path cannot be overridden in include_tasks vars. Replace the call to decrypt_include_encrypt.yml with direct include_vars using stat checks for encrypted vs unencrypted credential file handling. * fix: skip load-failure rule in ansible-lint to avoid CI false positives ansible-lint fails to resolve role_path relative paths during static analysis in GitHub Actions, causing false load-failure errors for files that exist and work at runtime. * Update ansible.cfg * Update ansible.cfg * refactor: rename discovery references to provision and add discovery_config variable - Rename discover_mapping_nodes.yml to provision_mapping_nodes.yml - Replace "discovery" terminology with "provision" across playbooks, vars, READMEs, and task names in provision roles - Add subnet as required field with IP pattern validation in network_spec schema - Define discovery_config variable in ome_discovery vars and use it in get_ome_credentials.yml (consistent with provision_config pattern) - Rename discovery_inputs to provision_inputs in validation vars - Rename discovery_mech_mapping to provision_mech_mapping - Update user-facing messages to reference provision.yml Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: credential rules, vault handling, GROUP_NAME validation, and discovery playbook improvements - Add ome_username and ome_password validation rules to credential_rules.json - Add 'discovery' tag to prepare_oim omnia_run_tags so OME credentials are prompted - Fix vault-encrypted credential loading in get_ome_credentials.yml (use decrypt-include-reencrypt pattern instead of unsupported vault_password_file) - Add include_input_dir.yml import to discovery.yml so input_project_dir is set - Accept SU1-SU100 (case-insensitive) in addition to grp0-grp100 for GROUP_NAME - Fix Magellan message to use list format (avoids \n in debug output) - Remove escaped quotes from discovery usage examples Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: extend SU group name support to build_image validation and schemas - Add build_aarch_image tag to input_file_inventory so build_image_aarch64.yml runs provision_config validation (was missing, causing no validation to run for aarch64 builds) - Update GROUP_NAME patterns in functional_groups_config.json and omnia_config.json schemas to accept SU1-SU100 format alongside grp0-grp100 - Update INVALID_GROUP_NAME_MSG to reflect both accepted formats Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Cleanup discovery roles: move library modules, remove unused roles (#4261) * Cleanup discovery roles: move library modules, remove unused roles - Move ome_server_inventory.py and generate_pxe_mapping.py from discovery/roles/ome_discovery/library/ to common/library/modules/ so they are shared via the common module search path already configured in discovery/ansible.cfg - Remove unused discovery/roles/telemetry/ directory - Remove unused discovery/roles/discovery_validations/ directory - Load discovery_config.yml at playbook level in discovery.yml (consistent with how build_stream_config.yml is loaded in provision.yml) - Fix discovery_complete_msg formatting for readable Ansible output Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Remove unused discovery_validations role Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix for set_pxe_boot.yml when custom inventory given (#4260) * Update generate_bmc_inventory.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * Update pre_checks.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * lint issue Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> --------- Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * Delete examples/input_template directory Signed-off-by: Venu-p1 <236371043+Venu-p1@users.noreply.github.com> * Delete discovery/roles/discovery_validations/vars/main.yml Signed-off-by: Venu-p1 <236371043+Venu-p1@users.noreply.github.com> * Delete discovery/roles/telemetry/tasks/apply_telemetry_on_upgrade.yml Signed-off-by: Venu-p1 <236371043+Venu-p1@users.noreply.github.com> * Update pre_checks.yml Signed-off-by: Venu-p1 <236371043+Venu-p1@users.noreply.github.com> * Update pulp_repo_name_migration.py Signed-off-by: Venu-p1 <236371043+Venu-p1@users.noreply.github.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Signed-off-by: Venu-p1 <236371043+Venu-p1@users.noreply.github.com> Co-authored-by: Abhishek S A <abhishek.sa3@dell.com> Co-authored-by: mithileshreddy04 <mithilesh.reddy@dell.com> Co-authored-by: Kratika_Patidar <Kratika.Patidar@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: sakshi-singla-1735 <sakshi.s@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
abhishek-sa1
added a commit
that referenced
this pull request
Apr 21, 2026
* Update openchami git version (#4251) Co-authored-by: mithileshreddy04 <mithilesh.reddy@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> * powerscale teleemtry support with direct authentication mode * use existing vmagent * update messages in vars * merge Pub/q2 dev to pub/telemetry (#4254) * removing input template * Fix for pulp remote RemoteArtifacts is 0 after repo migration Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> * Powerscale teleemtry support using helm * deploy powerscale telemetry using cloud-init * offline deployment of powerscale telemetry * fix for cert-manager failure * fix for cert manager failure * powerscale telemetry deployment with telemetry namespace * sync q2_dev changes (#4263) * removing input template * Fix for pulp remote RemoteArtifacts is 0 after repo migration Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> * Feature/ome discovery pxe mapping enhancements (#4245) * feat(discovery): OME static group extraction, PXE mapping IP/SU/parent tag enhancements ome_server_inventory.py: - Fix static group extraction: find 'Static Groups' container by name and select only direct children via ParentId; avoids picking system/nested groups - Emit module.warn() for static groups that exist but have no devices assigned - Fix idrac_hostname: read InstrumentationName/DnsName from DeviceManagement ManagementType==2 entry instead of DeviceName which returns the IP address generate_pxe_mapping.py: - ADMIN_IP: derive from first 2 octets of admin_network.subnet + last 2 of BMC IP - IB_IP: derive from first 2 octets of ib_network.subnet + last 2 of BMC IP - Skip IB_IP/IB_MAC when server has no IB NIC (ib_nic_mac is empty) - Add extract_su_from_hostname() with regex (SU[A-Z]?\d+)(?=R\d+) to parse Scalable Unit from BMC hostname; rejects service-tag-only hostnames (idrac-JCGT033) and falls back to grp0 when no SU pattern is found - Set GROUP_NAME to extracted SU identifier (fallback: grp0) - Post-process rows to assign PARENT_SERVICE_TAG from the service_kube_control_plane_x86_64 node within the same SU group - Remove BMC_HOSTNAME from CSV headers and output rows - Lint: remove dead try/except in calculate_admin_ip/calculate_ib_ip, reuse ib_mac variable, suppress broad-except pylint warning generate_pxe_mapping.yml: - Load network_spec.yml via include_vars - Set admin_subnet and ib_subnet using selectattr on Networks list - Pass both subnets as parameters to the generate_pxe_mapping module defaults/main.yml: - Add admin_subnet and ib_subnet default variables (empty string) provision_validation.py: - Comment out validate_admin_ips_against_network_spec function and its call site; ADMIN_IPs are now derived from subnet octets + BMC IP and will not necessarily fall within primary_oim_admin_ip/netmask_bits range * refactor: rename discovery directory to provision, update network_spec.yml - Renamed discovery/ to provision/ (git detected as rename, no content loss) - Updated input/network_spec.yml with latest network configuration changes * Update discovery.yml * refactor: unify OME credentials into get_config_credentials flow - Added ome_ip, ome_username, ome_password to omnia_credential.j2 template - Added 'discovery' service entry to omnia_credentials in update_config/vars/main.yml - Added 'discovery' to the hardcoded service key trigger list in fetch_credentials.yml - Replaced custom vault logic in get_ome_credentials.yml with unified decrypt_include_encrypt.yml call against omnia_config_credentials.yml - Updated ome_discovery/vars/main.yml to reference omnia_config_credentials_file and omnia_config_credentials_vault_key instead of the separate .vault/ paths - Deleted .vault/ome_credentials.yml and .vault/.vault_password (no longer needed) * chore: update copyright year from 2025 to 2026 in modified files Updated copyright header in all ome_discovery files modified during this feature branch: - library/generate_pxe_mapping.py - library/ome_server_inventory.py - tasks/generate_pxe_mapping.yml - tasks/get_ome_credentials.yml - defaults/main.yml - vars/main.yml * fix: restore discovery_validations role missed during discovery-to-provision rename discovery/roles/discovery_validations/ was accidentally dropped when renaming the discovery/ directory to provision/. Add it back under provision/roles/discovery_validations/ to resolve the PR merge conflict. * chore: update copyright year to 2026 in provision/roles/discovery_validations files * fix: remove duplicate discovery_validations role (provision_validations already exists) provision/roles/provision_validations/ is the correct renamed equivalent of discovery/roles/discovery_validations/. The discovery_validations copy added to provision/ was redundant. * feat: apply upstream telemetry upgrade changes from dell/omnia pub/q2_dev - Replace kubectl command with kubernetes.core.k8s module for iDRAC StatefulSet - Preserve existing replica count during iDRAC StatefulSet upgrade - Add LDMS store daemon check, restart, and readiness wait tasks * fix: quote build_stream_job_id_absent message in provision_validations vars * feat: add discovery/roles/discovery_validations and telemetry files - Add discovery/roles/discovery_validations/vars/main.yml with task definitions for validation flow - Add discovery/roles/telemetry/tasks/apply_telemetry_on_upgrade.yml with upstream telemetry upgrade logic (replica preservation + LDMS store) * fix: wrap long line in fetch_credentials.yml to satisfy yaml[line-length] lint * refactor: move ome_ip from credentials to discovery_config.yml - Create input/discovery_config.yml for non-credential discovery settings (ome_ip, future Magellan config) - Remove ome_ip from omnia_credential.j2 and credential update vars - Load ome_ip via include_vars from discovery_config.yml in get_ome_credentials.yml - Add discovery_config.yml to provision_validations discovery_inputs - Remove redundant ib_subnet/admin_subnet defaults from ome_discovery * fix: add newline at end of ome_discovery/defaults/main.yml * fix: override role_path to absolute path for decrypt_include_encrypt.yml role_path resolves to ome_discovery role path, causing encrypt_files_vars.yml to be looked up incorrectly. Override to playbook_dir dirname (/opt/omnia/omnia). * fix: inline credential loading to avoid role_path resolution issue role_path cannot be overridden in include_tasks vars. Replace the call to decrypt_include_encrypt.yml with direct include_vars using stat checks for encrypted vs unencrypted credential file handling. * fix: skip load-failure rule in ansible-lint to avoid CI false positives ansible-lint fails to resolve role_path relative paths during static analysis in GitHub Actions, causing false load-failure errors for files that exist and work at runtime. * Update ansible.cfg * Update ansible.cfg * refactor: rename discovery references to provision and add discovery_config variable - Rename discover_mapping_nodes.yml to provision_mapping_nodes.yml - Replace "discovery" terminology with "provision" across playbooks, vars, READMEs, and task names in provision roles - Add subnet as required field with IP pattern validation in network_spec schema - Define discovery_config variable in ome_discovery vars and use it in get_ome_credentials.yml (consistent with provision_config pattern) - Rename discovery_inputs to provision_inputs in validation vars - Rename discovery_mech_mapping to provision_mech_mapping - Update user-facing messages to reference provision.yml Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: credential rules, vault handling, GROUP_NAME validation, and discovery playbook improvements - Add ome_username and ome_password validation rules to credential_rules.json - Add 'discovery' tag to prepare_oim omnia_run_tags so OME credentials are prompted - Fix vault-encrypted credential loading in get_ome_credentials.yml (use decrypt-include-reencrypt pattern instead of unsupported vault_password_file) - Add include_input_dir.yml import to discovery.yml so input_project_dir is set - Accept SU1-SU100 (case-insensitive) in addition to grp0-grp100 for GROUP_NAME - Fix Magellan message to use list format (avoids \n in debug output) - Remove escaped quotes from discovery usage examples Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: extend SU group name support to build_image validation and schemas - Add build_aarch_image tag to input_file_inventory so build_image_aarch64.yml runs provision_config validation (was missing, causing no validation to run for aarch64 builds) - Update GROUP_NAME patterns in functional_groups_config.json and omnia_config.json schemas to accept SU1-SU100 format alongside grp0-grp100 - Update INVALID_GROUP_NAME_MSG to reflect both accepted formats Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Cleanup discovery roles: move library modules, remove unused roles (#4261) * Cleanup discovery roles: move library modules, remove unused roles - Move ome_server_inventory.py and generate_pxe_mapping.py from discovery/roles/ome_discovery/library/ to common/library/modules/ so they are shared via the common module search path already configured in discovery/ansible.cfg - Remove unused discovery/roles/telemetry/ directory - Remove unused discovery/roles/discovery_validations/ directory - Load discovery_config.yml at playbook level in discovery.yml (consistent with how build_stream_config.yml is loaded in provision.yml) - Fix discovery_complete_msg formatting for readable Ansible output Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Remove unused discovery_validations role Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix for set_pxe_boot.yml when custom inventory given (#4260) * Update generate_bmc_inventory.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * Update pre_checks.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * lint issue Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> --------- Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * resolving merge conflict * revert openchami commit id * resolving review comments * addressing review comments * fix for vmagent scraping powerscale metrics * cleanup script correction for powerscale telemetry cleanup * victoria operator and victoria log input validation * vitoria log input and input validation * remving L2 vslidation for victoria log which is not required * input validation and review comment addressing * change idrac_telemetry_collection_type to telemetry_collection_type * Remove invisible Unicode LRM (U+200E) characters from victoria-operator template filenames * VictoriaLogs container image references and default variable * port check * resolve merge conflict * correction for schema * Update telemetry_config.json * Update validate_input.py * merge conflict telemetry_prereq.yml * change victoria_configurations to victoria_metrics_configurations * remove deployment mode input variable * update for upgrade scenarios * update comments * update comment * resolving issues due to merge conflict * vitoria log changes * victoria log cluster component and VLAgent deployment * updating pod name * removing the changes of adding cert * victoria log changes * remivng victoria log pod calidation playbook * cleanup changes for victoria log * Update ansible-lint.yml and pylint for pub/telemetry (#4296) * Update ansible-lint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * Update pylint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * fixing ansible-lint * lint * line-lenght --------- Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Co-authored-by: mithileshreddy04 <mithilesh.reddy@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: Kratika_Patidar <Kratika.Patidar@dell.com>
abhishek-sa1
added a commit
that referenced
this pull request
Apr 27, 2026
* Merge pull request #4294 from mithileshreddy04/pub/q2_dev OpenCHAMI upgrade changes in prepare_oim and oim_cleanup * Feature branch sync - pub/telemetry to pub/q2_dev (#4293) * Update openchami git version (#4251) Co-authored-by: mithileshreddy04 <mithilesh.reddy@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> * powerscale teleemtry support with direct authentication mode * use existing vmagent * update messages in vars * merge Pub/q2 dev to pub/telemetry (#4254) * removing input template * Fix for pulp remote RemoteArtifacts is 0 after repo migration Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> * Powerscale teleemtry support using helm * deploy powerscale telemetry using cloud-init * offline deployment of powerscale telemetry * fix for cert-manager failure * fix for cert manager failure * powerscale telemetry deployment with telemetry namespace * sync q2_dev changes (#4263) * removing input template * Fix for pulp remote RemoteArtifacts is 0 after repo migration Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> * Feature/ome discovery pxe mapping enhancements (#4245) * feat(discovery): OME static group extraction, PXE mapping IP/SU/parent tag enhancements ome_server_inventory.py: - Fix static group extraction: find 'Static Groups' container by name and select only direct children via ParentId; avoids picking system/nested groups - Emit module.warn() for static groups that exist but have no devices assigned - Fix idrac_hostname: read InstrumentationName/DnsName from DeviceManagement ManagementType==2 entry instead of DeviceName which returns the IP address generate_pxe_mapping.py: - ADMIN_IP: derive from first 2 octets of admin_network.subnet + last 2 of BMC IP - IB_IP: derive from first 2 octets of ib_network.subnet + last 2 of BMC IP - Skip IB_IP/IB_MAC when server has no IB NIC (ib_nic_mac is empty) - Add extract_su_from_hostname() with regex (SU[A-Z]?\d+)(?=R\d+) to parse Scalable Unit from BMC hostname; rejects service-tag-only hostnames (idrac-JCGT033) and falls back to grp0 when no SU pattern is found - Set GROUP_NAME to extracted SU identifier (fallback: grp0) - Post-process rows to assign PARENT_SERVICE_TAG from the service_kube_control_plane_x86_64 node within the same SU group - Remove BMC_HOSTNAME from CSV headers and output rows - Lint: remove dead try/except in calculate_admin_ip/calculate_ib_ip, reuse ib_mac variable, suppress broad-except pylint warning generate_pxe_mapping.yml: - Load network_spec.yml via include_vars - Set admin_subnet and ib_subnet using selectattr on Networks list - Pass both subnets as parameters to the generate_pxe_mapping module defaults/main.yml: - Add admin_subnet and ib_subnet default variables (empty string) provision_validation.py: - Comment out validate_admin_ips_against_network_spec function and its call site; ADMIN_IPs are now derived from subnet octets + BMC IP and will not necessarily fall within primary_oim_admin_ip/netmask_bits range * refactor: rename discovery directory to provision, update network_spec.yml - Renamed discovery/ to provision/ (git detected as rename, no content loss) - Updated input/network_spec.yml with latest network configuration changes * Update discovery.yml * refactor: unify OME credentials into get_config_credentials flow - Added ome_ip, ome_username, ome_password to omnia_credential.j2 template - Added 'discovery' service entry to omnia_credentials in update_config/vars/main.yml - Added 'discovery' to the hardcoded service key trigger list in fetch_credentials.yml - Replaced custom vault logic in get_ome_credentials.yml with unified decrypt_include_encrypt.yml call against omnia_config_credentials.yml - Updated ome_discovery/vars/main.yml to reference omnia_config_credentials_file and omnia_config_credentials_vault_key instead of the separate .vault/ paths - Deleted .vault/ome_credentials.yml and .vault/.vault_password (no longer needed) * chore: update copyright year from 2025 to 2026 in modified files Updated copyright header in all ome_discovery files modified during this feature branch: - library/generate_pxe_mapping.py - library/ome_server_inventory.py - tasks/generate_pxe_mapping.yml - tasks/get_ome_credentials.yml - defaults/main.yml - vars/main.yml * fix: restore discovery_validations role missed during discovery-to-provision rename discovery/roles/discovery_validations/ was accidentally dropped when renaming the discovery/ directory to provision/. Add it back under provision/roles/discovery_validations/ to resolve the PR merge conflict. * chore: update copyright year to 2026 in provision/roles/discovery_validations files * fix: remove duplicate discovery_validations role (provision_validations already exists) provision/roles/provision_validations/ is the correct renamed equivalent of discovery/roles/discovery_validations/. The discovery_validations copy added to provision/ was redundant. * feat: apply upstream telemetry upgrade changes from dell/omnia pub/q2_dev - Replace kubectl command with kubernetes.core.k8s module for iDRAC StatefulSet - Preserve existing replica count during iDRAC StatefulSet upgrade - Add LDMS store daemon check, restart, and readiness wait tasks * fix: quote build_stream_job_id_absent message in provision_validations vars * feat: add discovery/roles/discovery_validations and telemetry files - Add discovery/roles/discovery_validations/vars/main.yml with task definitions for validation flow - Add discovery/roles/telemetry/tasks/apply_telemetry_on_upgrade.yml with upstream telemetry upgrade logic (replica preservation + LDMS store) * fix: wrap long line in fetch_credentials.yml to satisfy yaml[line-length] lint * refactor: move ome_ip from credentials to discovery_config.yml - Create input/discovery_config.yml for non-credential discovery settings (ome_ip, future Magellan config) - Remove ome_ip from omnia_credential.j2 and credential update vars - Load ome_ip via include_vars from discovery_config.yml in get_ome_credentials.yml - Add discovery_config.yml to provision_validations discovery_inputs - Remove redundant ib_subnet/admin_subnet defaults from ome_discovery * fix: add newline at end of ome_discovery/defaults/main.yml * fix: override role_path to absolute path for decrypt_include_encrypt.yml role_path resolves to ome_discovery role path, causing encrypt_files_vars.yml to be looked up incorrectly. Override to playbook_dir dirname (/opt/omnia/omnia). * fix: inline credential loading to avoid role_path resolution issue role_path cannot be overridden in include_tasks vars. Replace the call to decrypt_include_encrypt.yml with direct include_vars using stat checks for encrypted vs unencrypted credential file handling. * fix: skip load-failure rule in ansible-lint to avoid CI false positives ansible-lint fails to resolve role_path relative paths during static analysis in GitHub Actions, causing false load-failure errors for files that exist and work at runtime. * Update ansible.cfg * Update ansible.cfg * refactor: rename discovery references to provision and add discovery_config variable - Rename discover_mapping_nodes.yml to provision_mapping_nodes.yml - Replace "discovery" terminology with "provision" across playbooks, vars, READMEs, and task names in provision roles - Add subnet as required field with IP pattern validation in network_spec schema - Define discovery_config variable in ome_discovery vars and use it in get_ome_credentials.yml (consistent with provision_config pattern) - Rename discovery_inputs to provision_inputs in validation vars - Rename discovery_mech_mapping to provision_mech_mapping - Update user-facing messages to reference provision.yml Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: credential rules, vault handling, GROUP_NAME validation, and discovery playbook improvements - Add ome_username and ome_password validation rules to credential_rules.json - Add 'discovery' tag to prepare_oim omnia_run_tags so OME credentials are prompted - Fix vault-encrypted credential loading in get_ome_credentials.yml (use decrypt-include-reencrypt pattern instead of unsupported vault_password_file) - Add include_input_dir.yml import to discovery.yml so input_project_dir is set - Accept SU1-SU100 (case-insensitive) in addition to grp0-grp100 for GROUP_NAME - Fix Magellan message to use list format (avoids \n in debug output) - Remove escaped quotes from discovery usage examples Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: extend SU group name support to build_image validation and schemas - Add build_aarch_image tag to input_file_inventory so build_image_aarch64.yml runs provision_config validation (was missing, causing no validation to run for aarch64 builds) - Update GROUP_NAME patterns in functional_groups_config.json and omnia_config.json schemas to accept SU1-SU100 format alongside grp0-grp100 - Update INVALID_GROUP_NAME_MSG to reflect both accepted formats Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Cleanup discovery roles: move library modules, remove unused roles (#4261) * Cleanup discovery roles: move library modules, remove unused roles - Move ome_server_inventory.py and generate_pxe_mapping.py from discovery/roles/ome_discovery/library/ to common/library/modules/ so they are shared via the common module search path already configured in discovery/ansible.cfg - Remove unused discovery/roles/telemetry/ directory - Remove unused discovery/roles/discovery_validations/ directory - Load discovery_config.yml at playbook level in discovery.yml (consistent with how build_stream_config.yml is loaded in provision.yml) - Fix discovery_complete_msg formatting for readable Ansible output Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Remove unused discovery_validations role Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix for set_pxe_boot.yml when custom inventory given (#4260) * Update generate_bmc_inventory.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * Update pre_checks.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * lint issue Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> --------- Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * resolving merge conflict * revert openchami commit id * resolving review comments * addressing review comments * fix for vmagent scraping powerscale metrics * cleanup script correction for powerscale telemetry cleanup * victoria operator and victoria log input validation * vitoria log input and input validation * remving L2 vslidation for victoria log which is not required * input validation and review comment addressing * change idrac_telemetry_collection_type to telemetry_collection_type * Remove invisible Unicode LRM (U+200E) characters from victoria-operator template filenames * VictoriaLogs container image references and default variable * port check * resolve merge conflict * correction for schema * Update telemetry_config.json * Update validate_input.py * merge conflict telemetry_prereq.yml * change victoria_configurations to victoria_metrics_configurations * remove deployment mode input variable * update for upgrade scenarios * update comments * update comment * resolving issues due to merge conflict * vitoria log changes * victoria log cluster component and VLAgent deployment * updating pod name * removing the changes of adding cert * victoria log changes * remivng victoria log pod calidation playbook * cleanup changes for victoria log * Update ansible-lint.yml and pylint for pub/telemetry (#4296) * Update ansible-lint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * Update pylint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * fixing ansible-lint * lint * line-lenght --------- Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Co-authored-by: mithileshreddy04 <mithilesh.reddy@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: Kratika_Patidar <Kratika.Patidar@dell.com> * IB nic ip assignment * update MinIO and registry images to fixed tagged versions, omnia core container tag and version to 2.2 and v2.2.0.0 (#4309) * Minimal OS-only functional group enablement for x86_64 and aarch64 * Update image_package_collector.py * Update provision_validation.py * Minimal OS functional group updates in provision * Minimal OS functional group upgrade * Fix os_* package cross-contamination and remove stale discovery templates * OpenCHAMI upgrade changes * Update openchami container tags * Update main.yml * Update main.yml * Update main.yml * Update omnia version and core tag --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Co-authored-by: Mithilesh Reddy <mithilesh.reddy@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: Kratika_Patidar <Kratika.Patidar@dell.com> Co-authored-by: Nagachandan-P <Nagachandan.p@dell.com>
Rajeshkumar-s2
added a commit
that referenced
this pull request
Apr 30, 2026
* Minimal OS-only functional group enablement (#4267) * Minimal OS-only functional group enablement for x86_64 and aarch64 * Update image_package_collector.py * Update provision_validation.py * Minimal OS functional group updates in provision * Minimal OS functional group upgrade * Fix os_* package cross-contamination and remove stale discovery templates * csi defect fix * nvidia dcgm install * Add 'provision' tag to omnia_run_tags Signed-off-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> * Add 'provision' tag to omnia_run_tags (#4276) Signed-off-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> * Fix incorrect file path in Podman login failure message (OMN01D-2166) The podman_login_fail_msg referenced a hardcoded path input/omnia_config_credentials.yml which does not exist. Updated it to use the dynamic input_project_dir variable so the error message now correctly points to the actual credentials file at <input_project_dir>/omnia_config_credentials.yml. Fixed in both prepare_oim and gitlab roles. * Merge pull request #4294 from mithileshreddy04/pub/q2_dev OpenCHAMI upgrade changes in prepare_oim and oim_cleanup * Feature branch sync - pub/telemetry to pub/q2_dev (#4293) * Update openchami git version (#4251) Co-authored-by: mithileshreddy04 <mithilesh.reddy@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> * powerscale teleemtry support with direct authentication mode * use existing vmagent * update messages in vars * merge Pub/q2 dev to pub/telemetry (#4254) * removing input template * Fix for pulp remote RemoteArtifacts is 0 after repo migration Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> * Powerscale teleemtry support using helm * deploy powerscale telemetry using cloud-init * offline deployment of powerscale telemetry * fix for cert-manager failure * fix for cert manager failure * powerscale telemetry deployment with telemetry namespace * sync q2_dev changes (#4263) * removing input template * Fix for pulp remote RemoteArtifacts is 0 after repo migration Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> * Feature/ome discovery pxe mapping enhancements (#4245) * feat(discovery): OME static group extraction, PXE mapping IP/SU/parent tag enhancements ome_server_inventory.py: - Fix static group extraction: find 'Static Groups' container by name and select only direct children via ParentId; avoids picking system/nested groups - Emit module.warn() for static groups that exist but have no devices assigned - Fix idrac_hostname: read InstrumentationName/DnsName from DeviceManagement ManagementType==2 entry instead of DeviceName which returns the IP address generate_pxe_mapping.py: - ADMIN_IP: derive from first 2 octets of admin_network.subnet + last 2 of BMC IP - IB_IP: derive from first 2 octets of ib_network.subnet + last 2 of BMC IP - Skip IB_IP/IB_MAC when server has no IB NIC (ib_nic_mac is empty) - Add extract_su_from_hostname() with regex (SU[A-Z]?\d+)(?=R\d+) to parse Scalable Unit from BMC hostname; rejects service-tag-only hostnames (idrac-JCGT033) and falls back to grp0 when no SU pattern is found - Set GROUP_NAME to extracted SU identifier (fallback: grp0) - Post-process rows to assign PARENT_SERVICE_TAG from the service_kube_control_plane_x86_64 node within the same SU group - Remove BMC_HOSTNAME from CSV headers and output rows - Lint: remove dead try/except in calculate_admin_ip/calculate_ib_ip, reuse ib_mac variable, suppress broad-except pylint warning generate_pxe_mapping.yml: - Load network_spec.yml via include_vars - Set admin_subnet and ib_subnet using selectattr on Networks list - Pass both subnets as parameters to the generate_pxe_mapping module defaults/main.yml: - Add admin_subnet and ib_subnet default variables (empty string) provision_validation.py: - Comment out validate_admin_ips_against_network_spec function and its call site; ADMIN_IPs are now derived from subnet octets + BMC IP and will not necessarily fall within primary_oim_admin_ip/netmask_bits range * refactor: rename discovery directory to provision, update network_spec.yml - Renamed discovery/ to provision/ (git detected as rename, no content loss) - Updated input/network_spec.yml with latest network configuration changes * Update discovery.yml * refactor: unify OME credentials into get_config_credentials flow - Added ome_ip, ome_username, ome_password to omnia_credential.j2 template - Added 'discovery' service entry to omnia_credentials in update_config/vars/main.yml - Added 'discovery' to the hardcoded service key trigger list in fetch_credentials.yml - Replaced custom vault logic in get_ome_credentials.yml with unified decrypt_include_encrypt.yml call against omnia_config_credentials.yml - Updated ome_discovery/vars/main.yml to reference omnia_config_credentials_file and omnia_config_credentials_vault_key instead of the separate .vault/ paths - Deleted .vault/ome_credentials.yml and .vault/.vault_password (no longer needed) * chore: update copyright year from 2025 to 2026 in modified files Updated copyright header in all ome_discovery files modified during this feature branch: - library/generate_pxe_mapping.py - library/ome_server_inventory.py - tasks/generate_pxe_mapping.yml - tasks/get_ome_credentials.yml - defaults/main.yml - vars/main.yml * fix: restore discovery_validations role missed during discovery-to-provision rename discovery/roles/discovery_validations/ was accidentally dropped when renaming the discovery/ directory to provision/. Add it back under provision/roles/discovery_validations/ to resolve the PR merge conflict. * chore: update copyright year to 2026 in provision/roles/discovery_validations files * fix: remove duplicate discovery_validations role (provision_validations already exists) provision/roles/provision_validations/ is the correct renamed equivalent of discovery/roles/discovery_validations/. The discovery_validations copy added to provision/ was redundant. * feat: apply upstream telemetry upgrade changes from dell/omnia pub/q2_dev - Replace kubectl command with kubernetes.core.k8s module for iDRAC StatefulSet - Preserve existing replica count during iDRAC StatefulSet upgrade - Add LDMS store daemon check, restart, and readiness wait tasks * fix: quote build_stream_job_id_absent message in provision_validations vars * feat: add discovery/roles/discovery_validations and telemetry files - Add discovery/roles/discovery_validations/vars/main.yml with task definitions for validation flow - Add discovery/roles/telemetry/tasks/apply_telemetry_on_upgrade.yml with upstream telemetry upgrade logic (replica preservation + LDMS store) * fix: wrap long line in fetch_credentials.yml to satisfy yaml[line-length] lint * refactor: move ome_ip from credentials to discovery_config.yml - Create input/discovery_config.yml for non-credential discovery settings (ome_ip, future Magellan config) - Remove ome_ip from omnia_credential.j2 and credential update vars - Load ome_ip via include_vars from discovery_config.yml in get_ome_credentials.yml - Add discovery_config.yml to provision_validations discovery_inputs - Remove redundant ib_subnet/admin_subnet defaults from ome_discovery * fix: add newline at end of ome_discovery/defaults/main.yml * fix: override role_path to absolute path for decrypt_include_encrypt.yml role_path resolves to ome_discovery role path, causing encrypt_files_vars.yml to be looked up incorrectly. Override to playbook_dir dirname (/opt/omnia/omnia). * fix: inline credential loading to avoid role_path resolution issue role_path cannot be overridden in include_tasks vars. Replace the call to decrypt_include_encrypt.yml with direct include_vars using stat checks for encrypted vs unencrypted credential file handling. * fix: skip load-failure rule in ansible-lint to avoid CI false positives ansible-lint fails to resolve role_path relative paths during static analysis in GitHub Actions, causing false load-failure errors for files that exist and work at runtime. * Update ansible.cfg * Update ansible.cfg * refactor: rename discovery references to provision and add discovery_config variable - Rename discover_mapping_nodes.yml to provision_mapping_nodes.yml - Replace "discovery" terminology with "provision" across playbooks, vars, READMEs, and task names in provision roles - Add subnet as required field with IP pattern validation in network_spec schema - Define discovery_config variable in ome_discovery vars and use it in get_ome_credentials.yml (consistent with provision_config pattern) - Rename discovery_inputs to provision_inputs in validation vars - Rename discovery_mech_mapping to provision_mech_mapping - Update user-facing messages to reference provision.yml Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: credential rules, vault handling, GROUP_NAME validation, and discovery playbook improvements - Add ome_username and ome_password validation rules to credential_rules.json - Add 'discovery' tag to prepare_oim omnia_run_tags so OME credentials are prompted - Fix vault-encrypted credential loading in get_ome_credentials.yml (use decrypt-include-reencrypt pattern instead of unsupported vault_password_file) - Add include_input_dir.yml import to discovery.yml so input_project_dir is set - Accept SU1-SU100 (case-insensitive) in addition to grp0-grp100 for GROUP_NAME - Fix Magellan message to use list format (avoids \n in debug output) - Remove escaped quotes from discovery usage examples Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: extend SU group name support to build_image validation and schemas - Add build_aarch_image tag to input_file_inventory so build_image_aarch64.yml runs provision_config validation (was missing, causing no validation to run for aarch64 builds) - Update GROUP_NAME patterns in functional_groups_config.json and omnia_config.json schemas to accept SU1-SU100 format alongside grp0-grp100 - Update INVALID_GROUP_NAME_MSG to reflect both accepted formats Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Cleanup discovery roles: move library modules, remove unused roles (#4261) * Cleanup discovery roles: move library modules, remove unused roles - Move ome_server_inventory.py and generate_pxe_mapping.py from discovery/roles/ome_discovery/library/ to common/library/modules/ so they are shared via the common module search path already configured in discovery/ansible.cfg - Remove unused discovery/roles/telemetry/ directory - Remove unused discovery/roles/discovery_validations/ directory - Load discovery_config.yml at playbook level in discovery.yml (consistent with how build_stream_config.yml is loaded in provision.yml) - Fix discovery_complete_msg formatting for readable Ansible output Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Remove unused discovery_validations role Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix for set_pxe_boot.yml when custom inventory given (#4260) * Update generate_bmc_inventory.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * Update pre_checks.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * lint issue Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> --------- Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * resolving merge conflict * revert openchami commit id * resolving review comments * addressing review comments * fix for vmagent scraping powerscale metrics * cleanup script correction for powerscale telemetry cleanup * victoria operator and victoria log input validation * vitoria log input and input validation * remving L2 vslidation for victoria log which is not required * input validation and review comment addressing * change idrac_telemetry_collection_type to telemetry_collection_type * Remove invisible Unicode LRM (U+200E) characters from victoria-operator template filenames * VictoriaLogs container image references and default variable * port check * resolve merge conflict * correction for schema * Update telemetry_config.json * Update validate_input.py * merge conflict telemetry_prereq.yml * change victoria_configurations to victoria_metrics_configurations * remove deployment mode input variable * update for upgrade scenarios * update comments * update comment * resolving issues due to merge conflict * vitoria log changes * victoria log cluster component and VLAgent deployment * updating pod name * removing the changes of adding cert * victoria log changes * remivng victoria log pod calidation playbook * cleanup changes for victoria log * Update ansible-lint.yml and pylint for pub/telemetry (#4296) * Update ansible-lint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * Update pylint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * fixing ansible-lint * lint * line-lenght --------- Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Co-authored-by: mithileshreddy04 <mithilesh.reddy@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: Kratika_Patidar <Kratika.Patidar@dell.com> * IB nic ip assignment * update MinIO and registry images to fixed tagged versions, omnia core container tag and version to 2.2 and v2.2.0.0 (#4309) * Minimal OS-only functional group enablement for x86_64 and aarch64 * Update image_package_collector.py * Update provision_validation.py * Minimal OS functional group updates in provision * Minimal OS functional group upgrade * Fix os_* package cross-contamination and remove stale discovery templates * OpenCHAMI upgrade changes * Update openchami container tags * Update main.yml * Update main.yml * Update main.yml * Update omnia version and core tag * vast client installation * single template way * fix(OMN01D-2164): prompt OME credentials only when enable_bmc_discovery is true - Add enable_bmc_discovery flag (default: false) to discovery_config.yml - Load discovery_config.yml in prepare_oim.yml and set ome_discovery_enabled based on enable_bmc_discovery flag - Change discovery credentials from mandatory to conditional_mandatory gated on ome_discovery_enabled - When enable_bmc_discovery is false, OME username/password prompts are skipped during prepare_oim even if ome_ip is pre-filled * fix(OMN01D-2168): fail explicitly when discovery_mechanism is not provided Replace meta: end_play with ansible.builtin.fail so the playbook exits with non-zero status and a clear error message when discovery_mechanism is missing, instead of silently succeeding. * fix(OMN01D-2169): add L1 input validation for discovery.yml - Create discovery_config.json schema with: - enable_bmc_discovery (boolean, required) - ome_ip (string, required; must be valid IPv4 when enable_bmc_discovery is true) - Register discovery_config in config.py files dict and input_file_inventory - Add 'discovery' tag and invoke validate_config.yml in discovery.yml before role execution (consistent with provision.yml pattern) - Add explicit ome_ip check before OME role inclusion for clear fail-fast error when ome_ip is empty with discovery_mechanism=ome * fix(OMN01D-2225): improve OME authentication and reachability error messages - ome_server_inventory.py: auth failure now tells user to verify ome_username/ome_password in omnia_config_credentials.yml and rerun - collect_inventory.yml: wrap wait_for in block/rescue so timeout gives actionable message pointing to discovery_config.yml and network check * fix(OMN01D-2226): correct discovery completion message next steps - Replace misleading 'Rename or copy' instruction with guidance to update pxe_mapping_file_path in provision_config.yml - Show full absolute path of generated file throughout - Add spacing between steps for readability * fix(OMN01D-2227): escape backslash in docstring to suppress SyntaxWarning Python 3.12+ warns about invalid escape sequence '\d' in non-raw string literals. The docstring in extract_su_from_hostname() contained (?=R\d+) which triggered SyntaxWarning during discovery execution. Escaped the backslash to (?=R\\d+) in the docstring. * fix(OMN01D-2230): correct GROUP_NAME and PARENT_SERVICE_TAG in PXE mapping Issue 1 - GROUP_NAME: - Add fallback: try extracting SU pattern from OME group name when BMC hostname has no SU pattern (covers hierarchical OME groups like SU1_slurm_node, SU2_compute, etc.) - grp0 remains the correct default for single-cluster environments Issue 2 - PARENT_SERVICE_TAG: - Define CHILD_ROLES_OF_CONTROL_PLANE set (service_kube_node_x86_64) - Only assign PARENT_SERVICE_TAG to rows whose FUNCTIONAL_GROUP_NAME is a child role of the control plane within the same GROUP_NAME - Control plane nodes, slurm nodes, login nodes, etc. no longer get an incorrect PARENT_SERVICE_TAG * fix(OMN01D-2231): detect and fail on duplicate OME static group assignments - build_device_group_map() now tracks all group memberships per device and returns a conflicts dict for devices in multiple static groups - main() fails with an actionable error listing each conflicting device and its groups, instead of silently using the first-seen group - Prevents incorrect FUNCTIONAL_GROUP_NAME override in PXE mapping * fix(OMN01D-2232): validate OME group names against supported functional groups - Define SUPPORTED_FUNCTIONAL_GROUPS set matching Omnia's known roles - Skip servers whose OME static group is not in the supported set - Emit a warning per skipped device listing the unsupported group name and the full set of supported groups - Unsupported groups (e.g. 'abc') no longer appear in the PXE mapping * revert: temporarily revert discovery fixes for PR workflow * s3cmd configurations * update image tags * fix: Resolve FK constraint violation and catalog metadata persistence issues - Fix FK constraint by using shared session for image_group_repo and image_repo in ResultPoller - Concatenate multiple S3 paths with semicolon delimiter to respect unique constraint uq_images_image_group_id_role - Update cleanup_job.py to split semicolon-delimited paths and delete each individually - Widen image_name column from VARCHAR(256) to VARCHAR(512) to accommodate semicolon-delimited paths - Add logging to FileArtifactStore.store() to debug file write failures - Fix URI construction in parse_catalog.py for existing artifacts (construct file:// URI directly) - Update build_stream container image tag to 1.1 Tested with job 7da54d1f-ed26-41dd-b3ff-0386104e644d (image-build19) - ImageGroup and Images created successfully * Fix lint error --------- Signed-off-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Co-authored-by: Mithilesh Reddy <mithilesh.reddy@dell.com> Co-authored-by: mcas <sakshi.s@dell.com> Co-authored-by: Super User <root@oim.omnia.test> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> Co-authored-by: balajikumaran-c-s <balajikumaran.cs@dellteam.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: Abhishek S A <abhishek.sa3@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: Kratika_Patidar <Kratika.Patidar@dell.com> Co-authored-by: Nagachandan-P <Nagachandan.p@dell.com> Co-authored-by: Sujit Jadhav <sujit.jadhav@dell.com>
Rajeshkumar-s2
added a commit
that referenced
this pull request
May 1, 2026
* Minimal OS-only functional group enablement (#4267) * Minimal OS-only functional group enablement for x86_64 and aarch64 * Update image_package_collector.py * Update provision_validation.py * Minimal OS functional group updates in provision * Minimal OS functional group upgrade * Fix os_* package cross-contamination and remove stale discovery templates * csi defect fix * nvidia dcgm install * Add 'provision' tag to omnia_run_tags Signed-off-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> * Add 'provision' tag to omnia_run_tags (#4276) Signed-off-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> * Fix incorrect file path in Podman login failure message (OMN01D-2166) The podman_login_fail_msg referenced a hardcoded path input/omnia_config_credentials.yml which does not exist. Updated it to use the dynamic input_project_dir variable so the error message now correctly points to the actual credentials file at <input_project_dir>/omnia_config_credentials.yml. Fixed in both prepare_oim and gitlab roles. * Merge pull request #4294 from mithileshreddy04/pub/q2_dev OpenCHAMI upgrade changes in prepare_oim and oim_cleanup * Feature branch sync - pub/telemetry to pub/q2_dev (#4293) * Update openchami git version (#4251) Co-authored-by: mithileshreddy04 <mithilesh.reddy@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> * powerscale teleemtry support with direct authentication mode * use existing vmagent * update messages in vars * merge Pub/q2 dev to pub/telemetry (#4254) * removing input template * Fix for pulp remote RemoteArtifacts is 0 after repo migration Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> * Powerscale teleemtry support using helm * deploy powerscale telemetry using cloud-init * offline deployment of powerscale telemetry * fix for cert-manager failure * fix for cert manager failure * powerscale telemetry deployment with telemetry namespace * sync q2_dev changes (#4263) * removing input template * Fix for pulp remote RemoteArtifacts is 0 after repo migration Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> * Feature/ome discovery pxe mapping enhancements (#4245) * feat(discovery): OME static group extraction, PXE mapping IP/SU/parent tag enhancements ome_server_inventory.py: - Fix static group extraction: find 'Static Groups' container by name and select only direct children via ParentId; avoids picking system/nested groups - Emit module.warn() for static groups that exist but have no devices assigned - Fix idrac_hostname: read InstrumentationName/DnsName from DeviceManagement ManagementType==2 entry instead of DeviceName which returns the IP address generate_pxe_mapping.py: - ADMIN_IP: derive from first 2 octets of admin_network.subnet + last 2 of BMC IP - IB_IP: derive from first 2 octets of ib_network.subnet + last 2 of BMC IP - Skip IB_IP/IB_MAC when server has no IB NIC (ib_nic_mac is empty) - Add extract_su_from_hostname() with regex (SU[A-Z]?\d+)(?=R\d+) to parse Scalable Unit from BMC hostname; rejects service-tag-only hostnames (idrac-JCGT033) and falls back to grp0 when no SU pattern is found - Set GROUP_NAME to extracted SU identifier (fallback: grp0) - Post-process rows to assign PARENT_SERVICE_TAG from the service_kube_control_plane_x86_64 node within the same SU group - Remove BMC_HOSTNAME from CSV headers and output rows - Lint: remove dead try/except in calculate_admin_ip/calculate_ib_ip, reuse ib_mac variable, suppress broad-except pylint warning generate_pxe_mapping.yml: - Load network_spec.yml via include_vars - Set admin_subnet and ib_subnet using selectattr on Networks list - Pass both subnets as parameters to the generate_pxe_mapping module defaults/main.yml: - Add admin_subnet and ib_subnet default variables (empty string) provision_validation.py: - Comment out validate_admin_ips_against_network_spec function and its call site; ADMIN_IPs are now derived from subnet octets + BMC IP and will not necessarily fall within primary_oim_admin_ip/netmask_bits range * refactor: rename discovery directory to provision, update network_spec.yml - Renamed discovery/ to provision/ (git detected as rename, no content loss) - Updated input/network_spec.yml with latest network configuration changes * Update discovery.yml * refactor: unify OME credentials into get_config_credentials flow - Added ome_ip, ome_username, ome_password to omnia_credential.j2 template - Added 'discovery' service entry to omnia_credentials in update_config/vars/main.yml - Added 'discovery' to the hardcoded service key trigger list in fetch_credentials.yml - Replaced custom vault logic in get_ome_credentials.yml with unified decrypt_include_encrypt.yml call against omnia_config_credentials.yml - Updated ome_discovery/vars/main.yml to reference omnia_config_credentials_file and omnia_config_credentials_vault_key instead of the separate .vault/ paths - Deleted .vault/ome_credentials.yml and .vault/.vault_password (no longer needed) * chore: update copyright year from 2025 to 2026 in modified files Updated copyright header in all ome_discovery files modified during this feature branch: - library/generate_pxe_mapping.py - library/ome_server_inventory.py - tasks/generate_pxe_mapping.yml - tasks/get_ome_credentials.yml - defaults/main.yml - vars/main.yml * fix: restore discovery_validations role missed during discovery-to-provision rename discovery/roles/discovery_validations/ was accidentally dropped when renaming the discovery/ directory to provision/. Add it back under provision/roles/discovery_validations/ to resolve the PR merge conflict. * chore: update copyright year to 2026 in provision/roles/discovery_validations files * fix: remove duplicate discovery_validations role (provision_validations already exists) provision/roles/provision_validations/ is the correct renamed equivalent of discovery/roles/discovery_validations/. The discovery_validations copy added to provision/ was redundant. * feat: apply upstream telemetry upgrade changes from dell/omnia pub/q2_dev - Replace kubectl command with kubernetes.core.k8s module for iDRAC StatefulSet - Preserve existing replica count during iDRAC StatefulSet upgrade - Add LDMS store daemon check, restart, and readiness wait tasks * fix: quote build_stream_job_id_absent message in provision_validations vars * feat: add discovery/roles/discovery_validations and telemetry files - Add discovery/roles/discovery_validations/vars/main.yml with task definitions for validation flow - Add discovery/roles/telemetry/tasks/apply_telemetry_on_upgrade.yml with upstream telemetry upgrade logic (replica preservation + LDMS store) * fix: wrap long line in fetch_credentials.yml to satisfy yaml[line-length] lint * refactor: move ome_ip from credentials to discovery_config.yml - Create input/discovery_config.yml for non-credential discovery settings (ome_ip, future Magellan config) - Remove ome_ip from omnia_credential.j2 and credential update vars - Load ome_ip via include_vars from discovery_config.yml in get_ome_credentials.yml - Add discovery_config.yml to provision_validations discovery_inputs - Remove redundant ib_subnet/admin_subnet defaults from ome_discovery * fix: add newline at end of ome_discovery/defaults/main.yml * fix: override role_path to absolute path for decrypt_include_encrypt.yml role_path resolves to ome_discovery role path, causing encrypt_files_vars.yml to be looked up incorrectly. Override to playbook_dir dirname (/opt/omnia/omnia). * fix: inline credential loading to avoid role_path resolution issue role_path cannot be overridden in include_tasks vars. Replace the call to decrypt_include_encrypt.yml with direct include_vars using stat checks for encrypted vs unencrypted credential file handling. * fix: skip load-failure rule in ansible-lint to avoid CI false positives ansible-lint fails to resolve role_path relative paths during static analysis in GitHub Actions, causing false load-failure errors for files that exist and work at runtime. * Update ansible.cfg * Update ansible.cfg * refactor: rename discovery references to provision and add discovery_config variable - Rename discover_mapping_nodes.yml to provision_mapping_nodes.yml - Replace "discovery" terminology with "provision" across playbooks, vars, READMEs, and task names in provision roles - Add subnet as required field with IP pattern validation in network_spec schema - Define discovery_config variable in ome_discovery vars and use it in get_ome_credentials.yml (consistent with provision_config pattern) - Rename discovery_inputs to provision_inputs in validation vars - Rename discovery_mech_mapping to provision_mech_mapping - Update user-facing messages to reference provision.yml Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: credential rules, vault handling, GROUP_NAME validation, and discovery playbook improvements - Add ome_username and ome_password validation rules to credential_rules.json - Add 'discovery' tag to prepare_oim omnia_run_tags so OME credentials are prompted - Fix vault-encrypted credential loading in get_ome_credentials.yml (use decrypt-include-reencrypt pattern instead of unsupported vault_password_file) - Add include_input_dir.yml import to discovery.yml so input_project_dir is set - Accept SU1-SU100 (case-insensitive) in addition to grp0-grp100 for GROUP_NAME - Fix Magellan message to use list format (avoids \n in debug output) - Remove escaped quotes from discovery usage examples Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: extend SU group name support to build_image validation and schemas - Add build_aarch_image tag to input_file_inventory so build_image_aarch64.yml runs provision_config validation (was missing, causing no validation to run for aarch64 builds) - Update GROUP_NAME patterns in functional_groups_config.json and omnia_config.json schemas to accept SU1-SU100 format alongside grp0-grp100 - Update INVALID_GROUP_NAME_MSG to reflect both accepted formats Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Cleanup discovery roles: move library modules, remove unused roles (#4261) * Cleanup discovery roles: move library modules, remove unused roles - Move ome_server_inventory.py and generate_pxe_mapping.py from discovery/roles/ome_discovery/library/ to common/library/modules/ so they are shared via the common module search path already configured in discovery/ansible.cfg - Remove unused discovery/roles/telemetry/ directory - Remove unused discovery/roles/discovery_validations/ directory - Load discovery_config.yml at playbook level in discovery.yml (consistent with how build_stream_config.yml is loaded in provision.yml) - Fix discovery_complete_msg formatting for readable Ansible output Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Remove unused discovery_validations role Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix for set_pxe_boot.yml when custom inventory given (#4260) * Update generate_bmc_inventory.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * Update pre_checks.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * lint issue Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> --------- Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * resolving merge conflict * revert openchami commit id * resolving review comments * addressing review comments * fix for vmagent scraping powerscale metrics * cleanup script correction for powerscale telemetry cleanup * victoria operator and victoria log input validation * vitoria log input and input validation * remving L2 vslidation for victoria log which is not required * input validation and review comment addressing * change idrac_telemetry_collection_type to telemetry_collection_type * Remove invisible Unicode LRM (U+200E) characters from victoria-operator template filenames * VictoriaLogs container image references and default variable * port check * resolve merge conflict * correction for schema * Update telemetry_config.json * Update validate_input.py * merge conflict telemetry_prereq.yml * change victoria_configurations to victoria_metrics_configurations * remove deployment mode input variable * update for upgrade scenarios * update comments * update comment * resolving issues due to merge conflict * vitoria log changes * victoria log cluster component and VLAgent deployment * updating pod name * removing the changes of adding cert * victoria log changes * remivng victoria log pod calidation playbook * cleanup changes for victoria log * Update ansible-lint.yml and pylint for pub/telemetry (#4296) * Update ansible-lint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * Update pylint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * fixing ansible-lint * lint * line-lenght --------- Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Co-authored-by: mithileshreddy04 <mithilesh.reddy@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: Kratika_Patidar <Kratika.Patidar@dell.com> * IB nic ip assignment * update MinIO and registry images to fixed tagged versions, omnia core container tag and version to 2.2 and v2.2.0.0 (#4309) * Minimal OS-only functional group enablement for x86_64 and aarch64 * Update image_package_collector.py * Update provision_validation.py * Minimal OS functional group updates in provision * Minimal OS functional group upgrade * Fix os_* package cross-contamination and remove stale discovery templates * OpenCHAMI upgrade changes * Update openchami container tags * Update main.yml * Update main.yml * Update main.yml * Update omnia version and core tag * vast client installation * single template way * fix(OMN01D-2164): prompt OME credentials only when enable_bmc_discovery is true - Add enable_bmc_discovery flag (default: false) to discovery_config.yml - Load discovery_config.yml in prepare_oim.yml and set ome_discovery_enabled based on enable_bmc_discovery flag - Change discovery credentials from mandatory to conditional_mandatory gated on ome_discovery_enabled - When enable_bmc_discovery is false, OME username/password prompts are skipped during prepare_oim even if ome_ip is pre-filled * fix(OMN01D-2168): fail explicitly when discovery_mechanism is not provided Replace meta: end_play with ansible.builtin.fail so the playbook exits with non-zero status and a clear error message when discovery_mechanism is missing, instead of silently succeeding. * fix(OMN01D-2169): add L1 input validation for discovery.yml - Create discovery_config.json schema with: - enable_bmc_discovery (boolean, required) - ome_ip (string, required; must be valid IPv4 when enable_bmc_discovery is true) - Register discovery_config in config.py files dict and input_file_inventory - Add 'discovery' tag and invoke validate_config.yml in discovery.yml before role execution (consistent with provision.yml pattern) - Add explicit ome_ip check before OME role inclusion for clear fail-fast error when ome_ip is empty with discovery_mechanism=ome * fix(OMN01D-2225): improve OME authentication and reachability error messages - ome_server_inventory.py: auth failure now tells user to verify ome_username/ome_password in omnia_config_credentials.yml and rerun - collect_inventory.yml: wrap wait_for in block/rescue so timeout gives actionable message pointing to discovery_config.yml and network check * fix(OMN01D-2226): correct discovery completion message next steps - Replace misleading 'Rename or copy' instruction with guidance to update pxe_mapping_file_path in provision_config.yml - Show full absolute path of generated file throughout - Add spacing between steps for readability * fix(OMN01D-2227): escape backslash in docstring to suppress SyntaxWarning Python 3.12+ warns about invalid escape sequence '\d' in non-raw string literals. The docstring in extract_su_from_hostname() contained (?=R\d+) which triggered SyntaxWarning during discovery execution. Escaped the backslash to (?=R\\d+) in the docstring. * fix(OMN01D-2230): correct GROUP_NAME and PARENT_SERVICE_TAG in PXE mapping Issue 1 - GROUP_NAME: - Add fallback: try extracting SU pattern from OME group name when BMC hostname has no SU pattern (covers hierarchical OME groups like SU1_slurm_node, SU2_compute, etc.) - grp0 remains the correct default for single-cluster environments Issue 2 - PARENT_SERVICE_TAG: - Define CHILD_ROLES_OF_CONTROL_PLANE set (service_kube_node_x86_64) - Only assign PARENT_SERVICE_TAG to rows whose FUNCTIONAL_GROUP_NAME is a child role of the control plane within the same GROUP_NAME - Control plane nodes, slurm nodes, login nodes, etc. no longer get an incorrect PARENT_SERVICE_TAG * fix(OMN01D-2231): detect and fail on duplicate OME static group assignments - build_device_group_map() now tracks all group memberships per device and returns a conflicts dict for devices in multiple static groups - main() fails with an actionable error listing each conflicting device and its groups, instead of silently using the first-seen group - Prevents incorrect FUNCTIONAL_GROUP_NAME override in PXE mapping * fix(OMN01D-2232): validate OME group names against supported functional groups - Define SUPPORTED_FUNCTIONAL_GROUPS set matching Omnia's known roles - Skip servers whose OME static group is not in the supported set - Emit a warning per skipped device listing the unsupported group name and the full set of supported groups - Unsupported groups (e.g. 'abc') no longer appear in the PXE mapping * revert: temporarily revert discovery fixes for PR workflow * s3cmd configurations * update image tags * fix: Resolve FK constraint violation and catalog metadata persistence issues - Fix FK constraint by using shared session for image_group_repo and image_repo in ResultPoller - Concatenate multiple S3 paths with semicolon delimiter to respect unique constraint uq_images_image_group_id_role - Update cleanup_job.py to split semicolon-delimited paths and delete each individually - Widen image_name column from VARCHAR(256) to VARCHAR(512) to accommodate semicolon-delimited paths - Add logging to FileArtifactStore.store() to debug file write failures - Fix URI construction in parse_catalog.py for existing artifacts (construct file:// URI directly) - Update build_stream container image tag to 1.1 Tested with job 7da54d1f-ed26-41dd-b3ff-0386104e644d (image-build19) - ImageGroup and Images created successfully * Fix lint error * Modify the charlimit to 512 in DB --------- Signed-off-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Co-authored-by: Mithilesh Reddy <mithilesh.reddy@dell.com> Co-authored-by: mcas <sakshi.s@dell.com> Co-authored-by: Super User <root@oim.omnia.test> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> Co-authored-by: balajikumaran-c-s <balajikumaran.cs@dellteam.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: Abhishek S A <abhishek.sa3@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: Kratika_Patidar <Kratika.Patidar@dell.com> Co-authored-by: Nagachandan-P <Nagachandan.p@dell.com> Co-authored-by: Sujit Jadhav <sujit.jadhav@dell.com>
abhishek-sa1
added a commit
that referenced
this pull request
May 5, 2026
* Minimal OS-only functional group enablement (#4267) * Minimal OS-only functional group enablement for x86_64 and aarch64 * Update image_package_collector.py * Update provision_validation.py * Minimal OS functional group updates in provision * Minimal OS functional group upgrade * Fix os_* package cross-contamination and remove stale discovery templates * csi defect fix * nvidia dcgm install * Add 'provision' tag to omnia_run_tags Signed-off-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> * Add 'provision' tag to omnia_run_tags (#4276) Signed-off-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> * Fix incorrect file path in Podman login failure message (OMN01D-2166) The podman_login_fail_msg referenced a hardcoded path input/omnia_config_credentials.yml which does not exist. Updated it to use the dynamic input_project_dir variable so the error message now correctly points to the actual credentials file at <input_project_dir>/omnia_config_credentials.yml. Fixed in both prepare_oim and gitlab roles. * Merge pull request #4294 from mithileshreddy04/pub/q2_dev OpenCHAMI upgrade changes in prepare_oim and oim_cleanup * Feature branch sync - pub/telemetry to pub/q2_dev (#4293) * Update openchami git version (#4251) Co-authored-by: mithileshreddy04 <mithilesh.reddy@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> * powerscale teleemtry support with direct authentication mode * use existing vmagent * update messages in vars * merge Pub/q2 dev to pub/telemetry (#4254) * removing input template * Fix for pulp remote RemoteArtifacts is 0 after repo migration Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> * Powerscale teleemtry support using helm * deploy powerscale telemetry using cloud-init * offline deployment of powerscale telemetry * fix for cert-manager failure * fix for cert manager failure * powerscale telemetry deployment with telemetry namespace * sync q2_dev changes (#4263) * removing input template * Fix for pulp remote RemoteArtifacts is 0 after repo migration Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> * Feature/ome discovery pxe mapping enhancements (#4245) * feat(discovery): OME static group extraction, PXE mapping IP/SU/parent tag enhancements ome_server_inventory.py: - Fix static group extraction: find 'Static Groups' container by name and select only direct children via ParentId; avoids picking system/nested groups - Emit module.warn() for static groups that exist but have no devices assigned - Fix idrac_hostname: read InstrumentationName/DnsName from DeviceManagement ManagementType==2 entry instead of DeviceName which returns the IP address generate_pxe_mapping.py: - ADMIN_IP: derive from first 2 octets of admin_network.subnet + last 2 of BMC IP - IB_IP: derive from first 2 octets of ib_network.subnet + last 2 of BMC IP - Skip IB_IP/IB_MAC when server has no IB NIC (ib_nic_mac is empty) - Add extract_su_from_hostname() with regex (SU[A-Z]?\d+)(?=R\d+) to parse Scalable Unit from BMC hostname; rejects service-tag-only hostnames (idrac-JCGT033) and falls back to grp0 when no SU pattern is found - Set GROUP_NAME to extracted SU identifier (fallback: grp0) - Post-process rows to assign PARENT_SERVICE_TAG from the service_kube_control_plane_x86_64 node within the same SU group - Remove BMC_HOSTNAME from CSV headers and output rows - Lint: remove dead try/except in calculate_admin_ip/calculate_ib_ip, reuse ib_mac variable, suppress broad-except pylint warning generate_pxe_mapping.yml: - Load network_spec.yml via include_vars - Set admin_subnet and ib_subnet using selectattr on Networks list - Pass both subnets as parameters to the generate_pxe_mapping module defaults/main.yml: - Add admin_subnet and ib_subnet default variables (empty string) provision_validation.py: - Comment out validate_admin_ips_against_network_spec function and its call site; ADMIN_IPs are now derived from subnet octets + BMC IP and will not necessarily fall within primary_oim_admin_ip/netmask_bits range * refactor: rename discovery directory to provision, update network_spec.yml - Renamed discovery/ to provision/ (git detected as rename, no content loss) - Updated input/network_spec.yml with latest network configuration changes * Update discovery.yml * refactor: unify OME credentials into get_config_credentials flow - Added ome_ip, ome_username, ome_password to omnia_credential.j2 template - Added 'discovery' service entry to omnia_credentials in update_config/vars/main.yml - Added 'discovery' to the hardcoded service key trigger list in fetch_credentials.yml - Replaced custom vault logic in get_ome_credentials.yml with unified decrypt_include_encrypt.yml call against omnia_config_credentials.yml - Updated ome_discovery/vars/main.yml to reference omnia_config_credentials_file and omnia_config_credentials_vault_key instead of the separate .vault/ paths - Deleted .vault/ome_credentials.yml and .vault/.vault_password (no longer needed) * chore: update copyright year from 2025 to 2026 in modified files Updated copyright header in all ome_discovery files modified during this feature branch: - library/generate_pxe_mapping.py - library/ome_server_inventory.py - tasks/generate_pxe_mapping.yml - tasks/get_ome_credentials.yml - defaults/main.yml - vars/main.yml * fix: restore discovery_validations role missed during discovery-to-provision rename discovery/roles/discovery_validations/ was accidentally dropped when renaming the discovery/ directory to provision/. Add it back under provision/roles/discovery_validations/ to resolve the PR merge conflict. * chore: update copyright year to 2026 in provision/roles/discovery_validations files * fix: remove duplicate discovery_validations role (provision_validations already exists) provision/roles/provision_validations/ is the correct renamed equivalent of discovery/roles/discovery_validations/. The discovery_validations copy added to provision/ was redundant. * feat: apply upstream telemetry upgrade changes from dell/omnia pub/q2_dev - Replace kubectl command with kubernetes.core.k8s module for iDRAC StatefulSet - Preserve existing replica count during iDRAC StatefulSet upgrade - Add LDMS store daemon check, restart, and readiness wait tasks * fix: quote build_stream_job_id_absent message in provision_validations vars * feat: add discovery/roles/discovery_validations and telemetry files - Add discovery/roles/discovery_validations/vars/main.yml with task definitions for validation flow - Add discovery/roles/telemetry/tasks/apply_telemetry_on_upgrade.yml with upstream telemetry upgrade logic (replica preservation + LDMS store) * fix: wrap long line in fetch_credentials.yml to satisfy yaml[line-length] lint * refactor: move ome_ip from credentials to discovery_config.yml - Create input/discovery_config.yml for non-credential discovery settings (ome_ip, future Magellan config) - Remove ome_ip from omnia_credential.j2 and credential update vars - Load ome_ip via include_vars from discovery_config.yml in get_ome_credentials.yml - Add discovery_config.yml to provision_validations discovery_inputs - Remove redundant ib_subnet/admin_subnet defaults from ome_discovery * fix: add newline at end of ome_discovery/defaults/main.yml * fix: override role_path to absolute path for decrypt_include_encrypt.yml role_path resolves to ome_discovery role path, causing encrypt_files_vars.yml to be looked up incorrectly. Override to playbook_dir dirname (/opt/omnia/omnia). * fix: inline credential loading to avoid role_path resolution issue role_path cannot be overridden in include_tasks vars. Replace the call to decrypt_include_encrypt.yml with direct include_vars using stat checks for encrypted vs unencrypted credential file handling. * fix: skip load-failure rule in ansible-lint to avoid CI false positives ansible-lint fails to resolve role_path relative paths during static analysis in GitHub Actions, causing false load-failure errors for files that exist and work at runtime. * Update ansible.cfg * Update ansible.cfg * refactor: rename discovery references to provision and add discovery_config variable - Rename discover_mapping_nodes.yml to provision_mapping_nodes.yml - Replace "discovery" terminology with "provision" across playbooks, vars, READMEs, and task names in provision roles - Add subnet as required field with IP pattern validation in network_spec schema - Define discovery_config variable in ome_discovery vars and use it in get_ome_credentials.yml (consistent with provision_config pattern) - Rename discovery_inputs to provision_inputs in validation vars - Rename discovery_mech_mapping to provision_mech_mapping - Update user-facing messages to reference provision.yml Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: credential rules, vault handling, GROUP_NAME validation, and discovery playbook improvements - Add ome_username and ome_password validation rules to credential_rules.json - Add 'discovery' tag to prepare_oim omnia_run_tags so OME credentials are prompted - Fix vault-encrypted credential loading in get_ome_credentials.yml (use decrypt-include-reencrypt pattern instead of unsupported vault_password_file) - Add include_input_dir.yml import to discovery.yml so input_project_dir is set - Accept SU1-SU100 (case-insensitive) in addition to grp0-grp100 for GROUP_NAME - Fix Magellan message to use list format (avoids \n in debug output) - Remove escaped quotes from discovery usage examples Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: extend SU group name support to build_image validation and schemas - Add build_aarch_image tag to input_file_inventory so build_image_aarch64.yml runs provision_config validation (was missing, causing no validation to run for aarch64 builds) - Update GROUP_NAME patterns in functional_groups_config.json and omnia_config.json schemas to accept SU1-SU100 format alongside grp0-grp100 - Update INVALID_GROUP_NAME_MSG to reflect both accepted formats Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Cleanup discovery roles: move library modules, remove unused roles (#4261) * Cleanup discovery roles: move library modules, remove unused roles - Move ome_server_inventory.py and generate_pxe_mapping.py from discovery/roles/ome_discovery/library/ to common/library/modules/ so they are shared via the common module search path already configured in discovery/ansible.cfg - Remove unused discovery/roles/telemetry/ directory - Remove unused discovery/roles/discovery_validations/ directory - Load discovery_config.yml at playbook level in discovery.yml (consistent with how build_stream_config.yml is loaded in provision.yml) - Fix discovery_complete_msg formatting for readable Ansible output Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Remove unused discovery_validations role Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix for set_pxe_boot.yml when custom inventory given (#4260) * Update generate_bmc_inventory.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * Update pre_checks.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * lint issue Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> --------- Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * resolving merge conflict * revert openchami commit id * resolving review comments * addressing review comments * fix for vmagent scraping powerscale metrics * cleanup script correction for powerscale telemetry cleanup * victoria operator and victoria log input validation * vitoria log input and input validation * remving L2 vslidation for victoria log which is not required * input validation and review comment addressing * change idrac_telemetry_collection_type to telemetry_collection_type * Remove invisible Unicode LRM (U+200E) characters from victoria-operator template filenames * VictoriaLogs container image references and default variable * port check * resolve merge conflict * correction for schema * Update telemetry_config.json * Update validate_input.py * merge conflict telemetry_prereq.yml * change victoria_configurations to victoria_metrics_configurations * remove deployment mode input variable * update for upgrade scenarios * update comments * update comment * resolving issues due to merge conflict * vitoria log changes * victoria log cluster component and VLAgent deployment * updating pod name * removing the changes of adding cert * victoria log changes * remivng victoria log pod calidation playbook * cleanup changes for victoria log * Update ansible-lint.yml and pylint for pub/telemetry (#4296) * Update ansible-lint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * Update pylint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * fixing ansible-lint * lint * line-lenght --------- Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Co-authored-by: mithileshreddy04 <mithilesh.reddy@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: Kratika_Patidar <Kratika.Patidar@dell.com> * IB nic ip assignment * update MinIO and registry images to fixed tagged versions, omnia core container tag and version to 2.2 and v2.2.0.0 (#4309) * Minimal OS-only functional group enablement for x86_64 and aarch64 * Update image_package_collector.py * Update provision_validation.py * Minimal OS functional group updates in provision * Minimal OS functional group upgrade * Fix os_* package cross-contamination and remove stale discovery templates * OpenCHAMI upgrade changes * Update openchami container tags * Update main.yml * Update main.yml * Update main.yml * Update omnia version and core tag * vast client installation * single template way * fix(OMN01D-2164): prompt OME credentials only when enable_bmc_discovery is true - Add enable_bmc_discovery flag (default: false) to discovery_config.yml - Load discovery_config.yml in prepare_oim.yml and set ome_discovery_enabled based on enable_bmc_discovery flag - Change discovery credentials from mandatory to conditional_mandatory gated on ome_discovery_enabled - When enable_bmc_discovery is false, OME username/password prompts are skipped during prepare_oim even if ome_ip is pre-filled * fix(OMN01D-2168): fail explicitly when discovery_mechanism is not provided Replace meta: end_play with ansible.builtin.fail so the playbook exits with non-zero status and a clear error message when discovery_mechanism is missing, instead of silently succeeding. * fix(OMN01D-2169): add L1 input validation for discovery.yml - Create discovery_config.json schema with: - enable_bmc_discovery (boolean, required) - ome_ip (string, required; must be valid IPv4 when enable_bmc_discovery is true) - Register discovery_config in config.py files dict and input_file_inventory - Add 'discovery' tag and invoke validate_config.yml in discovery.yml before role execution (consistent with provision.yml pattern) - Add explicit ome_ip check before OME role inclusion for clear fail-fast error when ome_ip is empty with discovery_mechanism=ome * fix(OMN01D-2225): improve OME authentication and reachability error messages - ome_server_inventory.py: auth failure now tells user to verify ome_username/ome_password in omnia_config_credentials.yml and rerun - collect_inventory.yml: wrap wait_for in block/rescue so timeout gives actionable message pointing to discovery_config.yml and network check * fix(OMN01D-2226): correct discovery completion message next steps - Replace misleading 'Rename or copy' instruction with guidance to update pxe_mapping_file_path in provision_config.yml - Show full absolute path of generated file throughout - Add spacing between steps for readability * fix(OMN01D-2227): escape backslash in docstring to suppress SyntaxWarning Python 3.12+ warns about invalid escape sequence '\d' in non-raw string literals. The docstring in extract_su_from_hostname() contained (?=R\d+) which triggered SyntaxWarning during discovery execution. Escaped the backslash to (?=R\\d+) in the docstring. * fix(OMN01D-2230): correct GROUP_NAME and PARENT_SERVICE_TAG in PXE mapping Issue 1 - GROUP_NAME: - Add fallback: try extracting SU pattern from OME group name when BMC hostname has no SU pattern (covers hierarchical OME groups like SU1_slurm_node, SU2_compute, etc.) - grp0 remains the correct default for single-cluster environments Issue 2 - PARENT_SERVICE_TAG: - Define CHILD_ROLES_OF_CONTROL_PLANE set (service_kube_node_x86_64) - Only assign PARENT_SERVICE_TAG to rows whose FUNCTIONAL_GROUP_NAME is a child role of the control plane within the same GROUP_NAME - Control plane nodes, slurm nodes, login nodes, etc. no longer get an incorrect PARENT_SERVICE_TAG * fix(OMN01D-2231): detect and fail on duplicate OME static group assignments - build_device_group_map() now tracks all group memberships per device and returns a conflicts dict for devices in multiple static groups - main() fails with an actionable error listing each conflicting device and its groups, instead of silently using the first-seen group - Prevents incorrect FUNCTIONAL_GROUP_NAME override in PXE mapping * fix(OMN01D-2232): validate OME group names against supported functional groups - Define SUPPORTED_FUNCTIONAL_GROUPS set matching Omnia's known roles - Skip servers whose OME static group is not in the supported set - Emit a warning per skipped device listing the unsupported group name and the full set of supported groups - Unsupported groups (e.g. 'abc') no longer appear in the PXE mapping * revert: temporarily revert discovery fixes for PR workflow * s3cmd configurations * update image tags * fix: Resolve FK constraint violation and catalog metadata persistence issues - Fix FK constraint by using shared session for image_group_repo and image_repo in ResultPoller - Concatenate multiple S3 paths with semicolon delimiter to respect unique constraint uq_images_image_group_id_role - Update cleanup_job.py to split semicolon-delimited paths and delete each individually - Widen image_name column from VARCHAR(256) to VARCHAR(512) to accommodate semicolon-delimited paths - Add logging to FileArtifactStore.store() to debug file write failures - Fix URI construction in parse_catalog.py for existing artifacts (construct file:// URI directly) - Update build_stream container image tag to 1.1 Tested with job 7da54d1f-ed26-41dd-b3ff-0386104e644d (image-build19) - ImageGroup and Images created successfully * Fix lint error * Modify the charlimit to 512 in DB * Fix the image_groups name and stages in summary * Updating Catalog changes aligning to recent updates * Fix the pylint issues --------- Signed-off-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Co-authored-by: Mithilesh Reddy <mithilesh.reddy@dell.com> Co-authored-by: mcas <sakshi.s@dell.com> Co-authored-by: Super User <root@oim.omnia.test> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> Co-authored-by: balajikumaran-c-s <balajikumaran.cs@dellteam.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: Abhishek S A <abhishek.sa3@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: Kratika_Patidar <Kratika.Patidar@dell.com> Co-authored-by: Nagachandan-P <Nagachandan.p@dell.com> Co-authored-by: Sujit Jadhav <sujit.jadhav@dell.com>
Rajeshkumar-s2
added a commit
that referenced
this pull request
May 7, 2026
* Minimal OS-only functional group enablement (#4267) * Minimal OS-only functional group enablement for x86_64 and aarch64 * Update image_package_collector.py * Update provision_validation.py * Minimal OS functional group updates in provision * Minimal OS functional group upgrade * Fix os_* package cross-contamination and remove stale discovery templates * csi defect fix * nvidia dcgm install * Add 'provision' tag to omnia_run_tags Signed-off-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> * Add 'provision' tag to omnia_run_tags (#4276) Signed-off-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> * Fix incorrect file path in Podman login failure message (OMN01D-2166) The podman_login_fail_msg referenced a hardcoded path input/omnia_config_credentials.yml which does not exist. Updated it to use the dynamic input_project_dir variable so the error message now correctly points to the actual credentials file at <input_project_dir>/omnia_config_credentials.yml. Fixed in both prepare_oim and gitlab roles. * Merge pull request #4294 from mithileshreddy04/pub/q2_dev OpenCHAMI upgrade changes in prepare_oim and oim_cleanup * Feature branch sync - pub/telemetry to pub/q2_dev (#4293) * Update openchami git version (#4251) Co-authored-by: mithileshreddy04 <mithilesh.reddy@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> * powerscale teleemtry support with direct authentication mode * use existing vmagent * update messages in vars * merge Pub/q2 dev to pub/telemetry (#4254) * removing input template * Fix for pulp remote RemoteArtifacts is 0 after repo migration Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> * Powerscale teleemtry support using helm * deploy powerscale telemetry using cloud-init * offline deployment of powerscale telemetry * fix for cert-manager failure * fix for cert manager failure * powerscale telemetry deployment with telemetry namespace * sync q2_dev changes (#4263) * removing input template * Fix for pulp remote RemoteArtifacts is 0 after repo migration Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> * Feature/ome discovery pxe mapping enhancements (#4245) * feat(discovery): OME static group extraction, PXE mapping IP/SU/parent tag enhancements ome_server_inventory.py: - Fix static group extraction: find 'Static Groups' container by name and select only direct children via ParentId; avoids picking system/nested groups - Emit module.warn() for static groups that exist but have no devices assigned - Fix idrac_hostname: read InstrumentationName/DnsName from DeviceManagement ManagementType==2 entry instead of DeviceName which returns the IP address generate_pxe_mapping.py: - ADMIN_IP: derive from first 2 octets of admin_network.subnet + last 2 of BMC IP - IB_IP: derive from first 2 octets of ib_network.subnet + last 2 of BMC IP - Skip IB_IP/IB_MAC when server has no IB NIC (ib_nic_mac is empty) - Add extract_su_from_hostname() with regex (SU[A-Z]?\d+)(?=R\d+) to parse Scalable Unit from BMC hostname; rejects service-tag-only hostnames (idrac-JCGT033) and falls back to grp0 when no SU pattern is found - Set GROUP_NAME to extracted SU identifier (fallback: grp0) - Post-process rows to assign PARENT_SERVICE_TAG from the service_kube_control_plane_x86_64 node within the same SU group - Remove BMC_HOSTNAME from CSV headers and output rows - Lint: remove dead try/except in calculate_admin_ip/calculate_ib_ip, reuse ib_mac variable, suppress broad-except pylint warning generate_pxe_mapping.yml: - Load network_spec.yml via include_vars - Set admin_subnet and ib_subnet using selectattr on Networks list - Pass both subnets as parameters to the generate_pxe_mapping module defaults/main.yml: - Add admin_subnet and ib_subnet default variables (empty string) provision_validation.py: - Comment out validate_admin_ips_against_network_spec function and its call site; ADMIN_IPs are now derived from subnet octets + BMC IP and will not necessarily fall within primary_oim_admin_ip/netmask_bits range * refactor: rename discovery directory to provision, update network_spec.yml - Renamed discovery/ to provision/ (git detected as rename, no content loss) - Updated input/network_spec.yml with latest network configuration changes * Update discovery.yml * refactor: unify OME credentials into get_config_credentials flow - Added ome_ip, ome_username, ome_password to omnia_credential.j2 template - Added 'discovery' service entry to omnia_credentials in update_config/vars/main.yml - Added 'discovery' to the hardcoded service key trigger list in fetch_credentials.yml - Replaced custom vault logic in get_ome_credentials.yml with unified decrypt_include_encrypt.yml call against omnia_config_credentials.yml - Updated ome_discovery/vars/main.yml to reference omnia_config_credentials_file and omnia_config_credentials_vault_key instead of the separate .vault/ paths - Deleted .vault/ome_credentials.yml and .vault/.vault_password (no longer needed) * chore: update copyright year from 2025 to 2026 in modified files Updated copyright header in all ome_discovery files modified during this feature branch: - library/generate_pxe_mapping.py - library/ome_server_inventory.py - tasks/generate_pxe_mapping.yml - tasks/get_ome_credentials.yml - defaults/main.yml - vars/main.yml * fix: restore discovery_validations role missed during discovery-to-provision rename discovery/roles/discovery_validations/ was accidentally dropped when renaming the discovery/ directory to provision/. Add it back under provision/roles/discovery_validations/ to resolve the PR merge conflict. * chore: update copyright year to 2026 in provision/roles/discovery_validations files * fix: remove duplicate discovery_validations role (provision_validations already exists) provision/roles/provision_validations/ is the correct renamed equivalent of discovery/roles/discovery_validations/. The discovery_validations copy added to provision/ was redundant. * feat: apply upstream telemetry upgrade changes from dell/omnia pub/q2_dev - Replace kubectl command with kubernetes.core.k8s module for iDRAC StatefulSet - Preserve existing replica count during iDRAC StatefulSet upgrade - Add LDMS store daemon check, restart, and readiness wait tasks * fix: quote build_stream_job_id_absent message in provision_validations vars * feat: add discovery/roles/discovery_validations and telemetry files - Add discovery/roles/discovery_validations/vars/main.yml with task definitions for validation flow - Add discovery/roles/telemetry/tasks/apply_telemetry_on_upgrade.yml with upstream telemetry upgrade logic (replica preservation + LDMS store) * fix: wrap long line in fetch_credentials.yml to satisfy yaml[line-length] lint * refactor: move ome_ip from credentials to discovery_config.yml - Create input/discovery_config.yml for non-credential discovery settings (ome_ip, future Magellan config) - Remove ome_ip from omnia_credential.j2 and credential update vars - Load ome_ip via include_vars from discovery_config.yml in get_ome_credentials.yml - Add discovery_config.yml to provision_validations discovery_inputs - Remove redundant ib_subnet/admin_subnet defaults from ome_discovery * fix: add newline at end of ome_discovery/defaults/main.yml * fix: override role_path to absolute path for decrypt_include_encrypt.yml role_path resolves to ome_discovery role path, causing encrypt_files_vars.yml to be looked up incorrectly. Override to playbook_dir dirname (/opt/omnia/omnia). * fix: inline credential loading to avoid role_path resolution issue role_path cannot be overridden in include_tasks vars. Replace the call to decrypt_include_encrypt.yml with direct include_vars using stat checks for encrypted vs unencrypted credential file handling. * fix: skip load-failure rule in ansible-lint to avoid CI false positives ansible-lint fails to resolve role_path relative paths during static analysis in GitHub Actions, causing false load-failure errors for files that exist and work at runtime. * Update ansible.cfg * Update ansible.cfg * refactor: rename discovery references to provision and add discovery_config variable - Rename discover_mapping_nodes.yml to provision_mapping_nodes.yml - Replace "discovery" terminology with "provision" across playbooks, vars, READMEs, and task names in provision roles - Add subnet as required field with IP pattern validation in network_spec schema - Define discovery_config variable in ome_discovery vars and use it in get_ome_credentials.yml (consistent with provision_config pattern) - Rename discovery_inputs to provision_inputs in validation vars - Rename discovery_mech_mapping to provision_mech_mapping - Update user-facing messages to reference provision.yml Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: credential rules, vault handling, GROUP_NAME validation, and discovery playbook improvements - Add ome_username and ome_password validation rules to credential_rules.json - Add 'discovery' tag to prepare_oim omnia_run_tags so OME credentials are prompted - Fix vault-encrypted credential loading in get_ome_credentials.yml (use decrypt-include-reencrypt pattern instead of unsupported vault_password_file) - Add include_input_dir.yml import to discovery.yml so input_project_dir is set - Accept SU1-SU100 (case-insensitive) in addition to grp0-grp100 for GROUP_NAME - Fix Magellan message to use list format (avoids \n in debug output) - Remove escaped quotes from discovery usage examples Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: extend SU group name support to build_image validation and schemas - Add build_aarch_image tag to input_file_inventory so build_image_aarch64.yml runs provision_config validation (was missing, causing no validation to run for aarch64 builds) - Update GROUP_NAME patterns in functional_groups_config.json and omnia_config.json schemas to accept SU1-SU100 format alongside grp0-grp100 - Update INVALID_GROUP_NAME_MSG to reflect both accepted formats Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Cleanup discovery roles: move library modules, remove unused roles (#4261) * Cleanup discovery roles: move library modules, remove unused roles - Move ome_server_inventory.py and generate_pxe_mapping.py from discovery/roles/ome_discovery/library/ to common/library/modules/ so they are shared via the common module search path already configured in discovery/ansible.cfg - Remove unused discovery/roles/telemetry/ directory - Remove unused discovery/roles/discovery_validations/ directory - Load discovery_config.yml at playbook level in discovery.yml (consistent with how build_stream_config.yml is loaded in provision.yml) - Fix discovery_complete_msg formatting for readable Ansible output Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Remove unused discovery_validations role Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix for set_pxe_boot.yml when custom inventory given (#4260) * Update generate_bmc_inventory.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * Update pre_checks.yml Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * lint issue Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> --------- Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> * resolving merge conflict * revert openchami commit id * resolving review comments * addressing review comments * fix for vmagent scraping powerscale metrics * cleanup script correction for powerscale telemetry cleanup * victoria operator and victoria log input validation * vitoria log input and input validation * remving L2 vslidation for victoria log which is not required * input validation and review comment addressing * change idrac_telemetry_collection_type to telemetry_collection_type * Remove invisible Unicode LRM (U+200E) characters from victoria-operator template filenames * VictoriaLogs container image references and default variable * port check * resolve merge conflict * correction for schema * Update telemetry_config.json * Update validate_input.py * merge conflict telemetry_prereq.yml * change victoria_configurations to victoria_metrics_configurations * remove deployment mode input variable * update for upgrade scenarios * update comments * update comment * resolving issues due to merge conflict * vitoria log changes * victoria log cluster component and VLAgent deployment * updating pod name * removing the changes of adding cert * victoria log changes * remivng victoria log pod calidation playbook * cleanup changes for victoria log * Update ansible-lint.yml and pylint for pub/telemetry (#4296) * Update ansible-lint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * Update pylint.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * fixing ansible-lint * lint * line-lenght --------- Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> --------- Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Co-authored-by: mithileshreddy04 <mithilesh.reddy@dell.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: Kratika_Patidar <Kratika.Patidar@dell.com> * IB nic ip assignment * update MinIO and registry images to fixed tagged versions, omnia core container tag and version to 2.2 and v2.2.0.0 (#4309) * Minimal OS-only functional group enablement for x86_64 and aarch64 * Update image_package_collector.py * Update provision_validation.py * Minimal OS functional group updates in provision * Minimal OS functional group upgrade * Fix os_* package cross-contamination and remove stale discovery templates * OpenCHAMI upgrade changes * Update openchami container tags * Update main.yml * Update main.yml * Update main.yml * Update omnia version and core tag * vast client installation * single template way * fix(OMN01D-2164): prompt OME credentials only when enable_bmc_discovery is true - Add enable_bmc_discovery flag (default: false) to discovery_config.yml - Load discovery_config.yml in prepare_oim.yml and set ome_discovery_enabled based on enable_bmc_discovery flag - Change discovery credentials from mandatory to conditional_mandatory gated on ome_discovery_enabled - When enable_bmc_discovery is false, OME username/password prompts are skipped during prepare_oim even if ome_ip is pre-filled * fix(OMN01D-2168): fail explicitly when discovery_mechanism is not provided Replace meta: end_play with ansible.builtin.fail so the playbook exits with non-zero status and a clear error message when discovery_mechanism is missing, instead of silently succeeding. * fix(OMN01D-2169): add L1 input validation for discovery.yml - Create discovery_config.json schema with: - enable_bmc_discovery (boolean, required) - ome_ip (string, required; must be valid IPv4 when enable_bmc_discovery is true) - Register discovery_config in config.py files dict and input_file_inventory - Add 'discovery' tag and invoke validate_config.yml in discovery.yml before role execution (consistent with provision.yml pattern) - Add explicit ome_ip check before OME role inclusion for clear fail-fast error when ome_ip is empty with discovery_mechanism=ome * fix(OMN01D-2225): improve OME authentication and reachability error messages - ome_server_inventory.py: auth failure now tells user to verify ome_username/ome_password in omnia_config_credentials.yml and rerun - collect_inventory.yml: wrap wait_for in block/rescue so timeout gives actionable message pointing to discovery_config.yml and network check * fix(OMN01D-2226): correct discovery completion message next steps - Replace misleading 'Rename or copy' instruction with guidance to update pxe_mapping_file_path in provision_config.yml - Show full absolute path of generated file throughout - Add spacing between steps for readability * fix(OMN01D-2227): escape backslash in docstring to suppress SyntaxWarning Python 3.12+ warns about invalid escape sequence '\d' in non-raw string literals. The docstring in extract_su_from_hostname() contained (?=R\d+) which triggered SyntaxWarning during discovery execution. Escaped the backslash to (?=R\\d+) in the docstring. * fix(OMN01D-2230): correct GROUP_NAME and PARENT_SERVICE_TAG in PXE mapping Issue 1 - GROUP_NAME: - Add fallback: try extracting SU pattern from OME group name when BMC hostname has no SU pattern (covers hierarchical OME groups like SU1_slurm_node, SU2_compute, etc.) - grp0 remains the correct default for single-cluster environments Issue 2 - PARENT_SERVICE_TAG: - Define CHILD_ROLES_OF_CONTROL_PLANE set (service_kube_node_x86_64) - Only assign PARENT_SERVICE_TAG to rows whose FUNCTIONAL_GROUP_NAME is a child role of the control plane within the same GROUP_NAME - Control plane nodes, slurm nodes, login nodes, etc. no longer get an incorrect PARENT_SERVICE_TAG * fix(OMN01D-2231): detect and fail on duplicate OME static group assignments - build_device_group_map() now tracks all group memberships per device and returns a conflicts dict for devices in multiple static groups - main() fails with an actionable error listing each conflicting device and its groups, instead of silently using the first-seen group - Prevents incorrect FUNCTIONAL_GROUP_NAME override in PXE mapping * fix(OMN01D-2232): validate OME group names against supported functional groups - Define SUPPORTED_FUNCTIONAL_GROUPS set matching Omnia's known roles - Skip servers whose OME static group is not in the supported set - Emit a warning per skipped device listing the unsupported group name and the full set of supported groups - Unsupported groups (e.g. 'abc') no longer appear in the PXE mapping * revert: temporarily revert discovery fixes for PR workflow * s3cmd configurations * update image tags * fix: Resolve FK constraint violation and catalog metadata persistence issues - Fix FK constraint by using shared session for image_group_repo and image_repo in ResultPoller - Concatenate multiple S3 paths with semicolon delimiter to respect unique constraint uq_images_image_group_id_role - Update cleanup_job.py to split semicolon-delimited paths and delete each individually - Widen image_name column from VARCHAR(256) to VARCHAR(512) to accommodate semicolon-delimited paths - Add logging to FileArtifactStore.store() to debug file write failures - Fix URI construction in parse_catalog.py for existing artifacts (construct file:// URI directly) - Update build_stream container image tag to 1.1 Tested with job 7da54d1f-ed26-41dd-b3ff-0386104e644d (image-build19) - ImageGroup and Images created successfully * Fix lint error * Modify the charlimit to 512 in DB * Fix the image_groups name and stages in summary * Updating Catalog changes aligning to recent updates * Fix the pylint issues * Catalog Updates to support powerscale drivers --------- Signed-off-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Sujit Jadhav <sujit_jadhav@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Co-authored-by: Mithilesh Reddy <mithilesh.reddy@dell.com> Co-authored-by: mcas <sakshi.s@dell.com> Co-authored-by: Super User <root@oim.omnia.test> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: balajikumaran.cs <balajikumaran.c.s@gmail.com> Co-authored-by: balajikumaran-c-s <balajikumaran.cs@dellteam.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: Abhishek S A <abhishek.sa3@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: Sujit Jadhav <sujit_jadhav@dell.com> Co-authored-by: Super User <root@testbed.omnia.test> Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: Kratika_Patidar <Kratika.Patidar@dell.com> Co-authored-by: Nagachandan-P <Nagachandan.p@dell.com> Co-authored-by: Sujit Jadhav <sujit.jadhav@dell.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix for set_pxe_boot.yml when custom inventory given