Skip to content

feat: DNS-based hostname resolution for Slurm/MPI via coresmd#4471

Merged
abhishek-sa1 merged 6 commits into
pub/q2_devfrom
pr-4353-rebase-q2dev
May 20, 2026
Merged

feat: DNS-based hostname resolution for Slurm/MPI via coresmd#4471
abhishek-sa1 merged 6 commits into
pub/q2_devfrom
pr-4353-rebase-q2dev

Conversation

@sujit-jadhav
Copy link
Copy Markdown
Collaborator

Summary

Enables DNS-based hostname resolution for cluster compute nodes using the existing coresmd (CoreDNS + OpenCHAMI SMD plugin), replacing static /etc/hosts management.

What this PR does

DNS toggle via dns_config.yml

  • New input file input/dns_config.yml with dns_enabled (default: false) and dns_domain
  • JSON schema validation + validate_dns_config() for RFC 1035 domain checks
  • 13 unit tests for domain validation

Cloud-init resolver configuration

  • When dns_enabled: true, all 8 cloud-init templates write /etc/resolv.conf pointing to OIM coresmd
  • When dns_enabled: false (default), traditional /etc/hosts behavior is preserved

/etc/hosts skip logic

  • OIM update_hosts.yml — skips when DNS enabled
  • Slurm update_hosts_munge.yml — skips when DNS enabled
  • Add/remove node is automatic via SMD -> coresmd (30s cache refresh)

K8s integration

  • Patches K8s CoreDNS ConfigMap to forward dns_domain queries to OIM coresmd

Observability

  • coresmd exposes Prometheus metrics on port 9153

Bug fix

  • Create coredhcp template directory before copy (does not exist in upstream deployment-recipes repo)

What was removed (refactored)

  • Standalone CoreDNS container (docker.io/coredns/coredns:1.12.1) — redundant with coresmd
  • Static zone file generation (generate_dns_zones.yml, forward_zone.j2, reverse_zone.j2, etc.)
  • Unused dns_config params (dns_ttl, dns_cache_ttl, dns_soa, dns_fabric_suffixes, dns_reverse_enabled)

Files changed

  • 19 files modified (+406/-17 lines net after refactoring)

Testing

  • python3 -m pytest common/library/modules/tests/test_dns_config_validation.py — 13 tests pass
  • Manual validation of YAML/JSON syntax for all modified files

Known limitations (future work)

  • No split-horizon/fabric-aware resolution (IB vs Ethernet) — coresmd returns admin IP only
  • No reverse DNS (PTR) records
  • dns_domain and cluster_domain should be linked (currently independent)

Comment thread input/dns_config.yml Outdated
Comment thread prepare_oim/roles/deploy_containers/openchami/tasks/deploy_openchami.yml Outdated
@sujit-jadhav sujit-jadhav requested a review from abhishek-sa1 May 19, 2026 08:59
sujit-jadhav and others added 5 commits May 19, 2026 14:37
Add optional additional_subnets configuration under admin_network in
network_spec.yml to support multi-RAC / multi-subnet PXE deployments
with CoreDHCP relay (giaddr-based routing).

Changes:
- network_spec.yml: add additional_subnets field with documentation
- network_spec.json: JSON schema validation for subnet entries
- en_us_validation_msg.py: error messages for subnet validation
- provision_validation.py: validate CIDRs, routers, ranges, overlaps
- configs.yaml.j2: emit coredhcp_subnets/coredhcp_subnet_pools vars
- coredhcp.yaml.j2: dual-mode template (positional args for v0.4.x,
  key=value format with subnet=/subnet_pool= for multi-subnet)
- deploy_openchami.yml: overlay coredhcp template after clone
- vars/main.yml: add template path variables
- test_additional_subnets_validation.py: 17 unit tests

Single-subnet (flat) deployments continue to use the original
positional-argument config format compatible with coresmd v0.4.x.
Multi-subnet requires coresmd with multi-subnet support (PR #61).

Signed-off-by: sujit-jadhav <sujit.jadhav@dell.com>
Implement CoreDNS as the authoritative DNS server for cluster-internal
hostname resolution, replacing /etc/hosts-based management.

New input configuration:
- input/dns_config.yml: dns_enabled, dns_domain, dns_ttl, dns_cache_ttl,
  dns_fabric_suffixes, dns_soa, dns_reverse_enabled

Validation:
- JSON schema (dns_config.json) and validation logic (validate_dns_config)
- RFC 1035 domain validation, TTL range checks, SOA positive-int checks,
  fabric suffix format validation, reserved domain detection
- 33 unit tests covering all validation paths

CoreDNS deployment (OIM):
- Corefile.j2 template: file plugin for forward/reverse zones, cache,
  reload (10s), forward to upstream DNS
- Systemd quadlet (coredns.container.j2) for podman-managed container
- deploy_coredns.yml task: image pull, config generation, service start

DNS zone rendering pipeline:
- forward_zone.j2: SOA + NS + A records from ip_name_map
- reverse_zone.j2: SOA + NS + PTR records
- generate_dns_zones.yml: reads SMD inventory, renders zones
- generate_reverse_zone_additional.yml: per-additional-subnet reverse zones
- update_dns_zones.yml: lifecycle hook for node add/remove

Cloud-init templates (7 files):
- Conditional: resolv.conf pointing to OIM CoreDNS when dns_enabled,
  otherwise legacy /etc/hosts append

Slurm /etc/hosts management:
- update_hosts_munge.yml: skip /etc/hosts edits when dns_enabled
- update_hosts.yml: skip bulk /etc/hosts updates when dns_enabled

K8s CoreDNS integration:
- Forward dns_domain queries to OIM CoreDNS from K8s CoreDNS ConfigMap

Multi-subnet DHCP compatibility (PR #4352):
- Reverse zones generated for admin + additional subnets
- All variable names compatible with multi-subnet PR

Backward compatible: dns_enabled defaults to false, preserving existing
/etc/hosts behavior for users who do not opt in.

Signed-off-by: sujit-jadhav <sujit.jadhav@dell.com>
…ides DNS

Removed redundant CoreDNS container (docker.io/coredns/coredns:1.12.1)
and static zone file generation. The existing coresmd plugin in
OpenCHAMI already provides dynamic DNS from SMD inventory.

Removed:
- deploy_coredns.yml, Corefile.j2, coredns.container.j2
- generate_dns_zones.yml, generate_reverse_zone_additional.yml, update_dns_zones.yml
- forward_zone.j2, reverse_zone.j2
- coredns_image, coredns_config_dir, coredns_zone_dir vars

Kept:
- DNS validation (dns_config.yml, schema, tests, validate_dns_config)
- Cloud-init resolv.conf conditional (points nodes to OIM coresmd)
- Slurm/MPI /etc/hosts skip when dns_enabled

Signed-off-by: sujit-jadhav <sujit.jadhav@dell.com>
Removed unused parameters (dns_ttl, dns_cache_ttl, dns_reverse_enabled,
dns_fabric_suffixes, dns_soa) that were designed for the static zone
approach. With coresmd, DNS records are dynamic from SMD and these
params are no-ops.

Simplified: dns_config.yml, dns_config.json schema, validate_dns_config(),
error messages, and test suite (13 tests -> 13 focused domain tests).

Signed-off-by: sujit-jadhav <sujit.jadhav@dell.com>
The coredhcp/ directory does not exist in the upstream deployment-recipes
repo. Added file state=directory tasks before each copy to create the
parent directory on both localhost and OIM.

Signed-off-by: sujit-jadhav <sujit.jadhav@dell.com>
@sujit-jadhav sujit-jadhav force-pushed the pr-4353-rebase-q2dev branch 2 times, most recently from fa5d5fd to b07f4dc Compare May 19, 2026 09:45
@sujit-jadhav sujit-jadhav requested a review from snarthan May 19, 2026 09:50
connection: local

- name: Ensure coredhcp template directory exists on OIM
ansible.builtin.file:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

below 2 tasks are duplicate and not required


- name: Ensure coredhcp template directory exists
ansible.builtin.file:
path: "{{ openchami_coredhcp_target | dirname }}"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are 2 coredhcp template now. need to remove one and have one task for this

@sujit-jadhav sujit-jadhav force-pushed the pr-4353-rebase-q2dev branch from b07f4dc to e6674f2 Compare May 19, 2026 10:35
…fig.yml

Address review comments:
1. Removed dns_domain — domain comes from OIM metadata (domain_name).
   Cloud-init templates now use domain_name directly.
2. Moved coredhcp template tasks from deploy_openchami.yml to
   configs/ochami.yml where other config template drops live.
3. Eliminated dns_config.yml entirely — dns_enabled is now a simple
   boolean in provision_config.yml (already loaded by Ansible).
4. Removed dns_config.json schema, dns_config entry from config.py,
   validation file list, and test_dns_config_validation.py.
5. Simplified validate_dns_config() to no-op (schema handles type check).
6. Removed DNS domain error messages from en_us_validation_msg.py.

Signed-off-by: sujit-jadhav <sujit.jadhav@dell.com>
@sujit-jadhav sujit-jadhav force-pushed the pr-4353-rebase-q2dev branch from e6674f2 to 0aaa6bf Compare May 19, 2026 10:44
@sujit-jadhav sujit-jadhav requested a review from abhishek-sa1 May 19, 2026 10:59
@sujit-jadhav
Copy link
Copy Markdown
Collaborator Author

prepare_config.yml, local_repo.yml, buildimage and provision is working fine.

After provision, /etc/resolv.conf is getting populated properly. PR is safe to merge.

@sujit-jadhav sujit-jadhav requested a review from priti-parate May 20, 2026 05:28
@abhishek-sa1 abhishek-sa1 merged commit 21cffe6 into pub/q2_dev May 20, 2026
6 checks passed
abhishek-sa1 added a commit that referenced this pull request May 20, 2026
* Update ci-group-service_kube_control_plane_first_x86_64.yaml.j2

Signed-off-by: snarthan <narthan.s@dell.com>

* Update create_k8s_config_nfs.yml

helm directory created based on the version

Signed-off-by: snarthan <narthan.s@dell.com>

* Update ci-group-service_kube_control_plane_first_x86_64.yaml.j2

Signed-off-by: snarthan <narthan.s@dell.com>

* Update ci-group-service_kube_control_plane_x86_64.yaml.j2

Signed-off-by: snarthan <narthan.s@dell.com>

* Update software_config.json

Signed-off-by: snarthan <narthan.s@dell.com>

* Update template_rhel_10.0_multi_arch_software_config.json

Signed-off-by: snarthan <narthan.s@dell.com>

* Update template_rhel_10.0_x86-64_software_config.json

Signed-off-by: snarthan <narthan.s@dell.com>

* Update rhel_software_config.json

Signed-off-by: snarthan <narthan.s@dell.com>

* Vector-ome log support changes (#4483)

* vector ldms configuration and deployment

* vector updates

* vector-ldms metrics chnages and image change

* Update telemetry_prereq.yml

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* Set changed_when to false for telemetry deployment

Prevent change detection for telemetry deployment.

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* vecotr-ldms review comments

* lint-fix

* LDMS-Vector deployment (#4330)

* vector ldms configuration and deployment

* vector updates

* vector-ldms metrics chnages and image change

* Update telemetry_prereq.yml

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* Set changed_when to false for telemetry deployment

Prevent change detection for telemetry deployment.

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* vecotr-ldms review comments

* lint-fix

---------

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* Update main.yml

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* Update telemetry.sh.j2

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* conflict resolve

* conflict fix

* vector ome metrics changes

* Vector-OME deploymenet (#4394)

* conflict resolve

* conflict fix

* vector ome metrics changes

* Update vmagent-scrape-config.yaml.j2

* Replication factor update

update replicationFactor for vmstorage

* pod anti affinity changes for vlagent and vmagent

* vector-ome log support chnages

* remover log readable fix

---------

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>
Signed-off-by: sakshi-singla-1735 <sakshi.s@dell.com>
Co-authored-by: Abhishek S A <abhishek.sa3@dell.com>
Co-authored-by: mcas <sakshi.s@dell.com>
Co-authored-by: Jagadeesh N V <39791839+jagadeeshnv@users.noreply.github.com>
Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com>
Co-authored-by: Jagadeesh N V <jagadeesh_n_v@dell.com>
Co-authored-by: Sujit Jadhav <sujit.jadhav@dell.com>

* Merge pull request #4471 from dell/pr-4353-rebase-q2dev

feat: DNS-based hostname resolution for Slurm/MPI via coresmd

* Fix/validate reporting enhancement (#4481)

* Add result_detail field to stage response and enhance molecule test reporting

- Add result_detail JSONB field to GetStageResponse schema for detailed stage results including log_path, test_summary, and artifact_dir
- Update result_poller to populate result_detail with log_path (molecule_output.log) and report_path (test_report.json)
- Extract report_id and suite names from molecule_output.log using regex parsing
- Filter and copy current test run from shared test_report.json to artifact directory based on report_id
- Populate test_summary with scenario, molecule_command, duration_seconds, test_names, and report_id from
- Refactor test_summary structure in molecule execution: extract suite from header, reorganize test data with status, and reorder fields
Signed-off-by: venu <236371043+Venu-p1@users.noreply.github.com>

* Update validate stage to display raw test summary JSON instead of formatted fields

- Remove junit artifact report from deploy stage
- Simplify test summary display in validate stage to show raw JSON output using jq instead of parsing individual fields (total, passed, failed, skipped, errors)

Signed-off-by: venu <236371043+Venu-p1@users.noreply.github.com>

* Clear log_file_path on stage retry and reorder test_summary fields in molecule execution

- Clear log_file_path field in validate use case when retrying a stage to prevent stale data from previous attempts
- Reorder test_summary population in molecule execution: move report_id and duration_seconds before tests array, move summary counts (total, passed, failed, skipped, errors) to end after tests

Signed-off-by: venu <236371043+Venu-p1@users.noreply.github.com>

---------

Signed-off-by: venu <236371043+Venu-p1@users.noreply.github.com>

* K8S Upgrade from 1.34.1 to 1.35.1 (#4458) (#4485)

* updated code for helm



* add code for upgrade_k8s.yml and upgrade.yml



* add custom module for k8s upgrade status



* add code for lock, workers upgrade



* add code upgrade



* add code for hopchain



* add upgrade code for control planes



* add code for each steps involved in upgrade



* add upgrade code



* fixed lint issues



* fix lint issues and code changes



* remove unused task files



* fix lint issues



* fix lint issues



* fix lint issues



* fix lint issues



* fix lint issues



* fix lint issues



* fix lint issues



* fix lint issues



* update code changes for helm



* fixes after integrating local repo and build image



* Update registry in calico images in service_k8s_v1.35.1.json



* Resolving merge conflicts



* update upgrade_k8s.yml



* fix lint issues



* adding bss and cloudinit changes



* remove service_k8s_v1.34.1



* lint issue fix



* lint issue fix



* remove input project dir from build image



* update component dependencies for k8s



* update setting hostname



---------

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>
Signed-off-by: Vrinda_Marwah <Vrinda.Marwah@dell.com>
Signed-off-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com>
Signed-off-by: Vrinda Marwah <vrinda.marwah@dell.com>
Co-authored-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com>
Co-authored-by: Vrinda_Marwah <vrinda.marwah@dell.com>

---------

Signed-off-by: snarthan <narthan.s@dell.com>
Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>
Signed-off-by: sakshi-singla-1735 <sakshi.s@dell.com>
Signed-off-by: pullan1 <sudha.pullalaravu@dell.com>
Signed-off-by: Abhishek S A <abhishek.sa3@dell.com>
Signed-off-by: Nagachandan-P <Nagachandan.p@dell.com>
Signed-off-by: Nagachandan P <Nagachandan.p@dell.com>
Signed-off-by: SOWJANYAJAGADISH123 <sowjanya.jagadish@dell.com>
Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com>
Signed-off-by: venu <236371043+Venu-p1@users.noreply.github.com>
Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>
Signed-off-by: Vrinda_Marwah <Vrinda.Marwah@dell.com>
Signed-off-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com>
Signed-off-by: Vrinda Marwah <vrinda.marwah@dell.com>
Co-authored-by: snarthan <narthan.s@dell.com>
Co-authored-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com>
Co-authored-by: Kratika Patidar <Kratika.Patidar@dell.com>
Co-authored-by: mcas <sakshi.s@dell.com>
Co-authored-by: Jagadeesh N V <39791839+jagadeeshnv@users.noreply.github.com>
Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com>
Co-authored-by: Jagadeesh N V <jagadeesh_n_v@dell.com>
Co-authored-by: Sujit Jadhav <sujit.jadhav@dell.com>
Co-authored-by: pullan1 <sudha.pullalaravu@dell.com>
Co-authored-by: Nagachandan-P <Nagachandan.p@dell.com>
Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com>
Co-authored-by: Mithilesh Reddy <mithilesh.reddy@dell.com>
Co-authored-by: Venu-p1 <236371043+Venu-p1@users.noreply.github.com>
Co-authored-by: Vrinda_Marwah <vrinda.marwah@dell.com>
abhishek-sa1 added a commit that referenced this pull request May 20, 2026
* K8S Upgrade from 1.34.1 to 1.35.1 (#4458)

* updated code for helm

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>

* add code for upgrade_k8s.yml and upgrade.yml

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>

* add custom module for k8s upgrade status

Signed-off-by: Vrinda_Marwah <Vrinda.Marwah@dell.com>

* add code for lock, workers upgrade

Signed-off-by: Vrinda_Marwah <Vrinda.Marwah@dell.com>

* add code upgrade

Signed-off-by: Vrinda_Marwah <Vrinda.Marwah@dell.com>

* add code for hopchain

Signed-off-by: Vrinda_Marwah <Vrinda.Marwah@dell.com>

* add upgrade code for control planes

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>

* add code for each steps involved in upgrade

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>

* add upgrade code

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>

* fixed lint issues

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>

* fix lint issues and code changes

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>

* remove unused task files

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>

* fix lint issues

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>

* fix lint issues

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>

* fix lint issues

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>

* fix lint issues

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>

* fix lint issues

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>

* fix lint issues

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>

* fix lint issues

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>

* fix lint issues

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>

* update code changes for helm

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>

* fixes after integrating local repo and build image

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>

* Update registry in calico images in service_k8s_v1.35.1.json

Signed-off-by: Vrinda Marwah <vrinda.marwah@dell.com>

* Resolving merge conflicts

Signed-off-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com>

* update upgrade_k8s.yml

Signed-off-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com>

* fix lint issues

Signed-off-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com>

* adding bss and cloudinit changes

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>

* remove service_k8s_v1.34.1

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>

* lint issue fix

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>

* lint issue fix

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>

* remove input project dir from build image

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>

* update component dependencies for k8s

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>

* update setting hostname

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>

---------

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>
Signed-off-by: Vrinda_Marwah <Vrinda.Marwah@dell.com>
Signed-off-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com>
Signed-off-by: Vrinda Marwah <vrinda.marwah@dell.com>
Co-authored-by: Vrinda_Marwah <vrinda.marwah@dell.com>

* Feature branch sync - pub/q2_dev to pub/q2_upgrade (#4487)

* Update ci-group-service_kube_control_plane_first_x86_64.yaml.j2

Signed-off-by: snarthan <narthan.s@dell.com>

* Update create_k8s_config_nfs.yml

helm directory created based on the version

Signed-off-by: snarthan <narthan.s@dell.com>

* Update ci-group-service_kube_control_plane_first_x86_64.yaml.j2

Signed-off-by: snarthan <narthan.s@dell.com>

* Update ci-group-service_kube_control_plane_x86_64.yaml.j2

Signed-off-by: snarthan <narthan.s@dell.com>

* Update software_config.json

Signed-off-by: snarthan <narthan.s@dell.com>

* Update template_rhel_10.0_multi_arch_software_config.json

Signed-off-by: snarthan <narthan.s@dell.com>

* Update template_rhel_10.0_x86-64_software_config.json

Signed-off-by: snarthan <narthan.s@dell.com>

* Update rhel_software_config.json

Signed-off-by: snarthan <narthan.s@dell.com>

* Vector-ome log support changes (#4483)

* vector ldms configuration and deployment

* vector updates

* vector-ldms metrics chnages and image change

* Update telemetry_prereq.yml

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* Set changed_when to false for telemetry deployment

Prevent change detection for telemetry deployment.

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* vecotr-ldms review comments

* lint-fix

* LDMS-Vector deployment (#4330)

* vector ldms configuration and deployment

* vector updates

* vector-ldms metrics chnages and image change

* Update telemetry_prereq.yml

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* Set changed_when to false for telemetry deployment

Prevent change detection for telemetry deployment.

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* vecotr-ldms review comments

* lint-fix

---------

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* Update main.yml

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* Update telemetry.sh.j2

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* conflict resolve

* conflict fix

* vector ome metrics changes

* Vector-OME deploymenet (#4394)

* conflict resolve

* conflict fix

* vector ome metrics changes

* Update vmagent-scrape-config.yaml.j2

* Replication factor update

update replicationFactor for vmstorage

* pod anti affinity changes for vlagent and vmagent

* vector-ome log support chnages

* remover log readable fix

---------

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>
Signed-off-by: sakshi-singla-1735 <sakshi.s@dell.com>
Co-authored-by: Abhishek S A <abhishek.sa3@dell.com>
Co-authored-by: mcas <sakshi.s@dell.com>
Co-authored-by: Jagadeesh N V <39791839+jagadeeshnv@users.noreply.github.com>
Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com>
Co-authored-by: Jagadeesh N V <jagadeesh_n_v@dell.com>
Co-authored-by: Sujit Jadhav <sujit.jadhav@dell.com>

* Merge pull request #4471 from dell/pr-4353-rebase-q2dev

feat: DNS-based hostname resolution for Slurm/MPI via coresmd

* Fix/validate reporting enhancement (#4481)

* Add result_detail field to stage response and enhance molecule test reporting

- Add result_detail JSONB field to GetStageResponse schema for detailed stage results including log_path, test_summary, and artifact_dir
- Update result_poller to populate result_detail with log_path (molecule_output.log) and report_path (test_report.json)
- Extract report_id and suite names from molecule_output.log using regex parsing
- Filter and copy current test run from shared test_report.json to artifact directory based on report_id
- Populate test_summary with scenario, molecule_command, duration_seconds, test_names, and report_id from
- Refactor test_summary structure in molecule execution: extract suite from header, reorganize test data with status, and reorder fields
Signed-off-by: venu <236371043+Venu-p1@users.noreply.github.com>

* Update validate stage to display raw test summary JSON instead of formatted fields

- Remove junit artifact report from deploy stage
- Simplify test summary display in validate stage to show raw JSON output using jq instead of parsing individual fields (total, passed, failed, skipped, errors)

Signed-off-by: venu <236371043+Venu-p1@users.noreply.github.com>

* Clear log_file_path on stage retry and reorder test_summary fields in molecule execution

- Clear log_file_path field in validate use case when retrying a stage to prevent stale data from previous attempts
- Reorder test_summary population in molecule execution: move report_id and duration_seconds before tests array, move summary counts (total, passed, failed, skipped, errors) to end after tests

Signed-off-by: venu <236371043+Venu-p1@users.noreply.github.com>

---------

Signed-off-by: venu <236371043+Venu-p1@users.noreply.github.com>

* K8S Upgrade from 1.34.1 to 1.35.1 (#4458) (#4485)

* updated code for helm



* add code for upgrade_k8s.yml and upgrade.yml



* add custom module for k8s upgrade status



* add code for lock, workers upgrade



* add code upgrade



* add code for hopchain



* add upgrade code for control planes



* add code for each steps involved in upgrade



* add upgrade code



* fixed lint issues



* fix lint issues and code changes



* remove unused task files



* fix lint issues



* fix lint issues



* fix lint issues



* fix lint issues



* fix lint issues



* fix lint issues



* fix lint issues



* fix lint issues



* update code changes for helm



* fixes after integrating local repo and build image



* Update registry in calico images in service_k8s_v1.35.1.json



* Resolving merge conflicts



* update upgrade_k8s.yml



* fix lint issues



* adding bss and cloudinit changes



* remove service_k8s_v1.34.1



* lint issue fix



* lint issue fix



* remove input project dir from build image



* update component dependencies for k8s



* update setting hostname



---------

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>
Signed-off-by: Vrinda_Marwah <Vrinda.Marwah@dell.com>
Signed-off-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com>
Signed-off-by: Vrinda Marwah <vrinda.marwah@dell.com>
Co-authored-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com>
Co-authored-by: Vrinda_Marwah <vrinda.marwah@dell.com>

---------

Signed-off-by: snarthan <narthan.s@dell.com>
Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>
Signed-off-by: sakshi-singla-1735 <sakshi.s@dell.com>
Signed-off-by: pullan1 <sudha.pullalaravu@dell.com>
Signed-off-by: Abhishek S A <abhishek.sa3@dell.com>
Signed-off-by: Nagachandan-P <Nagachandan.p@dell.com>
Signed-off-by: Nagachandan P <Nagachandan.p@dell.com>
Signed-off-by: SOWJANYAJAGADISH123 <sowjanya.jagadish@dell.com>
Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com>
Signed-off-by: venu <236371043+Venu-p1@users.noreply.github.com>
Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>
Signed-off-by: Vrinda_Marwah <Vrinda.Marwah@dell.com>
Signed-off-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com>
Signed-off-by: Vrinda Marwah <vrinda.marwah@dell.com>
Co-authored-by: snarthan <narthan.s@dell.com>
Co-authored-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com>
Co-authored-by: Kratika Patidar <Kratika.Patidar@dell.com>
Co-authored-by: mcas <sakshi.s@dell.com>
Co-authored-by: Jagadeesh N V <39791839+jagadeeshnv@users.noreply.github.com>
Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com>
Co-authored-by: Jagadeesh N V <jagadeesh_n_v@dell.com>
Co-authored-by: Sujit Jadhav <sujit.jadhav@dell.com>
Co-authored-by: pullan1 <sudha.pullalaravu@dell.com>
Co-authored-by: Nagachandan-P <Nagachandan.p@dell.com>
Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com>
Co-authored-by: Mithilesh Reddy <mithilesh.reddy@dell.com>
Co-authored-by: Venu-p1 <236371043+Venu-p1@users.noreply.github.com>
Co-authored-by: Vrinda_Marwah <vrinda.marwah@dell.com>

---------

Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>
Signed-off-by: Vrinda_Marwah <Vrinda.Marwah@dell.com>
Signed-off-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com>
Signed-off-by: Vrinda Marwah <vrinda.marwah@dell.com>
Signed-off-by: snarthan <narthan.s@dell.com>
Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>
Signed-off-by: sakshi-singla-1735 <sakshi.s@dell.com>
Signed-off-by: pullan1 <sudha.pullalaravu@dell.com>
Signed-off-by: Abhishek S A <abhishek.sa3@dell.com>
Signed-off-by: Nagachandan-P <Nagachandan.p@dell.com>
Signed-off-by: Nagachandan P <Nagachandan.p@dell.com>
Signed-off-by: SOWJANYAJAGADISH123 <sowjanya.jagadish@dell.com>
Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com>
Signed-off-by: venu <236371043+Venu-p1@users.noreply.github.com>
Co-authored-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com>
Co-authored-by: Vrinda_Marwah <vrinda.marwah@dell.com>
Co-authored-by: snarthan <narthan.s@dell.com>
Co-authored-by: Kratika Patidar <Kratika.Patidar@dell.com>
Co-authored-by: mcas <sakshi.s@dell.com>
Co-authored-by: Jagadeesh N V <39791839+jagadeeshnv@users.noreply.github.com>
Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com>
Co-authored-by: Jagadeesh N V <jagadeesh_n_v@dell.com>
Co-authored-by: Sujit Jadhav <sujit.jadhav@dell.com>
Co-authored-by: pullan1 <sudha.pullalaravu@dell.com>
Co-authored-by: Nagachandan-P <Nagachandan.p@dell.com>
Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com>
Co-authored-by: Mithilesh Reddy <mithilesh.reddy@dell.com>
Co-authored-by: Venu-p1 <236371043+Venu-p1@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants