feat: DNS-based hostname resolution for Slurm/MPI via coresmd#4471
Merged
Conversation
Add optional additional_subnets configuration under admin_network in network_spec.yml to support multi-RAC / multi-subnet PXE deployments with CoreDHCP relay (giaddr-based routing). Changes: - network_spec.yml: add additional_subnets field with documentation - network_spec.json: JSON schema validation for subnet entries - en_us_validation_msg.py: error messages for subnet validation - provision_validation.py: validate CIDRs, routers, ranges, overlaps - configs.yaml.j2: emit coredhcp_subnets/coredhcp_subnet_pools vars - coredhcp.yaml.j2: dual-mode template (positional args for v0.4.x, key=value format with subnet=/subnet_pool= for multi-subnet) - deploy_openchami.yml: overlay coredhcp template after clone - vars/main.yml: add template path variables - test_additional_subnets_validation.py: 17 unit tests Single-subnet (flat) deployments continue to use the original positional-argument config format compatible with coresmd v0.4.x. Multi-subnet requires coresmd with multi-subnet support (PR #61). Signed-off-by: sujit-jadhav <sujit.jadhav@dell.com>
Implement CoreDNS as the authoritative DNS server for cluster-internal hostname resolution, replacing /etc/hosts-based management. New input configuration: - input/dns_config.yml: dns_enabled, dns_domain, dns_ttl, dns_cache_ttl, dns_fabric_suffixes, dns_soa, dns_reverse_enabled Validation: - JSON schema (dns_config.json) and validation logic (validate_dns_config) - RFC 1035 domain validation, TTL range checks, SOA positive-int checks, fabric suffix format validation, reserved domain detection - 33 unit tests covering all validation paths CoreDNS deployment (OIM): - Corefile.j2 template: file plugin for forward/reverse zones, cache, reload (10s), forward to upstream DNS - Systemd quadlet (coredns.container.j2) for podman-managed container - deploy_coredns.yml task: image pull, config generation, service start DNS zone rendering pipeline: - forward_zone.j2: SOA + NS + A records from ip_name_map - reverse_zone.j2: SOA + NS + PTR records - generate_dns_zones.yml: reads SMD inventory, renders zones - generate_reverse_zone_additional.yml: per-additional-subnet reverse zones - update_dns_zones.yml: lifecycle hook for node add/remove Cloud-init templates (7 files): - Conditional: resolv.conf pointing to OIM CoreDNS when dns_enabled, otherwise legacy /etc/hosts append Slurm /etc/hosts management: - update_hosts_munge.yml: skip /etc/hosts edits when dns_enabled - update_hosts.yml: skip bulk /etc/hosts updates when dns_enabled K8s CoreDNS integration: - Forward dns_domain queries to OIM CoreDNS from K8s CoreDNS ConfigMap Multi-subnet DHCP compatibility (PR #4352): - Reverse zones generated for admin + additional subnets - All variable names compatible with multi-subnet PR Backward compatible: dns_enabled defaults to false, preserving existing /etc/hosts behavior for users who do not opt in. Signed-off-by: sujit-jadhav <sujit.jadhav@dell.com>
…ides DNS Removed redundant CoreDNS container (docker.io/coredns/coredns:1.12.1) and static zone file generation. The existing coresmd plugin in OpenCHAMI already provides dynamic DNS from SMD inventory. Removed: - deploy_coredns.yml, Corefile.j2, coredns.container.j2 - generate_dns_zones.yml, generate_reverse_zone_additional.yml, update_dns_zones.yml - forward_zone.j2, reverse_zone.j2 - coredns_image, coredns_config_dir, coredns_zone_dir vars Kept: - DNS validation (dns_config.yml, schema, tests, validate_dns_config) - Cloud-init resolv.conf conditional (points nodes to OIM coresmd) - Slurm/MPI /etc/hosts skip when dns_enabled Signed-off-by: sujit-jadhav <sujit.jadhav@dell.com>
Removed unused parameters (dns_ttl, dns_cache_ttl, dns_reverse_enabled, dns_fabric_suffixes, dns_soa) that were designed for the static zone approach. With coresmd, DNS records are dynamic from SMD and these params are no-ops. Simplified: dns_config.yml, dns_config.json schema, validate_dns_config(), error messages, and test suite (13 tests -> 13 focused domain tests). Signed-off-by: sujit-jadhav <sujit.jadhav@dell.com>
The coredhcp/ directory does not exist in the upstream deployment-recipes repo. Added file state=directory tasks before each copy to create the parent directory on both localhost and OIM. Signed-off-by: sujit-jadhav <sujit.jadhav@dell.com>
fa5d5fd to
b07f4dc
Compare
| connection: local | ||
|
|
||
| - name: Ensure coredhcp template directory exists on OIM | ||
| ansible.builtin.file: |
Collaborator
There was a problem hiding this comment.
below 2 tasks are duplicate and not required
|
|
||
| - name: Ensure coredhcp template directory exists | ||
| ansible.builtin.file: | ||
| path: "{{ openchami_coredhcp_target | dirname }}" |
Collaborator
There was a problem hiding this comment.
there are 2 coredhcp template now. need to remove one and have one task for this
b07f4dc to
e6674f2
Compare
…fig.yml Address review comments: 1. Removed dns_domain — domain comes from OIM metadata (domain_name). Cloud-init templates now use domain_name directly. 2. Moved coredhcp template tasks from deploy_openchami.yml to configs/ochami.yml where other config template drops live. 3. Eliminated dns_config.yml entirely — dns_enabled is now a simple boolean in provision_config.yml (already loaded by Ansible). 4. Removed dns_config.json schema, dns_config entry from config.py, validation file list, and test_dns_config_validation.py. 5. Simplified validate_dns_config() to no-op (schema handles type check). 6. Removed DNS domain error messages from en_us_validation_msg.py. Signed-off-by: sujit-jadhav <sujit.jadhav@dell.com>
e6674f2 to
0aaa6bf
Compare
Collaborator
Author
|
prepare_config.yml, local_repo.yml, buildimage and provision is working fine. After provision, /etc/resolv.conf is getting populated properly. PR is safe to merge. |
abhishek-sa1
approved these changes
May 20, 2026
abhishek-sa1
added a commit
that referenced
this pull request
May 20, 2026
* Update ci-group-service_kube_control_plane_first_x86_64.yaml.j2 Signed-off-by: snarthan <narthan.s@dell.com> * Update create_k8s_config_nfs.yml helm directory created based on the version Signed-off-by: snarthan <narthan.s@dell.com> * Update ci-group-service_kube_control_plane_first_x86_64.yaml.j2 Signed-off-by: snarthan <narthan.s@dell.com> * Update ci-group-service_kube_control_plane_x86_64.yaml.j2 Signed-off-by: snarthan <narthan.s@dell.com> * Update software_config.json Signed-off-by: snarthan <narthan.s@dell.com> * Update template_rhel_10.0_multi_arch_software_config.json Signed-off-by: snarthan <narthan.s@dell.com> * Update template_rhel_10.0_x86-64_software_config.json Signed-off-by: snarthan <narthan.s@dell.com> * Update rhel_software_config.json Signed-off-by: snarthan <narthan.s@dell.com> * Vector-ome log support changes (#4483) * vector ldms configuration and deployment * vector updates * vector-ldms metrics chnages and image change * Update telemetry_prereq.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * Set changed_when to false for telemetry deployment Prevent change detection for telemetry deployment. Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * vecotr-ldms review comments * lint-fix * LDMS-Vector deployment (#4330) * vector ldms configuration and deployment * vector updates * vector-ldms metrics chnages and image change * Update telemetry_prereq.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * Set changed_when to false for telemetry deployment Prevent change detection for telemetry deployment. Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * vecotr-ldms review comments * lint-fix --------- Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * Update main.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * Update telemetry.sh.j2 Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * conflict resolve * conflict fix * vector ome metrics changes * Vector-OME deploymenet (#4394) * conflict resolve * conflict fix * vector ome metrics changes * Update vmagent-scrape-config.yaml.j2 * Replication factor update update replicationFactor for vmstorage * pod anti affinity changes for vlagent and vmagent * vector-ome log support chnages * remover log readable fix --------- Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Signed-off-by: sakshi-singla-1735 <sakshi.s@dell.com> Co-authored-by: Abhishek S A <abhishek.sa3@dell.com> Co-authored-by: mcas <sakshi.s@dell.com> Co-authored-by: Jagadeesh N V <39791839+jagadeeshnv@users.noreply.github.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: Jagadeesh N V <jagadeesh_n_v@dell.com> Co-authored-by: Sujit Jadhav <sujit.jadhav@dell.com> * Merge pull request #4471 from dell/pr-4353-rebase-q2dev feat: DNS-based hostname resolution for Slurm/MPI via coresmd * Fix/validate reporting enhancement (#4481) * Add result_detail field to stage response and enhance molecule test reporting - Add result_detail JSONB field to GetStageResponse schema for detailed stage results including log_path, test_summary, and artifact_dir - Update result_poller to populate result_detail with log_path (molecule_output.log) and report_path (test_report.json) - Extract report_id and suite names from molecule_output.log using regex parsing - Filter and copy current test run from shared test_report.json to artifact directory based on report_id - Populate test_summary with scenario, molecule_command, duration_seconds, test_names, and report_id from - Refactor test_summary structure in molecule execution: extract suite from header, reorganize test data with status, and reorder fields Signed-off-by: venu <236371043+Venu-p1@users.noreply.github.com> * Update validate stage to display raw test summary JSON instead of formatted fields - Remove junit artifact report from deploy stage - Simplify test summary display in validate stage to show raw JSON output using jq instead of parsing individual fields (total, passed, failed, skipped, errors) Signed-off-by: venu <236371043+Venu-p1@users.noreply.github.com> * Clear log_file_path on stage retry and reorder test_summary fields in molecule execution - Clear log_file_path field in validate use case when retrying a stage to prevent stale data from previous attempts - Reorder test_summary population in molecule execution: move report_id and duration_seconds before tests array, move summary counts (total, passed, failed, skipped, errors) to end after tests Signed-off-by: venu <236371043+Venu-p1@users.noreply.github.com> --------- Signed-off-by: venu <236371043+Venu-p1@users.noreply.github.com> * K8S Upgrade from 1.34.1 to 1.35.1 (#4458) (#4485) * updated code for helm * add code for upgrade_k8s.yml and upgrade.yml * add custom module for k8s upgrade status * add code for lock, workers upgrade * add code upgrade * add code for hopchain * add upgrade code for control planes * add code for each steps involved in upgrade * add upgrade code * fixed lint issues * fix lint issues and code changes * remove unused task files * fix lint issues * fix lint issues * fix lint issues * fix lint issues * fix lint issues * fix lint issues * fix lint issues * fix lint issues * update code changes for helm * fixes after integrating local repo and build image * Update registry in calico images in service_k8s_v1.35.1.json * Resolving merge conflicts * update upgrade_k8s.yml * fix lint issues * adding bss and cloudinit changes * remove service_k8s_v1.34.1 * lint issue fix * lint issue fix * remove input project dir from build image * update component dependencies for k8s * update setting hostname --------- Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> Signed-off-by: Vrinda_Marwah <Vrinda.Marwah@dell.com> Signed-off-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com> Signed-off-by: Vrinda Marwah <vrinda.marwah@dell.com> Co-authored-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com> Co-authored-by: Vrinda_Marwah <vrinda.marwah@dell.com> --------- Signed-off-by: snarthan <narthan.s@dell.com> Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Signed-off-by: sakshi-singla-1735 <sakshi.s@dell.com> Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> Signed-off-by: Nagachandan-P <Nagachandan.p@dell.com> Signed-off-by: Nagachandan P <Nagachandan.p@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <sowjanya.jagadish@dell.com> Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Signed-off-by: venu <236371043+Venu-p1@users.noreply.github.com> Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> Signed-off-by: Vrinda_Marwah <Vrinda.Marwah@dell.com> Signed-off-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com> Signed-off-by: Vrinda Marwah <vrinda.marwah@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com> Co-authored-by: Kratika Patidar <Kratika.Patidar@dell.com> Co-authored-by: mcas <sakshi.s@dell.com> Co-authored-by: Jagadeesh N V <39791839+jagadeeshnv@users.noreply.github.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: Jagadeesh N V <jagadeesh_n_v@dell.com> Co-authored-by: Sujit Jadhav <sujit.jadhav@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: Nagachandan-P <Nagachandan.p@dell.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: Mithilesh Reddy <mithilesh.reddy@dell.com> Co-authored-by: Venu-p1 <236371043+Venu-p1@users.noreply.github.com> Co-authored-by: Vrinda_Marwah <vrinda.marwah@dell.com>
abhishek-sa1
added a commit
that referenced
this pull request
May 20, 2026
* K8S Upgrade from 1.34.1 to 1.35.1 (#4458) * updated code for helm Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> * add code for upgrade_k8s.yml and upgrade.yml Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> * add custom module for k8s upgrade status Signed-off-by: Vrinda_Marwah <Vrinda.Marwah@dell.com> * add code for lock, workers upgrade Signed-off-by: Vrinda_Marwah <Vrinda.Marwah@dell.com> * add code upgrade Signed-off-by: Vrinda_Marwah <Vrinda.Marwah@dell.com> * add code for hopchain Signed-off-by: Vrinda_Marwah <Vrinda.Marwah@dell.com> * add upgrade code for control planes Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> * add code for each steps involved in upgrade Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> * add upgrade code Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> * fixed lint issues Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> * fix lint issues and code changes Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> * remove unused task files Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> * fix lint issues Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> * fix lint issues Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> * fix lint issues Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> * fix lint issues Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> * fix lint issues Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> * fix lint issues Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> * fix lint issues Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> * fix lint issues Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> * update code changes for helm Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> * fixes after integrating local repo and build image Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> * Update registry in calico images in service_k8s_v1.35.1.json Signed-off-by: Vrinda Marwah <vrinda.marwah@dell.com> * Resolving merge conflicts Signed-off-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com> * update upgrade_k8s.yml Signed-off-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com> * fix lint issues Signed-off-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com> * adding bss and cloudinit changes Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> * remove service_k8s_v1.34.1 Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> * lint issue fix Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> * lint issue fix Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> * remove input project dir from build image Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> * update component dependencies for k8s Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> * update setting hostname Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> --------- Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> Signed-off-by: Vrinda_Marwah <Vrinda.Marwah@dell.com> Signed-off-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com> Signed-off-by: Vrinda Marwah <vrinda.marwah@dell.com> Co-authored-by: Vrinda_Marwah <vrinda.marwah@dell.com> * Feature branch sync - pub/q2_dev to pub/q2_upgrade (#4487) * Update ci-group-service_kube_control_plane_first_x86_64.yaml.j2 Signed-off-by: snarthan <narthan.s@dell.com> * Update create_k8s_config_nfs.yml helm directory created based on the version Signed-off-by: snarthan <narthan.s@dell.com> * Update ci-group-service_kube_control_plane_first_x86_64.yaml.j2 Signed-off-by: snarthan <narthan.s@dell.com> * Update ci-group-service_kube_control_plane_x86_64.yaml.j2 Signed-off-by: snarthan <narthan.s@dell.com> * Update software_config.json Signed-off-by: snarthan <narthan.s@dell.com> * Update template_rhel_10.0_multi_arch_software_config.json Signed-off-by: snarthan <narthan.s@dell.com> * Update template_rhel_10.0_x86-64_software_config.json Signed-off-by: snarthan <narthan.s@dell.com> * Update rhel_software_config.json Signed-off-by: snarthan <narthan.s@dell.com> * Vector-ome log support changes (#4483) * vector ldms configuration and deployment * vector updates * vector-ldms metrics chnages and image change * Update telemetry_prereq.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * Set changed_when to false for telemetry deployment Prevent change detection for telemetry deployment. Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * vecotr-ldms review comments * lint-fix * LDMS-Vector deployment (#4330) * vector ldms configuration and deployment * vector updates * vector-ldms metrics chnages and image change * Update telemetry_prereq.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * Set changed_when to false for telemetry deployment Prevent change detection for telemetry deployment. Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * vecotr-ldms review comments * lint-fix --------- Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * Update main.yml Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * Update telemetry.sh.j2 Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> * conflict resolve * conflict fix * vector ome metrics changes * Vector-OME deploymenet (#4394) * conflict resolve * conflict fix * vector ome metrics changes * Update vmagent-scrape-config.yaml.j2 * Replication factor update update replicationFactor for vmstorage * pod anti affinity changes for vlagent and vmagent * vector-ome log support chnages * remover log readable fix --------- Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Signed-off-by: sakshi-singla-1735 <sakshi.s@dell.com> Co-authored-by: Abhishek S A <abhishek.sa3@dell.com> Co-authored-by: mcas <sakshi.s@dell.com> Co-authored-by: Jagadeesh N V <39791839+jagadeeshnv@users.noreply.github.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: Jagadeesh N V <jagadeesh_n_v@dell.com> Co-authored-by: Sujit Jadhav <sujit.jadhav@dell.com> * Merge pull request #4471 from dell/pr-4353-rebase-q2dev feat: DNS-based hostname resolution for Slurm/MPI via coresmd * Fix/validate reporting enhancement (#4481) * Add result_detail field to stage response and enhance molecule test reporting - Add result_detail JSONB field to GetStageResponse schema for detailed stage results including log_path, test_summary, and artifact_dir - Update result_poller to populate result_detail with log_path (molecule_output.log) and report_path (test_report.json) - Extract report_id and suite names from molecule_output.log using regex parsing - Filter and copy current test run from shared test_report.json to artifact directory based on report_id - Populate test_summary with scenario, molecule_command, duration_seconds, test_names, and report_id from - Refactor test_summary structure in molecule execution: extract suite from header, reorganize test data with status, and reorder fields Signed-off-by: venu <236371043+Venu-p1@users.noreply.github.com> * Update validate stage to display raw test summary JSON instead of formatted fields - Remove junit artifact report from deploy stage - Simplify test summary display in validate stage to show raw JSON output using jq instead of parsing individual fields (total, passed, failed, skipped, errors) Signed-off-by: venu <236371043+Venu-p1@users.noreply.github.com> * Clear log_file_path on stage retry and reorder test_summary fields in molecule execution - Clear log_file_path field in validate use case when retrying a stage to prevent stale data from previous attempts - Reorder test_summary population in molecule execution: move report_id and duration_seconds before tests array, move summary counts (total, passed, failed, skipped, errors) to end after tests Signed-off-by: venu <236371043+Venu-p1@users.noreply.github.com> --------- Signed-off-by: venu <236371043+Venu-p1@users.noreply.github.com> * K8S Upgrade from 1.34.1 to 1.35.1 (#4458) (#4485) * updated code for helm * add code for upgrade_k8s.yml and upgrade.yml * add custom module for k8s upgrade status * add code for lock, workers upgrade * add code upgrade * add code for hopchain * add upgrade code for control planes * add code for each steps involved in upgrade * add upgrade code * fixed lint issues * fix lint issues and code changes * remove unused task files * fix lint issues * fix lint issues * fix lint issues * fix lint issues * fix lint issues * fix lint issues * fix lint issues * fix lint issues * update code changes for helm * fixes after integrating local repo and build image * Update registry in calico images in service_k8s_v1.35.1.json * Resolving merge conflicts * update upgrade_k8s.yml * fix lint issues * adding bss and cloudinit changes * remove service_k8s_v1.34.1 * lint issue fix * lint issue fix * remove input project dir from build image * update component dependencies for k8s * update setting hostname --------- Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> Signed-off-by: Vrinda_Marwah <Vrinda.Marwah@dell.com> Signed-off-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com> Signed-off-by: Vrinda Marwah <vrinda.marwah@dell.com> Co-authored-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com> Co-authored-by: Vrinda_Marwah <vrinda.marwah@dell.com> --------- Signed-off-by: snarthan <narthan.s@dell.com> Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Signed-off-by: sakshi-singla-1735 <sakshi.s@dell.com> Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> Signed-off-by: Nagachandan-P <Nagachandan.p@dell.com> Signed-off-by: Nagachandan P <Nagachandan.p@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <sowjanya.jagadish@dell.com> Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Signed-off-by: venu <236371043+Venu-p1@users.noreply.github.com> Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> Signed-off-by: Vrinda_Marwah <Vrinda.Marwah@dell.com> Signed-off-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com> Signed-off-by: Vrinda Marwah <vrinda.marwah@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com> Co-authored-by: Kratika Patidar <Kratika.Patidar@dell.com> Co-authored-by: mcas <sakshi.s@dell.com> Co-authored-by: Jagadeesh N V <39791839+jagadeeshnv@users.noreply.github.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: Jagadeesh N V <jagadeesh_n_v@dell.com> Co-authored-by: Sujit Jadhav <sujit.jadhav@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: Nagachandan-P <Nagachandan.p@dell.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: Mithilesh Reddy <mithilesh.reddy@dell.com> Co-authored-by: Venu-p1 <236371043+Venu-p1@users.noreply.github.com> Co-authored-by: Vrinda_Marwah <vrinda.marwah@dell.com> --------- Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> Signed-off-by: Vrinda_Marwah <Vrinda.Marwah@dell.com> Signed-off-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com> Signed-off-by: Vrinda Marwah <vrinda.marwah@dell.com> Signed-off-by: snarthan <narthan.s@dell.com> Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com> Signed-off-by: sakshi-singla-1735 <sakshi.s@dell.com> Signed-off-by: pullan1 <sudha.pullalaravu@dell.com> Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> Signed-off-by: Nagachandan-P <Nagachandan.p@dell.com> Signed-off-by: Nagachandan P <Nagachandan.p@dell.com> Signed-off-by: SOWJANYAJAGADISH123 <sowjanya.jagadish@dell.com> Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Signed-off-by: venu <236371043+Venu-p1@users.noreply.github.com> Co-authored-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com> Co-authored-by: Vrinda_Marwah <vrinda.marwah@dell.com> Co-authored-by: snarthan <narthan.s@dell.com> Co-authored-by: Kratika Patidar <Kratika.Patidar@dell.com> Co-authored-by: mcas <sakshi.s@dell.com> Co-authored-by: Jagadeesh N V <39791839+jagadeeshnv@users.noreply.github.com> Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com> Co-authored-by: Jagadeesh N V <jagadeesh_n_v@dell.com> Co-authored-by: Sujit Jadhav <sujit.jadhav@dell.com> Co-authored-by: pullan1 <sudha.pullalaravu@dell.com> Co-authored-by: Nagachandan-P <Nagachandan.p@dell.com> Co-authored-by: SOWJANYAJAGADISH123 <Sowjanya.Jagadish@dell.com> Co-authored-by: Mithilesh Reddy <mithilesh.reddy@dell.com> Co-authored-by: Venu-p1 <236371043+Venu-p1@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Enables DNS-based hostname resolution for cluster compute nodes using the existing coresmd (CoreDNS + OpenCHAMI SMD plugin), replacing static
/etc/hostsmanagement.What this PR does
DNS toggle via
dns_config.ymlinput/dns_config.ymlwithdns_enabled(default: false) anddns_domainvalidate_dns_config()for RFC 1035 domain checksCloud-init resolver configuration
dns_enabled: true, all 8 cloud-init templates write/etc/resolv.confpointing to OIM coresmddns_enabled: false(default), traditional/etc/hostsbehavior is preserved/etc/hosts skip logic
update_hosts.yml— skips when DNS enabledupdate_hosts_munge.yml— skips when DNS enabledK8s integration
dns_domainqueries to OIM coresmdObservability
Bug fix
What was removed (refactored)
docker.io/coredns/coredns:1.12.1) — redundant with coresmdgenerate_dns_zones.yml,forward_zone.j2,reverse_zone.j2, etc.)dns_ttl,dns_cache_ttl,dns_soa,dns_fabric_suffixes,dns_reverse_enabled)Files changed
Testing
python3 -m pytest common/library/modules/tests/test_dns_config_validation.py— 13 tests passKnown limitations (future work)
dns_domainandcluster_domainshould be linked (currently independent)