Skip to content

Merge pub/q2_dev to pub/build_stream#4418

Merged
abhishek-sa1 merged 95 commits into
dell:pub/build_streamfrom
Rajeshkumar-s2:pub/build_stream
May 13, 2026
Merged

Merge pub/q2_dev to pub/build_stream#4418
abhishek-sa1 merged 95 commits into
dell:pub/build_streamfrom
Rajeshkumar-s2:pub/build_stream

Conversation

@Rajeshkumar-s2
Copy link
Copy Markdown
Collaborator

Description of the Solution

Merge pub/q2_dev to pub/build_stream and issues found during testing.

Suggested Reviewers

If you wish to suggest specific reviewers for this solution, please include them in this section. Be sure to include the @ before the GitHub username.
@priti-parate @Venu-p1 @abhishek-sa1

jagadeeshnv and others added 30 commits April 17, 2026 19:42
…ecution

- Implement literal moustache preservation for cloud-init variables using !unsafe with from_yaml.
- Generalize node_key support to allow full metadata paths (e.g., ds.meta_data.instance_data.local_ipv4).
- Fix b64encode evaluation in permission and directory creation tasks to ensure valid shell command generation.
- Centralize mount-specific runcmd execution in cloud-init templates using base64 encoding to prevent shell injection/parsing issues.
- Update storage_config schema to support flexible cloud-init variable references for node_key.
- Add safety defaults for mount_on_oim selection to prevent 'attribute not found' errors.
- Restore 'union' filter for mounts to maintain idempotency across multiple runs.
…ion updates

New tasks: provision/roles/mount_config/tasks/{swap_config.yml,process_single_swap.yml}
Build per-functional-group swap entries in cloud_init_groups_dict
Validate swap filename path and size format; include optional maxsize
Idempotent merge via combine(recursive=True)
Render swap block in cloud-init group templates
Inject conditional swap: {filename, size, [maxsize]} across login, compiler, kube (control/node), slurm (control/node) for x86_64/aarch64
Only rendered when cloud_init_groups_dict[fg].swap is present
Strengthen input validation (common_validation.py)
Enforce no overlap of functional_group_prefix/group across swap entries
Validate maxsize >= size (supports bytes or K/M/G/T, and “auto”)
Add helper parsers and validators for size comparison
Update storage_config schema for swap (storage_config.json)
Remove deprecated swap.name field
Required: filename, size
oneOf: require either functional_group_prefix or group
Preserve existing size semantics (supports units/auto)
- Change mount_item reference to config_item parameter
- Update process_single_mount.yml to pass config_item: mount_item
- Update process_single_swap.yml to pass config_item: swap_item
- Enable reuse of target group determination logic

This allows both mount and swap processing to use the same task.
Signed-off-by: Jagadeesh N V <39791839+jagadeeshnv@users.noreply.github.com>
- Fix pxe_mapping_file reference in config and validation flow
- Add validation to detect duplicate mount points per expanded functional group
- Include entry names in error messages for clarity
- Update schema and examples for consistency
- Remove extra mount_params profiles (vast_nfs_performance, powervault_iscsi, network_storage, local_storage, bind_mounts, scratch_storage, global)
- Keep only essential profiles: default and vast_nfs
- Simplify mounts section with two core examples (nfs_slurm, nfs_k8s)
- Remove VAST-specific entries and per-node bind mount examples
- Comment out powervault_config entries (move to examples)
- Comment out swap section (move to examples)
- Fix field names: group → groups in comments and examples
- Remove vast_enabled flag
- Align with storage_config.json schema (only mount_params, mounts, powervault_config, swap sections)
- Update comments to reference correct field names (groups instead of group)
… issues

- Fix FK constraint by using shared session for image_group_repo and image_repo in ResultPoller
- Concatenate multiple S3 paths with semicolon delimiter to respect unique constraint uq_images_image_group_id_role
- Update cleanup_job.py to split semicolon-delimited paths and delete each individually
- Widen image_name column from VARCHAR(256) to VARCHAR(512) to accommodate semicolon-delimited paths
- Add logging to FileArtifactStore.store() to debug file write failures
- Fix URI construction in parse_catalog.py for existing artifacts (construct file:// URI directly)
- Update build_stream container image tag to 1.1

Tested with job 7da54d1f-ed26-41dd-b3ff-0386104e644d (image-build19) - ImageGroup and Images created successfully
- Enable host-specific mount map generation from PXE mapping data
- Add vendor_data metadata structure to cloud-init runcmd for dynamic mounts
- Update cloud-init variable syntax from Jinja2 template to cloud-init query
- Fix variable reference in build_host_mount_map.yml (PXE_GROUP -> GROUP_NAME)
- Change mkdir commands to verbose mode (mkdir -pv)
- Update pxe_mapping_file_path to use omnia_shared_path with path normalization

This allows per-host mount configurations to be dynamically applied during
node provisioning via cloud-init vendor_data metadata.
jagadeeshnv and others added 25 commits May 7, 2026 23:00
…oot and defect fixes (dell#4373)

* pub telemetry changes

* service to scrape metrics from OTEL collector

* vmservice to scrape metrics from otel collector

* update endpoints

* revert other changes

* revert merge changes as per head

* revert variable set

* revert changes

* revert changes

* pylint fixes

* ansible lint fixes

* updating completion messaage

* telemetry validation while prepare oim

* update condition

* added check for LDMS

* Fix for crashloopback state on node reboot

* addressing review comment to move into vars

* remove rsyslog layer, update vmscraper and enabled external health monitor of csi driver

* fix for k8s_server_ip undefined variable

* fix for syntax error

* fix for UT issues - DNS resolution and keep powerscale configuration manual

* ansible lint fixes

* NAtive operator based vmscraper

* remove usused syslog template

* fix for nfs_client_param check in telemetry config

* update storage_config variable

* fix for k8s_nfs_server_path undefined variable

* fix for kustomization error

* remove powerscale syslog configuration

---------

Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com>
Signed-off-by: Abhishek S A <abhishek.sa3@dell.com>
Add optional additional_subnets configuration under admin_network in
network_spec.yml to support multi-RAC / multi-subnet PXE deployments
with CoreDHCP relay (giaddr-based routing).

Changes:
- network_spec.yml: add additional_subnets field with documentation
- network_spec.json: JSON schema validation for subnet entries
- en_us_validation_msg.py: error messages for subnet validation
- provision_validation.py: validate CIDRs, routers, ranges, overlaps
- configs.yaml.j2: emit coredhcp_subnets/coredhcp_subnet_pools vars
- coredhcp.yaml.j2: dual-mode template (positional args for v0.4.x,
  key=value format with subnet=/subnet_pool= for multi-subnet)
- deploy_openchami.yml: overlay coredhcp template after clone
- vars/main.yml: add template path variables
- test_additional_subnets_validation.py: 17 unit tests

Single-subnet (flat) deployments continue to use the original
positional-argument config format compatible with coresmd v0.4.x.
Multi-subnet requires coresmd with multi-subnet support (PR dell#61).

Signed-off-by: Abhishek S A <abhishek.sa3@dell.com>
Signed-off-by: Sujit Jadhav <sujit.jadhav@dell.com>
Co-authored-by: Abhishek S A <abhishek.sa3@dell.com>
Signed-off-by: balajikumaran.cs <balajikumaran.c.s@gmail.com>
Signed-off-by: sakshi-singla-1735 <sakshi.s@dell.com>
* vector ldms configuration and deployment

* vector updates

* vector-ldms metrics chnages and image change

* Update telemetry_prereq.yml

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* Set changed_when to false for telemetry deployment

Prevent change detection for telemetry deployment.

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* vecotr-ldms review comments

* lint-fix

* LDMS-Vector deployment (dell#4330)

* vector ldms configuration and deployment

* vector updates

* vector-ldms metrics chnages and image change

* Update telemetry_prereq.yml

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* Set changed_when to false for telemetry deployment

Prevent change detection for telemetry deployment.

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* vecotr-ldms review comments

* lint-fix

---------

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* Update main.yml

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* Update telemetry.sh.j2

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* conflict resolve

* conflict fix

* vector ome metrics changes

* Vector-OME deploymenet (dell#4394)

* conflict resolve

* conflict fix

* vector ome metrics changes

* Update vmagent-scrape-config.yaml.j2

---------

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>
Signed-off-by: sakshi-singla-1735 <sakshi.s@dell.com>
Co-authored-by: Abhishek S A <abhishek.sa3@dell.com>
Co-authored-by: mcas <sakshi.s@dell.com>
Co-authored-by: Jagadeesh N V <39791839+jagadeeshnv@users.noreply.github.com>
Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com>
Co-authored-by: Jagadeesh N V <jagadeesh_n_v@dell.com>
Co-authored-by: Sujit Jadhav <sujit.jadhav@dell.com>
defect fixes and merge conflict issues
---------

Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.github.com>
* vector ldms configuration and deployment

* vector updates

* vector-ldms metrics chnages and image change

* Update telemetry_prereq.yml

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* Set changed_when to false for telemetry deployment

Prevent change detection for telemetry deployment.

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* vecotr-ldms review comments

* lint-fix

* LDMS-Vector deployment (dell#4330)

* vector ldms configuration and deployment

* vector updates

* vector-ldms metrics chnages and image change

* Update telemetry_prereq.yml

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* Set changed_when to false for telemetry deployment

Prevent change detection for telemetry deployment.

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* vecotr-ldms review comments

* lint-fix

---------

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* Update main.yml

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* Update telemetry.sh.j2

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>

* conflict resolve

* conflict fix

* vector ome metrics changes

* Vector-OME deploymenet (dell#4394)

* conflict resolve

* conflict fix

* vector ome metrics changes

* Update vmagent-scrape-config.yaml.j2

* Replication factor update

update replicationFactor for vmstorage

---------

Signed-off-by: Kratika Patidar <Kratika.Patidar@dell.com>
Signed-off-by: sakshi-singla-1735 <sakshi.s@dell.com>
Co-authored-by: Abhishek S A <abhishek.sa3@dell.com>
Co-authored-by: mcas <sakshi.s@dell.com>
Co-authored-by: Jagadeesh N V <39791839+jagadeeshnv@users.noreply.github.com>
Co-authored-by: priti-parate <140157516+priti-parate@users.noreply.github.com>
Co-authored-by: Jagadeesh N V <jagadeesh_n_v@dell.com>
Co-authored-by: Sujit Jadhav <sujit.jadhav@dell.com>
* powerscale s3 support

* Update credential_rules.json

* Update s3_bucket.yml

* Update omnia.service.j2

* Update omnia.service.j2

* Update omnia.service.j2

* Update common_vars.yml

* ansible lint

* Update build_compute_image.yml

* Update common_vars.yml

* review comments

* jinja issue

* Update storage_config.json

* Update common_validation.py

* Update config.py

* Update storage_config.json

* Update common_validation.py
VLagent and VMagent pod anti-affinity changes
* Update fetch_packages.yml (dell#4326)

Signed-off-by: balajikumaran.cs <balajikumaran.c.s@gmail.com>

* Update feature_request.md

Removing project field from Feature request issue template

Signed-off-by: Luke Wilson <luke.wilson@dell.com>

* Update bug_report.md

removing unnecessary entries

Signed-off-by: John Lockman <jlockman3@gmail.com>

---------

Signed-off-by: balajikumaran.cs <balajikumaran.c.s@gmail.com>
Signed-off-by: Luke Wilson <luke.wilson@dell.com>
Signed-off-by: John Lockman <jlockman3@gmail.com>
Co-authored-by: balajikumaran.cs <balajikumaran.c.s@gmail.com>
Co-authored-by: Luke Wilson <luke.wilson@dell.com>
Co-authored-by: John Lockman <jlockman3@gmail.com>
Signed-off-by: Rajeshkumar-s2 <rajeshkumar.s2@dell.com>
@Rajeshkumar-s2 Rajeshkumar-s2 requested review from abhishek-sa1 and priti-parate and removed request for abhishek-sa1 May 13, 2026 15:03
@abhishek-sa1 abhishek-sa1 merged commit 300b4d3 into dell:pub/build_stream May 13, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.