Skip to content

ochami smd restries added after cloud-int service restart#4364

Merged
jagadeeshnv merged 5 commits into
dell:pub/q2_devfrom
jagadeeshnv:pub/q2_dev
May 5, 2026
Merged

ochami smd restries added after cloud-int service restart#4364
jagadeeshnv merged 5 commits into
dell:pub/q2_devfrom
jagadeeshnv:pub/q2_dev

Conversation

@jagadeeshnv
Copy link
Copy Markdown
Collaborator

Issues Resolved by this Pull Request

Please be sure to associate your pull request with one or more open issues. Use the word Fixes as well as a hashtag (#) prior to the issue number in order to automatically resolve associated issues (e.g., Fixes #100).

Fixes #

Description of the Solution

Please describe the solution provided and how it resolves the associated issues.

Suggested Reviewers

If you wish to suggest specific reviewers for this solution, please include them in this section. Be sure to include the @ before the GitHub username.

jagadeeshnv and others added 2 commits May 5, 2026 13:07
- Extract service readiness checks into separate block/rescue pattern
- Add SMD API health check before discovery attempt
- Implement automatic retry on discovery failure with service restart
- Increase service check timeout from 2 to 2 minutes (12 retries × 10s delay)
- Prevent connection refused errors by ensuring SMD endpoint is ready

This addresses the race condition where systemd marks smd service as "started"
but the HTTP endpoint at oimcp.oim.test:8443 isn't accepting connections yet.
@jagadeeshnv jagadeeshnv requested review from abhishek-sa1, priti-parate and snarthan and removed request for abhishek-sa1 May 5, 2026 07:40
Comment thread provision/roles/configure_ochami/tasks/provision_mapping_nodes.yml Outdated
@jagadeeshnv jagadeeshnv requested a review from abhishek-sa1 May 5, 2026 09:18
1. Check target status: systemctl status openchami.target
2. View logs: journalctl -u openchami.target -n 50
3. Check individual services: systemctl status smd bss cloud-init
4. Restart target: systemctl restart openchami.target
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we mention next step of each failure to rerun provision.yml also

@jagadeeshnv jagadeeshnv merged commit bc648c4 into dell:pub/q2_dev May 5, 2026
3 of 4 checks passed
@jagadeeshnv jagadeeshnv deleted the pub/q2_dev branch May 5, 2026 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants