Skip to content

Conversation

@saimanikant
Copy link
Collaborator

@saimanikant saimanikant commented Oct 13, 2025

Description

Changes include:

  • Regenerate the pydantic models

  • grab location of panic file - ping the status endpoint

  • if worker dies then (warning):
    check greater than 0
    readin line by line or limiting the size of the message (look at splitting on an empty line)

  • Feed panic file into logging (at warning level) for DCS Monitor, OTEL to pick up

  • qLimit restarts of worker to x (5)

Checklist

  • I have tested these changes locally.
  • I have added unit tests (if appropriate).
  • I have added necessary documentation or updated existing documentation.
  • I have linked the issue(s) addressed by this PR if any.

@saimanikant saimanikant marked this pull request as draft October 13, 2025 17:19
@saimanikant saimanikant marked this pull request as ready for review October 13, 2025 17:55
@codecov-commenter
Copy link

codecov-commenter commented Oct 13, 2025

Codecov Report

❌ Patch coverage is 82.85714% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.81%. Comparing base (312f23b) to head (fcd9e93).

Files with missing lines Patch % Lines
src/ansys/hps/data_transfer/client/client.py 85.18% 4 Missing ⚠️
src/ansys/hps/data_transfer/client/binary.py 75.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #294      +/-   ##
==========================================
- Coverage   84.16%   83.81%   -0.36%     
==========================================
  Files          17       17              
  Lines        1503     1538      +35     
==========================================
+ Hits         1265     1289      +24     
- Misses        238      249      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.


def _monitor(self):
restart_count = 0 # Initialize a counter for restarts
max_restarts = 5 # Set the maximum number of restarts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should make this configurable?

# log.debug("Worker log output stopped")

def _monitor(self):
restart_count = 0 # Initialize a counter for restarts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to reset this when a worker starts up successfully?

@saimanikant saimanikant merged commit ea5ff03 into main Oct 14, 2025
18 checks passed
@saimanikant saimanikant deleted the mguntupa/panic branch October 14, 2025 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants