Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qa/vstart_runner: don't let command run after timeout #56888

Merged
merged 2 commits into from Apr 17, 2024

Conversation

rishabh-d-dave
Copy link
Contributor

@rishabh-d-dave rishabh-d-dave commented Apr 15, 2024

NOTE: Context: this issue was spotted and fixed while working on PR #56065 and PR #56066.

Parameter "timeout" is accepted by LocalRemote.run() but the method
doesn't do anything about it besides accepting it. Thus, this parameter
has no effect.

In LocalRemote.run(), pass this parameter to LocalRemoteProcess.wait()
and from this method pass it to subprocess.Popen.communicate(). Thus,
command will be terminated by subprocess module at seconds specified by
"timeout" parameter. IOW, "timeout" parameter will have an effect.

Fixes: https://tracker.ceph.com/issues/65533

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@rishabh-d-dave
Copy link
Contributor Author

rishabh-d-dave commented Apr 16, 2024

Ceph API test failure was related. Link to it - https://jenkins.ceph.com/job/ceph-api/72373/

Relevant part from consolue output -

2024-04-15 12:32:34,803.803 INFO:__main__:> rm -rf ./dev ./out
2024-04-15 12:32:34,808.808 INFO:__main__:
running vstart.sh now...
2024-04-15 12:32:34,808.808 INFO:__main__:> ../src/vstart.sh -n --nolockdep
Using guessed paths /home/jenkins-build/build/workspace/ceph-api/build/lib/ ['/home/jenkins-build/build/workspace/ceph-api/qa', '/home/jenkins-build/build/workspace/ceph-api/build/lib/cython_modules/lib.3', '/home/jenkins-build/build/workspace/ceph-api/src/pybind']
Traceback (most recent call last):
  File "/home/jenkins-build/build/workspace/ceph-api/build/../qa/tasks/vstart_runner.py", line 1552, in <module>
    exec_test()
  File "/home/jenkins-build/build/workspace/ceph-api/build/../qa/tasks/vstart_runner.py", line 1402, in exec_test
    remote.run(args=args, env=vstart_env, timeout=(3 * 60))
  File "/home/jenkins-build/build/workspace/ceph-api/build/../qa/tasks/vstart_runner.py", line 452, in run
    return self._do_run(**kwargs)
  File "/home/jenkins-build/build/workspace/ceph-api/build/../qa/tasks/vstart_runner.py", line 491, in _do_run
    proc.wait(timeout)
  File "/home/jenkins-build/build/workspace/ceph-api/build/../qa/tasks/vstart_runner.py", line 252, in wait
    out, err = self.subproc.communicate(timeout=timeout)
  File "/usr/lib/python3.10/subprocess.py", line 1152, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "/usr/lib/python3.10/subprocess.py", line 2004, in _communicate
    self._check_timeout(endtime, orig_timeout, stdout, stderr)
  File "/usr/lib/python3.10/subprocess.py", line 1196, in _check_timeout
    raise TimeoutExpired(
subprocess.TimeoutExpired: Command '../src/vstart.sh -n --nolockdep' timed out after 180 seconds

@rishabh-d-dave
Copy link
Contributor Author

@rishabh-d-dave
Copy link
Contributor Author

jenkins test ceph api

@rishabh-d-dave
Copy link
Contributor Author

jenkins test api

@rishabh-d-dave
Copy link
Contributor Author

1 similar comment
@rishabh-d-dave
Copy link
Contributor Author

@rishabh-d-dave
Copy link
Contributor Author

jenkins test make check arm64

@rishabh-d-dave
Copy link
Contributor Author

Ceph API too passed - https://jenkins.ceph.com/job/ceph-api/72470/.

This means issue on previous run was unrelated.

@rishabh-d-dave
Copy link
Contributor Author

rishabh-d-dave commented Apr 16, 2024

@vshankar Let's merge this PR? Testing with teuthology aren't affected by it, so need to run them. And Ceph API tests are affected but all of them passed.

Parameter "timeout" is accepted by LocalRemote.run() but the method
doesn't do anything about it besides accepting it. Thus, this parameter
has no effect.

In LocalRemote.run(), pass this parameter to LocalRemoteProcess.wait()
and from this method pass it to subprocess.Popen.communicate(). Thus,
command will be terminated by subprocess module at seconds specified by
"timeout" parameter. IOW, "timeout" parameter will have an effect.

Fixes: https://tracker.ceph.com/issues/65533
Signed-off-by: Rishabh Dave <ridave@redhat.com>
This issue was exposed by ceph API test failure. Link to this failure - https://jenkins.ceph.com/job/ceph-api/72373/

Copying traceback below from here https://jenkins.ceph.com/job/ceph-api/72373/consoleFull#-2010121386c212b007-e891-4176-9ee7-2f60eca393b7 -

2024-04-15 12:32:34,808.808 INFO:__main__:> ../src/vstart.sh -n
--nolockdep
Using guessed paths
/home/jenkins-build/build/workspace/ceph-api/build/lib/
['/home/jenkins-build/build/workspace/ceph-api/qa',
'/home/jenkins-build/build/workspace/ceph-api/build/lib/cython_modules/lib.3',
'/home/jenkins-build/build/workspace/ceph-api/src/pybind']
Traceback (most recent call last):
  File
"/home/jenkins-build/build/workspace/ceph-api/build/../qa/tasks/vstart_runner.py",
line 1552, in <module>
    exec_test()
  File
"/home/jenkins-build/build/workspace/ceph-api/build/../qa/tasks/vstart_runner.py",
line 1402, in exec_test
    remote.run(args=args, env=vstart_env, timeout=(3 * 60))
  File
"/home/jenkins-build/build/workspace/ceph-api/build/../qa/tasks/vstart_runner.py",
line 452, in run
    return self._do_run(**kwargs)
  File
"/home/jenkins-build/build/workspace/ceph-api/build/../qa/tasks/vstart_runner.py",
line 491, in _do_run
    proc.wait(timeout)
  File
"/home/jenkins-build/build/workspace/ceph-api/build/../qa/tasks/vstart_runner.py",
line 252, in wait
    out, err = self.subproc.communicate(timeout=timeout)
  File "/usr/lib/python3.10/subprocess.py", line 1152, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "/usr/lib/python3.10/subprocess.py", line 2004, in _communicate
    self._check_timeout(endtime, orig_timeout, stdout, stderr)
  File "/usr/lib/python3.10/subprocess.py", line 1196, in _check_timeout
    raise TimeoutExpired(
subprocess.TimeoutExpired: Command '../src/vstart.sh -n --nolockdep'
timed out after 180 seconds

Signed-off-by: Rishabh Dave <ridave@redhat.com>
@rishabh-d-dave
Copy link
Contributor Author

No changes to the patch. Added reference to tracker ticket.

@rishabh-d-dave
Copy link
Contributor Author

rishabh-d-dave commented Apr 17, 2024

Leaving link to last Ceph API test run since it was successful - https://jenkins.ceph.com/job/ceph-api/72470

@rishabh-d-dave
Copy link
Contributor Author

@vshankar Let's merge this PR? Testing with teuthology aren't affected by it, so need to run them. And Ceph API tests are affected but all of them passed.

@vshankar

@vshankar
Copy link
Contributor

@vshankar Let's merge this PR? Testing with teuthology aren't affected by it, so need to run them. And Ceph API tests are affected but all of them passed.

Yes, I was waiting on clean jenkins test to give a merge ACK.

@rishabh-d-dave
Copy link
Contributor Author

rishabh-d-dave commented Apr 17, 2024

Since Ceph API tests passed twice, I am sure that the dashboard tests has been fixed - https://jenkins.ceph.com/job/ceph-api/72470/, https://jenkins.ceph.com/job/ceph-api/72514/. Proceeding to merge.

@rishabh-d-dave rishabh-d-dave merged commit adaccd7 into ceph:main Apr 17, 2024
11 of 12 checks passed
@rishabh-d-dave rishabh-d-dave deleted the vstart-runnner-timeout branch April 17, 2024 11:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants