Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quincy: pybind/mgr/progress: enforced try and except on accessing event dictionary #44671

Merged
merged 1 commit into from Jan 26, 2022

Conversation

kamoltat
Copy link
Member

@kamoltat kamoltat commented Jan 19, 2022

There is a certain race condition scenario where
an event gets deleted while the progress module
iterates through the events dictionary,
without a try and except, this will cause
an unhandled exception error and will crash
the module.

This commit will enforce try and except
on every part of the code where we are accessing
the events dictionary.

Fixes: https://tracker.ceph.com/issues/53803

Backporting the relevant commits from master PR:
#44553

Signed-off-by: Kamoltat ksirivad@redhat.com

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

…onary

There is a certain race condition scenario where
an event gets deleted while the progress module
iterates through the ``events`` dictionary,
without a ``try and except``, this will cause
an unhandled exception error and will crash
the module.

This commit will enforce ``try and except``
on every part of the code where we are accessing
the ``events`` dictionary.

Fixes: https://tracker.ceph.com/issues/53803

Signed-off-by: Kamoltat <ksirivad@redhat.com>
(cherry picked from commit b70d4a9)
@kamoltat kamoltat self-assigned this Jan 19, 2022
@kamoltat kamoltat added this to the quincy milestone Jan 19, 2022
@kamoltat
Copy link
Member Author

kamoltat commented Jan 20, 2022

https://pulpito.ceph.com/ksirivad-2022-01-19_20:56:10-rados:mgr-wip-ksirivad-quincy-backport-44553-distro-basic-smithi/6628033/
Failure due to PG recovery event not getting triggered. This is because no pgs were affected by the osd.0 being marked out, this happened because the osd.0 id was not in any of the PG's acting sets (both old osd map and new osd map).

@yuriw
Copy link
Contributor

yuriw commented Jan 24, 2022

From @ljflores

"@yuriweinstein @Nehaojha looks good from a RADOS standpoint.

Failures, unrelated:
Bug #53843: mgr/dashboard: Error - yargs parser supports a minimum Node.js version of 12. - Dashboard - Ceph
Bug #53827: cephadm exited with error code when creating osd. - Orchestrator - Ceph
Bug #49287: podman: setting cgroup config for procHooks process caused: Unit libpod-$hash.scope not found - Orchestrator - Ceph
Bug #53807: Dead jobs in rados/cephadm/smoke-roleless{...}: ingress jobs stuck - Orchestrator - Ceph

Details:
Bug_#53843: mgr/dashboard: Error - yargs parser supports a minimum Node.js version of 12. - Ceph - Mgr - Dashboard
Bug_#53827: cephadm exited with error code when creating osd: Input/Output error. Faulty NVME? - Infrastructure - Sepia
Bug_#49287: podman: setting cgroup config for procHooks process caused: Unit libpod-$hash.scope not found - Ceph - Orchestrator
Bug_#53807: Dead jobs in rados/cephadm/smoke-roleless{...} - Ceph - Orchestrator"

@neha-ojha
Copy link
Member

From @ljflores

"@yuriweinstein @Nehaojha looks good from a RADOS standpoint.

Failures, unrelated: Bug #53843: mgr/dashboard: Error - yargs parser supports a minimum Node.js version of 12. - Dashboard - Ceph Bug #53827: cephadm exited with error code when creating osd. - Orchestrator - Ceph Bug #49287: podman: setting cgroup config for procHooks process caused: Unit libpod-$hash.scope not found - Orchestrator - Ceph Bug #53807: Dead jobs in rados/cephadm/smoke-roleless{...}: ingress jobs stuck - Orchestrator - Ceph

Details: Bug_#53843: mgr/dashboard: Error - yargs parser supports a minimum Node.js version of 12. - Ceph - Mgr - Dashboard Bug_#53827: cephadm exited with error code when creating osd: Input/Output error. Faulty NVME? - Infrastructure - Sepia Bug_#49287: podman: setting cgroup config for procHooks process caused: Unit libpod-$hash.scope not found - Ceph - Orchestrator Bug_#53807: Dead jobs in rados/cephadm/smoke-roleless{...} - Ceph - Orchestrator"

@yuriw looks good to merge!

@kamoltat
Copy link
Member Author

Think merging this PR should be fine, the failure I posted above in the comments is more related to how the test case doesn't accurately evaluate when 0 PGs are affected due to marking osd in or out: https://tracker.ceph.com/issues/53984. Therefore, this PR is not affected if it merges first as it fixes another separate issue.

@yuriw
Copy link
Contributor

yuriw commented Jan 25, 2022

jenkins test api

1 similar comment
@epuertat
Copy link
Member

jenkins test api

@yuriw yuriw merged commit cf8f3e5 into ceph:quincy Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants