mgr/diskprediction_local: Add tests for device failure predictors #37513

chauhankaranraj · 2020-10-01T21:37:19Z

This PR adds a sample smartctl json and unit tests to check if the Red Hat predictor and ProphetStor predictor models don't throw errors and yield valid outputs, when using the versions of python packages specified in requirements.txt.

Fixes: https://tracker.ceph.com/issues/47448
Signed-off-by: Karanraj Chauhan kachau@redhat.com

Hey @yaarith, I wrote some tests to address point 1 of this issue. Please lmk what you think :)

Checklist

References tracker ticket
Updates documentation if necessary
Includes tests for new functionality or reproducer for bug

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox

…sses Add sample smartctl json and unit tests to check if the Red Hat predictor and ProphetStor predictor models function properly and give valid outputs. Fixes: https://tracker.ceph.com/issues/47448 Signed-off-by: Karanraj Chauhan <kachau@redhat.com>

yaarith · 2020-10-05T20:49:25Z

src/pybind/mgr/diskprediction_local/tests/test_predictors.py

+
+        # predict and check if value is a valid output
+        pred = rh_predictor.predict(self.predict_datas)
+        self.assertIn(pred, self.valid_outputs)


We should use the assert to verify the specific output we're expecting the model to return given a specific input (instead of any possible outputs the model has in general).

Same in def test_prophetstor_predictor(self).

Makes sense, I'll update it 😄

Thanks for being patient, I've updated tests to check for specific output. And also added a check to ensure that models don't crash in case of an invalid json input

Updated sample input to be a json that we know is *unambiguously* in a "good" health state. Fixes: https://tracker.ceph.com/issues/47448 Signed-off-by: Karanraj Chauhan <kachau@redhat.com>

Updated tests to check whether models specifically output "good", and not just one of "good", "warning", "bad" for the sample input json that is known to be representative of a healthy disk. Fixes: https://tracker.ceph.com/issues/47448 Signed-off-by: Karanraj Chauhan <kachau@redhat.com>

Updated tests to check whether models throw errors or show message and exit gracefully when input is an invalid smartctl json. Fixes: https://tracker.ceph.com/issues/47448 Signed-off-by: Karanraj Chauhan <kachau@redhat.com>

diskprediction_local unit tests do not have tox (make check) declarations. tox.ini installs the requirements needed by diskprediction_local module, so the tests run in the expected environment. Tests can now be run with: $ ./do_cmake $ cd build $ ctest -R run-tox-mgr-diskprediction_local -V Fixes: https://tracker.ceph.com/issues/47448 Signed-off-by: Yaarit Hatuka <yaarit@redhat.com>

…n tests In order to verify that diskpredicion_local models function properly across all supported distribution, we can simply run their unit tests via this rados suite. We do not need to bring up a cluster since we use smartctl JSON samples to mock the input to the models. This by itself will mock the loading of the required dependencies, and will provide us with an accurate indication regarding the behavior of diskprediction_local module on each distribution. Please note: 1. These tests will fail when: - sklearn or numpy are not installed. - sklearn or numpy are installed, but with different (older or newer) versions which cause the code to break (due to lack of backward compatibility, or API changes). 2. Currently the tests print warnings that several functions used by the models will be deprecated in newer sklearn versions, but the tests are considered to pass successfully. 3. Nautilus runs python 2. sklearn 0.20 was the last version to support Python 2.7. Fixes: https://tracker.ceph.com/issues/47448 Signed-off-by: Yaarit Hatuka <yaarit@redhat.com>

yaarith · 2020-10-19T17:31:29Z

jenkins retest this please

Logic was missing, and diskprediction_local CMake tests could not pass. Signed-off-by: Yaarit Hatuka <yaarit@redhat.com>

yaarith · 2020-10-20T15:59:24Z

Regarding backporting these tests to Octopus and Nautilus:

The missing sklearn dependency is fixed and merged in master:
#35648
but could not be backported to Octopus:
#37376
and probably to Nautilus for the same reason.
This means the tests will never pass (in Octopus and Nautilus), so we do not backport them until we find a solution.

Whenever we do backport to Nautilus though, we should change:
https://github.com/ceph/ceph/pull/37513/files#diff-f3ba8fbe68e55784d2e2b3617712a2d6de00f1b18dfb447657d5f45d0fe96596R2
to:
envlist = py27

yaarith · 2020-11-23T20:45:04Z

Update: We still wait for the availability of scikit-learn package in EPEL 8.

tchaikov · 2020-12-06T13:51:52Z

Update: We still wait for the availability of scikit-learn package in EPEL 8.

@yaarith i cannot find a bugzilla ticket asking for python3-scikit-learn EPEL8. neither can i find the epel8 branch at https://src.fedoraproject.org/rpms/python-scikit-learn . have we ping'ed the maintainer for creating the epel8 branch?

yaarith · 2020-12-07T17:40:39Z

@yaarith i cannot find a bugzilla ticket asking for python3-scikit-learn EPEL8. neither can i find the epel8 branch at https://src.fedoraproject.org/rpms/python-scikit-learn . have we ping'ed the maintainer for creating the epel8 branch?

Hi @ktdreyer, regarding @tchaikov's comment here, I'm not sure what the next steps are once we have the packages ready on our end.

yaarith · 2021-01-06T15:27:01Z

@tchaikov after a discussion with @ktdreyer I contacted the maintainer, asking if he can maintain EPEL (7 and 8) versions. Will update with progress.

tchaikov · 2021-03-11T08:36:56Z

FWIW, https://copr.fedorainfracloud.org/coprs/tchaikov/python-scikit-learn/

stale · 2021-07-21T05:00:21Z

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

github-actions · 2022-09-06T00:04:28Z

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

github-actions · 2022-10-28T21:11:35Z

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

yaarith self-requested a review October 2, 2020 00:59

yaarith added mgr tests labels Oct 2, 2020

yaarith reviewed Oct 5, 2020

View reviewed changes

Karanraj Chauhan and others added 5 commits October 7, 2020 16:50

mgr/diskprediction_local: Update sample smartctl input

d06efd4

Updated sample input to be a json that we know is *unambiguously* in a "good" health state. Fixes: https://tracker.ceph.com/issues/47448 Signed-off-by: Karanraj Chauhan <kachau@redhat.com>

mgr/diskprediction_local: Add test for invalid input

c7afcf7

Updated tests to check whether models throw errors or show message and exit gracefully when input is an invalid smartctl json. Fixes: https://tracker.ceph.com/issues/47448 Signed-off-by: Karanraj Chauhan <kachau@redhat.com>

mgr/diskprediction_local: wrap add_tox_test() with WITH_TESTS

2b2f932

Logic was missing, and diskprediction_local CMake tests could not pass. Signed-off-by: Yaarit Hatuka <yaarit@redhat.com>

stale bot added the stale label Jul 21, 2021

rzarzynski requested a review from yaarith March 8, 2022 17:37

stale bot removed the stale label Mar 8, 2022

djgalloway changed the base branch from master to main July 8, 2022 00:00

djgalloway requested a review from a team as a code owner July 8, 2022 00:00

github-actions bot added the stale label Sep 6, 2022

github-actions bot closed this Oct 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mgr/diskprediction_local: Add tests for device failure predictors #37513

mgr/diskprediction_local: Add tests for device failure predictors #37513

chauhankaranraj commented Oct 1, 2020

yaarith Oct 5, 2020

chauhankaranraj Oct 5, 2020

chauhankaranraj Oct 7, 2020

yaarith commented Oct 19, 2020

yaarith commented Oct 20, 2020

yaarith commented Nov 23, 2020

tchaikov commented Dec 6, 2020

yaarith commented Dec 7, 2020

yaarith commented Jan 6, 2021

tchaikov commented Mar 11, 2021

stale bot commented Jul 21, 2021

github-actions bot commented Sep 6, 2022

github-actions bot commented Oct 28, 2022

mgr/diskprediction_local: Add tests for device failure predictors #37513

mgr/diskprediction_local: Add tests for device failure predictors #37513

Conversation

chauhankaranraj commented Oct 1, 2020

Checklist

yaarith Oct 5, 2020

Choose a reason for hiding this comment

chauhankaranraj Oct 5, 2020

Choose a reason for hiding this comment

chauhankaranraj Oct 7, 2020

Choose a reason for hiding this comment

yaarith commented Oct 19, 2020

yaarith commented Oct 20, 2020

yaarith commented Nov 23, 2020

tchaikov commented Dec 6, 2020

yaarith commented Dec 7, 2020

yaarith commented Jan 6, 2021

tchaikov commented Mar 11, 2021

stale bot commented Jul 21, 2021

github-actions bot commented Sep 6, 2022

github-actions bot commented Oct 28, 2022