New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr/diskprediction_local: Add tests for device failure predictors #37513
Conversation
…sses Add sample smartctl json and unit tests to check if the Red Hat predictor and ProphetStor predictor models function properly and give valid outputs. Fixes: https://tracker.ceph.com/issues/47448 Signed-off-by: Karanraj Chauhan <kachau@redhat.com>
|
||
# predict and check if value is a valid output | ||
pred = rh_predictor.predict(self.predict_datas) | ||
self.assertIn(pred, self.valid_outputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use the assert to verify the specific output we're expecting the model to return given a specific input (instead of any possible outputs the model has in general).
Same in def test_prophetstor_predictor(self)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I'll update it 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for being patient, I've updated tests to check for specific output. And also added a check to ensure that models don't crash in case of an invalid json input
Updated sample input to be a json that we know is *unambiguously* in a "good" health state. Fixes: https://tracker.ceph.com/issues/47448 Signed-off-by: Karanraj Chauhan <kachau@redhat.com>
Updated tests to check whether models specifically output "good", and not just one of "good", "warning", "bad" for the sample input json that is known to be representative of a healthy disk. Fixes: https://tracker.ceph.com/issues/47448 Signed-off-by: Karanraj Chauhan <kachau@redhat.com>
Updated tests to check whether models throw errors or show message and exit gracefully when input is an invalid smartctl json. Fixes: https://tracker.ceph.com/issues/47448 Signed-off-by: Karanraj Chauhan <kachau@redhat.com>
diskprediction_local unit tests do not have tox (make check) declarations. tox.ini installs the requirements needed by diskprediction_local module, so the tests run in the expected environment. Tests can now be run with: $ ./do_cmake $ cd build $ ctest -R run-tox-mgr-diskprediction_local -V Fixes: https://tracker.ceph.com/issues/47448 Signed-off-by: Yaarit Hatuka <yaarit@redhat.com>
…n tests In order to verify that diskpredicion_local models function properly across all supported distribution, we can simply run their unit tests via this rados suite. We do not need to bring up a cluster since we use smartctl JSON samples to mock the input to the models. This by itself will mock the loading of the required dependencies, and will provide us with an accurate indication regarding the behavior of diskprediction_local module on each distribution. Please note: 1. These tests will fail when: - sklearn or numpy are not installed. - sklearn or numpy are installed, but with different (older or newer) versions which cause the code to break (due to lack of backward compatibility, or API changes). 2. Currently the tests print warnings that several functions used by the models will be deprecated in newer sklearn versions, but the tests are considered to pass successfully. 3. Nautilus runs python 2. sklearn 0.20 was the last version to support Python 2.7. Fixes: https://tracker.ceph.com/issues/47448 Signed-off-by: Yaarit Hatuka <yaarit@redhat.com>
jenkins retest this please |
Logic was missing, and diskprediction_local CMake tests could not pass. Signed-off-by: Yaarit Hatuka <yaarit@redhat.com>
Regarding backporting these tests to Octopus and Nautilus: The missing sklearn dependency is fixed and merged in master: Whenever we do backport to Nautilus though, we should change: |
Update: We still wait for the availability of scikit-learn package in EPEL 8. |
@yaarith i cannot find a bugzilla ticket asking for python3-scikit-learn EPEL8. neither can i find the epel8 branch at https://src.fedoraproject.org/rpms/python-scikit-learn . have we ping'ed the maintainer for creating the epel8 branch? |
Hi @ktdreyer, regarding @tchaikov's comment here, I'm not sure what the next steps are once we have the packages ready on our end. |
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution! |
This PR adds a sample smartctl json and unit tests to check if the Red Hat predictor and ProphetStor predictor models don't throw errors and yield valid outputs, when using the versions of python packages specified in requirements.txt.
Fixes: https://tracker.ceph.com/issues/47448
Signed-off-by: Karanraj Chauhan kachau@redhat.com
Hey @yaarith, I wrote some tests to address point 1 of this issue. Please lmk what you think :)
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox