New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr/diskprediction Add diskprediction plugin service #22239
Conversation
b847f8b
to
3cd1144
Compare
Discussed offline yesterday. Summary:
|
Blocked by this work: https://pad.ceph.com/p/smart |
92a3568
to
5bc4cbc
Compare
4d237ce
to
407813e
Compare
Discussed offline today:
@hsiang41 Let me know if I missed anything or got it wrong! |
doc/mgr/diskprediction.rst
Outdated
|
||
The connection settings can be configured on any machine with the proper cephx | ||
credentials; they are usually the monitor node with client.admin keyring. | ||
Run the following command to set up connection betweet Ceph system and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/betweet/between/
8b07896
to
a2614c2
Compare
50ed034
to
fabaaca
Compare
Add plugin failed reason in the command status. Rename partial command prefix to be device. Change local predictor data related on the devicehealth history. Signed-off-by: Rick Chen rick.chen@prophetstor.com
fabaaca
to
8ba72c8
Compare
I'm getting
when using the local prediction mode. |
HI Sage:
Do you pip install the requirements.txt that stored in the diskprediction plugin installed path?
Because the old pickle did not support unicode. Please try below command.
pip install -r <mgr installed path>/diskprediction/requirements.txt –upgrade
From: Sage Weil <notifications@github.com>
Sent: Saturday, September 1, 2018 6:21 AM
To: ceph/ceph <ceph@noreply.github.com>
Cc: Rick Chen <rick.chen@prophetstor.com>; Mention <mention@noreply.github.com>
Subject: Re: [ceph/ceph] mgr/diskprediction Add diskprediction plugin service (#22239)
I'm getting
$ bin/ceph device predict-life-expectancy WDC_WD6002FFWX-68TZ4N0_K1GX50LD
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2018-08-31 17:18:14.302 7faa22924700 -1 WARNING: all dangerous and experimental features are enabled.
2018-08-31 17:18:14.321 7faa22924700 -1 WARNING: all dangerous and experimental features are enabled.
2018-08-31 17:18:14.578 7faa22924700 0 mgrc start_command no mgr session (no running mgr daemon?), waiting
Error EINVAL: Traceback (most recent call last):
File "/home/sage/src/ceph/src/pybind/mgr/diskprediction/module.py", line 329, in handle_command
return fun(inbuf, cmd)
File "/home/sage/src/ceph/src/pybind/mgr/diskprediction/module.py", line 298, in _predict_life_expectancy
result = obj_predictor.query_info('', cmd['dev_id'], '')
File "/home/sage/src/ceph/src/pybind/mgr/diskprediction/common/localpredictor.py", line 106, in query_info
predicted_result = self._local_predict(predict_datas)
File "/home/sage/src/ceph/src/pybind/mgr/diskprediction/common/localpredictor.py", line 74, in _local_predict
return obj_predictor.predict(smart_datas)
File "/home/sage/src/ceph/src/pybind/mgr/diskprediction/predictor/DiskFailurePredictor.py", line 210, in predict
clf = joblib.load(modelpath)
File "/usr/lib/python2.7/site-packages/joblib/numpy_pickle.py", line 578, in load
obj = _unpickle(fobj, filename, mmap_mode)
File "/usr/lib/python2.7/site-packages/joblib/numpy_pickle.py", line 508, in _unpickle
obj = unpickler.load()
File "/usr/lib64/python2.7/pickle.py", line 864, in load
dispatch[key](self)
File "/usr/lib/python2.7/site-packages/joblib/numpy_pickle.py", line 328, in load_build
Unpickler.load_build(self)
File "/usr/lib64/python2.7/pickle.py", line 1230, in load_build
d = inst.__dict__
AttributeError: 'unicode' object has no attribute '__dict__'
when using the local prediction mode.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#22239 (comment)> , or mute the thread <https://github.com/notifications/unsubscribe-auth/AFXXL6Fczvp60nS2BGXPtcMAsrMR-LO7ks5uWbbpgaJpZM4UNiRa> . <https://github.com/notifications/beacon/AFXXL8toeqVi1g4AmhXvBKks5ajpgWGYks5uWbbpgaJpZM4UNiRa.gif>
…---
Avast 防毒軟體已檢查此封電子郵件的病毒。
https://www.avast.com/antivirus
google==2.0.1
google-api-python-client==1.7.3
google-auth==1.5.0
google-auth-httplib2==0.0.3
google-gax==0.12.5
googleapis-common-protos==1.5.3
grpc==0.3.post19
grpc-google-logging-v2==0.8.1
grpc-google-pubsub-v1==0.8.1
grpcio==1.14.1
mock==2.0.0
numpy==1.9.0
scipy==0.13.3
sklearn==0.0
|
@liewegas Do you pip install the requirements.txt that stored in the diskprediction plugin installed path? Because the numpy library require at least value 1.8.2. |
Okay, so we have a larger challenge here of translating the requirements.txt into rpm package versions and adding them to ceph.spec.in and debian/control. I'm not sure where the pickle version you're referring to is coming from? |
The pickle is numpy depended library. Because the older version numpy did not support the UNICODE. Can we modify the install-deps.sh to add pip install requirements? |
@jcsp I'm assuming the preferred path is to rely on installed packages for everything. This makes me a bit nervous as there are a lot of dependencies here. Is there an option to do a virtualenv and bundle the dependencies? |
@liewegas Bunding python dependencies is generally impractical if they have native code components (as e.g. numpy does). The good news is that scikit-learn (version 0.18.1) is already packaged in Fedora at least. The google/grpc bits I'm not so sure, but given that they're only used by DiskProphet's product users (right?) maybe we can worry less about them; perhaps any packaging that needs doing for that could happen outside of the upstream Ceph packaging. @hsiang41 couple of questions for you I see that the requirements.txt is referencing sklearn==0.0, where the sklearn page on pypi says to use scikit-learn instead. Is there significant difference in the interfaces? |
local model -> recall: 63%, false alarm: 6.5%, accuracy: 78.25% |
ad9ac72
to
4921a4e
Compare
@jcsp The sklearn interface is same as scikit-learn. I modify the requirements.txt to include scikit-learn. Also update the depended library as below: |
55a274f
to
1b723ec
Compare
@hsiang41 it would be useful to work out how those versions relate to what's available in major distros, (e.g. centos7 has numpy 1.7.1, is that recent enough?) and whether the versions in distros are compatible with your code. The goal is to know whether we can just add dependency lines to the RPM packaging, or whether somebody would need to create special packages in order to use this ceph-mgr module on certain distros. Regarding the grpc dependencies, am I correct in thinking those are only relevant to people using your cloud engine? |
@jcsp I have test below rpm package in my machine(CentOS Linux release 7.5.1804 (Core)), that can work with local predictor. The grpc dependencies use by the diskprediction plugin to push data into the colud service. But I did not find any rpm package about the grpc. Do you have any advise about this problem? |
1b723ec
to
5933373
Compare
@@ -0,0 +1,1775 @@ | |||
# Generated by the protocol buffer compiler. DO NOT EDIT! | |||
# source: mainServer.proto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we include mainServer.proto
in the source tree instead of the generated python binding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tchaikov If we add mainServer.proto in the source tree, so We need add convertor script for proto to py code in the ceph-mgr deploy script. Do we need do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hsiang41 i am not sure if i am following you. could you define "ceph-mgr deploy script"? is it the src/pybind/mgr/diskprediction/CMakeLists.txt
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tchaikov My concern is the proto generate python apply to CMakeLists.txt that mean the ceph need install grpc library and grpc plugins library. These library did not have rpm package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hsiang41 i see.. so what we need is to ready grpc_tools.protoc
and googleapis-common-protos
python modules for compiling the .proto
definition file. since this grpc server is hosted in cloud, and the grpc service is only available to user who use this cloud service. i'd suggest package it downstream.
@@ -0,0 +1,77 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my concern is the license of the pre-trained SVM models. because this dataset are pickled SVM classifiers, which by themselves are machine readable after being un-pickled. but what about its "source"? or are they in the source form already? if yes, how is user allowed to "modify" it in an effective way even he/she understands the SVM and the python language? as LGPL 2.1 requires the work to be accompanied with the source code of it. if we cannot provide the source, we will have to re-distribute these data files in a different license.
i had a hard time when preparing[0] a software package which used a statistical language model for debian. the packaged software was licensed under LGPL2.1 and CDDL. and the package was rejected by debian's FTP master because of the license of the pre-trained data: we licensed it under the same dual license.
yes, in this context, we are the upstream developers not downstream maintainer. but i think it's worthy of mention.
[0] https://lists.debian.org/debian-devel/2008/05/msg00005.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting thought. In my opinion, the original dataset would not constitute "source code" for licensing purposes, but I can see how there could be some debate. It is probably prudent to apply a different license to the model.
Perhaps just declare the model files as public domain -- @hsiang41 , what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a different license for the model files would be the easiest thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jcsp Agree. But I did not know how to do this. Do you have sample for this? or Need I apply something comment in my project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hsiang41 in the top level COPYING
file you can see a list of various exceptions. I'd add a section at the bottom of that, and also add a COPYING file in your models/ directory that makes a statement that these particular files are donated by ProphetStor to be used by anyone for any purpose and you make no copyright claims over them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hsiang41 probably you could update https://github.com/ceph/ceph/blob/master/COPYING , https://github.com/ceph/ceph/blob/master/debian/copyright accordingly in this PR ? like
diff --git a/COPYING b/COPYING
index cd45ce086a..f0a37b8bf3 100644
--- a/COPYING
+++ b/COPYING
@@ -145,3 +145,7 @@ Files: src/include/timegm.h
Copyright (C) Copyright Howard Hinnant
Copyright (C) Copyright 2010-2011 Vicente J. Botet Escriba
License: Boost Software License, Version 1.0
+
+Files: src/pybind/mgr/diskprediction/predictor/models/*
+Copyright: None
+License: Public domain
OK, so this is the challenging part. If we have a dependency on python-scikit-learn, then we probably also need to be providing (+ therefore building, as we can't rely on third party repos) that package. I see that Fedora has a python-scikit-learn package, Ubuntu 16.04 has a python-sklearn package, and SUSE has it in tumbleweed+leap 15, but not in SLES. In the short term, the answer is probably to just include this module, but make clear to users that they will need to find their own python-scikit-learn packages before using it.
That's probably up to you, as it would only be diskprophet customers that are affected. Your options are basically to build packages yourself, or ask your customers to install using pip if they are comfortable with that. |
@votdev Can you help to review my changed that already follow your advise? |
Refresh local predictor model. Signed-off-by: Rick Chen rick.chen@prophetstor.com
5933373
to
f3f595e
Compare
1. Refresh diskprediction plugin doc guide. 2. Change the COPYING file. Signed-off-by: Rick Chen rick.chen@prophetstor.com
Correct command "ceph device set-cloud-prediction-config' typo. Signed-off-by: Rick Chen rick.chen@prophetstor.com
Merged via #24104 |
The DiskProphet plugin service continuously collects and sends time series data to an DiskProphet server. Users has the option to fetch physical disk of the osd predicted health state. The physical disk prediction result store in the ceph device info(#22423).
The plugin has two mode.
Local - The plugin include internal predictor module. It can use device health data to do the simple prediction.
Cloud - This mode related on the plugin pushed data that include ceph cluster/mon/osd status and workload to do the device health predicted.
Signed-off-by: Rick Chen rick.chen@prophetstor.com
To test, ping sage in #ceph-devel for a credential to use, or see /ceph/diskprediction_config.txt on teuthology.