New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add scikit-learn to CMSSW environment #2387
Comments
A new Issue was created by @gkasieczka . @davidlange6, @smuzaffar, @Degano, @davidlt, @Dr15Jones can you please review it and eventually sign/assign? Thanks. cms-bot commands are list here cms-sw/cmssw#13029 |
Dear BTV conveners (@acaudron @pvmulder @mverzett @imarches @carolinecollard, ...) would you lend your support to having a recent version scikit-learn available with CMSSW? That would simplify training efforts for various MVA-s. |
@davidlange6 your ACK will be required here. |
@jpata, that would be indeed a nice addition indeed. I am currently running everything on a custom anaconda build because of this. It would be also nice if the package is kept up-to-date. |
Can we also please update the version of matplotlib we ship in CMSSW? It's currently very old, and it's the perfect tool to plot results from scikit :) |
We did have 1.4.1 version of matplotlib in 2015, but that had to reverted due to some issues. Sadly commit message does not indicate what they were thus will be hard to follow up. @Degano do you recall what were the issues? Other than that, all of this would be first tested in DEVEL IBs before proposing final changes to stable IBs. Keeping it up do date will require someone maintaining it, or at least someone checking for new versions/options and making CMSDIST issues to update. |
I maintain ML stack within comp_gcc493 repository. And, right now building recent version of numpy, scipy, pandas, scikit-learn, xgboost, vw, etc. All of them are in use of package I'm responsible for, DCAFPilot. |
Hello, as Tau conveners were pinged I am answering: yes, we are interested in having scikit-learn available with external packages of CMSSW which will make it easier studies towards replacement our TMVA based tools by ones based on scikit-learn. |
Matplotlib and numpy related changes have been merged in #2391 |
Merged in #2426 |
Dear all, can you let me know in which CMSSW version (or IB) scikit-learn will be included by default. Thank you very much, Christian |
at this moment, its in the 81x integration build. It will be part of 810pre10 and then it can be tested and back ported to 80x if desired (I would desire it). david On Aug 4, 2016, at 10:52 AM, Christian Veelken <notifications@github.commailto:notifications@github.com> wrote: Dear all, can you let me know in which CMSSW version (or IB) scikit-learn will be included by default. Thank you very much, Christian — |
Hi @davidlange6 , pinging also @gkasieczka
Or am I looking in the wrong place? |
Hi, I would haved expected it to show up in: |
@gkasieczka , it is only available in CMSSW_8_1_DEVEL_X IBs
can you please check and confirm if it works as expected? then we can include it in normal 80X IBs. |
ah - thanks for checking - it seems we only put this into the DEVEL branches and not the mainstream. Not sure why. Will fix asap.
|
@smuzaffar Yes, the tests works. However, the DEVEL branch still take it from @davidlange6 Thank you! |
Hi, I checked that scikit-learn is available in CMSSW_8_1_0_pre11. But, I did not find root_numpy. Is it not needed? I think, it is needed to convert root trees to read in sklearn. Is it possible to add it? Thanks. |
Hi, there is a problem in 810_pre12 as given below. It works ok in pre11. problem seems to be due to scipy. How to find out which version of sklean and scipy included in pre11 and pre12? import sklearn |
interesting - we definitely need a unit test... you can look at scram tool list
|
Correct. In order to make sure things are working:
In this particular case, I guess, py2-scipy was built against system lapack library and in CMSSW we use our own, which does not provide Well, we don't run any tests while building RPMs thus we never know if package is working. |
so the problem is that we build against the wrong lapack library - @iahmad-khan - can you have a look at using the one we distribute while building py2-scipy? |
i am looking into it ""In lapack 3.6.0 some deprecated functions which are used in scipy were dropped. Have to use lapack 3.5.0 or lower in order to get things to work back or upgrade scipy. "" |
SciPy 0.16.1 [released 2015-10-24] , that is in use now , is too old , the newest is SciPy 0.18.0 [ released 2016-07-25] . i am going to upgrade and test. |
ok - so likely my fault for thinking to update lapack.
|
@davidlange6 , ok , so should i downgrade lapack or upgrade scipy? |
best to upgrade scipy if that is an option. yes.
|
Ok |
Add latest version of scipy to DEVEL IBs. |
ok , [SciPy 0.18.0 released 2016-07-25] is the latest. |
@aknayak if you want this working in future release, you (or someone else) will need to provide some QA to validate this machinery. |
I can do a quick test, let me know the version where it is included. |
Dear Colleagues,
I would like to request adding scikit-learn [1] to the CMSSW enviroment. scikit-learn is a widely used Python package for machine learning. It is also increasingly used in CMS, especially in b-tagging.
As pointed out by Valentin ([2]) spec-files and RPMs already exist in a branch of cmsdist [3].
The spec for scikit-learn is here:
https://github.com/cms-sw/cmsdist/blob/comp_gcc493/py2-scikit-learn.spec
I would however propose to move to the latest version of scikit learn (0.17.1).
The key dependencies are numpy and scipy (both have spec files in [4] as well).
I would also suggest to include Pandas [4,5] which offers much convenience for data handling as input to scikit.
Thank you!
[1] http://scikit-learn.org/stable/
[2] https://hypernews.cern.ch/HyperNews/CMS/get/swDevelopment/3355/1.html
[3] https://github.com/cms-sw/cmsdist/tree/comp_gcc493
[4] http://pandas.pydata.org/
[5] https://github.com/cms-sw/cmsdist/blob/comp_gcc493/py2-pandas.spec
The text was updated successfully, but these errors were encountered: