Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scikit-learn to CMSSW environment #2387

Closed
gkasieczka opened this issue Jul 13, 2016 · 33 comments
Closed

Add scikit-learn to CMSSW environment #2387

gkasieczka opened this issue Jul 13, 2016 · 33 comments
Assignees

Comments

@gkasieczka
Copy link

Dear Colleagues,

I would like to request adding scikit-learn [1] to the CMSSW enviroment. scikit-learn is a widely used Python package for machine learning. It is also increasingly used in CMS, especially in b-tagging.

As pointed out by Valentin ([2]) spec-files and RPMs already exist in a branch of cmsdist [3].

The spec for scikit-learn is here:
https://github.com/cms-sw/cmsdist/blob/comp_gcc493/py2-scikit-learn.spec

I would however propose to move to the latest version of scikit learn (0.17.1).

The key dependencies are numpy and scipy (both have spec files in [4] as well).

I would also suggest to include Pandas [4,5] which offers much convenience for data handling as input to scikit.

Thank you!

[1] http://scikit-learn.org/stable/
[2] https://hypernews.cern.ch/HyperNews/CMS/get/swDevelopment/3355/1.html
[3] https://github.com/cms-sw/cmsdist/tree/comp_gcc493
[4] http://pandas.pydata.org/
[5] https://github.com/cms-sw/cmsdist/blob/comp_gcc493/py2-pandas.spec

@cmsbuild
Copy link
Contributor

A new Issue was created by @gkasieczka .

@davidlange6, @smuzaffar, @Degano, @davidlt, @Dr15Jones can you please review it and eventually sign/assign? Thanks.

cms-bot commands are list here cms-sw/cmssw#13029

@jpata
Copy link

jpata commented Jul 13, 2016

Dear BTV conveners (@acaudron @pvmulder @mverzett @imarches @carolinecollard, ...)
Dear Tau conveners (@veelken, ...)

would you lend your support to having a recent version scikit-learn available with CMSSW? That would simplify training efforts for various MVA-s.

@davidlt
Copy link
Contributor

davidlt commented Jul 13, 2016

@davidlange6 your ACK will be required here.

@mverzett
Copy link

@jpata, that would be indeed a nice addition indeed. I am currently running everything on a custom anaconda build because of this.

It would be also nice if the package is kept up-to-date.

@blinkseb
Copy link

Can we also please update the version of matplotlib we ship in CMSSW? It's currently very old, and it's the perfect tool to plot results from scikit :)

@davidlt
Copy link
Contributor

davidlt commented Jul 13, 2016

We did have 1.4.1 version of matplotlib in 2015, but that had to reverted due to some issues. Sadly commit message does not indicate what they were thus will be hard to follow up. @Degano do you recall what were the issues?

Other than that, all of this would be first tested in DEVEL IBs before proposing final changes to stable IBs.

Keeping it up do date will require someone maintaining it, or at least someone checking for new versions/options and making CMSDIST issues to update.

@davidlt
Copy link
Contributor

davidlt commented Jul 13, 2016

@vkuznet
Copy link
Contributor

vkuznet commented Jul 13, 2016

I maintain ML stack within comp_gcc493 repository. And, right now building recent version of numpy, scipy, pandas, scikit-learn, xgboost, vw, etc. All of them are in use of package I'm responsible for, DCAFPilot.

@mbluj
Copy link

mbluj commented Jul 13, 2016

Hello,

as Tau conveners were pinged I am answering: yes, we are interested in having scikit-learn available with external packages of CMSSW which will make it easier studies towards replacement our TMVA based tools by ones based on scikit-learn.
Best,
Michal

@ghost ghost assigned iahmad-khan Aug 1, 2016
@iahmad-khan
Copy link
Contributor

Matplotlib and numpy related changes have been merged in #2391

@iahmad-khan
Copy link
Contributor

Merged in #2426

@veelken
Copy link

veelken commented Aug 4, 2016

Dear all,

can you let me know in which CMSSW version (or IB) scikit-learn will be included by default.
One of the tau ID experts in the Tau POG would like to start using it.

Thank you very much,

Christian

@davidlange6
Copy link
Contributor

at this moment, its in the 81x integration build. It will be part of 810pre10 and then it can be tested and back ported to 80x if desired (I would desire it).

david

On Aug 4, 2016, at 10:52 AM, Christian Veelken <notifications@github.commailto:notifications@github.com> wrote:

Dear all,

can you let me know in which CMSSW version (or IB) scikit-learn will be included by default.
One of the tau ID experts in the Tau POG would like to start using it.

Thank you very much,

Christian


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com//issues/2387#issuecomment-237492913, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEzyww8PURxR063dzN7bv-V2V2ObHi8cks5qcag5gaJpZM4JLOcz.

@jpata
Copy link

jpata commented Aug 25, 2016

Hi @davidlange6 , pinging also @gkasieczka

scikit-learn doesn't seem to be there in the 810_pre10, perhaps we want to reopen:

$ find /cvmfs/cms.cern.ch/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_0_pre10 -path "*scikit*" | wc -l
0

Or am I looking in the wrong place?

@gkasieczka
Copy link
Author

Hi,

I would haved expected it to show up in:
/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external
(where it isn't either).

@smuzaffar
Copy link
Contributor

@gkasieczka , it is only available in CMSSW_8_1_DEVEL_X IBs

scram p CMSSW_8_1_DEVEL_X_2016-08-24-2300
cd CMSSW_8_1_DEVEL_X_2016-08-24-2300
cmsenv
python -c 'import sklearn; print "OK"'

can you please check and confirm if it works as expected? then we can include it in normal 80X IBs.

@davidlange6
Copy link
Contributor

ah - thanks for checking - it seems we only put this into the DEVEL branches and not the mainstream. Not sure why. Will fix asap.

On Aug 25, 2016, at 11:56 AM, gkasieczka notifications@github.com wrote:

Hi,

I would haved expected it to show up in:
/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external
(where it isn't either).


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@gkasieczka
Copy link
Author

@smuzaffar Yes, the tests works. However, the DEVEL branch still take it from /cvmfs/cms-ib.cern.ch/ which is not guaranteed to be available on grid sites.

@davidlange6 Thank you!

@aknayak
Copy link

aknayak commented Aug 31, 2016

Hi, I checked that scikit-learn is available in CMSSW_8_1_0_pre11. But, I did not find root_numpy. Is it not needed? I think, it is needed to convert root trees to read in sklearn. Is it possible to add it? Thanks.

@aknayak
Copy link

aknayak commented Oct 6, 2016

Hi, there is a problem in 810_pre12 as given below. It works ok in pre11. problem seems to be due to scipy. How to find out which version of sklean and scipy included in pre11 and pre12?

import sklearn
File "/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-scikit-learn/0.17.1-giojec/lib/python2.7/site-packages/sklearn/init.py", line 57, in
from .base import clone
File "/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-scikit-learn/0.17.1-giojec/lib/python2.7/site-packages/sklearn/base.py", line 11, in
from .utils.fixes import signature
File "/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-scikit-learn/0.17.1-giojec/lib/python2.7/site-packages/sklearn/utils/init.py", line 11, in
from .validation import (as_float_array,
File "/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-scikit-learn/0.17.1-giojec/lib/python2.7/site-packages/sklearn/utils/validation.py", line 16, in
from ..utils.fixes import signature
File "/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-scikit-learn/0.17.1-giojec/lib/python2.7/site-packages/sklearn/utils/fixes.py", line 324, in
from scipy.sparse.linalg import lsqr as sparse_lsqr
File "/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-scipy/0.16.1-giojec/lib/python2.7/site-packages/scipy/sparse/linalg/init.py", line 109, in
from .isolve import *
File "/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-scipy/0.16.1-giojec/lib/python2.7/site-packages/scipy/sparse/linalg/isolve/init.py", line 8, in
from .lgmres import lgmres
File "/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-scipy/0.16.1-giojec/lib/python2.7/site-packages/scipy/sparse/linalg/isolve/lgmres.py", line 8, in
from scipy.linalg import get_blas_funcs
File "/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-scipy/0.16.1-giojec/lib/python2.7/site-packages/scipy/linalg/init.py", line 172, in
from .misc import *
File "/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-scipy/0.16.1-giojec/lib/python2.7/site-packages/scipy/linalg/misc.py", line 6, in
from .lapack import get_lapack_funcs
File "/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-scipy/0.16.1-giojec/lib/python2.7/site-packages/scipy/linalg/lapack.py", line 356, in
from scipy.linalg import _flapack
ImportError: /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-scipy/0.16.1-giojec/lib/python2.7/site-packages/scipy/linalg/flapack.so: undefined symbol: sgegv

@davidlange6
Copy link
Contributor

interesting - we definitely need a unit test...

you can look at

scram tool list
scram tool info

On Oct 6, 2016, at 2:32 PM, aknayak notifications@github.com wrote:

Hi, there is a problem in 810_pre12 as given below. It works ok in pre11. problem seems to be due to scipy. How to find out which version of sklean and scipy included in pre11 and pre12?

import sklearn
File "/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-scikit-learn/0.17.1-giojec/lib/python2.7/site-packages/sklearn/init.py", line 57, in
from .base import clone
File "/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-scikit-learn/0.17.1-giojec/lib/python2.7/site-packages/sklearn/base.py", line 11, in
from .utils.fixes import signature
File "/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-scikit-learn/0.17.1-giojec/lib/python2.7/site-packages/sklearn/utils/init.py", line 11, in
from .validation import (as_float_array,
File "/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-scikit-learn/0.17.1-giojec/lib/python2.7/site-packages/sklearn/utils/validation.py", line 16, in
from ..utils.fixes import signature
File "/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-scikit-learn/0.17.1-giojec/lib/python2.7/site-packages/sklearn/utils/fixes.py", line 324, in
from scipy.sparse.linalg import lsqr as sparse_lsqr
File "/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-scipy/0.16.1-giojec/lib/python2.7/site-packages/scipy/sparse/linalg/init.py", line 109, in
from .isolve import *
File "/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-scipy/0.16.1-giojec/lib/python2.7/site-packages/scipy/sparse/linalg/isolve/init.py", line 8, in
from .lgmres import lgmres
File "/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-scipy/0.16.1-giojec/lib/python2.7/site-packages/scipy/sparse/linalg/isolve/lgmres.py", line 8, in
from scipy.linalg import get_blas_funcs
File "/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-scipy/0.16.1-giojec/lib/python2.7/site-packages/scipy/linalg/init.py", line 172, in
from .misc import *
File "/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-scipy/0.16.1-giojec/lib/python2.7/site-packages/scipy/linalg/misc.py", line 6, in
from .lapack import get_lapack_funcs
File "/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-scipy/0.16.1-giojec/lib/python2.7/site-packages/scipy/linalg/lapack.py", line 356, in
from scipy.linalg import flapack
ImportError: /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-scipy/0.16.1-giojec/lib/python2.7/site-packages/scipy/linalg/_flapack.so: undefined symbol: sgegv


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@davidlt
Copy link
Contributor

davidlt commented Oct 6, 2016

Correct. In order to make sure things are working:

  • Tests are needed to make sure all components are working correctly together (they aren't according to this issue)
  • Tests are needed to ensure that result continues to be valid while we are updating compilers and other libraries
  • Tests must be part of CMSSW IB otherwise no one will notice failures

In this particular case, I guess, py2-scipy was built against system lapack library and in CMSSW we use our own, which does not provide sgegv symbol. Our CMSSW_8_1_0_pre12/external/slc6_amd64_gcc530/lib/liblapack.so.3 does not have this symbol. Looks like this particular symbol has been deprecated in lapack library.

Well, we don't run any tests while building RPMs thus we never know if package is working.

@davidlange6
Copy link
Contributor

so the problem is that we build against the wrong lapack library - @iahmad-khan - can you have a look at using the one we distribute while building py2-scipy?

@iahmad-khan
Copy link
Contributor

i am looking into it

""In lapack 3.6.0 some deprecated functions which are used in scipy were dropped. Have to use lapack 3.5.0 or lower in order to get things to work back or upgrade scipy. ""

@iahmad-khan
Copy link
Contributor

SciPy 0.16.1 [released 2015-10-24] , that is in use now , is too old , the newest is SciPy 0.18.0 [ released 2016-07-25] . i am going to upgrade and test.

@davidlange6
Copy link
Contributor

ok - so likely my fault for thinking to update lapack.
#2527
my notes for why are lacking - hopefully this does not break something else (I think its a problem for something else not yet introduced, but thats ok)

On Oct 6, 2016, at 3:42 PM, Ijaz ahmad khan notifications@github.com wrote:

i am looking into it

""In lapack 3.6.0 some deprecated functions which are used in scipy were dropped. Have to use lapack 3.5.0 or lower in order to get things to work back or upgrade scipy. ""


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@iahmad-khan
Copy link
Contributor

@davidlange6 , ok , so should i downgrade lapack or upgrade scipy?

@davidlange6
Copy link
Contributor

best to upgrade scipy if that is an option. yes.

On Oct 6, 2016, at 3:58 PM, Ijaz ahmad khan notifications@github.com wrote:

@davidlange6 , ok , so should i downgrade lapack or upgrade scipy?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@iahmad-khan
Copy link
Contributor

Ok

@davidlt
Copy link
Contributor

davidlt commented Oct 6, 2016

Add latest version of scipy to DEVEL IBs.

@iahmad-khan
Copy link
Contributor

ok , [SciPy 0.18.0 released 2016-07-25] is the latest.

@davidlt
Copy link
Contributor

davidlt commented Oct 6, 2016

@aknayak if you want this working in future release, you (or someone else) will need to provide some QA to validate this machinery.

@aknayak
Copy link

aknayak commented Oct 6, 2016

I can do a quick test, let me know the version where it is included.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests