Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement lightweight CRAB distribution as part of CMSSW distribution #4844

Closed
belforte opened this issue Oct 31, 2019 · 86 comments
Closed

implement lightweight CRAB distribution as part of CMSSW distribution #4844

belforte opened this issue Oct 31, 2019 · 86 comments

Comments

@belforte
Copy link
Member

As per kickoff chat on Oct 30 . Participant were:
Steano, Marco Mascheroni, Leonardo, Bockjoo, Shahzad, David Lange.

Notes from the chat from Stefano. If anything wrong is written, my fault.
These are also in the Minutes in Indico:
https://indico.cern.ch/event/859940/

Thanks Leonardo for taking responsibility for this from now on.

Crab Client is only python 2.7 with only dependance from pycurl.
Crab developers do not want to bring any binary.

We agreed to try to move CRAB Client into an external distributed via CMSSW
distribution release. The idea is that CRAB Client and CMSSW can be
executed in the same environment. Many users have python scripts where
they call CRAB python API while manipulating CMSSW configuration, we
do not want to break those.

CrabClient will not be part of the release, so CRAB version stays the same whichever
CMSSW version people use. CRAB version will be updated when the distribution is
updated, i.e. at the time of any new release build (usually every two weeks).
The distribution update will push out whatever CRAB tag we set in the spec file
(in CMSSW repository) via a PullRequest. The new version will be available in the
IB in the meanwhile.

Given the above constrains we will have two versions of CRAB Client in the
CMSSW distribution: pre and prod, or beta an prod, or whatever... which we
will update via separate PR. So that we can update the PRE, make sure it
works, and only when sure, update production. If we push a bug in production
we may need to wait two weeks for the fix !!

If this works, Bockjoo can stop the current machinery for CRAB and will not need
to do anything.

We will most likely need to change/add some setup script in CRAB, details need
to be figured out.

Marco Mascheroni is available to help. Shahzad is willing to help us write
the needed spec file.

Leonardo is in charge of this effort and will communicate progress or problems.

Next week Marco will be at CERN and Leonardo, Shahzad and Marco will get
toghether to get things started.,

We do not need to have python3 compatible version. But of course in time
we may want to.

@lecriste
Copy link
Contributor

Update from November 6th:
@smuzaffar will incorporate the current spec file in CMSSW and try to build RPMs. A crab-dev command will be available upon issuing cmsenv in order to perform the needed test. When we are happy it will be renamed crab and we can stop sourcing the current file in /cvmfs/cms.cern.ch/crab3/.
If this all works, the procedure to release a new client version will not change (apart from the spec file location).
A crab-pre command will also be available, pointing to the preprod RPM.
Need to make sure that the use case where the user runs a python script to build both CRAB config and CMSSW PSet as python objects and submit on the fly via CRAB client API is preserved.

@smuzaffar
Copy link
Contributor

@lecriste @belforte please see cms-sw/cmsdist#5356
this is my first attepmt to include crab client in cmssw software stack. My initial understanding was that there is few simple python scripts for crab client which only depend on pycurl but it turned out (trying to build https://github.com/cms-sw/cmsdist/blob/comp_gcc630/crabclient.spec ) that it depends on many python and DMWM packages. For now I have simplified the build recipe ( https://github.com/cms-sw/cmsdist/pull/5356/files#diff-7d6e1f3d0fe3aaef69481cabb9985504 ) to just get directly the DMWM tools and only depend on py2-pycurl py2-python-cjson

Once the cmsdist PR is merged then new crab wrapper scripts ( https://github.com/cms-sw/cms-common/blob/master/common/crab-prod {dev,pre} ) will be available under /cvmfs/cms-ib.cern.ch/week1/common and you should be able to run /cvmfs/cms-ib.cern.ch/week1/common/crab-prod. I have tested this locally and was able to run sumbit and query commands.

@belforte , this only solves the command-line usage of crab-client. The CRAB python API is still need more work.

@smuzaffar
Copy link
Contributor

By the way, as mentioned during our meeting. Can we change crab so that the crab wrapper script always unset cmssw env (i.e by running eval scram unset -sh) and crab python sets it back when ever it has to deal with cmssw?

@belforte
Copy link
Member Author

belforte commented Nov 15, 2019 via email

@belforte
Copy link
Member Author

belforte commented Nov 15, 2019 via email

@smuzaffar
Copy link
Contributor

crab wrapper script finds the latest version of crab ( https://github.com/cms-sw/cms-common/blob/master/common/crab-prod#L11-L15 ) and source its init.sh to set env for crab. This should set the correct path to python, pycurl and any other packages needed for crab itself.

I am not sure how exactly crab uses cmssw? If it is just running some cmssw commands to dump cmssw configuration then unsetting cmssw env in crab wrapper and setting it again (in sub-shell/process) before calling cmssw will work. But if crab imports cmssw python configuration then it will not work.

@smuzaffar
Copy link
Contributor

problem with PYTHON API is that when someone imports CRABClient from their python scripts then this means python interpreter has already been started and we can not change python version for crab and might not be able to unset cmssw env

@belforte
Copy link
Member Author

one thing at a time

  1. crab wrapper : I hope we can pitch it (since I never understood the intricated nesting done in there) - was hoping for @lecriste and @mmascher to take the lead here. Can you guys ?
  2. python API: python calls CRAB/CMSSW are mixed in, but the only place where we need to do scram unsetenv is in a forked subprocess before calling dreaded grid clients like voms-proxy* and gfal* which have their own binary stack that we do no want to know about. That forked subprocess does not need CRAB env. either.
  3. python version: that's the only thing I really worry about, since at some point we may have a python 3 CRAB used with an old CMSSW release which has python 2.7 inside. Can we imagine to write CRAB Client so that it "always work" ? Or will we need to support two versions ? This would be a problem also with current setup. Mayb/hopefully I am simply confused.

@belforte
Copy link
Member Author

One problem here is that there's a large distance between what's needed to make CRABClient run, and what's declared in current spec files, and what's defined in current setup script.

FWIW I could run crab submit, crab status and crab checkwrite (hence I suspect everyhing) after:

  1. cmsenv
  2. export PATH=/afs/cern.ch/user/b/belforte/WORK/CRAB3/CRABClient/bin:$PATH
  3. export PYTHONPATH=<some stuff from github>:$PYTHONPATH
  4. where <some stuff from github> means
  • CRABClient (of course)
  • 3 files from CRABServer (including a PandaServerInterface.py which we should better deprecate)
  • WMCore (where we only use a very limited subset of course)
  • DBS/Client and DBS/PycurlClient

some additional definitions will be needed to e.g. make command autocompletion work, but I think that's what I mean when I say "CRAB Client is just a set of python script + pycurl"

question is: what's the simplest way to configure things so that thin can easily setup and run.

@belforte
Copy link
Member Author

The "python version" question needs to be clarified.
How far back in python do we need to be compatible ?
e.g. we have code (which I'd like to get rid of) to protect in case of python 2.4, see #4856 (comment)

@smuzaffar
Copy link
Contributor

I do not think we have any active cmssw release where we have python 2.4. CMSSW_5_3 uses python 2.6

@smuzaffar
Copy link
Contributor

3 python version:
If you have two different versions of CRAB for python2 and 3 then yes it will hard to setup the env. How hard it is to have crab client (and its dependencies e.g CarbServer, DBS and WMCore) to support both version of python?

@belforte
Copy link
Member Author

belforte commented Nov 18, 2019 via email

@amaltaro
Copy link

Our goal for WMCore is to have it supporting both python2.7 and python3.x, at least until we can fully migrate it to python3.

@belforte
Copy link
Member Author

belforte commented Nov 18, 2019 via email

@smuzaffar
Copy link
Contributor

smuzaffar commented Dec 16, 2019

Crab client is now available /cvmfs/cms-ib.cern.ch/week1/common/crab-{prod|dev|pre}. From lxplus, you should be able to run it without setting any env (you need cmssw env for crab-prod sumbit command only)

Can you please give it a try and see if you can run most of crab commands?

[a]

Traceback (most recent call last):
  File "crab/3.3.1912.rc1-cms/bin/crab", line 157, in <module>
    client = CRABClient()
  File "crab/3.3.1912.rc1-cms/bin/crab", line 49, in __init__
    self.subCommands = getAvailCommands()
  File "crab/3.3.1912.rc1-cms/lib/CRABClient/ClientUtilities.py", line 362, in getAvailCommands
    subcmdplugins = getPlugins(subcmdpath, subcmdname, ['SubCommand'])
  File "crab/3.3.1912.rc1-cms/lib/CRABClient/ClientUtilities.py", line 291, in getPlugins
    mod = __import__('%s.%s.%s' % (namespace, plugins, el[1]), globals(), locals(), el[1])
  File "crab/3.3.1912.rc1-cms/lib/CRABClient/Commands/getcommand.py", line 6, in <module>
    from CRABClient.UserUtilities import getMutedStatusInfo
  File "crab/3.3.1912.rc1-cms/lib/CRABClient/UserUtilities.py", line 13, in <module>
    from dbs.apis.dbsClient import DbsApi
  File "crab/3.3.1912.rc1-cms/lib/dbs/apis/dbsClient.py", line 8, in <module>
    import cjson
ImportError: No module named cjson

@smuzaffar
Copy link
Contributor

In order to update crab in cmssw distribution one needs to request changes via PR for

For now all of these are using crab client 3.3.1912.rc1

@belforte
Copy link
Member Author

belforte commented Dec 18, 2019

I can't understand this:
pycurl should comes from system too (it is available on both slc6 and slc7).

system pycurl does not have proper SSL support. Why do we build our own otherwise ?

@belforte
Copy link
Member Author

belforte commented Dec 18, 2019

we should ask Yuyi to remove cjson from dbs. I'll follow up.

@belforte
Copy link
Member Author

I tested crab-prod and while other command seemed to work, submit fails
belforte@lxplus736/TC3> /cvmfs/cms-ib.cern.ch/week1/common/crab-prod submit
Will use CRAB configuration file crabConfig.py
Warning: Incompatible CRABClient version 3.3.1912.rc1
Server is saying that compatible versions are: [u'3.3.1911.rc1', u'3.3.1911', u'3.3.1911.patch1', u'3.3.1911.rc3']
Importing CMSSW configuration demoanalyzer.py
ERROR: ImportError: No module named FWCore.ParameterSet.Config

Please email hn-cms-computing-tools@cern.ch for support with the crab.log file or crab.log URL.
Client Version: 3.3.1912.rc1
The log file /afs/cern.ch/work/b/belforte/CRAB3/TC3/crab_20191218_234140/crab.log has been uploaded automatically to the CRAB cache.
Please email the following URL 'https://cmsweb.cern.ch/crabcache/logfile?name=2019-12-18_234143_crab.log&username=belforte' to hn-cms-computing-tools@cern.ch if you need help fr

@belforte
Copy link
Member Author

@lecriste does it work for you ?

@smuzaffar
Copy link
Contributor

@belforte , pycurl just needs to load libcurl.so. As we added curl dependency for crab, so pycurl wiill load cms libcurl (which has SSL support)

[a] System pycurl and system curl

lxplus> python
Python 2.7.5 (default, Aug  7 2019, 00:51:29) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pycurl
>>> pycurl.version
'libcurl/7.29.0 NSS/3.44 zlib/1.2.7 libidn/1.28 libssh2/1.8.0'

[b] System pycurl and our curl

lxplus> source /cvmfs/cms.cern.ch/slc7_amd64_gcc820/external/curl/7.59.0/etc/profile.d/init.sh 
lxplus> python
Python 2.7.5 (default, Aug  7 2019, 00:51:29) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pycurl
>>> pycurl.version
'libcurl/7.59.0 OpenSSL/1.0.2d zlib/1.2.8'

@belforte
Copy link
Member Author

THanks, I think I understand the point about pycurl now.

I am fine with keeping CRAB Client compatible with sytem python,
although I must say that I was not expecting such a script which defines the env, runs, resets.
What happens to python scripts which were calling CRAB API's ? How do we run them ?

I am also a bit puzzled how I can go about testing new code, should I make a personal copy of /cvmfs/cms-ib.cern.ch/week1/common/_crab-startup and edit it to point PYTHONPATH to my git area ?

SImply stated, this looks different from what we talked about, or at least what I had in mind, and I am lost.

@smuzaffar
Copy link
Contributor

We can move https://github.com/cms-sw/cmsdist/blob/IB/CMSSW_11_1_X/master/_crab-startup.file in to crab repository and make it work in standalone mode too.

By the way, I noticed that crab needs to have cmssw PYTHONPATH available when it reads cmssw configuration. That was the reason you were not able to submit a job as mentioned here #4844 (comment) ) . I have proposed two changes

New crab startup script to set CRAB_CMSSW_SITE_PACKAGES env which CRABClient/JobType/CMSSWConfig.py can use to properly set the python before loading cmssw configuration.

With these two changes I was able to submit a test job

[a]
crab-prod submit -c ../conf/crab.py

Will use CRAB configuration file ../conf/crab.py
Importing CMSSW configuration /afs/cern.ch/work/m/muzaffar/crab/crab-conf/conf/pset_tutorial_analysis.py
Finished importing CMSSW configuration /afs/cern.ch/work/m/muzaffar/crab/crab-conf/conf/pset_tutorial_analysis.py
Sending the request to the server at cmsweb.cern.ch
Success: Your task has been delivered to the prod CRAB3 server.
Task name: 191219_171300:muzaffar_crab_test2_CMSSW_11_0_0_pre13_1000_step1
Project dir: crab_projects/crab_test2_CMSSW_11_0_0_pre13_1000_step1
Please use 'crab status -d crab_projects/crab_test2_CMSSW_11_0_0_pre13_1000_step1' to check how the submission process proceeds.
Log file is /afs/cern.ch/work/m/muzaffar/crab/crab-conf/run/crab_projects/crab_test2_CMSSW_11_0_0_pre13_1000_step1/crab.log

@belforte
Copy link
Member Author

All in all "starting with 2020" is a good point to change naming standard. We could e.g. drop the 3.3 which carries no info

@belforte
Copy link
Member Author

@smuzaffar see answers/comments inline

@belforte , you asked for next steps.

* `export PATH=/cvmfs/cms-ib.cern.ch/week1/common:${PATH:+$PATH}` part will not be needed once we have a cmssw release build and installed with new `crab-*` packages. Currently default `crab-*` commands (i.e. you login to lxplus and  run `crab-*` without sourcing any setup script) comes from `/cvmfs/cms.cern.ch` which is an old version crab client.

OK. But surely it will be good to remove those old commands (which I did not even know existed !), who can do this ?

* setting up `PYTHONPATH` can also be automatic. `cmsset_default.*` scripts allow us to source various startup scripts. This means once you login to lxplus then `/cvmfs/cms.cern.ch/share/cms/crab-common/1.0/lib` can be there in PYTHONPATH. I am not sure if this will break any thing. `/cvmfs/cms.cern.ch/share/cms/crab-common/1.0/lib`  only contains `CRABAPI` and `CRABClient` pythion modules, so ideally this should not break any thing. But if we want to be on safe side then we can ask users to source a script (just like they do it now) to get crab client in PYTHONPATH.
  Let me know if you prefer the automatic setup or would you prefer that users always source some crab-setup script?

Let's stay with an explicit crab-setup for a while. In a few months, when user have transitioned and
we are confident there is no problem (even if we do not expect those) we can change.

* sourcing of `etc/crab-bash-completion.sh` has been fixed and will be part of next update. This should work for all `crab-*` commands.

Thanks !

@smuzaffar
Copy link
Contributor

@belforte , the old commands are part of new distribution mechanism and were installed as a part of CMSSW_11_1_0_pre1 release. New crab version (and its crab-* commands) will be installed in default /cvmfs/cms.cern.ch once we deploy a new cmssw version.

@belforte
Copy link
Member Author

@smuzaffar as I see that you are going ahead with more work here, let me ask before you finalize next release:
when is the good time to e.g. change crab-pre.spec or crab-dev.spec to use new CRABClient tag ?

@smuzaffar
Copy link
Contributor

@belforte , I think I am mostly done with the changes. I was just waiting for CMSSW 11.1.0.pre2 to be out so that it can deploy the new crab-* versions.
Feel free to update the tag in cms-sw/cmsdist#5479 for crab-pre|dev now. If we manage to merge it today then it might end up in cmssw 11.1.0.pre2 release.

@lecriste
Copy link
Contributor

@smuzaffar do you know if anything changed in
/cvmfs/cms.cern.ch/crab3/crab.sh
?
See:
https://hypernews.cern.ch/HyperNews/CMS/get/computing-tools/5490/1/1.html

@smuzaffar
Copy link
Contributor

No nothing changed due to new crab. looks like /cvmfs/cms.cern.ch/crab3/crab_standalone.sh is missing or env CRAB_SOURCE_SCRIPT is not set properly before sourcing the crab,sh. I have no idea who creates/deploy this file

bash -ex  /cvmfs/cms.cern.ch/crab3/crab.sh
+ '[' -z '' ']'
+ CRAB_SOURCE_SCRIPT=/cvmfs/cms.cern.ch/crab3/crab_standalone.sh
++ getVariableValue '$CRABCLIENT_ROOT' bin
++ VARNAME='$CRABCLIENT_ROOT'
++ SUBDIR=bin
++ sh -c 'source /cvmfs/cms.cern.ch/crab3/crab_standalone.sh >/dev/null 2>/dev/null; if [ $? -eq 0 ] && [ -d $CRABCLIENT_ROOT ];                     then echo $CRABCLIENT_ROOT/bin; else exit 1; fi'
+ CRAB3_BIN_ROOT=

@smuzaffar
Copy link
Contributor

looks like there are broken symlinks

lrwxrwxrwx. 1 cvmfs cvmfs    85 Dec 28 21:07 crab.csh -> /cvmfs/cms.cern.ch/crab3/slc7_amd64_gcc630/cms/crabclient/3.3.2001/etc/init-light.csh
lrwxrwxrwx. 1 cvmfs cvmfs    93 Dec 27 15:07 crab_pre.csh -> /cvmfs/cms.cern.ch/crab3/slc7_amd64_gcc630/cms/crabclient/3.3.2001.rc1/etc/init-light-pre.csh
lrwxrwxrwx. 1 cvmfs cvmfs    92 Dec 27 15:07 crab_pre.sh -> /cvmfs/cms.cern.ch/crab3/slc7_amd64_gcc630/cms/crabclient/3.3.2001.rc1/etc/init-light-pre.sh
lrwxrwxrwx. 1 cvmfs cvmfs    93 Jan 22 11:41 crab_pre_standalone.csh -> /cvmfs/cms.cern.ch/crab3/slc7_amd64_gcc630/cms/crabclient/3.3.2002.rc1/etc/profile.d/init.csh
lrwxrwxrwx. 1 cvmfs cvmfs    92 Jan 22 11:41 crab_pre_standalone.sh -> /cvmfs/cms.cern.ch/crab3/slc7_amd64_gcc630/cms/crabclient/3.3.2002.rc1/etc/profile.d/init.sh
lrwxrwxrwx. 1 cvmfs cvmfs    84 Dec 28 21:07 crab.sh -> /cvmfs/cms.cern.ch/crab3/slc7_amd64_gcc630/cms/crabclient/3.3.2001/etc/init-light.sh

@lecriste
Copy link
Contributor

Ok, hopefully Bockjoo con help there.
By the way is this
https://hypernews.cern.ch/HyperNews/CMS/get/computing-tools/5489/1/2.html
expected?

@smuzaffar
Copy link
Contributor

yes this is expected ... crab-prod/pre/dev commands which are available via /cvmfs/cms.cern.ch/common currently only work for slc7. No one should use these and we should not advertise these new commands to users.

New crab packaging in cmssw should work for any arch (slc6, slc7 , cc8) but those will only be available once cmssw 11.1.0.pre is deployed. Still I would suggest not to tell users to use new commands, I would like you to test them before you inform cms users about these

@belforte
Copy link
Member Author

belforte commented Jan 22, 2020

@smuzaffar @amaltaro I have found a small sticky point.
CRABClient still has a corner case were it looks up DBS using WMCore DBSReader. As discussed elsewhere, that now requires retry decorator [1] which
AFAICT is no part of CMSSW python [2]
In COMP side where services like CRAB TaskWorker run in their own python built, we add py2-retry to the spec file [3]. But we always run CRAB Client with CMSSW python.

I see 4 ways out and would like your opinion on which to choose or to suggest a better one

  1. change CRABClient to use DASGOCLIENT instead of DBS (good thing anyhow, but takes a bit of my time)
  2. change CRABClient to use DBS python API directly instead of wrapping it in WMCore's DBSReader (takes maybe same time from me as 1.)
  3. Shahzad adds py2-retry to CMSSW's python (maybe it can be useful elsewhere as well)
  4. There is some other funciton/method/decorator in CMSSW python which achieves same result and Alan can try instead

Kindly let me know. I think we can build CRABClient with old WMCore 1.2.8 until this is sorted out, but should not get locked out of WMCore updates.

[1]
https://pypi.org/project/retry/
[2]

belforte@lxplus754/cmsdist> which python
/cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_2_6/external/slc7_amd64_gcc700/bin/python
belforte@lxplus754/cmsdist> python
Python 2.7.14+ (default, Mar 28 2018, 11:00:22) 
[GCC 7.3.1 20180127] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import retry
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named retry
>>> 

[3]
https://github.com/cms-sw/cmsdist/blob/eff210dc14ebb01e9f0f0291c3ef84cc79bf64ac/crabtaskworker.spec#L16

@belforte
Copy link
Member Author

Currently my preference is for 3. so to give me time to do 2. As to going to dasgoclient... it is appealing, but of all tools around DBS Python API is the one with the best documentation IMHO, so it looks faster. The only drawback wrt using dasgoclient is that cjson is needed, but it is there already.

@smuzaffar
Copy link
Contributor

opts 3 and 4 are not going to fix existing cmssw versions. Better to go for 1

@amaltaro
Copy link

I'm not very fond of it, but another possibility would be to actually move that function getDataTiers with the retry decorator to a different place. And we leave the getDataTiers without the decorator inside the DBSReader module. Please let me know what you think, and I can make that change.

@smuzaffar
Copy link
Contributor

By the way, new crabclient 3.3.2001 has been distributed via CMSSW 11.1.0.pre2 and is available on lxplus (under /cvmfs/cms.cern.ch). New crab-prod|dev|pre commands are now available for all archs (slc6, slc7 and cc8) on lxplus. One should be able to use these commands without sourcing any extra script (of course these commands will only work under cmssw env).

In order to use crabclient python API, one needs to source /cvmfs/cms.cern.ch/common/crab-setup.[c]sh . It just adds crab clients lib's directory in to PYTHONPATH . By default it sets crab prod but one can source /cvmfs/cms.cern.ch/common/crab-setup.[c]sh dev|pre to get pre or dev versions of crab client.

Can you please give it a try on lxplus. Again this is still under tests so please do not recommend these crab-prod|pre|dev commands to users.

@belforte
Copy link
Member Author

thanks. Yes, I will test carefully and get back to you. I will also make a PR to roll back WMCore from 1.2.9 to 1.2.8 until I have changed the code in order not to need py2-retry. I will implemento either 1. or 2. in my list. No action needed from @smuzaffar nor @amaltaro

@belforte
Copy link
Member Author

@smuzaffar crab commands autocompletion does not work. I guess this line
complete -F _UseCrab_${CRABCLIENT_TYPE} -o filenames crab
https://github.com/cms-sw/cmsdist/blob/4b38b5c8ef9fde034348f1570e6ffeb8d2fc1786/crab/crab-setup.sh.file#L15
which appears now in /cvmfs/cms.cern.ch/common/crab-setup.sh should rather be in /cvmfs/cms.cern.ch/common/crab-prod|pre|dev . Well.. it is harmless in the crab-setup.sh, simply un-needed for using python API. I do not understand how the crab-* commands are create, so do not know which change is needed. In case, it is surely fine to execute it at all times, whether one is going to use python API or not.

Of course I keep testing by sourcing myself the complete command.

@belforte
Copy link
Member Author

@smuzaffar aside from "it works", about the overall design. I.e. how it is supposed to work.

I do not see it as user-friendly that one has to surce some script only when using the python API. Is this because of the worry that adding CRAB stuff to PYTHONPATH may break something in CMSSW ? In any case it may easily lead to confusion on user sides at at minimum a lot of run/fail/oops/setup/runagain/swear-and-curse if not complain.
Did I manage to confuse you one more time with some previous message ?

I would find it simpler if we start like now : user needs to source something after "cmsenv" to use CRAB (also avoids that they source current setup on top of what is in CMSSW). Then if/when you are confident that CRAB does not break anything, we can make it part of cmsenv and dummyfy the setup script.

Or am I confused and missing some important fact ?

@smuzaffar
Copy link
Contributor

@belforte , looks like new cms-common packages was not installed for /cvmfs/cms.cern.ch . cms-common is a special package and needs special treatment from Bockjoo. I am going to ask him to update it.

New cms-common should automatically source the autocomplete scripts for crab-* commands (/cvmfs/cms.cern.ch/share/etc/profile.d/S99crab-env.sh ).

@smuzaffar
Copy link
Contributor

No, you did not confuse me. Yes, for users the instructions will be that they have to source some setup scripts.

But during this test phase, I wanted to check the proof of concept that we can run command-line crab client without sourcing anything. For now sourcing /cvmfs/cms.cern.ch/common/crab-setup.sh only updates PYTHONPATH that is why I asked to source it (for your testing) for python API tests. But there is no harm to source it always.

@belforte
Copy link
Member Author

belforte commented Jan 24, 2020 via email

@belforte
Copy link
Member Author

@smuzaffar Bockjoo updated cvmfs and now the command completion works.
(even if I could not figure out how it is done!)
But I noticed a discrepancy:
when using e.g. crab_prod, python will run with this $PYTHONPATH
/cvmfs/cms.cern.ch/share/cms/crab-prod/3.3.2001/lib
while after source /cvmfs/cms.cern.ch/common/crab-setup.sh it is set to
/cvmfs/cms.cern.ch/share/cms/crab/1.0/lib/prod

I presume this is intentional, even if a bit confusing.
So I have "discovered" that after sourcing crab_setup.sh I need to do
import CRABClient
in python, before I can do
from dbs.apis.dbsClient import DbsApi
If that is going to stay, we'll have to revisit a bit the documentation.

I am indeed intrigued by the magic in /cvmfs/cms.cern.ch/share/cms/crab/1.0/lib/crab-proxy-package. But since I was not expecting it, but imagining a more old fashioned $PYTHONPATH pointing to a lib directory with all dependencies in , like /cvmfs/cms.cern.ch/share/cms/crab-prod/3.3.2001/lib, I am now puzzled at how to make it possible to use DBS3 scripts like those pointed to in https://twiki.cern.ch/twiki/bin/view/CMSPublic/Crab3DataHandling#Changing_a_dataset_or_file_statu

was prepared to tell users to fetch them gitHub, or to clone them to CRABClient/bin or whatever (they are extremely stable), but how do we run them ? Somehow inserting import CRABClient in there does not look good.
I agree that this is going beyond the scope "make crab command work inside CMSW", my fault, I was wrongly assuming that it was going to come for free.

@smuzaffar
Copy link
Contributor

smuzaffar commented Jan 24, 2020

@belforte , cmsset_default.sh script, which is sourced when one login, had the functionality to source addtional group startup scripts ( https://github.com/cms-sw/cms-common/blob/master/cmsset_default.sh#L24-L33 ) . I made use of it and created /cvmfs/cms.cern.ch/share/etc/profile.d/S99crab-env.sh . S99crab-env.sh sources the crab-bash-completion.sh.

Yes the python path is intentional, crab-* command-line clients use lib directly from the latest crabclient installation. This is done as crab-* client do not pollute env and they can set PYTHONPATH to /cvmfs/cms.cern.ch/share/cms/crab-prod/3.3.2001/lib where we have few python packages e.g. dbs, Utils etc. While /cvmfs/cms.cern.ch/common/crab-setup.sh pollute env (by setting PYTHONPATH) so I did not want to have these dbs, Utils modules available by default. This can break env of any other package which needs dbs but a different version (which might not competible with version from crab).

For the same reason above, I only made CRABAPI and CRABClient available via PYTHONPATH (after sourcing crab-setup.sh). This distribution is suppose to provide CRAB* API and not dbs. So user should import CRAB* first :-)

@belforte
Copy link
Member Author

I understand. Thanks.
I will take care of DBS3SetDatasetStatus.py etc. via proper instructions in the twiki.
Hopefully we'll switch user produced files to Rucio this year and that will be obsolete.

@belforte
Copy link
Member Author

I have updated the spec for dev with the new CRABClient tag which should work with new WMCore.\cms-sw/cmsdist#5501
What's the procedure for testing that ? Will it appear automatically as crab-dev ?

@smuzaffar
Copy link
Contributor

I have started the test. For now tst only make usre that we can built it. If you can provide us a simple bash script to test few commands then we can include that in cmssw to make sure we can run new version of crab. shell script could just run few crab commands e.g.

crab-dev --version
crab-dev checkusername

you an also provide a python script to test CRAB Python API functionality.

By the way, why does crab always create crab.log file? Sometimes I run it from cvmfs direcotry and it fails to create this log file [a].

[a]
IOError: [Errno 30] Read-only file system: '/cvmfs/cms.cern.ch/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_1_0_pre2/crab.log'

@belforte
Copy link
Member Author

belforte commented Jan 27, 2020 via email

@belforte
Copy link
Member Author

there is no option in CRABCLient to disable the log file. Let me know if you need it. I think it could be rather easy via an env.var, otherwise I have to learn the part of CRAB code which parses command arguments and add an option.

@belforte
Copy link
Member Author

Everything is not working satisfactorely in my opinion, ad the new procedures to change, deploy and use CRAB Client are documented in the twiki
https://twiki.cern.ch/twiki/bin/view/CMSPublic/CMSCrabClient

Latest CRAB Client works with latest WMCore (1.2.9) and in crab report compares result from DBS Python API and DASGOCLIENT. When I ill be sure of my handling of dasgoclient, we can stop usig DBS API i the client, but keep as dependency for distribution.

Thanks all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants