Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend WMSpec for non-trivial user jobs #211

Closed
spigad opened this issue Aug 28, 2010 · 22 comments
Closed

Extend WMSpec for non-trivial user jobs #211

spigad opened this issue Aug 28, 2010 · 22 comments
Assignees

Comments

@spigad
Copy link
Member

spigad commented Aug 28, 2010

Ok, take the 2nd patch on top of the first.

@spigad
Copy link
Member Author

spigad commented Nov 16, 2010

spiga: In order to support a real analysis use case we need to be able to handle user libraries and modules etc.

All these stuff must be placed in the proper location on the runtime area in the WN. The piece of WMSpec handling this part need to be implemented.

@evansde77
Copy link

evansde: Can you elaborate a bit on what is needed here?
At some point this should reduce to a tar/untar of a scram project area containing libs, files etc.
How much more detail about it is needed? (Could you outline the current crab process for my enlightenment?)

@ericvaandering
Copy link
Member

ewv: Dave, I think that is about it. In addition to libs, we tar up data/ I think which may contain data files from users. Maybe we want to change that to make them specify each file individually? Probably the first step of this for me is to actually look in detail at what CRAB actually does now.

@evansde77
Copy link

evansde: Is data under the scram project area as well?
Seems like if a user does:

scram project CMSSW CMSSW_X_Y_Z
cd CMSSW_X_Y_Z
#meddle with stuff...

Tarring up the CMSSW_X_Y_Z dir will get you every local change, FileInPath settings will work and you dont really need to care too much about whats in there at all? (I implemented an input sandbox for the PA/RelVal stuff back in the day before patch releases using this and it worked pretty well, do users stray outside this much?)

@ericvaandering
Copy link
Member

ewv: The problem with that is that you pack up all their src/ code too which is worthless. And with CRAB running the way it does, you also pack up any old CRAB output they have, so tarballs grow exponentially if you do the naive thing. So we ignore src/ and bin/ for sure and encourage people to submit from those directories.

We should take this opportunity to re-evaluate of course, but we do have a formula that works pretty well and that users are used to.

Let me do a little forensics and I'll post here exactly what CRAB is currently doing.

@ericvaandering
Copy link
Member

ewv: Ok, what crab is doing that may need to be done is:

  1. Check to see if the executable is part of the release. Usually this is cmsRun and is not put in the sandbox. For our initial stuff, this will DEFINITELY be cmsRun. I don't know if at some point we plan to support arbitrary scripts as jobs
  2. tar up the user's lib/ and module/ (CMSSW_3_6_3/lib and CMSSW_3_6_3/module)
  3. crawl through CMSSW_3_6_3/src/ looking for directories named data/. Tar if found
  4. tar the users CMSSW.py and CMSSW.pkl file
  5. We tar the crab.cfg file for some reason. To let the remote site try to understand what the user is doing?

I think 2 & 4 are the bare minimum. Something like 3 will be necessary.

CRAB2 also does things we definitely don't need to do:

a) Ship MonaLisa
b) Ship various ProdCommon, IMProv, and WMCore stuff
c) ship various CRAB utilities like writeCfg, cmscp, etc.

All this is taken care of by existing WMCore infrastructure.

@spigad
Copy link
Member Author

spigad commented Nov 16, 2010

spiga: Replying to [comment:8 ewv]:

Ok, what crab is doing that may need to be done is:

  1. Check to see if the executable is part of the release. Usually this is cmsRun and is not put in the sandbox. For our initial stuff, this will DEFINITELY be cmsRun. I don't know if at some point we plan to support arbitrary scripts as jobs
  2. tar up the user's lib/ and module/ (CMSSW_3_6_3/lib and CMSSW_3_6_3/module)
  3. crawl through CMSSW_3_6_3/src/ looking for directories named data/. Tar if found
  4. tar the users CMSSW.py and CMSSW.pkl file

thanks Eric, In my original post I meant exactly your 2, 3, 4. This is what I would have in CRAB_3 soon. Not sure if Dave agree, for what I remember we were in the same page with this, some time ago.

  1. We tar the crab.cfg file for some reason. To let the remote site try to understand what the user is doing?

kind of. Don't remember exactly who asked to have it at the WN level.. If you want I can look for the related ticket/mail... personally I don't really see a use case.

I think 2 & 4 are the bare minimum. Something like 3 will be necessary.

CRAB2 also does things we definitely don't need to do:

a) Ship MonaLisa
b) Ship various ProdCommon, IMProv, and WMCore stuff
c) ship various CRAB utilities like writeCfg, cmscp, etc.

All this is taken care of by existing WMCore infrastructure.

@ericvaandering
Copy link
Member

ewv: Actually thinking about this, I think 4) is already taken care of by WMSpec, right. At least pulling the config out of a release is. It should be, at most, a minor change to use a user-supplied file.

@evansde77
Copy link

evansde: Couple of comments/questions:

Agree that 2,3,4 need done. 4. is already done by the config cache stuff & sandbox building.

  1. does lib & module get all the python stuff (configs etc?)
  2. If the analysis type is a CMSSW thing, then it will be cmsRun, if we want to support others, we add steps to support them explicitly which will then implement stuff to find the binary (Eg: FWLite, or a Script type, guessing the script stuff will cover madgraph etc)
  3. If you default to picking up the lib & module, you could probably just add a user settable list of extras to include in the sandbox (this could handle the data use case as well?)

The WMCore code will be packaged by the agent by default until we stabilise enough to build the runtime RPMs from it. Any extras that Crab needs can be included in that as well down the road.

@ericvaandering
Copy link
Member

ewv: Replying to [comment:11 evansde]:

Couple of comments/questions:

Agree that 2,3,4 need done. 4. is already done by the config cache stuff & sandbox building.

  1. does lib & module get all the python stuff (configs etc?)

We don't need the python because we ship the pickled config file, so it has already grabbed all the python objects it needs out of either the user's area or the system area. I assumed WMAgent was doing the same, but maybe not? In any case, I think we want to continue to do this as we have users (encourage them actually) to have python programs as config files. By that, I mean their config can have if statements (on the simple side) to talking to web services (on the complicated end). I don't think we want any of that going on on the worker node.

  1. If the analysis type is a CMSSW thing, then it will be cmsRun, if we want to support others, we add steps to support them explicitly which will then implement stuff to find the binary (Eg: FWLite, or a Script type, guessing the script stuff will cover madgraph etc)

This is what I had in mind. We should probably eventually supply (or get users to supply) steps for all common actions. Maybe we supply a generic script step as well, but that should be extremely rare.

  1. If you default to picking up the lib & module, you could probably just add a user settable list of extras to include in the sandbox (this could handle the data use case as well?)

We could. I guess the question comes down to is the advantage of doing this (picking up un-needed data areas) worth re-training users? Now if they change a data/ file in their own area, the job uses it, just like if they change a src/ file and recompile.

Personally I lean towards just packing up the data area as we do now.

BTW, we do have a user-settable thing as well (I forgot to include that) for the odd file that they store that is not in a data area. We should keep this, whether it makes it into the first attempt or not.

The WMCore code will be packaged by the agent by default until we stabilise enough to build the runtime RPMs from it. Any extras that Crab needs can be included in that as well down the road.

Yup, agreed, which is why I put it in the "we don't need to do this category". Only there for completeness of what we do now. :-)

@evansde77
Copy link

evansde: Regarding config files:

ConfigCache can support python or pickled files, the interesting bit with the python code in the scram area is that you can decouple configs from a release and use the same one for several projects if you like. (Eg: the Conf/DataProcessing stuff makes the top level config look really lightweight for some of the standard things)
Eg: someone has a doReco.py config in configcache, but customises the event content in their scram area.
Or: Rerun myanalysis.py but with some cut changed in the scram stuff.

Probably not a first order kind of implementation, but may come in handy down the road...
The config cache stuff should make multicrab style operations on the server that bit easier, since you go same config, same sandbox over multiple datasets, and can add datasets after the initial submission if the interface supports it.

(Could also completely bypass the config cache and stuff the pickled config in the sandbox if needed, lots of freedom in the new system, probably need to pick a first pass and work from that)

@ericvaandering
Copy link
Member

ewv: I'm making decent progress on this but before I get too far in, I want to make sure I'm not doing something too restrictive.

First, there is already a field in the Step for this. It's actually a list of sandboxes. Is there any reason it should be a list and not a single one? Does changing it mean a change to the underlying database?

Second, is there any reason that different steps in the job would need different sandboxes. Do we envision workflows like this?

@evansde77
Copy link

evansde: Replying to [comment:14 ewv]:

I'm making decent progress on this but before I get too far in, I want to make sure I'm not doing something too restrictive.

First, there is already a field in the Step for this. It's actually a list of sandboxes. Is there any reason it should be a list and not a single one? Does changing it mean a change to the underlying database?

Basically it can be a list of URLs or something like that. Could be reduced to a single file if needed.
No DB stores this based on just the spec information so its pretty free form.

Second, is there any reason that different steps in the job would need different sandboxes. Do we envision workflows like this?

No specific workflow in mind, just making it easy to be flexible. Eg: things like Madgraph etc seem to have a bunch of input sandboxes so it seemed prudent, but nothing concrete.

@ericvaandering
Copy link
Member

ewv: Replying to [comment:15 evansde]:

Replying to [comment:14 ewv]:

I'm making decent progress on this but before I get too far in, I want to make sure I'm not doing something too restrictive.

First, there is already a field in the Step for this. It's actually a list of sandboxes. Is there any reason it should be a list and not a single one? Does changing it mean a change to the underlying database?

Basically it can be a list of URLs or something like that. Could be reduced to a single file if needed.
No DB stores this based on just the spec information so its pretty free form.

Ok, what I'll do for now is keep the structure as a list but keep the implementation as a single local file. Should we later need to expand this, it should be easier. If we're really going to do URLs (which makes sense for MadGraph probably since parts of it are centralized) then a untarUser.py script makes a lot more sense than my 5 lines of bash that I have now.

Second, is there any reason that different steps in the job would need different sandboxes. Do we envision workflows like this?

No specific workflow in mind, just making it easy to be flexible. Eg: things like Madgraph etc seem to have a bunch of input sandboxes so it seemed prudent, but nothing concrete.

OK. Different sand-boxes for different steps is also a minor perturbation for what I'm doing on the WN.

@ericvaandering
Copy link
Member

ewv: Please review the patch. Multiple tarballs per step are not supported throughout, but some of the necessary code is in place.

@sfoulkes
Copy link

sfoulkes: I think it's better if the userSandbox parameter in StdBase defaulted to "None" so that we don't get a user sandbox for production jobs. Same goes in the Analysis Spec, if the user didn't pass in a sandbox we shouldn't be setting one for them.

I'm not sure that you're untarring the sandbox in the correct place. On the WN, CMSSW is run out of ./job/WMTaskSpace/stepName (I think), it looks like you're just untarring the sandbox in the base directory there.

@ericvaandering
Copy link
Member

ewv: Yeah, definitely the default should be changed.

It is getting untarred in the right place. The job sandbox untars in job/ and the CWD at the time of the untar is CMSSW_x_y_z

@sfoulkes
Copy link

sfoulkes: Couple more issues:

  • Fix the comment in WMCore/WMSpec/Steps/Templates/CMSSW.py for setUserSandbox()
  • The code in WMCore/WMSpec/Steps/Executors/CMSSW.py in execute() will crash if a user sandbox isn't set.

You'll need to update your tree as well, Evans made some changes for the spec stuff to support multicore CMSSW and that caused your patch to not apply.

@ericvaandering
Copy link
Member

ewv: Replying to [comment:21 sfoulkes]:

  • The code in WMCore/WMSpec/Steps/Executors/CMSSW.py in execute() will crash if a user sandbox isn't set.

No, because the same code sets it as a default to [] and ','.join([]) is '' which is exactly what I want. And setUserSandbox(None) returns, doesn't actually replace the list with None

@ericvaandering
Copy link
Member

ewv: Sorry formatting issue. Should read "is a blank string which is exactly..."

@sfoulkes
Copy link

sfoulkes: You're right, i'll apply the patch.

@ericvaandering
Copy link
Member

ewv: BTW, if you want to enable this in the injectAnalysis script you can just

        "userSandbox" : '/uscms/home/ewv/crab-test/CMSSW_3_6_1_patch7/default.tgz',

which is a CRAB-produced tarball. I didn't check in the injectAnalysis since I couldn't easily produce a clean patch with just that one change.

@ghost ghost assigned ericvaandering Jul 24, 2012
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants