-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend WMSpec for non-trivial user jobs #211
Comments
spiga: In order to support a real analysis use case we need to be able to handle user libraries and modules etc. All these stuff must be placed in the proper location on the runtime area in the WN. The piece of WMSpec handling this part need to be implemented. |
evansde: Can you elaborate a bit on what is needed here? |
ewv: Dave, I think that is about it. In addition to libs, we tar up data/ I think which may contain data files from users. Maybe we want to change that to make them specify each file individually? Probably the first step of this for me is to actually look in detail at what CRAB actually does now. |
evansde: Is data under the scram project area as well? scram project CMSSW CMSSW_X_Y_Z Tarring up the CMSSW_X_Y_Z dir will get you every local change, FileInPath settings will work and you dont really need to care too much about whats in there at all? (I implemented an input sandbox for the PA/RelVal stuff back in the day before patch releases using this and it worked pretty well, do users stray outside this much?) |
ewv: The problem with that is that you pack up all their src/ code too which is worthless. And with CRAB running the way it does, you also pack up any old CRAB output they have, so tarballs grow exponentially if you do the naive thing. So we ignore src/ and bin/ for sure and encourage people to submit from those directories. We should take this opportunity to re-evaluate of course, but we do have a formula that works pretty well and that users are used to. Let me do a little forensics and I'll post here exactly what CRAB is currently doing. |
ewv: Ok, what crab is doing that may need to be done is:
I think 2 & 4 are the bare minimum. Something like 3 will be necessary. CRAB2 also does things we definitely don't need to do: a) Ship MonaLisa All this is taken care of by existing WMCore infrastructure. |
spiga: Replying to [comment:8 ewv]:
thanks Eric, In my original post I meant exactly your 2, 3, 4. This is what I would have in CRAB_3 soon. Not sure if Dave agree, for what I remember we were in the same page with this, some time ago.
kind of. Don't remember exactly who asked to have it at the WN level.. If you want I can look for the related ticket/mail... personally I don't really see a use case.
|
ewv: Actually thinking about this, I think 4) is already taken care of by WMSpec, right. At least pulling the config out of a release is. It should be, at most, a minor change to use a user-supplied file. |
evansde: Couple of comments/questions: Agree that 2,3,4 need done. 4. is already done by the config cache stuff & sandbox building.
The WMCore code will be packaged by the agent by default until we stabilise enough to build the runtime RPMs from it. Any extras that Crab needs can be included in that as well down the road. |
ewv: Replying to [comment:11 evansde]:
We don't need the python because we ship the pickled config file, so it has already grabbed all the python objects it needs out of either the user's area or the system area. I assumed WMAgent was doing the same, but maybe not? In any case, I think we want to continue to do this as we have users (encourage them actually) to have python programs as config files. By that, I mean their config can have if statements (on the simple side) to talking to web services (on the complicated end). I don't think we want any of that going on on the worker node.
This is what I had in mind. We should probably eventually supply (or get users to supply) steps for all common actions. Maybe we supply a generic script step as well, but that should be extremely rare.
We could. I guess the question comes down to is the advantage of doing this (picking up un-needed data areas) worth re-training users? Now if they change a data/ file in their own area, the job uses it, just like if they change a src/ file and recompile. Personally I lean towards just packing up the data area as we do now. BTW, we do have a user-settable thing as well (I forgot to include that) for the odd file that they store that is not in a data area. We should keep this, whether it makes it into the first attempt or not.
Yup, agreed, which is why I put it in the "we don't need to do this category". Only there for completeness of what we do now. :-) |
evansde: Regarding config files: ConfigCache can support python or pickled files, the interesting bit with the python code in the scram area is that you can decouple configs from a release and use the same one for several projects if you like. (Eg: the Conf/DataProcessing stuff makes the top level config look really lightweight for some of the standard things) Probably not a first order kind of implementation, but may come in handy down the road... (Could also completely bypass the config cache and stuff the pickled config in the sandbox if needed, lots of freedom in the new system, probably need to pick a first pass and work from that) |
ewv: I'm making decent progress on this but before I get too far in, I want to make sure I'm not doing something too restrictive. First, there is already a field in the Step for this. It's actually a list of sandboxes. Is there any reason it should be a list and not a single one? Does changing it mean a change to the underlying database? Second, is there any reason that different steps in the job would need different sandboxes. Do we envision workflows like this? |
evansde: Replying to [comment:14 ewv]:
Basically it can be a list of URLs or something like that. Could be reduced to a single file if needed.
No specific workflow in mind, just making it easy to be flexible. Eg: things like Madgraph etc seem to have a bunch of input sandboxes so it seemed prudent, but nothing concrete. |
ewv: Replying to [comment:15 evansde]:
Ok, what I'll do for now is keep the structure as a list but keep the implementation as a single local file. Should we later need to expand this, it should be easier. If we're really going to do URLs (which makes sense for MadGraph probably since parts of it are centralized) then a untarUser.py script makes a lot more sense than my 5 lines of bash that I have now.
OK. Different sand-boxes for different steps is also a minor perturbation for what I'm doing on the WN. |
ewv: Please review the patch. Multiple tarballs per step are not supported throughout, but some of the necessary code is in place. |
sfoulkes: I think it's better if the userSandbox parameter in StdBase defaulted to "None" so that we don't get a user sandbox for production jobs. Same goes in the Analysis Spec, if the user didn't pass in a sandbox we shouldn't be setting one for them. I'm not sure that you're untarring the sandbox in the correct place. On the WN, CMSSW is run out of ./job/WMTaskSpace/stepName (I think), it looks like you're just untarring the sandbox in the base directory there. |
ewv: Yeah, definitely the default should be changed. It is getting untarred in the right place. The job sandbox untars in job/ and the CWD at the time of the untar is CMSSW_x_y_z |
sfoulkes: Couple more issues:
You'll need to update your tree as well, Evans made some changes for the spec stuff to support multicore CMSSW and that caused your patch to not apply. |
ewv: Replying to [comment:21 sfoulkes]:
No, because the same code sets it as a default to [] and ','.join([]) is '' which is exactly what I want. And setUserSandbox(None) returns, doesn't actually replace the list with None |
ewv: Sorry formatting issue. Should read "is a blank string which is exactly..." |
sfoulkes: You're right, i'll apply the patch. |
ewv: BTW, if you want to enable this in the injectAnalysis script you can just
which is a CRAB-produced tarball. I didn't check in the injectAnalysis since I couldn't easily produce a clean patch with just that one change. |
Ok, take the 2nd patch on top of the first.
The text was updated successfully, but these errors were encountered: