Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job splitting for CRAB #253

Closed
AndreasAlbert opened this issue Mar 23, 2021 · 4 comments
Closed

Job splitting for CRAB #253

AndreasAlbert opened this issue Mar 23, 2021 · 4 comments

Comments

@AndreasAlbert
Copy link

Hi,

I'm trying to figure out how to run GoodnessOfFit toys over CRAB. I understand that I can run e.g. 25 toys in a single crab job like this:

combineTool.py \
    --job-mode crab3 \
    -M GoodnessOfFit \
    -d /path/to/card.root \
    -t 25 \
    # (other args)

This works fine. I would like to extend this though to be able to run many toys split over a number of jobs. I have tried two approaches to accomplish this:

  1. using the --merge argument. This does not seem to have any effect. I think this traces down to the fact that the combine tool thinks of my GoF command as "one entry in the job queue", rather than "25 independent entries".

  2. using the --custom-crab argument, and specifying config.Data.totalUnits = 50 for submission of e.g. 50 jobs. The submission works in this case, but the jobs fail because the script executed on the worker node tries to match the job ID to the job queue entries [1]. In the same vein as above, the jbo queue only has one entry here, so the script simply fails for all job IDs > 1.

Is there an existing well-defined way of doing this? If not, I can hotfix [1] for myself, but I'm not sure how to implement this in a sustainable way without creating spaghetti.

Any hints would be appreciated!

[1] https://github.com/cms-analysis/CombineHarvester/blob/master/CombineTools/python/combine/CombineToolBase.py#L308

@ajgilbert
Copy link
Collaborator

Does it work if you generate a set of jobs with different seeds, e.g. adding -s 1:50:1 ?

@AndreasAlbert
Copy link
Author

Thanks for this suggestion! While a bit unintuitive, it works fine.

If I wanted to implement this more cleanly (e.g. to assure that I never accidentally submit two sets of toys with the same seed), I assume I'd have to change EnhancedCombine.run_method to create dummy lists of subbed_vars entries like in [1]?

[1] https://github.com/cms-analysis/CombineHarvester/blob/master/CombineTools/python/combine/EnhancedCombine.py#L82-L85

@ajgilbert
Copy link
Collaborator

I think if the seed is specified in any way, it's always going to be difficult to protect against accidental reuse. One option would be to be use a random seed in each job (-s -1). This requires a slight syntax upgrade in combineTool.py, see #254. Would this work in your case?

@AndreasAlbert
Copy link
Author

Yes, that is exactly what I need! Thank you for thinking through it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants