Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xAH and submission of multiple datasets #837

Closed
urania277 opened this issue Mar 11, 2017 · 7 comments
Closed

xAH and submission of multiple datasets #837

urania277 opened this issue Mar 11, 2017 · 7 comments
Assignees

Comments

@urania277
Copy link
Contributor

Not having modified xAH_run yet (and unfortunately not having that much time to do it myself), I'm looking to see if anyone would be interested to implement this:

https://twiki.cern.ch/twiki/bin/view/AtlasProtected/EventLoop#Processing_multiple_datasets_in

Thanks!
Caterina

@kratsg
Copy link
Contributor

kratsg commented Mar 11, 2017

This would be somewhat easily implemented in xAH_run.py but I guess it's a question of how we support this option since it's a very-very-very specific under prun. @kkrizka might have an idea.

@kkrizka
Copy link
Contributor

kkrizka commented Mar 14, 2017

How about adding an option to prun called --inputSingleTask name? Then the SampleHandler part of xAH_run.py (

for fname in args.input_filename:
) will use the SH::SampleGrid(name) stuff instead of scanRucio.

This should work quite nicely for data at least. I have to think a bit how to deal with splitting the MC. We can either automatically create a separate output DS for each input DS, or add another option to xAH_run.py to enable the behaviour. @urania277, do you have a preference?

I will experiment with this tomorrow.

@kkrizka
Copy link
Contributor

kkrizka commented Mar 14, 2017

I put an example implementation at https://github.com/kkrizka/xAODAnaHelpers/tree/singlejeditask . It crates a list of datasets per input filelist when the --singleTask option is added to the prun command.

It almost works, except the first list of datasets is tries to submit crashes with the following.

Traceback (most recent call last): File "/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/pyAmi/pyAMI-5.0.3.2/lib/argparse/__init__.py", line 1194, in __eq__ return vars(self) == vars(other) TypeError: vars() argument must have __dict__ attribute TypeError: an integer is required

All subsequent lists are submitted correctly. More debugging is still needed.

@beojan
Copy link
Contributor

beojan commented Dec 14, 2017

Has there been any progress on this? The lack of period containers for many (all?) derivations in release 21 is making the one-task-per-dataset workflow quite painful, due to retries, and worse, resubmits if the pilot jobs for a task spuriously fail.

@ntadej
Copy link
Contributor

ntadej commented Dec 14, 2017

Period containers should be available for all data periods after the re-derivation completes. Note that you usually have to request them by your derivation contacts.

@kratsg
Copy link
Contributor

kratsg commented Dec 14, 2017

hi @beojan, xAH is open-sourced and xAH_run.py is written very nicely. If you think you can implement the feature or continue what @kkrizka to fix things -- it would be a good addition.

Unfortunately, many people are struggling with grid submissions like this for the same reason and there's not a lot one can do.

@jbossios
Copy link
Contributor

Hi @beojan,

I just want to stress out that you could prepare your own period containers as well, but of course, it's quite a lot of work but it's feasible.

Best,
Jona

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants