xAH and submission of multiple datasets #837

urania277 · 2017-03-11T12:23:25Z

Not having modified xAH_run yet (and unfortunately not having that much time to do it myself), I'm looking to see if anyone would be interested to implement this:

https://twiki.cern.ch/twiki/bin/view/AtlasProtected/EventLoop#Processing_multiple_datasets_in

Thanks!
Caterina

kratsg · 2017-03-11T12:28:37Z

This would be somewhat easily implemented in xAH_run.py but I guess it's a question of how we support this option since it's a very-very-very specific under prun. @kkrizka might have an idea.

kkrizka · 2017-03-14T07:19:25Z

How about adding an option to prun called --inputSingleTask name? Then the SampleHandler part of xAH_run.py (

xAODAnaHelpers/scripts/xAH_run.py

Line 315 in 83a5669

for fname in args.input_filename:

) will use the SH::SampleGrid(name) stuff instead of scanRucio.

This should work quite nicely for data at least. I have to think a bit how to deal with splitting the MC. We can either automatically create a separate output DS for each input DS, or add another option to xAH_run.py to enable the behaviour. @urania277, do you have a preference?

I will experiment with this tomorrow.

kkrizka · 2017-03-14T22:29:48Z

I put an example implementation at https://github.com/kkrizka/xAODAnaHelpers/tree/singlejeditask . It crates a list of datasets per input filelist when the --singleTask option is added to the prun command.

It almost works, except the first list of datasets is tries to submit crashes with the following.

Traceback (most recent call last): File "/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/pyAmi/pyAMI-5.0.3.2/lib/argparse/__init__.py", line 1194, in __eq__ return vars(self) == vars(other) TypeError: vars() argument must have __dict__ attribute TypeError: an integer is required

All subsequent lists are submitted correctly. More debugging is still needed.

beojan · 2017-12-14T15:05:51Z

Has there been any progress on this? The lack of period containers for many (all?) derivations in release 21 is making the one-task-per-dataset workflow quite painful, due to retries, and worse, resubmits if the pilot jobs for a task spuriously fail.

ntadej · 2017-12-14T15:09:14Z

Period containers should be available for all data periods after the re-derivation completes. Note that you usually have to request them by your derivation contacts.

kratsg · 2017-12-14T15:10:09Z

hi @beojan, xAH is open-sourced and xAH_run.py is written very nicely. If you think you can implement the feature or continue what @kkrizka to fix things -- it would be a good addition.

Unfortunately, many people are struggling with grid submissions like this for the same reason and there's not a lot one can do.

jbossios · 2017-12-14T15:49:41Z

Hi @beojan,

I just want to stress out that you could prepare your own period containers as well, but of course, it's quite a lot of work but it's feasible.

Best,
Jona

urania277 added the enhancement label Mar 11, 2017

kkrizka self-assigned this Jan 18, 2018

kkrizka mentioned this issue Jan 21, 2018

xAH and submission of multiple datasets #1140

Merged

kkrizka mentioned this issue Feb 7, 2019

TypeError: vars() argument must have __dict__ attribute #1316

Open

jbossios closed this as completed May 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xAH and submission of multiple datasets #837

xAH and submission of multiple datasets #837

urania277 commented Mar 11, 2017

kratsg commented Mar 11, 2017

kkrizka commented Mar 14, 2017

kkrizka commented Mar 14, 2017

beojan commented Dec 14, 2017

ntadej commented Dec 14, 2017

kratsg commented Dec 14, 2017

jbossios commented Dec 14, 2017

xAH and submission of multiple datasets #837

xAH and submission of multiple datasets #837

Comments

urania277 commented Mar 11, 2017

kratsg commented Mar 11, 2017

kkrizka commented Mar 14, 2017

kkrizka commented Mar 14, 2017

beojan commented Dec 14, 2017

ntadej commented Dec 14, 2017

kratsg commented Dec 14, 2017

jbossios commented Dec 14, 2017