Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SKIMS in Conf data processing 75 #10227

Merged

Conversation

franzoni
Copy link

FW port of #10226

Includes a new key (PhysicsSkims) to allow inclusion of SKIM: in the prompt processing, in the same step as RECO
added a new test which probes such new key for "ppRun2" "ppRun2B0T" scenarios
@cerminar

@hufnagel : can you please take a look and comment ? The integration with T0 needs to follow the same logic of the ALCA, using the skim matrix with @PRIMARYDATASET syntax. The committed example in run_CfgTest.sh demostrates the principle

@sextonkennedy : we spoke about this, do you have comments ?

@franzoni
Copy link
Author

please test

@cmsbuild cmsbuild added this to the Next CMSSW_7_5_X milestone Jul 15, 2015
@cmsbuild
Copy link
Contributor

The tests are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @franzoni (Giovanni Franzoni) for CMSSW_7_5_X.

SKIMS in Conf data processing 75

It involves the following packages:

Configuration/DataProcessing

@cmsbuild, @franzoni, @davidlange6 can you please review it and eventually sign? Thanks.
@ghellwig, @Martin-Grunewald this is something you requested to watch as well.
You can sign-off by replying to this message having '+1' in the first line of your reply.
You can reject by replying to this message having '-1' in the first line of your reply.
If you are a L2 or a release manager you can ask for tests by saying 'please test' in the first line of a comment.
@Degano you are the release manager for this.
You can merge this pull request by typing 'merge' in the first line of your comment.

@franzoni
Copy link
Author

+1
scram build runtests : passed ok

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_7_5_X IBs once checked with relvals in the development release cycle of CMSSW or unless it breaks tests. This pull request requires discussion in the ORP meeting before it's merged. @davidlange6, @Degano, @smuzaffar

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_7_5_X IBs once checked with relvals in the development release cycle of CMSSW (tests are also fine). This pull request requires discussion in the ORP meeting before it's merged. @davidlange6, @Degano, @smuzaffar

@sextonkennedy
Copy link
Member

Hi @franzoni ,
I do have some comments for you. First Dirk is in AWS class at FNAL right now. I will try to make sure this doesn't fall in the cracks of his travel. He did promise to meet with me on this topic the last time we were both at CERN.

@sextonkennedy
Copy link
Member

The second comment I have comes from my notes on the conversation I had with Dirk about this. You've done part of what he suggested except what he called "PromptSkims" you call "PhysicsSkims" which is fine. What I don't see (and maybe I just missed it), is the place where the filter sequence you define is used to configure an output module that uses that filter sequence to select events. That's probably because that code is in the T0 code here: https://github.com/hufnagel/T0/blob/master/src/python/T0/RunConfig/RunConfigAPI.py#L432
where you see we need the association between the input primaryDataset, the SKIMname derived from that PD, the selectEvents sequence to use, and the DataTier to output where DataTiers are one of the set RECO, AOD, miniAOD, or a custom USER data tier. The USER tier would be hardest to deal with so hopefully the physics skims can use a standard tier? Once there is a dictionary that defines these associations the Tier0 code would just select which skim to run for a particular scenario, the code in the CMSSW release just provides the list of possibilities that the Tier0 code selects from. Does this make sense?

@franzoni
Copy link
Author

hello @sextonkennedy @hufnagel ,

is the place where the filter sequence you define is used to configure an output module that uses that
filter sequence to select events.

The ConfigBulder,
in presence of a step " -s SKIM:mySkim", takes care of

  1. adding output modules the the process, and
  2. configuring them to write the events which pass the skim filters, assign the datatier and the skim name,
    see the example:
    https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/d6c6667623f5c3684515fd8d85495a99/configFile of a recent test in relvals ( https://hypernews.cern.ch/HyperNews/CMS/get/dataopsrequests/7590.html )
    For wmagent, this has been suffiient to get the skims out, with correct naming and data tier, business as usual.

The USER tier would be hardest to deal with so hopefully the physics skims can use a standard tier?
Agreed.
Accurate data tier determination is taken care by the configuration of the skims as they are.

In light of what the ConfigBulder does, is this:

Once there is a dictionary that defines these associations the Tier0 code would just select which
skim to run for a particular scenario

actually necessary ? I'd imagine not.

I can find some more time to work on this today. I'd like to be done by tomorrow morning, for what is needed by the PdmV/PPD side,. Can we iterate, if needed ?

Cheers,
g.

@hufnagel
Copy link

That doesn't work for the Tier0. Tier0 is not WMAgent. WMAgent scans the configuration in ConfigCache and extract output module definitions. Tier0 constructs the config at runtime based on the information it passed into Config.DP.

Anything that happens behind the scene in ConfigBuilder to add "magic" output module parameters is a no-go for the Tier0. I need to know upfront what the output module parameters are going to be, at least at the level of the outputs dictionary syntax passed to ConfigBuilder supports.

@hufnagel
Copy link

After looking at the code, adding a physicskim step should NOT add any output modules. I need to add them myself from the Tier0 side by passing them in via the outputs dict that is passed to ConfigBuilder. That is the only supported way in the Tier0. Which means there needs to be some set if ConfigBuilder output parameters (dataTier0, eventContent, selectEvents) that gives me what we want for this output.

@cerminar
Copy link
Contributor

@hufnagel this is funny: we implement this to circumvent Tier0 shortcomings and stumble in new ones.
You have access to the CMSSW repository feel free to solve the problem. What you will get from us is the name of the skims (and nothing more). All the rest of the information to run them via WMAgent is in the release. The complicated dictionaries that you need to maintain need to be hidden somehow on your side.

@davidlange6
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants