Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow mimicing UL MC production where HLT was run in a separate step #37564

Closed
makortel opened this issue Apr 14, 2022 · 11 comments
Closed

Workflow mimicing UL MC production where HLT was run in a separate step #37564

makortel opened this issue Apr 14, 2022 · 11 comments

Comments

@makortel
Copy link
Contributor

The Ultra Legacy MC production ran HLT step in a seperate step of the workflow because it had to be run using a different CMSSW release than other steps of the workflow. This lead to inefficiencies reported e.g. in https://indico.cern.ch/event/1078746/#16-improving-efficiency-of-hlt (the talk also discusses a way to improve the efficiency).

In order to work on improving the efficiency of such a setup, it is necessary to have workflow(s) defined in runTheMatrix that run HLT in a separate step similar to UL MC production (although staying in the same CMSSW release is probably sufficient in this case).

@makortel
Copy link
Contributor Author

assign pdmv

@cmsbuild
Copy link
Contributor

New categories assigned: pdmv

@bbilin,@wajidalikhan,@jordan-martins,@kskovpen you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

@cms-sw/pdmv-l2 Could you help in defining such a workflow (or set of workflows) in runTheMatrix, please?

@kskovpen
Copy link
Contributor

Sure! That's a very interesting study. Here comes the PR.

@srimanob
Copy link
Contributor

srimanob commented Apr 17, 2022

Hi @makortel
Do I understand correctly that we need 2nd file solution?

Is that possible in our production system? It's possible for private, for sure.
I remember that the way we chose for UL because of the 2nd file solution in production was not available. But that was 4 years ago.

But if this to test in IB, I think it is a good idea and should be possible as we deal with the filename directly.

@kskovpen
Copy link
Contributor

Here comes a new version of the PR.

@makortel
Copy link
Contributor Author

@srimanob

Do I understand correctly that we need 2nd file solution?

Correct, this approach requires the RECO step to use the so-called "2 file solution".

Is that possible in our production system? It's possible for private, for sure. I remember that the way we chose for UL because of the 2nd file solution in production was not available. But that was 4 years ago.

I can't really comment on that (should be asked from WM and ComOps). Our primary goal is to do this in StepChain in a way that is transparent to the WM. We'd just need WM to support specific customizations on the workflow steps when run in StepChain (see dmwm/WMCore#10819).

I recall @nsmith- mentioning at some point that this approach could be beneficial for TaskChain as well.

But if this to test in IB, I think it is a good idea and should be possible as we deal with the filename directly.

Testing in IB is indeed the first step. Testing and deploying in grid would need dmwm/WMCore#10819 to be resolved, but we didn't want to wait on that to get started.

(in general we also think it would be very good to test in IBs the exact workflow setup(s) we run in production)

@makortel
Copy link
Contributor Author

Thanks @kskovpen for #37633. I want to follow up on the question of the data tier for the HLT-only step, mostly to clear up my confusion. What data tier was used for the separate HLT step in UL production?

@kskovpen
Copy link
Contributor

Hi @makortel , the HLT step in UL has GEN-SIM-RAW as data tier and RAWSIM as event content. On a related topic, I didn't manage to drop SIM from the output of the HLT step - it seems that there are some rivet routines using SIM information when running the RECO step. Moreover, the order of files in --filein and --secondfilein surprisingly matters.

@makortel
Copy link
Contributor Author

Workflow was added in #37633

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants