Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No compression defined in SKIM output #37150

Closed
srimanob opened this issue Mar 6, 2022 · 8 comments
Closed

No compression defined in SKIM output #37150

srimanob opened this issue Mar 6, 2022 · 8 comments

Comments

@srimanob
Copy link
Contributor

srimanob commented Mar 6, 2022

It seems skim outputs does not have compression by default. This includes

  • output module which clones from AOD (e.g. EXOMONOPOLE)
  • new defined output module (e.g. BPHSkim)

cmsDriver.py step3 --conditions auto:run2_data -s RAW2DIGI,L1Reco,RECO,SKIM:EXOMONOPOLE+BPHSkim,PAT --datatier AOD,MINIAOD --eventcontent AOD,MINIAOD --data --process reRECO --scenario pp --era Run2_2018 --customise Configuration/DataProcessing/RecoTLR.customisePostEra_Run2_2018 --python MONOPOLE_cfg.py -n 500 --no_exec --filein file:root://eoscms.cern.ch//eos/cms/store/data/Run2018D/EGamma/RAW/v1/000/320/822/00000/1441432A-8997-E811-8BD0-FA163EA21B5C.root --fileout file:step3.root --nThreads 8

Comparing outputs from above cmsDriver with and without compression from Monopole:

  • Output without compression: 1.86 MB/event
  • Output with LZMA-level4: 1.66 MB/event
  • Output with ZSTD-level4: 1.74 MB/event

Should the compression be default for SKIM outputs?
Should the code be manage centrally (if possible), or each skim takes care for compression config?

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 6, 2022

A new Issue was created by @srimanob Phat Srimanobhas.

@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor

makortel commented Mar 7, 2022

It seems skim outputs does not have compression by default

Actually the default behavior of PoolOutputModule is to compress with ZLIB-9 (and there is no way to disable compression). Changing the default to ZSTD-4 was agreed in core software meeting two weeks ago (PR to be done soon) because it seems to be better than ZLIB-9 in all (tested) cases (like here).

I don't know the history, but I'd guess the default compression was deemed good enough for skims. How much do skims use disk (tape?) space compared e.g. to AOD? LZMA has a sizable computational cost (including connection to CPU efficiency), so that tradeoff would require a careful study.

@makortel
Copy link
Contributor

makortel commented Mar 7, 2022

assign core, pdmv

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 7, 2022

New categories assigned: core,pdmv

@bbilin,@wajidalikhan,@jordan-martins,@Dr15Jones,@smuzaffar,@makortel,@kskovpen you have been requested to review this Pull request/Issue and eventually sign? Thanks

@srimanob
Copy link
Contributor Author

srimanob commented Mar 7, 2022

Hi @makortel
Ah, thanks for clarification. The size of SKIM datasets vary from analysis-to-analysis. So the default with ZLIB-9 is the same as RECO, I presume. Then we may not need to do anything if it is handled by default.

@makortel
Copy link
Contributor

makortel commented Mar 7, 2022

Right, RECO uses the default as well

RECOEventContent = cms.PSet(
outputCommands = cms.untracked.vstring('drop *'),
splitLevel = cms.untracked.int32(0),
)

On the other hand, IIRC RECO has a rather limited lifetime (when produced on Tier0), whereas I'd imagine Skims to have longer lifetime. Even if I'd be in favor of "just using the default" for skims, I'm curious of any of them are stored on tape, or are they on disk only?

@srimanob
Copy link
Contributor Author

srimanob commented Mar 7, 2022

@srimanob
Copy link
Contributor Author

As clarification that compression is done by default, I think this ticket can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants