Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't handle column types redundantly anymore #401

Closed
wants to merge 9 commits into from
Closed

Don't handle column types redundantly anymore #401

wants to merge 9 commits into from

Conversation

guitargeek
Copy link

Hi NanoAOD devs!

I hope this is the right cmssw fork and branch for this PR.

Yesterday I wanted to introduce some new column types in my private nanoAOD productions (int16_t for example, to save a bit of space) and use them in the flat table producers. However, I realized that there are many parts of the NanoAOD code which have to be tweaked if you want to do this, as the way how column types are handled is not completely trivial.

One source of complication is that when you add a column to a flat table with addColumn(), you have to pass the type as a template argument as well as an enum value in the function parameters. After working a bit with the code, I understood that this is redundant, because the check_type function makes sure you always use the right enum value with the right template parameter. Therefore, we could just drop this enum parameter and deduce it from the template argument. In this situation, we would also not need check_type anymore.

The only tricky part are bool columns, which should actually be represented by a uint8_t vector. So far, the logic to take care of this had to be implemented in the plugins that made use of the FlatTable class, but I think I found a way to have this logic directly in the FlatTable class so one can just use addColumn<bool> to create bool columns and they will be internally stored in the uint8_t vector.

What do you think? This simplifies the type handling already quite a bit, and I think it's the good path towards a FlatTable class that will support all basic types that you can also store in TTrees.

I tested this with the local matrix tests so far, can the nano-bot tests still be done here? That would be very cool!

Thanks for considering this and cheers,
Jonas

@guitargeek
Copy link
Author

Since there is no bot here to mention the conveners: hi @peruzzim and @fgolf

@arizzi
Copy link

arizzi commented Sep 24, 2019

hi,
without looking at the code, did you check that it also still works with the nanoedm + merge step as performed in production?

triggering the bot meanwhile

@gpetruc-bot
Copy link

@guitargeek
Copy link
Author

guitargeek commented Sep 24, 2019

Hi @arizzi, yes I think that's what I did. I tested by stripping down the nano configs down to just the GenPart table for a quick check, and at the end of my cfg I did:

process.out = cms.OutputModule("NanoAODOutputModule",
    fileName = cms.untracked.string('out.root'),
    outputCommands = process.NanoAODEDMEventContent.outputCommands,
    compressionLevel = cms.untracked.int32(9),
    compressionAlgorithm = cms.untracked.string("LZMA"),
    dataset = cms.untracked.PSet(
        dataTier = cms.untracked.string('NANOAODSIM'),
        filterName = cms.untracked.string('')
    ),
)

It that what you mean?

@gpetruc-bot
Copy link

Please update PhysicsTools/NanoAOD/python/nanoDQM_cfi.py: take this patch or run prepareDQM.py -d -u nano_file_mc.root, and then if needed adjust the plot range using some human common sense.

@arizzi
Copy link

arizzi commented Sep 24, 2019

nope, I mean using PoolOutputModule rather than NanoAodOutputModule and then convert to actual (flat) nano in the merge step.

@guitargeek
Copy link
Author

guitargeek commented Sep 24, 2019

Sorry for not being experienced with this yet. Alright I swapped NanoAODOutputModule for PoolOutputModule in my config and have now a root file that contains the flat table object as expected. What exactly do you mean by merge step? Is that something in CMSSW? Just one link pointing to some short example or explanation would be great, thanks in advance!

@arizzi
Copy link

arizzi commented Sep 24, 2019

https://github.com/cms-sw/cmssw/blob/master/Configuration/DataProcessing/python/Merge.py

@gpetruc-bot
Copy link

gpetruc-bot commented Sep 24, 2019 via email

@guitargeek
Copy link
Author

guitargeek commented Sep 24, 2019

Okay I used the mergeProcess function to dump a little config:

from Configuration.DataProcessing.Merge import mergeProcess

process = mergeProcess(
    ["file:out1.root", "file:out2.root", "file:out3.root"],
    process_name = "Merge",
    output_file = "Merged.root",
    output_lfn = None,
    newDQMIO = False,
    mergeNANO = True,
    bypassVersionCheck = False)

print(process.dumpPython())

It worked fine, and in the end I got once more a "flat ntuple".

Copy link

@gpetruc-bot gpetruc-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automatic test report for 1112208

Code integration

Code checks passed for this PR

Please update PhysicsTools/NanoAOD/python/nanoDQM_cfi.py: take this patch or run prepareDQM.py -d -u nano_file_mc.root, and then if needed adjust the plot range using some human common sense.

Tests

  • Long test data102X (10000 events): passed, no significant changes; dqm plots: all, diff
  • Long test data106X (9000 events): passed, no significant changes; dqm plots: all, diff
  • Long test data80X (10000 events): passed, no significant changes; dqm plots: all, diff
  • Long test data80Xhip (3000 events): passed, no significant changes; dqm plots: all, diff
  • Long test data94X (10000 events): passed, no significant changes; dqm plots: all, diff
  • Long test data94X2016 (10000 events): passed, no significant changes; dqm plots: all, diff
  • Long test data94Xv2 (10000 events): passed, no significant changes; dqm plots: all, diff
  • Long test mc102X (9000 events): passed, no significant changes; dqm plots: all, diff
  • Long test mc106X (9000 events): passed, no significant changes; dqm plots: all, diff
  • Long test mc80X (10000 events): passed, no significant changes; dqm plots: all, diff
  • Long test mc94X (10000 events): passed, no significant changes; dqm plots: all, diff
  • Long test mc94X2016 (9000 events): passed, no significant changes; dqm plots: all, diff
  • Long test mc94Xv2 (9000 events): passed, no significant changes; dqm plots: all, diff
  • Test mc_94Xv2: passed
  • Test mc_102X: passed
  • Test data_94X: passed
  • Test data_102X: passed

Disk size report

Sample kb/event ref kb/event diff
TTbar MC 102X 1.831 1.831 0.000 ( +0.0% )
TTbar MC 94Xv1 1.924 1.924 0.000 ( +0.0% )
TTbar MC 94Xv2 1.956 1.956 0.000 ( +0.0% )
TTbar MC 94X2016 1.745 1.746 -0.000 ( -0.0% )
TTbar MC 80X 1.902 1.900 0.001 ( +0.1% )
Data 102X 0.963 0.963 0.000 ( +0.0% )
Data 94Xv1 0.913 0.913 -0.000 ( -0.0% )
Data 80X 0.793 0.793 -0.000 ( -0.0% )
Data 80X, Mu Run2016E 0.775 0.775 0.000 ( +0.1% )

@gpetruc-bot
Copy link

@guitargeek
Copy link
Author

Hi @mariadalfonso, the commits here are not exactly the same as in the cms-sw PR (cms-sw#30436). There is just one additional commit from cms-sw#30273 on which my developments depended. I think we should also backport this boost commit in cms-sw together with the nano-types PR to really keep the nano-types changes nicely in sync.

@gpetruc-bot
Copy link

Copy link

@gpetruc-bot gpetruc-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automatic test report for 1812488

Code integration

Code checks passed for this PR

Code format passed for this PR

Tests

  • Long test data102X (10000 events): passed, no significant changes; dqm plots: all, diff
  • Long test data106Xul17 (9000 events): passed, no significant changes; dqm plots: all, diff
  • Long test data106Xul18 (9000 events): passed, no significant changes; dqm plots: all, diff
  • Long test data80X (10000 events): passed, no significant changes; dqm plots: all, diff
  • Long test data94X (10000 events): passed, no significant changes; dqm plots: all, diff
  • Long test data94X2016 (10000 events): passed, no significant changes; dqm plots: all, diff
  • Long test data94Xv2 (10000 events): passed, no significant changes; dqm plots: all, diff
  • Long test mc102X (9000 events): passed, no significant changes; dqm plots: all, diff
  • Long test mc106Xul16 (9000 events): passed, no significant changes; dqm plots: all, diff
  • Long test mc106Xul17 (9000 events): passed, with differences; dqm plots: all, diff
  • Long test mc106Xul18 (9000 events): passed, no significant changes; dqm plots: all, diff
  • Long test mc80X (10000 events): passed, no significant changes; dqm plots: all, diff
  • Long test mc94X (10000 events): passed, no significant changes; dqm plots: all, diff
  • Long test mc94X2016 (9000 events): passed, no significant changes; dqm plots: all, diff
  • Long test mc94Xv2 (9000 events): passed, no significant changes; dqm plots: all, diff
  • Test mc_94Xv2: passed
  • Test mc_102X: passed
  • Test data_94X: passed
  • Test data_102X: passed

Disk size report

Sample kb/event ref kb/event diff
TTbar MC 102X 1.998 1.997 0.001 ( +0.0% )
TTbar MC 94Xv1 2.054 2.054 0.000 ( +0.0% )
TTbar MC 94Xv2 2.095 2.095 -0.000 ( -0.0% )
TTbar MC 94X2016 1.889 1.886 0.002 ( +0.1% )
TTbar MC 80X 2.006 2.007 -0.001 ( -0.1% )
Data 102X 1.068 1.067 0.000 ( +0.0% )
Data 94Xv1 1.019 1.018 0.000 ( +0.0% )
Data 80X 0.870 0.870 0.000 ( +0.0% )

@mariadalfonso
Copy link

tests are successful,
the cms-sw#30436 can be merged

@guitargeek
Copy link
Author

Thanks, so I can close this PR I guess.

@guitargeek guitargeek closed this Jul 22, 2020
@guitargeek guitargeek deleted the master-cmsswmaster_nano-types branch July 24, 2020 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

5 participants