Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unit test for trigger::TriggerEvent format #41631

Merged
merged 1 commit into from May 12, 2023

Conversation

wddgit
Copy link
Contributor

@wddgit wddgit commented May 11, 2023

PR description:

Add a new unit test for the trigger::TriggerEvent raw data format. This generates a data file containing a trigger::TriggerEvent object with known content. Then it reads it. It verifies that when we read it we obtain values that match the known written values for all the data fields in the object. In particular all containers have content so all contained types are also read.

It also reads the old files in the DataFormats/HLTReco data repository which are listed in the shell script. The plan is that each time the data format of trigger::TriggerEvent is modified a file will added.

PR validation:

This only adds a unit test in DataFormats/HLTReco which passes. It shouldn't affect anything else.

@missirol
Copy link
Contributor

@cmsbuild , ping

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41631/35518

  • This PR adds an extra 16KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @wddgit (W. David Dagenhart) for master.

It involves the following packages:

  • DataFormats/HLTReco (hlt)

@cmsbuild, @missirol, @Martin-Grunewald can you please review it and eventually sign? Thanks.
@silviodonato, @missirol, @rovere this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@wddgit
Copy link
Contributor Author

wddgit commented May 11, 2023

please test with cms-data/DataFormats-HLTReco#1

@wddgit
Copy link
Contributor Author

wddgit commented May 11, 2023

Note this PR was motivated by discussions after the problems discussed in issues #41246 and #41348. It does not actually detected those problems very well, but as we discussed them we concluded we needed better tests of raw data formats if we we continue to guarantee that those formats would be backward compatible forever. I already implemented similar tests for the other 3 raw data formats (see #41565, #41494, and #4139) recently. I'll continue and consider adding some for Scouting formats next. I'm looking at what already exists for Scouting at the moment.

@wddgit
Copy link
Contributor Author

wddgit commented May 11, 2023

FYI @makortel

@missirol
Copy link
Contributor

I'll continue and consider adding some for Scouting formats next. I'm looking at what already exists for Scouting at the moment.

Thanks, @wddgit. Due to #41040, a test was added for Scouting in #41093, but the idea was to replace it with a test in the style of the ones you are implementing (see #41040 (comment)).

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-bee537/32565/summary.html
COMMIT: 484edeb
CMSSW: CMSSW_13_2_X_2023-05-11-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/41631/32565/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-bee537/32565/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-bee537/32565/git-merge-result

Comparison Summary

Summary:

  • You potentially added 2 lines to the logs
  • Reco comparison results: 475 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3461906
  • DQMHistoTests: Total failures: 32
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 3461851
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 797.1469999999998 KiB( 47 files compared)
  • DQMHistoSizes: changed ( 11634.7,... ): 46.891 KiB CTPPS/common
  • Checked 207 log files, 159 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@missirol
Copy link
Contributor

+hlt

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@missirol
Copy link
Contributor

As usual, I have a naive question (not specific to this PR): would it help to have these unit tests run on more than 1 single event ?

My concern comes from #41025 (comment), where we spotted an issue with the initial implementation of an ioread rule only after we tested on more than 1 event.

@rappoccio
Copy link
Contributor

@missirol can you also sign cms-data/DataFormats-HLTReco#1 ?

@rappoccio
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit c13903f into cms-sw:master May 12, 2023
11 checks passed
@wddgit
Copy link
Contributor Author

wddgit commented May 12, 2023

Why did reading more than 1 event help in that case?

One possibility I can imagine is that some container is empty on most events and using many events ensured we hit an event where the container wasn't empty. This test guarantees all containers have content on the first event. So it wouldn't help in that case.

Or was was it something else? Or we don't know why?

@missirol
Copy link
Contributor

Unfortunately, I don't have much insight (that's why I'm asking). More expert-level comments were in #41025 (comment).

I took CMSSW_13_2_X_2023-05-10-2300, and reproduced what is described in #41025 (comment) [1] [2] [3] [4]. Comparing [3] and [4], I see that the mismatch for the collection

Product Label: "hltScoutingEgammaPacker" (type: "vector<Run3ScoutingElectron>")

appears in the 7th event, which is the 3rd event in which that collection is not empty. If I had done that check on less events, I would have missed that the clear() call was needed to get the correct results.

Maybe this is irrelevant for this PR, and I should have asked elsewhere. I'm just trying to understand better.

[1]

${CMSSW_BASE}/src/DataFormats/Scouting/test/scoutingCollectionsDumper.py -k Run3Scouting -i /eos/cms/store/user/cmsbuild/store/mc/Run3Winter21DRMiniAOD/VBFHToTauTau_M125_TuneCP5_14TeV-powheg-pythia8/GEN-SIM-DIGI-RAW/FlatPU30to80FEVT_112X_mcRun3_2021_realistic_v16-v1/270000/005b56c1-0107-46b3-9740-1c6efc559295.root.unused -v -1 -n 100

[2] cmssw41025_logs.tar.gz
[3] ref.txt (output of [1] in vanilla CMSSW_13_2_X_2023-05-10-2300)
[4] bad.txt (output of [1] after removing all clear() statements from here)

@aandvalenzuela
Copy link
Contributor

Hello,
This PR adds the unit test TestTriggerEventFormat that fails since this merge in ASAN IBs:

===== Test "TestTriggerEventFormat" ====
+ LOCAL_TEST_DIR=/build/avalenzu/ASAN/CMSSW_13_2_ASAN_X_2023-05-12-2300/src/DataFormats/HLTReco/test
+ cmsRun /build/avalenzu/ASAN/CMSSW_13_2_ASAN_X_2023-05-12-2300/src/DataFormats/HLTReco/test/create_TriggerEvent_test_file_cfg.py
Begin processing the 1st record. Run 1, Event 1, LumiSection 1 on stream 0 at 15-May-2023 15:55:43.245 CEST
+ file=testTriggerEvent.root
+ cmsRun /build/avalenzu/ASAN/CMSSW_13_2_ASAN_X_2023-05-12-2300/src/DataFormats/HLTReco/test/test_readTriggerEvent_cfg.py testTriggerEvent.root
15-May-2023 15:55:47 CEST  Initiating request to open file file:testTriggerEvent.root
15-May-2023 15:55:48 CEST  Successfully opened file file:testTriggerEvent.root
Begin processing the 1st record. Run 1, Event 1, LumiSection 1 on stream 0 at 15-May-2023 15:55:49.038 CEST
15-May-2023 15:55:49 CEST  Closed file file:testTriggerEvent.root
+ oldFiles='testTriggerEvent_CMSSW_13_0_0.root testTriggerEvent_CMSSW_13_1_0_pre3.root'
+ for file in $oldFiles
++ edmFileInPath DataFormats/HLTReco/data/testTriggerEvent_CMSSW_13_0_0.root
+ inputfile=
+ die 'Failure edmFileInPath DataFormats/HLTReco/data/testTriggerEvent_CMSSW_13_0_0.root' 1
+ echo Failure edmFileInPath DataFormats/HLTReco/data/testTriggerEvent_CMSSW_13_0_0.root: status 1
Failure edmFileInPath DataFormats/HLTReco/data/testTriggerEvent_CMSSW_13_0_0.root: status 1
+ exit 1

---> test TestTriggerEventFormat had ERRORS
TestTime:12
^^^^ End Test TestTriggerEventFormat ^^^^

It fails before running the cmsRun command. Could you please have a look at it?

Many thanks,
Andrea.

@wddgit
Copy link
Contributor Author

wddgit commented May 15, 2023

It is failing to find a data file that was recently added to the cms-data/DataFormats-HLTReco repository. The regular IBs are finding the file, so it should be there and functional. Is there something different about how the ASAN IBs find files? This doesn't seem to me to be an issue with the code in the PR. Anyone know how this works? The data file was added to the data repository at approximately the same time as the PR with the new unit test. It is possible it is a timing issue and this will just work with the next ASAN IB, but I don't how this works. Does someone need to do something to update the external dependency to the new version of the data repository?

@perrotta
Copy link
Contributor

@aandvalenzuela @wddgit this unit test was failing also in the normail IB in 2023-05-12-2300 IBs.
The cmsdist PR for the data files was originally forgotten to be merged, and only added since CMSSW_13_2_X_2023-05-14-0000
As you can see from the more recent bilds, this is fixed now

@aandvalenzuela
Copy link
Contributor

My bad @wddgit, @perrotta! I got the ASAN results from 2023-05-12-2300 as if they were from the latest IBs (since they are actually the latest ASAN IBs we have). Sorry for the inconvenience!

@wddgit
Copy link
Contributor Author

wddgit commented May 15, 2023

Maybe in the future as I do more of these, I should let the PR with the data files get merged first. Then wait a few days, then submit the other one instead of submitting them together...

@wddgit
Copy link
Contributor Author

wddgit commented May 17, 2023

@Dr15Jones @missirol @makortel Jumping back to the comment from @missirol 5 days ago. My understanding is this:

The clear is needed because the FWLite test is reusing the objects and the schema evolution code in ROOT doesn't do that clear for new data members (a little scary for FWLite...).

The clear isn't necessary in a cmsRun job because it makes a new object every event. Processing more than one event wouldn't help detect this issue in the new tests I am writing because these tests use cmsRun. If I wanted to detect this I would need to use FWLite (or something with similar behavior) in the read step of the test in addition to processing more than 1 event.

Do I understand correctly?

@makortel
Copy link
Contributor

The clear is needed because the FWLite test is reusing the objects and the schema evolution code in ROOT doesn't do that clear for new data members (a little scary for FWLite...).

What I understood in #41025 (comment) is that the way the schema was evolved in the PR, the clear() is needed in both cmsRun-way and FWLite-way (because of the affected data member being new, and how ROOT works underneath).

Only if the schema would have been evolved like I speculated in #41025 (comment), the clear() would not be needed for cmsRun, but would be needed for FWLite.

I think @missirol has a point, and the tests would be able cover more potential problems if they would have 2 events with different content (thanks for pointing it out!).

@wddgit
Copy link
Contributor Author

wddgit commented May 17, 2023

So I looked again. This is the relevant line of code:

https://cmssdt.cern.ch/dxr/CMSSW/source/IOPool/Input/src/RootDelayedReader.cc#86

For cmsRun, it appears to default construct a new object every time. All the vectors would already be empty and clear would do nothing. For cmsRun it isn't needed. I didn't test this. I could experiment if you are not convinced. Possibly I am missing something. @missirol Did you see the error in a cmsRun process reading the objects?

For FWLite the events wouldn't need to have different content to have the problem. A vector would just keep adding more elements each time a new event was read and the vector would grow and grow without a clear call in the iorule.

@makortel
Copy link
Contributor

In #41025 (comment) I understood that the actual vector to be cleared is one owned and managed by ROOT. In any case in the development of #41025 incorrect behavior was observed with cmsRun if the clear() calls were absent. Further, specific experimentation could be useful to clarify the situation.

@wddgit
Copy link
Contributor Author

wddgit commented May 17, 2023

I'll try some experiments. I've got to write code similar to that to write the scouting tests anyway. Maybe you are right. My understanding of what ROOT is doing internally is not very good. I'll let you know what I find.

@missirol
Copy link
Contributor

Did you see the error in a cmsRun process reading the objects?

I think the answer is 'no'.

Overall, I have a feeling I introduced confusion (if so, I apologise). I understand only now that FWLite and cmsRun can behave differently, in general.

[1] /eos/cms/store/user/cmsbuild/store/mc/Run3Winter21DRMiniAOD/VBFHToTauTau_M125_TuneCP5_14TeV-powheg-pythia8/GEN-SIM-DIGI-RAW/FlatPU30to80FEVT_112X_mcRun3_2021_realistic_v16-v1/270000/005b56c1-0107-46b3-9740-1c6efc559295.root.unused

@wddgit
Copy link
Contributor Author

wddgit commented May 17, 2023

@missirol Thanks for the clarification.

It will take a few days for me to work through and implement these new tests. Let's continue this conversation in that PR when it is submitted and I've experimented a bit with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants