Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated root to tip of branch v6-22-00-patches #6388

Conversation

smuzaffar
Copy link
Contributor

No description provided.

@smuzaffar
Copy link
Contributor Author

smuzaffar commented Nov 9, 2020

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 9, 2020

The tests are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 9, 2020

A new Pull Request was created by @smuzaffar (Malik Shahzad Muzaffar) for branch IB/CMSSW_11_2_X/rootnext.

@cmsbuild, @smuzaffar, @mrodozov can you please review it and eventually sign? Thanks.
cms-bot commands are listed here

@smuzaffar
Copy link
Contributor Author

lets get it in for 23h00 Ib

@smuzaffar smuzaffar merged commit dc5b7e3 into cms-sw:IB/CMSSW_11_2_X/rootnext Nov 9, 2020
@cmsbuild
Copy link
Contributor

-1

Tested at: 3939110

CMSSW: CMSSW_11_2_ROOT622_X_2020-11-09-1100
SCRAM_ARCH: slc7_amd64_gcc820
You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2fc353/10599/summary.html

I found follow errors while testing this PR

Failed tests: UnitTests

  • Unit Tests:

I found errors in the following unit tests:

---> test MagneticFieldEngineTestDriver had ERRORS

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2fc353/10599/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 8 differences found in the comparisons
  • DQMHistoTests: Total files compared: 35
  • DQMHistoTests: Total histograms compared: 2529296
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2529273
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 34 files compared)
  • Checked 148 log files, 22 edm output root files, 35 DQM output files

@smuzaffar
Copy link
Contributor Author

@slava77 , can you please check these reco comparison differences ?

@slava77
Copy link
Contributor

slava77 commented Nov 11, 2020

@slava77 , can you please check these reco comparison differences ?

Interesting

Reco comparison results: 8 differences found in the comparisons

it looks like all of the differences are appearing in particleFlowBlock linkData_ size, but then there is nothing downstream.

Screen Shot 2020-11-11 at 5 22 25 AM

IIRC there is no DQM for particleFlowBlock; so, seeing no differences in DQM would not reject some possible subtleties of the reco plotting script, which in this case would be running a more recent version of root after loading the older version root file first.

It's useful to make a manual check in "native" builds for the baseline and this PR test.
I see that the baseline is CMSSW_11_2_X_2020-11-09-1100.
Just in case, is the PR build available on CVMFS?
It looks like the PR is merged in IB/CMSSW_11_2_X/rootnext , which should be in CMSSW_11_2_ROOT622_X_2020-11-09-2300/slc7_amd64_gcc820; I guess I can check this instead.
I will do it some time later today.

@smuzaffar
Copy link
Contributor Author

Yes, you can use latest CMSSW_11_2_ROOT622_X IB .
I am opening a PR to test it in CMSSW_11_2_X (master Ibs)

@slava77
Copy link
Contributor

slava77 commented Nov 11, 2020

I picked step3.root from 11634.0_TTbar_14TeV+2021+TTbar_14TeV_TuneCP5_GenSim+Digi+Reco+HARVEST+ALCA for the baseline and the PR outputs.
For the baseline I inspected it in its original CMSSW_11_2_X_2020-11-09-1100,
while for this PR test output I used CMSSW_11_2_ROOT622_X_2020-11-09-2300.

The content of PFBlock::linkData_ looks different based on TTree::Draw (or Scan) used in the plotting script.

This data member type is an std::map<unsigned int, Link = struct {float distance; char test;}>.
It looks to me that with this PR the results are incorrect based on reported values in e.g. the first event 2nd block
en->Scan("recoPFBlocks_particleFlowBlock__RECO.obj[2].elements_@.size():recoPFBlocks_particleFlowBlock__RECO.obj[2].linkData_@.size():recoPFBlocks_particleFlowBlock__RECO.obj[2].linkData_[].first:recoPFBlocks_particleFlowBlock__RECO.obj[2].linkData_[].second.distance", "", "", 1,0)

  • the size is different 208 -> 80
  • the .first values do not match and the max value for the case with this PR is logically not possible from the way the algorithm filling these values works
  • the .second.distance field with this PR looks like random numbers in all cases

I can't recall offhand if anything else of type std::map is plotted in the reco comparisons; it may be that this is incidentally the only case.
At least for the PFBlocks, we do not read this data for further processing in the tested workflows; so, I can't be sure that the map reading is limited to FWLite (while it still works in full framework) or if there is a possibility that the std::map streaming is fully broken.

@smuzaffar
Copy link
Contributor Author

@slava77 , testing latest root v6.22 for native 11.2.X IBs shows no comparison differences #6404

@slava77
Copy link
Contributor

slava77 commented Nov 11, 2020

@slava77 , testing latest root v6.22 for native 11.2.X IBs shows no comparison differences #6404

what's the difference between the builds?
IB/CMSSW_11_2_X/master...IB/CMSSW_11_2_X/rootnext looks perhaps only different in the root debug build flag
Is there anything else different?

@smuzaffar
Copy link
Contributor Author

yes only difference is root debug build flag in rootnext branch every thing else is identical

@slava77
Copy link
Contributor

slava77 commented Nov 12, 2020

@pcanal
I'm responding here to cms-sw/cmssw#30359 (comment) for a bit more continuity in the thread

@slava77 Does reading the failing map from a file written with an old version of ROOT & CMSSW work (or fails) when reading with the new FWLite? (i.e. is the file badly read or badly written?)

it looks like a case of both badly written and badly read: I can read the old file with the new IB in CMSSW_11_2_ROOT622_X_2020-11-09-2300, and the values in the map .second.distance are (accidentally?) reasonable, however the key values and the size is different

[1]

*    Row   * Instance * elements_@.size() * linkData_@.size() * linkData_[].first * linkData_[].second.distance *
*        0 *        0 *       132 *        23 *         1 * 0.0010000 *
*        0 *        1 *       132 *        23 *        40 * 0.0010000 *
*        0 *        2 *       132 *        23 *       257 * 0.0010000 *
*        0 *        3 *       132 *        23 *       513 * 0.0010000 *
*        0 *        4 *       132 *        23 *       769 * 0.0010000 *

[2]
vs native baseline file made and read in CMSSW_11_2_X_2020-11-09-1100

*        0 *        0 *       132 *       208 *        40 * 0.0010000 *
*        0 *        1 *       132 *       208 *        41 * 0.0010000 *
*        0 *        2 *       132 *       208 *        42 * 0.0010000 *
*        0 *        3 *       132 *       208 *        44 * 0.0010000 *
*        0 *        4 *       132 *       208 *        47 * 0.0010000 *
*        0 *        5 *       132 *       208 *       164 * 0.0010000 *
*        0 *        6 *       132 *       208 *       171 * 0.0161606 *
*        0 *        7 *       132 *       208 *       293 * 0.0010000 *
*        0 *        8 *       132 *       208 *       300 * 0.0144026 *
*        0 *        9 *       132 *       208 *       421 * 0.0010000 *

the above agrees with values printed from the job itself (running in CMSSW_11_2_X_2020-11-09-1100)

[3]
vs the file from PR read in ROOT622_X

*        0 *        0 *       132 *        80 *        10 * 2.162e+10 *
*        0 *        1 *       132 *        80 *        37 * 2.193e-08 *
*        0 *        2 *       132 *        80 *        40 *         0 *
*        0 *        3 *       132 *        80 *       164 * 9.758e-10 *
*        0 *        4 *       132 *        80 *       235 * -1.510681 *
*        0 *        5 *       132 *        80 *       253 * 1.0767518 *

[4]
I also remade the file in CMSSW_11_2_ROOT622_X_2020-11-09-2300 and get somewhat different random numbers while reading it in the same release

*        0 *        0 *       132 *       126 *         0 * -10.97802 *
*        0 *        1 *       132 *       126 *        40 * 0.1121101 *
*        0 *        2 *       132 *       126 *       256 * 1.691e+37 *
*        0 *        3 *       132 *       126 *       267 * -1.89e-30 *
*        0 *        4 *       132 *       126 *       372 * -0.216617 *
*        0 *        5 *       132 *       126 *       511 *         0 *

note that the printouts from inside the job agree with the baseline release as shown in [2] (so, at runtime the data in memory is the same in [4] and in [2])

and the same reproduced file made in CMSSW_11_2_ROOT622_X_2020-11-09-2300 now read in CMSSW_11_2_X_2020-11-09-1100

*        0 *        0 *       132 *       208 *        40 * 0.1121101 *
*        0 *        1 *       132 *       208 *        41 * -10.97802 *
*        0 *        2 *       132 *       208 *        42 * 1.238e-08 *
*        0 *        3 *       132 *       208 *        44 * -0.014939 *
*        0 *        4 *       132 *       208 *        47 * -1.45e-37 *
*        0 *        5 *       132 *       208 *       164 * 257592.01 *

note that the size and keys turns out to be right, but the last column is all wrong.

@pcanal
Copy link

pcanal commented Nov 12, 2020

So we need to solve this.

I am tad bit confused. "so, at runtime the data in memory is the same in [4] and in [2])" but the output for the column 5th column is different between the 2 Scan output so I am not sure whether "and keys turns out to be right" and the data is different for different run or if it is wrong.

Either way, it seems like it might be a long standing problem, so I only need to debug it in one release. Can I get the instruction to run the writing part with the text output to capture the values a printed by the job itself? I assume the reading part is simply, open file and run Scan.

@slava77
Copy link
Contributor

slava77 commented Nov 12, 2020

I used CMSSW_11_2_ROOT622_X_2020-11-09-2300
and reran step3*py on step2.root input from https://cmssdt.cern.ch/SDT/jenkins-artifacts/ib-baseline-tests/CMSSW_11_2_X_2020-11-09-1100/slc7_amd64_gcc820/-GenuineIntel/matrix-results/11634.0_TTbar_14TeV+2021+TTbar_14TeV_TuneCP5_GenSim+Digi+Reco+HARVEST+ALCA/
which is the jenkins output of the baseline IB CMSSW_11_2_X_2020-11-09-1100

I guess this can be remade from scratch with runTheMatrix.py -l 11634.0

For the runtime, debugs can be added in RecoParticleFlow/PFProducer/plugins/PFBlockProducer.cc
PFBlockProducer::produce method
I simply had the following

    for (auto const& block : blocks) {
      str << "Block elements size "<<block.elements().size() << endl;
      str << "Block link data: size "<<block.linkData().size()<<" :"<<endl;
      for (auto const& link : block.linkData()) {
        str << "\t key "<<link.first<<" distance "<< link.second.distance<<" test "<<link.second.test<< endl;
      }
    }

    LogWarning("PFBlockProducer") << str.str() << endl;

which is more appropriate than the original for the purpose of this debugging.
Additionally, use process.particleFlowBlock.verbose = True in the config, which can as well be hardcoded if the c++ code printouts are needed.

HTH

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants