Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various warning messages from MkFitOutputConverter #35798

Closed
mmusich opened this issue Oct 23, 2021 · 15 comments
Closed

Various warning messages from MkFitOutputConverter #35798

mmusich opened this issue Oct 23, 2021 · 15 comments

Comments

@mmusich
Copy link
Contributor

mmusich commented Oct 23, 2021

After integration of mkFit we observe a large number of warnings coming from MkFitOutputConverter of the type:

%MSG-w MkFitOutputConverter:   MkFitOutputConverter:detachedTripletStepTrackCandidates  23-Oct-2021 10:11:25 CEST Run: 1 Event: 10
Curvilinear error not pos-def
[         -nan        -nan        -nan        -nan        -nan
          -nan        -nan        -nan        -nan        -nan
          -nan        -nan        -nan        -nan        -nan
          -nan        -nan        -nan         nan         nan
          -nan        -nan        -nan         nan         nan ]
candidate ignored
%MSG

and:

%MSG-w MkFitOutputConverter:   MkFitOutputConverter:highPtTripletStepTrackCandidates  23-Oct-2021 10:11:25 CEST Run: 1 Event: 10
Backward fit of candidate 10 failed, ignoring the candidate
%MSG

examples:

wf. 11634.7 step3

wf 12434.0 step3

@mmusich
Copy link
Contributor Author

mmusich commented Oct 23, 2021

assign reconstruction

@cmsbuild
Copy link
Contributor

New categories assigned: reconstruction

@slava77,@jpata you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

A new Issue was created by @mmusich Marco Musich.

@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@mmusich
Copy link
Contributor Author

mmusich commented Oct 23, 2021

@mmasciov @slava77 could you please have a look?

@mmusich
Copy link
Contributor Author

mmusich commented Oct 23, 2021

@VinInn FYI

@slava77
Copy link
Contributor

slava77 commented Oct 23, 2021

Uhm, 136.874 looks even worse

I think that for now we should just change this to LogInfo on the CMSSW side.

@jpata
Copy link
Contributor

jpata commented Nov 23, 2021

According to Javi at T0 in hypernews: https://hypernews.cern.ch/HyperNews/CMS/get/tier0-Ops/2314/3/1/1/1/1/1/1/1/1/1/1/1.html this is an issue for T0 replays.

In the log file /afs/cern.ch/user/c/cmst0/public/PausedJobs/DQMseq/job_9738/tarball/job/WMTaskSpace/cmsRun1/cmsRun1-stdout.log, there's a lot of this

%MSG-w BasicTrajectoryState:  MkFitOutputConverter:pixelLessStepTrackCandidates  12-Nov-2021 01:30:07 CET Run: 317696 Event: 37435960
curv error not pos-def
[  0.000387528  0.00015181 0.000120542-2.72934e-05-9.24547e-07
    0.00015181 0.000101104 5.54339e-05-1.82003e-05-6.04576e-07
   0.000120542 5.54339e-05 4.47304e-05-9.97972e-06-3.35531e-07
  -2.72934e-05-1.82003e-05-9.97972e-06 4.07633e-06 9.49303e-08
  -9.24547e-07-6.04576e-07-3.35531e-07 9.49303e-08 -1.5703e-06 ]

Looks like it's coming from

edm::LogWarning("BasicTrajectoryState") << "local error not pos-def\n"

@mmasciov
Copy link
Contributor

@jpata , the amount of warnings is expected to drop significantly after PR #36075 (backport PR).
We have checked explicitly using the same configuration, and the number of warnings after such backport got reduced by ~15x. The source was understood and 'cured' by that PR.

@mmusich
Copy link
Contributor Author

mmusich commented Nov 25, 2021

I just wanted to check if there is a plan for further reduction.
By checking step3 of wf 136.874 there is still a bit of:

%MSG-w BasicTrajectoryState:  TrackProducer:pixelLessStepTracks  25-Nov-2021 06:22:51 CET Run: 319450 Event: 106497951
local error not pos-def
[   -0.0215424-1.64885e-05   0.0072324  0.00764519  -0.0222784
  -1.64885e-05  0.00100408 0.000317885   0.0213792-0.000340222
     0.0072324 0.000317885  0.00579943 -0.00439497  -0.0370556
    0.00764519   0.0213792 -0.00439497    0.476544    0.354667
    -0.0222784-0.000340222  -0.0370556    0.354667     6.33425 ]
pos/mom/mf  (-43.8802,-25.4802,208.054)   (-2.05221,-1.02749,8.73586)   (-0.0187514,-0.0108884,3.72672) 
%MSG

that I don't observe that much if I run the CKF counterpart:

cmsDriver.py step3  --conditions auto:run2_data -s RAW2DIGI,L1Reco,RECO,SKIM:ZElectron+SinglePhotonJetPlusHOFilter+EXOMONOPOLE,EI,PAT,ALCA:SiStripCalZeroBias+SiStripCalMinBias+SiStripCalSmallBiasScan+TkAlMinBias+EcalESAlign,DQM:@commonFakeHLT+@ecal+@egamma+@L1TEgamma --datatier RECO,MINIAOD,DQMIO --eventcontent RECO,MINIAOD,DQM --data  --process reRECO --scenario pp --era Run2_2018_noMkFit --customise Configuration/DataProcessing/RecoTLR.customisePostEra_Run2_2018 --hltProcess reHLT -n 100  --filein  file:step2.root  --fileout file:step3.root 

(for the record ~200 vs ~10)

@mmasciov
Copy link
Contributor

@mmusich, yes. PR #36246 halves the remaining warnings already.
We will try to reduce them further with minor future technical updates.

@perrotta
Copy link
Contributor

@mmusich, yes. PR #36246 halves the remaining warnings already. We will try to reduce them further with minor future technical updates.

I take one PR randomly, i.e. 136.793, and those TrackProducer:pixelLessStepTracks warnings increase after #36246 from 120 to 159.

In 10224.0 they increase from 0 to 6

@mmasciov how did you conclude that "PR #36246 halves the remaining warnings already" ?

@mmasciov
Copy link
Contributor

@perrotta
From step 3 of 136.874, which was originally pointed out by @mmusich. Here, the number of such warnings goes from 170 to 88.
We also observed a reduction of warnings when re-testing the data replay job mentioned here.

However, warnings are really track-specific. So, one tracking particle present in one case and not in another can produce an increase of warning messages in the former case.
Anyways, as mentioned above we will try to reduce these warnings further with minor future technical updates.

@slava77
Copy link
Contributor

slava77 commented Feb 17, 2022

I had another look at the origins of the remaining non-pos-def warnings, but the examples appeared to be of a similar kind that I noticed shortly after this issue was opened.

The posDef() checks in the *TrajectoryError cover only a simple check for the diagonal elements. So, the checks done in the MkFitOutputConverter are incomplete and some track states after a propagation begin to fail the posDef() checks.
(In a local test still) I have added a check for the matrix determinant to be non-negative and this apparently reduced the number of warnings roughly down to the rate seen in the noMkFit rerun of the workflows.
I used 20 events in wf 11834 and 100 each in 136.{793,874}.

If the internal validation confirms that there is no loss in efficiency and the computation of the determinant is cheap, I think that we will add a check on the determinant sign in the MkFitOutputConverter soon after #36546 is merged. A better understanding of the origins (or if the bad states can be detected already inside the mkFit routines where the covariance matrix is still a 6x6 [and supposedly det=0] ) will be on a longer time scale.

If someone has some pointers to literature related to preservation of positive-definiteness of the covariance in Kalman updates, please share. This may help resolving the problem at the origins.

@jpata
Copy link
Contributor

jpata commented Mar 8, 2022

+reconstruction

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 8, 2022

This issue is fully signed and ready to be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants