New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in IB test wf 13034.0 #29251
Comments
A new Issue was created by @silviodonato Silvio Donato. @Dr15Jones, @smuzaffar, @silviodonato, @makortel, @davidlange6, @fabiocos can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign alca, reconstruction |
In https://cmssdt.cern.ch/lxr/source/RecoLocalTracker/SiPixelRecHits/src/SiPixelTemplateReco.cc#1178 I notice that the value In the literature I find that assert((sigmaQ >= 0.) && (mpv >= 0.) && (kappa >= 0.01) && (kappa <= 10.)); That would be enough to remove the exception observed in the attached log. Of course, the calculations in @cmantill @pmaksim1 @tvami @mmusich @tsusa @OzAmram : could any of you please have a look and confirm whether the proposed quick fix above actually corresponds to what expected from that algo, or if some different fix should be applied instead? |
The short answer is that I think this fix makes sense, but I am checking with Morris to be sure. I'll let you know when he confirms. The longer explanation is: |
Ok so after a little closer inspection I think what Andrea wrote above is a little backwards. The CMSSW exception is thrown on line 1174, so it is the condition on 1173 that needs to be changed to fix the problem (note the ifndef on 1172). 1178 is what is run in the standalone code. So it is a little confusing that the exception is being thrown with kappa = 0.01 then, because that should not satisfy the (kappa < 0.01) condition that is being checked for. It might be due to some floating point rounding error or something. So we could try giving it a little buffer, by changing 1173 to be something like:
We can also make a similar change to 1178 to keep consistent. (Also I talked to Morris and he OK'ed this change) |
Lines 1173 to 1176 are accessed only if Of course, one could certainly relax both cases (CMSSW and STANDALONE) to take rounding errors into account. But the error reported in the log that started this issue wouldn't have been triggered in STANDALONE mode, while it did in CMSSW. |
Sorry @OzAmram : yes, you are right. In my post above I confused Yes: rounding approximations must be taken into account when there are non integer numbers. Instead of writing values as such, I'd rather define a const value epsilon (small) to be added to the actual bounds in the evaluation. |
Ok that sounds good to me. |
the assert will happen for (in some other code in CMSSW we had problems in the past where binning was decided by exact inequalities and missing a floating point value falling at the bin boundary, which would lead to problems. The appropriate solution is to simply do logically the right thing and allow equal value) |
@slava77 The assert on line 1178 is only executed in standalone mode. The CMSSW exception is being thrown based on line 1173 (Andrea confused them above). The CMSSW condition checks if In either case I think Andrea's idea of using an epsilon should work. Do you want me to test it and make PR? |
I thought I read it right, but then apparently I did it backwards as well
Looking at the code more carefully now, all values are double precision where the check is made. Was there a good motivation to switch between float and double here? |
Ah ok that makes sense. The SiPixelTemplate object stores the kappa and other vavilov parameters as floats (which are what is used in the interpolation), so we would need to change that rather than just the computation in vavilov_pars in order to get the 'double' value of 0.01. Indeed using the float value could lead to an error down the line based on the ROOT function kappa is being used for which does the same check of kappa < 0.01 (see https://github.com/root-project/root/blob/331efa4c00fefc38980eaaf7b41b8e95fcd1a23b/math/mathcore/src/TMath.cxx#L2941-L2943). |
Actually, my proposal to use 0.01f in the check isn't that much better without changing the precision of kappa itself. I see that the only places where kappa is used are:
But then, perhaps bumping it up to 0.010000001 in vavilov_pars if the float value is ==0.01f (both double and single precision above 0.01) is acceptable |
Does someone have a recipe to reproduce the error so I can check if the above fix does indeed fix it? When I ran |
It happens seldom. I can reproduce the issue after several jobs. RAW is kept, so you should be able to reproduce the issue. Please try raw from cmsDriver (same as wf 13034): Issue can be found in
HTH. |
+1
|
+1 |
This issue is fully signed and ready to be closed. |
As discussed during the last Core Software meeting, we are getting a seldom error in 13034.0 :
These are the log files of the latest IB tests with the error
CMSSW_11_1_X_2020-03-13-2300.log
CMSSW_11_1_X_2020-03-14-1100.log
CMSSW_11_1_X_2020-03-16-2300.log
CMSSW_11_1_X_2020-03-19-1100.log
CMSSW_11_1_X_2020-03-21-1100
CMSSW_11_1_X_2020-03-30-1100
The text was updated successfully, but these errors were encountered: