New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HCAL] Sanitize Mahi HCAL local reconstruction pulse arrival time values #22394
[HCAL] Sanitize Mahi HCAL local reconstruction pulse arrival time values #22394
Conversation
The code-checks are being triggered in jenkins. |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-22394/3619 |
A new Pull Request was created by @jaylawhorn (Jay Lawhorn) for master. It involves the following packages: RecoLocalCalo/HcalRecAlgos @perrotta, @cmsbuild, @slava77 can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
please test |
The tests are being triggered in jenkins. |
Comparison job queued. |
Comparison is ready Comparison Summary:
|
The differences are mostly as expected, impacting Hcal RecHit timing and higher level object timing that uses Hcal RecHits. I see one set of differences involving edmErrorSummaryEntries that aren't immediately obvious to me, here: for example: |
On 3/1/18 4:39 AM, Jay Lawhorn wrote:
The differences are mostly as expected, impacting Hcal RecHit timing and
higher level object timing that uses Hcal RecHits. I see one set of
differences involving edmErrorSummaryEntries that aren't immediately
obvious to me, here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_10_1_X_2018-02-28-1100+22394/25315/validateJR/all_OldVSNew_TTbarPUwf25202p0/
for example:
image
<https://user-images.githubusercontent.com/6333978/36845020-52478994-1d55-11e8-9987-7d69094bd5f9.png>
this is very likely due to a difference in (randomized) PU test reads
unrelated to this PR.
[hopefully we can get rid of these false-positive differences at some
point soon]
I'm not sure if someone can confirm these behave as expected when we
stop returning NaN for RecHit timing? Maybe @abdoulline
<https://github.com/abdoulline> @deguio <https://github.com/deguio>
@igv4321 <https://github.com/igv4321> ?
It's obvious to me that NaNs should be removed.
The decision to truncate at +/-12.5 is less obvious.
Is the value still meaningful for out-of-time pulse fits?
If so, perhaps some broader coverage should be preserved (+/-50 maybe).
|
@slava77 Thanks for the clarification! So if the fit is putting the pulse in the in-time bunch crossing, however large the residuals, it seems misleading to me to return a "time" value that corresponds to an out-of-time pulse. (Also, the time value is less and less meaningful the farther away from the nominal it gets because it is based on the local derivative of the pulse at the nominal value.) If it was an out-of-time pulse, it would be assigned to an out-of-time bunch crossing, which we don't return. On a longer time scale we would like to fix this pulse arrival time to be useful and reasonable without any hard boundaries, either by including the arrival time as an explicit parameter in the fit with a gaussian constraint, or by adding more pulse shapes to the fit (up to the # of bunch crossings) which would reduce the residuals problem. However, for now, we would prefer to not return information that could be mis-interpreted. @jaehyeok sees anyways that the large time values come from low energy RecHits (https://indico.cern.ch/event/708228/contributions/2907551/attachments/1605858/2547938/20180223_Jae_HCAL_MAHI.pdf slide 5). |
On 3/1/18 6:20 AM, Jay Lawhorn wrote:
@slava77 <https://github.com/slava77> Thanks for the clarification!
So if the fit is putting the pulse in the in-time bunch crossing,
however large the residuals, it seems misleading to me to return a
"time" value that corresponds to an out-of-time pulse. (Also, the time
value is less and less meaningful the farther away from the nominal it
gets because it is based on the local derivative of the pulse at the
nominal value.) If it was an out-of-time pulse, it would be assigned to
an out-of-time bunch crossing, which we don't return.
OK
On a longer time scale we would like to fix this pulse arrival time to
be useful and reasonable without any hard boundaries, either by
including the arrival time as an explicit parameter in the fit with a
gaussian constraint, or by adding more pulse shapes to the fit (up to
the # of bunch crossings) which would reduce the residuals problem.
However, for now, we would prefer to not return information that could
be mis-interpreted.
Perhaps on a longer time scale we can even save OOT hits above some
threshold in a separate collection.
…
@jaehyeok <https://github.com/jaehyeok> sees anyways that the large time
values come from low energy RecHits
(https://indico.cern.ch/event/708228/contributions/2907551/attachments/1605858/2547938/20180223_Jae_HCAL_MAHI.pdf
slide 5).
|
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
The current version of Mahi can return a pulse arrival time that is NaN because there is no check for division by zero. Additionally, we limit the pulse arrival time to +/- 12.5 ns from the nominal time, because if the hit is put in the in-time sample (bx=0), it does belong in that sample. This removes large tails in the timing distribution induced by deviations from the default pulse shape template and/or the existence of pulses in bunch crossing we don't allow pulses to exist in the fit.
For example, some recHits from the default release:
become
where -9999 is the default value corresponding to zero in-time energy.
The following plot demonstrates the truncation of the ~1% tails in the pulse shape arrival time:
@mariadalfonso @deguio @jaehyeok