Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more robust protection against infinite loops in Ecal multifit (7_4_12_patchX) #11368

Merged
merged 1 commit into from Sep 18, 2015

Conversation

bendavid
Copy link
Contributor

Previous protection only caught a subset of possible cases.

Even rarer infinite loops still observed in HLT. A more refined solution for this may be introduced later, but this should be very safe. (And an absolute cutoff has been added for the worst case scenario)

Can reproduce in 7_4_12_patch1 with
/afs/cern.ch/user/a/avetisya/public/StuckEvent/hlt.py

Still running some tests myself to make doubly sure results are unchanged for other rechits.

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @bendavid (Josh Bendavid) for CMSSW_7_4_12_patchX.

more robust protection against infinite loops in Ecal multifit

It involves the following packages:

RecoLocalCalo/EcalRecAlgos

@cmsbuild, @cvuosalo, @slava77 can you please review it and eventually sign? Thanks.
@argiro this is something you requested to watch as well.
You can sign-off by replying to this message having '+1' in the first line of your reply.
You can reject by replying to this message having '-1' in the first line of your reply.
If you are a L2 or a release manager you can ask for tests by saying 'please test' in the first line of a comment.

Note that this branch is designed for requested bug fixes specific to the CMSSW_7_4_12 release.
If you wish to make a pull request for the CMSSW_7_4_X release cycle, please use the CMSSW_7_4_X branch instead

@bendavid
Copy link
Contributor Author

@emanueledimarco @lgray

@bendavid
Copy link
Contributor Author

For some more background, the first protection catches the cases where the loop returns to an identical state within a single iteration. The further observed cases have the loop cycling through repeating states with a period of 3 or 4 iterations.

Cutting this sequence off at an arbitrary place in the middle is undesirable, because one state in the repeating sequence represents de-facto convergence, the others not. Gradually loosening the convergence threshold should take care of this. (Then there is a hard cutoff at the very end)

@lgray
Copy link
Contributor

lgray commented Sep 18, 2015

@cmsbuild please test

@cmsbuild
Copy link
Contributor

The tests are being triggered in jenkins.

@slava77
Copy link
Contributor

slava77 commented Sep 18, 2015

why CMSSW_7_4_12_patchX as a target?
@davidlange6 please clarify what this branch/milestone is

@slava77
Copy link
Contributor

slava77 commented Sep 18, 2015

@smuzaffar you probably know as well if CMSSW_7_4_12_patchX is useable by general code integration

@bendavid
Copy link
Contributor Author

@davidlange6 asked me to make the pull request here in addition to elsewhere
This is needed ASAP for HLT and prompt reco. (but will also have to go in the main 74x, 75x, and 76x branches)

Let me know if you want me to immediately open all three in parallel.
I'll open the 7_4_X right away at least since it's just point and click.

@slava77
Copy link
Contributor

slava77 commented Sep 18, 2015

all 4 in parallel

there is no IB to test on this branch

@bendavid bendavid changed the title more robust protection against infinite loops in Ecal multifit more robust protection against infinite loops in Ecal multifit (7_4_12_patchX) Sep 18, 2015
@davidlange6
Copy link
Contributor

@slava77 - this is just a patches branch for prompt and hlt since I had started intrgrating requests for the next full build.

You can test in74x.

@bendavid
Copy link
Contributor Author

I've confirmed identity (with respect to 7_4_12) for all rechits in 10 events from the Zee skim.

@davidlange6
Copy link
Contributor

I will merge this and start a build while tests happen in 74x.

davidlange6 added a commit that referenced this pull request Sep 18, 2015
more robust protection against infinite loops in Ecal multifit (7_4_12_patchX)
@davidlange6 davidlange6 merged commit 38147a0 into cms-sw:CMSSW_7_4_12_patchX Sep 18, 2015
@bendavid
Copy link
Contributor Author

ugh, wait, the final protection doesn't actually have a break statement with it
One more pull request coming in a second...
(Didn't notice because the adaptive threshold already avoids the infinite loop on the tested event)

@slava77
Copy link
Contributor

slava77 commented Sep 18, 2015

On 9/18/15 1:22 PM, David Lange wrote:

@slava77 https://github.com/slava77 - this is just a patches branch
for prompt and hlt since I had started intrgrating requests for the next
full build.

I'm not really familiar with this machinery/logic.
Prompt was always integrated and picked off from the main "production"
branch.
This looks like a new policy or something adhoc.

David, could you please write up something on a twiki or send to
release-dataops
to have as a reference.

You can test in74x.


Reply to this email directly or view it on GitHub
#11368 (comment).

@bendavid
Copy link
Contributor Author

(the missing break statement is added to this branch here
#11371)

It's already included in the still-open 74x pull request.

@davidlange6
Copy link
Contributor

On Sep 18, 2015, at 8:34 PM, Slava Krutelyov notifications@github.com wrote:

On 9/18/15 1:22 PM, David Lange wrote:

@slava77 https://github.com/slava77 - this is just a patches branch
for prompt and hlt since I had started intrgrating requests for the next
full build.

I'm not really familiar with this machinery/logic.
Prompt was always integrated and picked off from the main "production"
branch.
This looks like a new policy or something adhoc.

David, could you please write up something on a twiki or send to
release-dataops
to have as a reference.

my goal is indeed to use the latest 74x, yes. But the concept of patches on the non-latest full build is not new. For documentation, you can look at my spring offline/computing week talk on releases for production (which is more complicated than what we actually have implemented, but the idea is the same)

https://indico.cern.ch/event/368377/session/2/contribution/1/attachments/732527/1005024/releases_offcomp_150317.pdf

You can test in74x.


Reply to this email directly or view it on GitHub
#11368 (comment).


Reply to this email directly or view it on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants