Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacing PR#5954: Threadsafe faster hcal25ns reco method2 #6150

Conversation

lihux25
Copy link
Contributor

@lihux25 lihux25 commented Oct 31, 2014

A new PR. This is the SAME as the latest update of the PR#5954 and meant to replace that (with a fresh pull request...)

(ref: PR#5954: #5954)

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @lihux25 (Hongxuan Liu) for CMSSW_7_3_X.

Replacing PR#5954: Threadsafe faster hcal25ns reco method2

It involves the following packages:

RecoLocalCalo/HcalRecAlgos
RecoLocalCalo/HcalRecProducers

@cmsbuild, @nclopezo, @StoyanStoynev, @slava77 can you please review it and eventually sign? Thanks.
@argiro this is something you requested to watch as well.
You can sign-off by replying to this message having '+1' in the first line of your reply.
You can reject by replying to this message having '-1' in the first line of your reply.
@nclopezo, @ktf you are the release manager for this.
You can merge this pull request by typing 'merge' in the first line of your comment.

@slava77
Copy link
Contributor

slava77 commented Oct 31, 2014

Thanks, please close the other PR

bool fitStatus = false;
if(n_above_thr<=5){
// Set starting values and step sizes for parameters
double vstart[3] = {iniTimesArr[i_tsmax-1], tsMAX_NOPED, 0};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check that i_tsmax is above zero

@slava77
Copy link
Contributor

slava77 commented Oct 31, 2014

Moving on to this PR (more comments were in #5954)

After running valgrind on 10 events in wflow 25202, reconstruction only I see about 26K valgrind errors about uninitialized conditions coming from Minuit2 and downstream.
It looks like many of the errors start from ROOT::Minuit2::MnUserParameterState::SetLimits
I made a few suggestions in the code to be tested.
I think it's important to fix these

run valgrind with

time valgrind --tool=memcheck --suppressions=$CMSSW_RELEASE_BASE/src/Utilities/ReleaseScripts/data/cms-valgrind-memcheck.supp --num-callers=20 --xml=yes --xml-file=valgrind-%p.xml cmsRun a.py >& a.log

(about 10 events should be enough, mind that valgrind is about 10-20 times slower than a regular run)

you can parse the output xml file with smth like (this selects Uninit* kind of errors in a stack of calls with ::doEvent and skip the ones with TObject [the TObject-related erros seem to be harmless]):

grep "<fn\|</error\|<kind" valgrind-11018.xml  | awk '/<\/error>/{prn=0;}/kind>Uninit/{if(match(msg,"::doEvent")>0&& !match(msg, "TObject")){print msgP"\n======\n";} msg=$0; msgP=msg; prn=1;cnt=0;}/<fn>/{if (prn==1){msg=msg"\n"$0; if(cnt<5)msgP=msgP"\n"$0} if (prn==1)cnt++; else cnt=0;}' | sed -e 's?<fn>??g;s?</fn>??g;s?<text>??g;s?</text>??g;s/\&gt;/>/g;s/\&amp;/\&/g;s/\&lt;/</g' | less

NB: Prior to this PR there are 7 valgrind errors in HcalNoiseAlgo::pass*RBXRechitR45 which don't spread to downstream reco though (still, need to be fixed separately from this).

@slava77
Copy link
Contributor

slava77 commented Oct 31, 2014

as for the thread safety, I tried to run the static analyzer, but it failed to parse the most interesting part HybridMinimizer.cc
I'll wait for jenkins/cmsbuild to try it and see if we get more details.

@ktf @Dr15Jones

@slava77
Copy link
Contributor

slava77 commented Oct 31, 2014

... I wouldn't be too surprised if the uninitialized variables are a source of the fit instabilities (the reason multiple scans are needed)

@davidlt
Copy link
Contributor

davidlt commented Oct 31, 2014

If you want to catch funny cases in heap, use a secret weapon.

export MALLOC_CHECK_=3 
export MALLOC_PERTURB_=$(($RANDOM % 255 + 1))

It forces your allocated and freed memory be with specific pattern. If changing MALLOC_PERTURB_ value causes a change in results, then your program has some random memory bugs. Of course should be used with cmsRunGlibC.

This somewhat allows you to set uninitialized variables value (if their are in heap).

@@ -0,0 +1,966 @@
// Note copied and modifed from the Minuit2Minimizer to suit our purpose
// Implementation file for class HybridMinimizer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you have removed the copyright notice.
After I learned that this is essentially a copy of the Minuit2Minimizer, the copyright note should be put back and some notes should be added about modifications made in this code.
Please also write a message on roottalk to investigate if there are subtle reasons to not support changes in the fit strategy.

@lihux25
Copy link
Contributor Author

lihux25 commented Nov 2, 2014

Your comments are incorporated and valgrind tests are done in the latest update

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 2, 2014

Pull request #6150 was updated. @cmsbuild, @nclopezo, @StoyanStoynev, @slava77 can you please check and sign again.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 3, 2014

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 3, 2014

@cmsbuild cmsbuild added the hold label Nov 3, 2014
@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 3, 2014

This pull request is fully signed and it will be integrated in one of the next CMSSW_7_3_X IBs unless changes or unless it breaks tests. This PR is put on hold by @slava77. He / she will have to remove the hold comment or @nclopezo, @ktf, @ktf, @davidlange6, @smuzaffar will have to merge it by hand.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 4, 2014

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 4, 2014

This pull request is fully signed and it will be integrated in one of the next CMSSW_7_3_X IBs unless changes (tests are also fine). This PR is put on hold by @slava77. He / she will have to remove the hold comment or @nclopezo, @ktf, @ktf, @davidlange6, @smuzaffar will have to merge it by hand.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 4, 2014

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 4, 2014

This pull request is fully signed and it will be integrated in one of the next CMSSW_7_3_X IBs unless changes (tests are also fine). This PR is put on hold by @slava77. He / she will have to remove the hold comment or @nclopezo, @ktf, @ktf, @davidlange6, @smuzaffar will have to merge it by hand.

davidlange6 added a commit that referenced this pull request Nov 4, 2014
…_method2

Replacing PR#5954: Threadsafe faster hcal25ns reco method2
@davidlange6 davidlange6 merged commit 675982e into cms-sw:CMSSW_7_3_X Nov 4, 2014
@wmtan wmtan mentioned this pull request Nov 4, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants