Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SiPixelMonitorTrack invalid TSOS crash fix #14322

Merged

Conversation

schneiml
Copy link
Contributor

@schneiml schneiml commented May 2, 2016

This should fix the issue of PromptReco crashes in DQM code [0].

Not really tested yet, just the change suggested in the thread.

[0] https://hypernews.cern.ch/HyperNews/CMS/get/recoDevelopment/1440.html

@cmsbuild
Copy link
Contributor

cmsbuild commented May 2, 2016

A new Pull Request was created by @schneiml (Marcel Schneider) for CMSSW_8_0_X.

It involves the following packages:

DQM/SiPixelMonitorTrack

@cmsbuild, @dmitrijus, @vanbesien, @deguio, @davidlange6 can you please review it and eventually sign? Thanks.
@threus this is something you requested to watch as well.
@slava77, @Degano, @smuzaffar you are the release manager for this.

cms-bot commands are list here #13028

@slava77
Copy link
Contributor

slava77 commented May 3, 2016

@cmsbuild please test

please make a PR for 81X as well. You should be able to use the same schneiml:sipixelmonitortrack-fix-tsos-crash topic branch

@cmsbuild
Copy link
Contributor

cmsbuild commented May 3, 2016

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/12774/console

@dmitrijus
Copy link
Contributor

This change isn't final - that crash still happens, elsewhere, but for the same reasons.

@schneiml
Copy link
Contributor Author

schneiml commented May 3, 2016

This was the other one -- I don't see any other TrajectoryStateCombiners in DQM now.

@cmsbuild
Copy link
Contributor

cmsbuild commented May 3, 2016

Pull request #14322 was updated. @cmsbuild, @dmitrijus, @vanbesien, @deguio, @davidlange6 can you please check and sign again.

@dmitrijus
Copy link
Contributor

I am running the failing job with the last commit on top - I will write back once it finishes (6 hours :( )

@schneiml
Copy link
Contributor Author

schneiml commented May 3, 2016

Have fun with my other PR's in the mean time: #14337 (8_1_X version) and #14230 (as usual).

@cmsbuild
Copy link
Contributor

cmsbuild commented May 3, 2016

@cmsbuild
Copy link
Contributor

cmsbuild commented May 3, 2016

@dmitrijus
Copy link
Contributor

Still segfaults, this time somewhere in pluginDQMTrackerMonitorTrack.so
I will try to skip the boring events - so it does not take 8 hours to verify.....

@schneiml
Copy link
Contributor Author

schneiml commented May 3, 2016

Ok, I suspected this one might be affected as well (it also does
residuals), but this time the TSOS (if it is still the problem) lives
somewhere in the Alignment/Validation code.

@cmsbuild
Copy link
Contributor

cmsbuild commented May 4, 2016

Pull request #14322 was updated. @ghellwig, @cerminar, @dmitrijus, @cmsbuild, @franzoni, @deguio, @mmusich, @vanbesien, @davidlange6 can you please check and sign again.

@schneiml
Copy link
Contributor Author

schneiml commented May 4, 2016

Fixed one more. There is about 10 more TrajectoryStateCombiner in the codebase and none of them checks isValid() as of now. I hope we don't have to touch all of them...

@mmusich
Copy link
Contributor

mmusich commented May 4, 2016

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented May 4, 2016

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/12809/console

@schneiml
Copy link
Contributor Author

schneiml commented May 4, 2016

@mmusich Testing is probably still to early, but we can hope that this actually works now ( @dmitrijus will test it in a configuration were it actually crashes, some Tier0 run)

@mmusich
Copy link
Contributor

mmusich commented May 4, 2016

@schneiml is this too early because you foresee more fixes in this branch?

@schneiml
Copy link
Contributor Author

schneiml commented May 4, 2016

This is a fix for an issue (in the description) in Tier0. It is pretty hard to reproduce and we are hitting more crashes after the first one, so there could be more fixes coming (or if things go well it works now).

@mmusich
Copy link
Contributor

mmusich commented May 4, 2016

@schneiml I followed the thread on the Tier-0 HN. On the alca side I am not aware of other code in Alignment/OfflineValidation other the one you fixed that is run at Tier-0. Still it would be good to protect for this use case. @ghellwig you might be interested.

@schneiml
Copy link
Contributor Author

schneiml commented May 4, 2016

@mmusich Mhh... Tier0 HN? I have the threads on Reco Dev and DQM Dev, are there even more? Might be useful to connect them.

Also @rovere mentioned #14060 which is related.

For reference, this gives an inclomplete list of broken usages:

http://cmslxr.fnal.gov/dxr/CMSSW_8_0_5/search?q=regexp%3A%27%28+combiner%7Ctsoscomb%29%28%5C%28%7C%5C.combine%29%27+regexp%3A%27isValid%27&case=true

(check that isValid() does not check the returned object -- some do, most don't)

There are even more usages with different names or no isValid at all, regex can't really catch that...

From the (Pixel-)DQM side we should have all track related things now, but there might be other stuff running in Tier0.

@cmsbuild
Copy link
Contributor

cmsbuild commented May 4, 2016

@cmsbuild
Copy link
Contributor

cmsbuild commented May 4, 2016

@dmitrijus
Copy link
Contributor

dmitrijus commented May 4, 2016

No more crashes with the latest commit!
I will +1 after few more tests (after lunch).

Style issue: we don't enforce specific code style, and I will accept this PR either way,
but mixing tabs/spaces for indentation in one file is..... ugly:
https://github.com/schneiml/cmssw/blob/sipixelmonitortrack-fix-tsos-crash/DQM/SiPixelMonitorTrack/src/SiPixelTrackResidualSource.cc#L954 (this line uses a single tab, next line - spaces)

I've seen it in a few other files.

The easiest way is to fix/use: clang-format --style=Google -i *.cc
But, unfortunately, that will mess up this pull request... (...and should happen in a separate pull request, or for the new files and packages)

@schneiml
Copy link
Contributor Author

schneiml commented May 4, 2016

@dmitrijus urgh... I am aware of this issue, but I am fairly new with vim and did not get it to do sth. sane by default yet, so I fix things manually when I notice.

For the old DQM stuff, I lost most hope, but for the Phase1 stuff I should look into that. Probably for the next PR, I don't want to delay things and switch IBs again...

@deguio
Copy link
Contributor

deguio commented May 4, 2016

@schneiml
please don't forget to submit to 81x as well.
thanks,
F.

@dmitrijus
Copy link
Contributor

@schneiml Don't worry about it :)

@mmusich
Copy link
Contributor

mmusich commented May 4, 2016

+1
will re-sign if more fixes are needed depending on @dmitrijus tests outcome

@dmitrijus
Copy link
Contributor

+1
long running tests are okay!

@cmsbuild
Copy link
Contributor

cmsbuild commented May 4, 2016

This pull request is fully signed and it will be integrated in one of the next CMSSW_8_0_X IBs (tests are also fine). This pull request requires discussion in the ORP meeting before it's merged. @slava77, @davidlange6, @Degano, @smuzaffar

@davidlange6
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit df6505e into cms-sw:CMSSW_8_0_X May 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants