Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phase1 Pixel DQM Bugfixes #14760

Merged
merged 8 commits into from Jun 11, 2016
Merged

Conversation

schneiml
Copy link
Contributor

@schneiml schneiml commented Jun 3, 2016

This PR fixes some smaller bugs that were found in the first version merged in #14586 (also @fioriNTU ).

  • (UPDATE: the explanation here is wrong, it seems this was all caused by the invalid refs fixed by this. I still don't understand why this happens, but it is undefined behaviour and therefore totally fine.) The most critical one was caused by the last-minute changes to TrackerTopology ( @davidlange6 ). The new code stored a edm::ESHandle to TT, and that seemed to silently become invalid on rare occasions. The fix is to use a bare pointer, which is a bad solution but works. While I can understand that the Handle or also the pointer gets invalid, what I do NOT understand is why this happens during the DQM booking phase (maybe also elsewhere, but definitely there) and only in one out of 6 plugins, that all run the same code. Maybe there is more to it. (The plugin in question is SiPixelPhase1RecHits, the symptom is that many histograms are not booked, and changing to pointers fixes it).
  • the other commits fix a crash on configurations with all MEs turned off and some cosmetic changes
  • finally there is a a fix for the issue reported by @slava77 today in Phase 1 Pixel DQM (number 4, the story continues) #14586 . With this fix the given command does still crash, but it is no longer the Phase1 Pixel DQMs fault (I hope).
  • I have no idea how @mtosi 's commit ends up in this PR. It is totally unrelated. (Update: fixed by rebase & force push)

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 3, 2016

A new Pull Request was created by @schneiml (Marcel Schneider) for CMSSW_8_1_X.

It involves the following packages:

DQM/SiPixelPhase1Common
DQM/SiPixelPhase1Digis
DQM/SiPixelPhase1RecHits
DQM/TrackingMonitorClient

@cmsbuild, @dmitrijus, @vanbesien, @davidlange6 can you please review it and eventually sign? Thanks.
@makortel, @threus this is something you requested to watch as well.
@slava77, @Degano, @smuzaffar you are the release manager for this.

cms-bot commands are list here #13028

@slava77
Copy link
Contributor

slava77 commented Jun 3, 2016

finally there is a a fix for the issue reported by @slava77 today in #14586 . 
With this fix the given command does still crash, 
but it is no longer the Phase1 Pixel DQMs fault (I hope).

what is the reason for a crash now?
The data processing config worked before the #14586 was introduced.

@schneiml
Copy link
Contributor Author

schneiml commented Jun 3, 2016

Traceback (most recent call last):
  File "RunPromptRecoCfg.py", line 111226, in <module>
    process.patMetUncertaintySequenceNoHF = cms.Sequence(process.ak4PFCHSL1FastL2L3CorrectorChain+process.ak4PFCHSL1FastL2L3ResidualCorrectorChain++process.patPFMetT1NoHFpatMetUncertaintySequenceNoHF+process.patPFMetT1SmearNoHFpatMetUncertaintySequenceNoHF)
TypeError: bad operand type for unary +: 'Sequence'

I have honestly no idea what this sequence does, but to me this is a sign that it is not DQM related.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 3, 2016

Pull request #14760 was updated. @cmsbuild, @dmitrijus, @vanbesien, @davidlange6 can you please check and sign again.

@slava77
Copy link
Contributor

slava77 commented Jun 3, 2016

On 6/3/16 6:51 AM, Marcel Schneider wrote:

|Traceback (most recent call last): File "RunPromptRecoCfg.py", line
111226, in process.patMetUncertaintySequenceNoHF =
cms.Sequence(process.ak4PFCHSL1FastL2L3CorrectorChain+process.ak4PFCHSL1FastL2L3ResidualCorrectorChain++process.patPFMetT1NoHFpatMetUncertaintySequenceNoHF+process.patPFMetT1SmearNoHFpatMetUncertaintySequenceNoHF)
TypeError: bad operand type for unary +: 'Sequence' |

I have honestly no idea what this sequence does, but to me this is a
sign that it is not DQM related.

Probably not, but I guess the only way is to check in a test build with
the phase1 DQM changes reverted


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#14760 (comment), or
mute the thread
https://github.com/notifications/unsubscribe/AEdcbt8LXiCogN4Bykb8VKx8qmFJEKjWks5qIDFvgaJpZM4ItYgj.

@schneiml
Copy link
Contributor Author

schneiml commented Jun 3, 2016

Are there instructions on how to do a clean revert (with cmssw packages etc.)?

@schneiml
Copy link
Contributor Author

schneiml commented Jun 7, 2016

@slava77 I am fairly certain the new crash was introduced by this #14608 .

  • In -pre5, things work, but the sequences in question are empty.
  • The PR added something to the sequences.
  • In -pre6, some of the cms.Sequences have a _seq of None, which is translated to the empty string.
  • This leads to syntax errors when a sequence of such sequences is dumpPythoned.

There are two bugs here: one is that the patMetUncertaintySequence contains invalid sequences with a _seq of None. This indicates a functional problem that has to be fixed. the other one is that dumpPython creates incorrect output in this cornercase. This is a purely technical cornercase that is usually not needed and does not really need a fix, though it is still a bug. I'd propose representing the invalid sequence as "cms.Sequence()" or sth instead of the empty string, but I am not certain what makes most sense there due to the mixin architecture. Maybe an early crash would be even better. (The line in question is https://github.com/cms-sw/cmssw/blob/CMSSW_8_1_X/FWCore/ParameterSet/python/SequenceTypes.py#L238 )

Relevant lines in output:

[mschneid][8][9.22][lxplus011][.../tst/CMSSW_8_1_0_pre6]$ ag 'process.patMetUncertaintySequence' RunPromptRecoCfg.py 
111088:process.patMetUncertaintySequenceNoHF = cms.Sequence(process.ak4PFCHSL1FastL2L3CorrectorChain+process.ak4PFCHSL1FastL2L3ResidualCorrectorChain++process.patPFMetT1NoHFpatMetUncertaintySequenceNoHF+process.patPFMetT1SmearNoHFpatMetUncertaintySequenceNoHF)
111124:process.patMetUncertaintySequencePuppi = cms.Sequence(process.ak4PFCHSL1FastL2L3CorrectorChain+process.ak4PFCHSL1FastL2L3ResidualCorrectorChain++process.patPFMetT1PuppipatMetUncertaintySequencePuppi+process.patPFMetT1SmearPuppipatMetUncertaintySequencePuppi)
111130:process.patMetUncertaintySequence = cms.Sequence(process.ak4PFCHSL1FastL2L3CorrectorChain+process.ak4PFCHSL1FastL2L3ResidualCorrectorChain++process.patPFMetT1patMetUncertaintySequence+process.patPFMetT1SmearpatMetUncertaintySequence)
111190:process.fullPatMetSequence = cms.Sequence(process.patMetModuleSequence+process.patMetCorrectionSequence+process.patMetUncertaintySequence+process.patShiftedModuleSequence+process.slimmedMETs)
111226:process.fullPatMetSequencePuppi = cms.Sequence(process.patMetModuleSequencePuppi+process.patMetCorrectionSequencePuppi+process.patMetUncertaintySequencePuppi+process.patShiftedModuleSequencePuppi+process.patCaloMet)
111256:process.fullPatMetSequenceNoHF = cms.Sequence(process.patMetModuleSequenceNoHF+process.patMetCorrectionSequenceNoHF+process.patMetUncertaintySequenceNoHF+process.patShiftedModuleSequenceNoHF+process.patCaloMet)
[mschneid][8][8.44][lxplus011][.../tst/CMSSW_8_1_0_pre6]$ cd ../CMSSW_8_1_0_pre5/
[mschneid][8][7.37][lxplus011][.../tst/CMSSW_8_1_0_pre5]$ ag 'process.patMetUncertaintySequence' RunPromptRecoCfg.py 
104741:process.patMetUncertaintySequenceNoHF = cms.Sequence()
104897:process.patMetUncertaintySequencePuppi = cms.Sequence()
105116:process.patMetUncertaintySequence = cms.Sequence()
105587:process.fullPatMetSequence = cms.Sequence(process.patMetModuleSequence+process.patMetUncertaintySequence+process.patShiftedModuleSequence+process.patMetCorrectionSequence)
105623:process.fullPatMetSequencePuppi = cms.Sequence(process.patMetModuleSequencePuppi+process.patMetUncertaintySequencePuppi+process.patShiftedModuleSequencePuppi+process.patMetCorrectionSequencePuppi)
105653:process.fullPatMetSequenceNoHF = cms.Sequence(process.patMetModuleSequenceNoHF+process.patMetUncertaintySequenceNoHF+process.patShiftedModuleSequenceNoHF+process.patMetCorrectionSequenceNoHF)
[mschneid][8][9.27][lxplus011][.../tst/CMSSW_8_1_0_pre5]$ 

notice the cms.Sequence() in -pre5 and the ++ syntax error in -pre6.

@makortel
Copy link
Contributor

makortel commented Jun 7, 2016

the other one is that dumpPython creates incorrect output in this cornercase. This is a purely technical cornercase that is usually not needed and does not really need a fix, though it is still a bug. I'd propose representing the invalid sequence as "cms.Sequence()" or sth instead of the empty string, but I am not certain what makes most sense there due to the mixin architecture. Maybe an early crash would be even better

Having debugged these kind of cases (too) many times, I'd much prefer to get an early crash with a clear error message with as much useful information as possible to help fixing the cause. I think mapping this case to empty sequence would add confusion and make debugging more difficult.

@slava77
Copy link
Contributor

slava77 commented Jun 7, 2016

On 6/7/16 2:16 AM, Marcel Schneider wrote:

@slava77 https://github.com/slava77 I am fairly certain the new crash
was introduced by this #14608 #14608 .

  • In |-pre5|, things work, but the sequences in question are /empty/.
  • The PR added something to the sequences.
  • In |-pre6|, some of the |cms.Sequence|s have a |_seq| of |None|,
    which is translated to the empty string.
  • This leads to syntax errors when a sequence of such sequences is
    |dumpPython|ed.

Thanks for checking.
This should help in further investigation.

There are two bugs here: one is that the |patMetUncertaintySequence|
contains invalid sequences with a |_seq| of |None|. This indicates a
functional problem that has to be fixed. the other one is that
|dumpPython| creates incorrect output in this cornercase. This is a
purely technical cornercase that is usually not needed and does not
really need a fix, though it is still a bug. I'd propose representing
the invalid sequence as |"cms.Sequence()"| or sth instead of the empty
string, but I am not certain what makes most sense there due to the
mixin architecture. Maybe an early crash would be even better. (The line
in question is
https://github.com/cms-sw/cmssw/blob/CMSSW_8_1_X/FWCore/ParameterSet/python/SequenceTypes.py#L238
)

Relevant lines in output:

|[mschneid][8][9.22][lxplus011][.../tst/CMSSW_8_1_0_pre6]$ ag
'process.patMetUncertaintySequence' RunPromptRecoCfg.py
111088:process.patMetUncertaintySequenceNoHF =
cms.Sequence(process.ak4PFCHSL1FastL2L3CorrectorChain+process.ak4PFCHSL1FastL2L3ResidualCorrectorChain++process.patPFMetT1NoHFpatMetUncertaintySequenceNoHF+process.patPFMetT1SmearNoHFpatMetUncertaintySequenceNoHF)
111124:process.patMetUncertaintySequencePuppi =
cms.Sequence(process.ak4PFCHSL1FastL2L3CorrectorChain+process.ak4PFCHSL1FastL2L3ResidualCorrectorChain++process.patPFMetT1PuppipatMetUncertaintySequencePuppi+process.patPFMetT1SmearPuppipatMetUncertaintySequencePuppi)
111130:process.patMetUncertaintySequence =
cms.Sequence(process.ak4PFCHSL1FastL2L3CorrectorChain+process.ak4PFCHSL1FastL2L3ResidualCorrectorChain++process.patPFMetT1patMetUncertaintySequence+process.patPFMetT1SmearpatMetUncertaintySequence)
111190:process.fullPatMetSequence =
cms.Sequence(process.patMetModuleSequence+process.patMetCorrectionSequence+process.patMetUncertaintySequence+process.patShiftedModuleSequence+process.slimmedMETs)
111226:process.fullPatMetSequencePuppi =
cms.Sequence(process.patMetModuleSequencePuppi+process.patMetCorrectionSequencePuppi+process.patMetUncertaintySequencePuppi+process.patShiftedModuleSequencePuppi+process.patCaloMet)
111256:process.fullPatMetSequenceNoHF =
cms.Sequence(process.patMetModuleSequenceNoHF+process.patMetCorrectionSequenceNoHF+process.patMetUncertaintySequenceNoHF+process.patShiftedModuleSequenceNoHF+process.patCaloMet)
[mschneid][8][8.44][lxplus011][.../tst/CMSSW_8_1_0_pre6]$ cd
../CMSSW_8_1_0_pre5/
[mschneid][8][7.37][lxplus011][.../tst/CMSSW_8_1_0_pre5]$ ag
'process.patMetUncertaintySequence' RunPromptRecoCfg.py
104741:process.patMetUncertaintySequenceNoHF = cms.Sequence()
104897:process.patMetUncertaintySequencePuppi = cms.Sequence()
105116:process.patMetUncertaintySequence = cms.Sequence()
105587:process.fullPatMetSequence =
cms.Sequence(process.patMetModuleSequence+process.patMetUncertaintySequence+process.patShiftedModuleSequence+process.patMetCorrectionSequence)
105623:process.fullPatMetSequencePuppi =
cms.Sequence(process.patMetModuleSequencePuppi+process.patMetUncertaintySequencePuppi+process.patShiftedModuleSequencePuppi+process.patMetCorrectionSequencePuppi)
105653:process.fullPatMetSequenceNoHF =
cms.Sequence(process.patMetModuleSequenceNoHF+process.patMetUncertaintySequenceNoHF+process.patShiftedModuleSequenceNoHF+process.patMetCorrectionSequenceNoHF)
[mschneid][8][9.27][lxplus011][.../tst/CMSSW_8_1_0_pre5]$ |

notice the |cms.Sequence()| in |-pre5| and the |++| syntax error in |-pre6|.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#14760 (comment), or
mute the thread
https://github.com/notifications/unsubscribe/AEdcbrJTRcIKdixG8rn6BmLJPumMrfofks5qJTb5gaJpZM4ItYgj.

@dmitrijus
Copy link
Contributor

@schneiml can you take a look at #14800 ?

@schneiml
Copy link
Contributor Author

schneiml commented Jun 7, 2016

@dmitrijus My guess is that this PR also fixes this issue, at least for
the moment, but I cannot guarantee it.

On 07.06.2016 15:46, Dmitrijus wrote:

@schneiml can you take a look at #14800 ?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#14760 (comment)

The refs got invalid when the vector grows, which caused cms-sw#14800.
@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 8, 2016

Pull request #14760 was updated. @cmsbuild, @dmitrijus, @vanbesien, @davidlange6 can you please check and sign again.

@dmitrijus
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 9, 2016

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/13437/console

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 9, 2016

This pull request is fully signed and it will be integrated in one of the next CMSSW_8_1_X IBs after it passes the integration tests. This pull request requires discussion in the ORP meeting before it's merged. @slava77, @davidlange6, @Degano, @smuzaffar

@schneiml
Copy link
Contributor Author

schneiml commented Jun 9, 2016

@dmitrijus I was about to ping you for this one. As far as I see this should fix all known issues with the Phase1 DQM as of now.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 9, 2016

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 9, 2016

@davidlange6
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit b132d1c into cms-sw:CMSSW_8_1_X Jun 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants