Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update edmStreamStallGrapher.py #15574

Merged
merged 3 commits into from Aug 24, 2016

Conversation

Dr15Jones
Copy link
Contributor

Updated edmStreamStallGrapher.py to be consistent with the recent changes to the framework. This includes

  • changes to when the signals are emitted by the Tracer service
  • handling multiple modules running concurrently for the same Stream

The module was added to allow testing of stalling when running multiple threads.
The update deals with the framework change which changed when signals occur as well as allowing multiple modules to run on each stream.
@cmsbuild
Copy link
Contributor

A new Pull Request was created by @Dr15Jones (Chris Jones) for CMSSW_8_1_X.

It involves the following packages:

FWCore/Concurrency
FWCore/Framework

@cmsbuild, @smuzaffar, @Dr15Jones, @davidlange6 can you please review it and eventually sign? Thanks.
@Martin-Grunewald, @wddgit, @wmtan this is something you requested to watch as well.
@slava77, @smuzaffar you are the release manager for this.

cms-bot commands are list here #13028

@Dr15Jones
Copy link
Contributor Author

please test

@Dr15Jones
Copy link
Contributor Author

+1

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 23, 2016

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/14691/console

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_8_1_X IBs after it passes the integration tests. This pull request requires discussion in the ORP meeting before it's merged. @slava77, @davidlange6, @smuzaffar

@cmsbuild
Copy link
Contributor

waitTime = time - streamTime[s]
if trans == kFinished:
if n != kSourceDelayedRead and n!=kSourceFindEvent and n!=kFinishInit:
del modulesOnStream[n]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I be able to run this file on a tracer output file generated in CMSSW_8_1_X_2016-08-18-1100?
I'm getting an error

edmStreamStallGrapher.py", line 385, in <module>
    stalledModules = findStalledModules(processingSteps, numStreams)
edmStreamStallGrapher.py", line 158, in findStalledModules
    del modulesOnStream[n]
KeyError: 'siPixelDigis'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should work. Can you post the log file you used?
It is possible the MessageLogger changed order of some printouts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same tracer.log file that I've sent you earlier by email, on cmsdev02

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong, this version can only properly parse files created since CMSSW_8_1_X_2016-08-22-1100.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to parse the log file after filtering out the following message

grep -v 'delayed processing event for module' just_tracer.log > noDelayed.log

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that your log file has one of the rare message inversions: there is a 'module finished' before the corresponding 'module starting' messages in the log output. I'll try putting in a protection for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#15591 handles the message inversion cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see an updated image for that log at
https://www.dropbox.com/s/dyd1qgt1zh4h51z/stall.pdf?dl=0

I do notice that there are a large number of 'stalled' modules. Some of those are stream modules. My hypothesis is there are so many messages coming so fast that it can take an appreciable amount of time for some of the messages to be printed. This excessive timing is viewed as a module stall by the script.
It may be necessary to move to a dedicated Service to gather the information needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks nice.
Please send me the text file with stall times by module as well

@cmsbuild
Copy link
Contributor

@davidlange6
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 2b6f6c4 into cms-sw:CMSSW_8_1_X Aug 24, 2016
@Dr15Jones Dr15Jones deleted the updateEdmStreamStallGrapher branch August 24, 2016 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants