New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report CPU info to HTCondor monitoring #15767
Conversation
This adds CPU usage information to the HTCondor service reporting; it reports the entire CPU (user + system) usage for this process. As the "job CPU" information reported by the framework excludes the framework startup costs, the "total CPU" is slightly more than what the Timing service prints on stdout.
This commit has the framework forward the CPU model information to the HTCondor monitoring.
With this commit, getCPU() additionally returns all usage information from child processes.
A new Pull Request was created by @bbockelm (Brian Bockelman) for CMSSW_8_1_X. It involves the following packages: FWCore/Services @cmsbuild, @smuzaffar, @Dr15Jones, @davidlange6 can you please review it and eventually sign? Thanks. cms-bot commands are list here #13028 |
please test |
The tests are being triggered in jenkins. |
private: | ||
int totalNumberCPUs_; | ||
double averageCoreSpeed_; | ||
bool reportCPUProperties_; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating the member data in the cpuInfo call is not thread safe. As far as I can tell, there is no need for totalNumberCPUs_
or averageCoreSpeed_
to be member data. It also looks like reportCPUProperties_
could be const
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return cpuInfoImpl(models, avgSpeed, nullptr); | ||
} | ||
|
||
bool CPU::cpuInfoImpl(std::string &result_models, double &result_avg_speed, Service<JobReport>* reportSvc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to do a lot more than just the work need to return result_models
and result_avg_speed
. I think the function should be better factored so that it doesn't know about the JobReport
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Now, one function will parse the cpuinfo file - each different statistic we want will utilize this information in a separate method.
Pull request #15767 was updated. @cmsbuild, @smuzaffar, @Dr15Jones, @davidlange6 can you please check and sign again. |
…_cpu_info Conflicts: FWCore/Services/plugins/CPU.cc
Pull request #15767 was updated. @cmsbuild, @smuzaffar, @Dr15Jones, @davidlange6 can you please check and sign again. |
please test |
The tests are being triggered in jenkins. |
@bbockelm how do we know if this is working? |
Hi Chris, Add the following two lines to your pset:
You should see the condor invocations echoed to stdout and the CPU information in the FJR. Brian |
+1 |
This pull request is fully signed and it will be integrated in one of the next CMSSW_8_1_X IBs (tests are also fine). This pull request requires discussion in the ORP meeting before it's merged. @slava77, @davidlange6, @smuzaffar |
+1 |
Hi @bbockelm If I add
to a job in CMSSW_8_1_0_pre13 (that has this PR) I get a failure
which looks like the TimingService is getting constructed before fillDescriptions has been called...this does not happen if the TimingService isn't included in the config. Found by a CRAB user, see this HN and the upstream thread: https://hypernews.cern.ch/HyperNews/CMS/get/computing-tools/2205/2/1.html Since this happens while constructing CondorStatusService, I strongly suspect this PR is responsible. |
Never mind, I had missed that this was fixed by #16311 |
This PR fixes #15763 and #15764 by adding CPU information to the HTCondor monitoring. In particular, it adds: