Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve command-line argument support for Python configs in cmsRun and edm tools #42650

Merged
merged 32 commits into from
Oct 11, 2023

Conversation

kpedro88
Copy link
Contributor

@kpedro88 kpedro88 commented Aug 23, 2023

PR description:

  1. Modify cmsRun to use a stricter syntax (already required for the use of VarParsing), eliminating the -p,--parameter-set flag and treating the Python config file name as a required positional parameter only. This enables treating everything after that parameter as arguments to the Python config file. (In practice, this required writing a simple custom parser for boost::program_options, which does not natively have the equivalent of argparse's REMAINDER type.)
    • Also modify cmsRun not to treat sys.exit(0) in Python as an exception, but rather just to stop execution and exit with code 0. This allows using cmsRun config.py --help in an intuitive way.
  2. Modify edmConfigDump to handle command-line arguments (either VarParsing or argparse, or anything else, in principle). Here, argparse's REMAINDER type is used.
    • edmConfigSplit and edmConfigHash modified for the same purpose. No other edm tools were identified for which such a modification makes sense.
    • edmParameterSetDump seems like a prototype of edmConfigDump that had not substantially changed in over a decade (the somewhat fragmented history: part 1, part 2, part 3, part 4), so it is removed as redundant.

Summary of interface changes:

  • cmsRun: -p,--parameter-set flags removed. config_file.py now a positional parameter that must come after any other arguments to cmsRun and before any arguments to the Python config file itself.
  • cmsRun: -c,--command flag added to allow passing a config as a string containing Python code, rather than a file (following the python -c interface). (@Dr15Jones realized while reviewing this PR that such functionality had been available from the pybind11 interface setup before it was modified here.)
  • When invoking cmsRun config_file.py ..., inside the Python config file, the contents of sys.argv are ["config_file.py",...] (consistent with the result of python3 config_file.py ...). Previously, when using cmsRun, the contents of sys.argv would be ["cmsRun","config_file.py",...].
  • PyBind11ProcessDesc: constructors modified with an additional bool isFile parameter, to allow for any config file names (rather than just ending in .py), as requested by Core.
  • readConfig() functions now use vector<string> input rather than int argc, char* argv[].

Unit tests in packages with dependencies on headers modified here are updated accordingly. Other unit tests will be updated in separate PRs.

PR validation:

Unit tests in the modified packages pass, including new tests of the relevant behavior from the modified executables.

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

Having this functionality available in all active release cycles would be nice, and I'm happy to prepare full or partial backports. However, the interface changes and the impacts on tests from various packages make this a difficult proposition to support centrally. It's possible that minimal branches can be provided for users who want this functionality in their own working areas.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-42650/36703

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @kpedro88 (Kevin Pedro) for master.

It involves the following packages:

  • FWCore/Framework (core)
  • FWCore/ParameterSet (core)
  • FWCore/ParameterSetReader (core)

@cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please review it and eventually sign? Thanks.
@makortel, @missirol, @wddgit this is something you requested to watch as well.
@perrotta, @dpiparo, @antoniovilela, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@makortel
Copy link
Contributor

@cmsbuild, please test

@makortel
Copy link
Contributor

@cmsbuild, please abort

@makortel
Copy link
Contributor

test parameters:

  • full_cmssw = true

@makortel
Copy link
Contributor

@cmsbuild, please test

@kpedro88
Copy link
Contributor Author

@makortel does full_cmssw test every package? the build should work, but some unit tests that use --parameter-set will fail. My suggestion in #34469 (comment) was to make a separate PR that fixes all the tests outside of the packages modified here.

@makortel
Copy link
Contributor

does full_cmssw test every package?

IIRC yes. We wanted to see the scope of the failures that occur presently.

Copy link
Contributor

@makortel makortel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kevin for the PR. Chris and I took a first look jointly. I think this development is in good direction, but some of the details need to be ironed out.

In addition of the more detailed comments below, we want

  • Test(s) that show that cmsRun config.py nonexistent=foo and cmsRun config.py --nonexistent give meaningful error messages (and non-zero exit code) when the config.py uses VarParsing or argparse, respectively.
  • Test(s) where the configuration file checks the sys.argv directly (without VarParsing or argparse). This kind of test could then clearly demonstrate the cmsRun optional arguments do not get passed to python.

if (!(pythonOptValues.size() == 1 and pythonOptValues[0] == kPythonOptDefault))
pythonArgs =
edm::vector_transform(pythonOptValues, [](const auto& arg) { return const_cast<char*>(arg.c_str()); });
pythonArgs.insert(pythonArgs.begin(), const_cast<char*>(fileName.c_str()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me it would be possible to avoid the const_casts by cleaning up python2 code from PyBind11ProcessDesc and updating the interface in between. But that is probably beyond the scope of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to clean up our pybind interface if Core is in favor of modernizing it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like just passing in an option to the constructor of PyBind11ProcessDesc saying what the string represents (a file name or an actual python commands) would work.

If you'd like to improve our use of pybind that would be appreciated :).

<test name="testFWCoreFrameworkVarparsing" command="cmsRun -n 1 ${LOCALTOP}/src/FWCore/Framework/test/test_varparsing.py threads=2"/>
<test name="testFWCoreFrameworkVarparsingHelp" command="cmsRun -n 1 ${LOCALTOP}/src/FWCore/Framework/test/test_varparsing.py --help"/>
<test name="testFWCoreFrameworkArgparse" command="cmsRun -n 1 ${LOCALTOP}/src/FWCore/Framework/test/test_argparse.py -n 2"/>
<test name="testFWCoreFrameworkArgparseHelp" command="cmsRun -n 1 ${LOCALTOP}/src/FWCore/Framework/test/test_argparse.py --help"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests should ensure the cmsRun behaves as expected instead of just "doesn't crash".

<test name="testFWCoreParameterSetVarparsingDump" command="edmConfigDump -o dump_varparsing.py ${LOCALTOP}/src/FWCore/ParameterSet/test/test_varparsing.py output=foo intprod=2"/>
<test name="testFWCoreParameterSetVarparsingSplit" command="edmConfigSplit ${LOCALTOP}/src/FWCore/ParameterSet/test/test_varparsing.py output=foo intprod=2"/>
<test name="testFWCoreParameterSetArgparseDump" command="edmConfigDump -o dump_argparse.py ${LOCALTOP}/src/FWCore/ParameterSet/test/test_argparse.py -o foo -i 2"/>
<test name="testFWCoreParameterSetArgparseSplit" command="edmConfigSplit ${LOCALTOP}/src/FWCore/ParameterSet/test/test_argparse.py -o foo -i 2"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests should ensure the behavior is as expected rather than just "doesn't crash".

std::vector<boost::program_options::option> final_opts_parser(std::vector<std::string>& args) {
std::vector<boost::program_options::option> result;
if (!args.empty() and args[0].size() >= kParameterSetSuffixLen and
args[0].compare(args[0].size() - kParameterSetSuffixLen, kParameterSetSuffixLen, kParameterSetSuffix) == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We require that there is no restriction at all on the configuration file name. However, all cmsRun optional parameters start with -. Therefore, if the args[0] doesn't begin with -, it has to be the file name. The only way to use a configuration file name starting with - would be to precede it with --. So, if args[0] == "--" and args.size() >= 2, then args[1] must be the file name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented this logic as requested, but when I added tests for it, they failed. The CMSSW pybind interface apparently does not handle config file names as specified here:

// if it ends with py, it's a file
if (config.substr(config.size() - 3) == ".py") {
readFile(config);
} else {
readString(config);
}

@@ -102,6 +105,30 @@ namespace {
edm::EventProcessor* ep_;
};

std::vector<boost::program_options::option> final_opts_parser(std::vector<std::string>& args) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how well this part of boost is documented, could you add some notes how the extra_style_parser gets called during parsing? It would greatly help future maintenance. I'm guessing the function gets called for the remaining args as long as it is non-empty, or the result is not-default-constructed, but I could easily be wrong or miss some detail.

process.source = cms.Source("EmptySource")

process.maxEvents.input = args.maxEvents
process.options.numberOfThreads = args.numThreads
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make the ensuring the cmsRun behavior easier if the options of the configuration file would have different behavior than what cmsRun does (like with -n). You could e.g. print out the value and check the printout, or have the configuration file to throw an exception if the value is unexpected.

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eeb285/34548/summary.html
COMMIT: 471b1a5
CMSSW: CMSSW_13_3_X_2023-08-30-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/42650/34548/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found 96 errors in the following unit tests:

---> test TestIntegrationDropOnInput had ERRORS
---> test TestIntegrationEventHistory had ERRORS
---> test TestIntegrationEventSetup had ERRORS
and more ...

Comparison Summary

Summary:

  • You potentially removed 156 lines from the logs
  • Reco comparison results: 341 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3153095
  • DQMHistoTests: Total failures: 119
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3152954
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 207 log files, 159 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor

makortel commented Oct 5, 2023

+core

@jfernan2
Copy link
Contributor

jfernan2 commented Oct 5, 2023

+1

@tvami
Copy link
Contributor

tvami commented Oct 6, 2023

+analysis

@kpedro88
Copy link
Contributor Author

kpedro88 commented Oct 9, 2023

@makortel can you also sign for heterogeneous?

@makortel
Copy link
Contributor

+heterogeneous

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (but tests are reportedly failing). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @rappoccio, @antoniovilela (and backports should be raised in the release meeting by the corresponding L2)

@makortel
Copy link
Contributor

@cms-sw/orp-l2 Just to remind that this PR needs to be merged together with #42823

@antoniovilela
Copy link
Contributor

urgent

@rappoccio
Copy link
Contributor

+1

Requires co-dependency from #42823 to resolve errors.

@rappoccio
Copy link
Contributor

merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants