[Discussion] Input parsing without Getkw #88

robertodr · 2017-08-17T11:22:35Z

This thread is a Request For Comments on how to rewrite input parsing.
@ilfreddy @arnfinn @bast Please comment on any of the following sections. @loriab I would also appreciate your input on this matter, if you have the time.

Goals to achieve

Make it easier for hosts to integrate with the library.
Remove the dependency on Getkw.
Bottom-up: reduce the number of steps a developer has to go through to introduce a new parameter into any of the core classes.
Top-down: reduce the distance an input parameter has to travel before it can get used in any of the core classes.
Automatize documentation of the input format.

Current situation

Input can go through two, mutually exclusive, routes:

An extra input file is parsed by a Python script and then read C++ side. This is the Getkw way of doing things. Python-side, options left blank are filled with a default. C++-side there's the Input class that contains all information read in. In the class, structs with input for specific classes (i.e. for the cavity, the Green's functions, the solvers etc.) are generated and passed onto the factory functions for object creation (look at the flow of the code in Meddle.cpp)
The host passes a struct with input parameters set, usually from its own input parser. The input so passed is only a subset of the possible input to PCMSolver. This makes the second route not usable for most (if not all of) the recent functionality introduced.

Drawbacks

There are basically two ways of providing input that need to be kept in sync and this means checking that one route doesn't override sensible defaults that would be set by the other. Needless to say, this is not very easy to do.
None of the current host programs is particularly thrilled about introducing an extra layer of input parsing. The best we can currently do is piggybacking[^1] on the host input file, as done in Psi4.
With Psi4 this is relatively easy: it's just integrating one small Python module into other Python modules. Plus, the look-and-feel of Psi4 and PCMSolver input are similar enough not to disconcert the user. Piggybacking on e.g. DALTON's input format would be harder to do.
Documentation of the input is done completely by hand. You can imagine how easy it is to miss stuff.
We depend on Getkw, which is basically abandonware. Getkw is not a lightweight dependency. Getkw depends on:
1. boost::any. Soon to be banned (together with all of Boost)
2. C++ exceptions. Banned.
3. Run-time type identification. Banned.
4. Python availability at runtime. This is mostly fine, but surprising for some hosts.
There a number of issues related, closely and less closely, to input reading. These all take a very long time to address, because it's boring stuff, but also because the current structure has become too complex:

Challenges

The input management for some host programs would have to be rewritten from scratch. I think the benefits would outweigh the annoyance though.
Getting interoperability of the C struct and the Fortran 90 user defined type right. The struct will need to have strings and array fields.

Notes

[^1] The input to PCMSolver is embedded in the host input and unpacked into a different input file in the scratch directory at run time. The PCMSolver Python script then does its own parsing.
[^2] The Meddle class is the one responsible for routing the computation to the correct internal objects. The C++ API, if you wish.

The text was updated successfully, but these errors were encountered:

ilfreddy · 2017-08-18T11:21:34Z

The input file is "touched" when the program is running. @robertodr said is because everything is made uppercase. Still that should happen in the cache and not directly to the file on disk. This is (mildly) annoying when one is performing many tests with small modifications to the same input, because the editor complains the file has been modified.
Feature request: make sure the original input file is not touched (an intermediate uppercase version could be created if need be).

robertodr · 2017-08-18T13:25:12Z

@ilfreddy I've made a separate issue #91 out of your request. Let's try to limit this thread for comments on input parsing 😉

loriab · 2017-08-18T16:56:15Z

Would you want to consider PCMSolver only taking JSON/py-dict input, with QC progs responsible for generating it from their native options parsing (perhaps this is "piggypacking"). Then perhaps PCMSolver additionally does a light str --> json or argparse --> json utility so it can be run independently.

Related discussion at https://github.com/MolSSI/QC_JSON_Schema , though admittedly Mol-centered at present

robertodr · 2017-08-19T10:49:58Z

Thanks @loriab. Admittedly, I'd prefer YAML (less curly braces around 😺), but anything that is a standard format and would relieve me from doing most of the work in defining it, is very welcome.

These are the flows of the input I am envisioning:

QC program needs to add a new input section for PCMSolver, with input parameters whose names and types are defined by the given version of the API.
QC program reads input and dumps the PCMSolver section into a prescribed format, let's say JSON. Notice that the QC program does not validate the input, but simply translates it to a standard format.
This can happen in one of two ways:
1. Disk-based PCMSolver adds nhomann/json as dependency and exposes a function pcmsolver_write_input_file() that accepts the (presumably long) list of input parameters (all plain-old data structures - PODs) read by the QC program. This function would validate the input and write it to file in the correct format. A call to pcmsolver_new() would then bear no arguments and look for the already validated input file in the current directory.
2. Software-based PCMSolver adds nhomann/json as dependency. I wrap it with a C API, in turn wrapped by a Fortran API. QC program calls pcmsolver_generate_input_json() that accepts the (presumably long) list of input parameters read by the QC program. This function would validate the input and return the JSON data structure for the input. Eventually, QC program calls pcmsolver_new(json_input) and gets a handle to a PCM computation.

QC program pseudocode for route i.

bool success = pcmsolver_write_input_file(/* PCM input parameters as PODs*/);
if (success) {
   pcmsolver_context * ctx = pcmsolver_new();
} else {
  // ABORT 
}

QC program pseudocode for route ii.

JSON * json_input = pcmsolver_generate_input_json(/* PCM input parameters as PODs */);
pcmsolver_context * ctx = pcmsolver_new(json_input);

Route ii. looks more elegant, but it will require more work in order to maintain the C and Fortran wrappers to the C++ JSON library.

robertodr added enhancement question labels Aug 17, 2017

robertodr mentioned this issue Nov 25, 2017

Overhaul Green's function factory functionality #107

Merged

7 tasks

robertodr mentioned this issue Apr 10, 2018

[WIP] Refactor input parsing #146

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion] Input parsing without Getkw #88

[Discussion] Input parsing without Getkw #88

robertodr commented Aug 17, 2017

ilfreddy commented Aug 18, 2017

robertodr commented Aug 18, 2017

loriab commented Aug 18, 2017

robertodr commented Aug 19, 2017

[Discussion] Input parsing without Getkw #88

[Discussion] Input parsing without Getkw #88

Comments

robertodr commented Aug 17, 2017

Goals to achieve

Current situation

Drawbacks

Suggested solution

Challenges

Notes

ilfreddy commented Aug 18, 2017

robertodr commented Aug 18, 2017

loriab commented Aug 18, 2017

robertodr commented Aug 19, 2017