Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend ParameterHandler::parse_input() for file-type-independent parsing with inbuilt correctness check #9741

Merged
merged 1 commit into from Apr 13, 2020

Conversation

nfehn
Copy link
Contributor

@nfehn nfehn commented Mar 26, 2020

It is useful to have a parse_input() function that does not depend on the type of input file (prm, xml, json). It is also useful to provide the flexibility to check that mandatory parameters, i.e., those parameters declared with flag has_to_be_set = true in declare_entry() or add_parameter(), see PR #9718, are indeed found in the input file.

This PR depends on PR #9718.

@nfehn
Copy link
Contributor Author

nfehn commented Apr 6, 2020

@bangerth, you mentioned concerns regarding the necessity to explicitly include the Assert in PR #9718. Maybe I have not given enough motivation for my point of view. Not only for why I believe it is really necessary, but also why it is a benefit for the deal.II library if the assert is used in an inbuilt way: I am referring to this discussion from PR #9718 and because this PR has been split off from PR #9718 during the discussion:

The way I approach this is like this:

Every addition of a function has a cost. It has to be maintained, its documentation has to be kept up to date, it expands the number of functions users have to look through to find what they are looking for, it tempts users to use this function even in cases where it doesn't quite do what they need and then prompts them to write work-arounds, etc. So adding a function is not free.

This has to be balanced with the question of whether the function (or a variation of it) is useful.

I claim that that the assert function is not actually useful enough to justify the cost.

I than looked at how tutorial step-60, https://www.dealii.org/developer/doxygen/deal.II/step_60.html, is dealing with this, i.e., making sure that parameters are initialized. I explicitly refer to an explicit example, because before we only talked vaguely about "user code" so far, which might also have been the cause for some of the controversial discussions.

  1. An additional parameter is defined that stores the initialization status

bool initialized = false;

  1. A ..._call_back()-type function is implemented that will be called at the end of the parse_input() function

parse_parameters_call_back.connect([&]() -> void { initialized = true; });

  1. The call-back function overwrites the state of the additional parameter initialized, but the user has to explicitly assert somewhere in the main function or in a setup phase (after constructors have been called)

AssertThrow(parameters.initialized, ExcNotInitialized());

Please also take into account the effort that has to be made for documenting all this in the tutorial step-60. I think the explanations in the tutorial and at other places in the deal.II docu show that a call_back() function is not self-explaining, i.e., it still needs to be communicated to the user where this function will be called apart from the usual function documentation (parameters, return type, etc.), because the user still wants to know where this function has its place in terms of a procedural design. Clearly, every new function added to the library has its costs, but I claim the costs we are currently willing to accept (maybe unknowingly) in the deal.II library (tutorials also have to be maintained, its functionality tested, etc.) are already higher than the costs of this PR and the related simplifications that can be made based on this PR within deal.II, not mentioning all the application codes, that are certainly a lot. The above example reveals another major benefit of this PR. Clearly, every user will be able to implement the above three lines of code and it will not cost much time (probably more for testing and documentation), but due to the rigidity of such a design a user will only introduce one global initialize variable, and the call_back() function actually does not make sure that the parameters have indeed been initialized, it only says that a parse_input() function has been called, which is error-prone in the current design of ParameterHandler, see PR #9718. Moreover, as every user is lazy and has limited time, it is very understandable that the user will not implement detailed assert-information as in PR #9718 with concrete hints where the problem is, but again rather one "global" assert like ExcNotInitialized. Finally, for the design proposed here with the assert-function called automatically, the users do not "have to look through to find what they are looking for". It explicitly avoids this, since the Assert is invoked by an additional parameter of the parse_input() function that the user will look up anyway in the docu when using parse_input(). In contrast, I had to look through several pages when going through tutorial step-60, in terms of which ingredients are needed for the assertion (steps 1-3 listed above), how is the call_back() function actually working, etc.

Put differently, it is difficult to judge whether a function in a library like deal.II is really necessary or useful, but it is more simple to judge whether it provides an improvement or an advancement.

@nfehn
Copy link
Contributor Author

nfehn commented Apr 9, 2020

see also PR #9862

@nfehn
Copy link
Contributor Author

nfehn commented Apr 9, 2020

dealii << "Request for review" << std::flush

Copy link
Member

@kronbichler kronbichler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me and I like the option to have the additional checks.

Regarding your explanation of step-60: Could you sketch (in terms of code) how the new design would look like? This would make for a good "possibilities for extensions" part of the tutorial, showing an alternative way how to do these checks, and helping the library as a whole to establish best practices with parameter files.

As parameter handling is often a somewhat secondary topic for most of us, besides the actual research questions we try to answer with deal.II, I think some trial and error and even controversial discussions on this topic are fine, as long as we are polite to each other and honor the other's time in reviewing and commenting things.

@@ -0,0 +1,7 @@
New: There is a new function ParameterHandler::parse_input() that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean

Suggested change
New: There is a new function ParameterHandler::parse_input() that
New: There is a new function ParameterHandler::parse_input_any_type() that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this. I also renamed _all_entries_ into _mandatory_entries_ since "all" can be misunderstood.

@nfehn
Copy link
Contributor Author

nfehn commented Apr 9, 2020

I hope my suggestions for improvements are not understood as criticism of step-60. Let me point out that I would have done it exactly the same way given what dealii currently offers! Only after the recent PRs it turned out that maybe some things can be done more elegantly.

I am unsure whether one should describe these changes as extensions of the tutorial since this would only affect the way how the code is written. The question is whether the class ParameterAcceptor wants to extend its functionality towards what is described here and in PR #9718. If yes, the sketch would look like this:

A correct initialization of parameters is then checked automatically, so the three steps described in the above comment could in principle be removed, as well as the related documentation. I can not give more concrete information as every user has a different understanding of what is a good default behavior, some would like to have the checks done automatically, but for example we decided to use assert_all_entries_are_set=false as default (for reasons of compatibility).

@nfehn
Copy link
Contributor Author

nfehn commented Apr 9, 2020

I have another question regarding the documentation: Should I add the same documentation text for the new parameter in all functions, or explain it for one function and refer to this one for the others?


/**
* Parse input from a specified @param parameter_file independently of the type
* of input file (prm, xml, json) being used. The user can specify whether
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe one thing you could add here is to say that this new function explicitly goes by the filename extension - in the specific functions, I could run parse_input_from_json with a file called file.prm and everything worked if the content was json - such a mistake is certainly not good style, but it happens especially for things that get passed on between different people with different habits. For those folks, the comment would make sure that the selection is by the filename extension and not by the format in the content (which could technically also be done but we do not want to implement it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this. Regarding the documentation, I added documentation to those functions that also explained the parameter skip_undefined that has been there before, while I did not add documentation to those function that also do not explain skip_undefined.

@kronbichler
Copy link
Member

I have another question regarding the documentation: Should I add the same documentation text for the new parameter in all functions, or explain it for one function and refer to this one for the others?

We are not really consistent on this topic in deal.II I believe, but I would prefer if this is explained in verbose manner in one of the functions (maybe the topmost in terms of order in the file and doxygen documentation), and let the other functions refer here in the form of For a detailed explanation of the parameters @p skip_parameters and @p assert_mandatory_entries_are_found, see the ParameterHandler::parse_input() function above.

@nfehn
Copy link
Contributor Author

nfehn commented Apr 10, 2020

Yes, I think the dcoumentation is in line with ParameterHandler, where other functions already refer to the first parse_input() function

Copy link
Member

@peterrum peterrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about assert_mandatory_entries_are_found parameter anymore. It had to be added all over the code. Why cannot the user call directly the assert function assert_that_entries_have_been_set()? In contrast, this is not possible in the case of skip_undefined since it influences the decision entry by entry and cannot done in a post-processing step.

But if others are fine with this, I won't stop this PR. If we should go with the assert_mandatory_entries_are_found parameter, I have the feeling that the number of arguments of the parse_input functions becomes too large. A solution would be to introduce a set_flags() method (as in the case of GridOut). Any opinions to that? We should make this change than in a follow-up PR before the release (since some functions did not even have skip_undefined in the last release).

* such mistakes would otherwise remain unrecognized.
*/
void
parse_input_any_type(const std::string parameter_file,
Copy link
Member

@peterrum peterrum Apr 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this name reflect the purpose of this function better?

Suggested change
parse_input_any_type(const std::string parameter_file,
parse_input_from_file(const std::string parameter_file,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_file also holds for other parse_input_...() functions, so I don't think it is a better description for this particular function. I originally used just parse_input() but due to the i) function overloading of many parse_input-functions and ii) using default values for some function parameters, the code did no longer compile because of ambiguities (what makes things more complicated, there are dervied classes MultipleParameterLoop, ParameterAcceptor). I would agree on ...from_any_file()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed there is a parse_input()-function which takes a file name (

void
ParameterHandler::parse_input(const std::string &filename,
const std::string &last_line,
const bool skip_undefined)
{
PathSearch search("PARAMETERS");
std::string openname = search.find(filename);
std::ifstream file_stream(openname.c_str());
parse_input(file_stream, filename, last_line, skip_undefined);
}
). Is there a reason why you did not extend that function, but created a new function. Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a notation consistent with the other parse_input() functions, one would probably call this function pasrse_input_from_prm()? To explain the names, I can only speculate that this has been the first function and json-, xml-versions were added later.

I intentionally created a new one because I would otherwise change the behavior of this function, do you agree? For example, I think this parse_input() function can be called without the .prm file ending. So codes calliing this function without file ending would then run into the Assert() implemented in the new function parse_input_any_type() if I had extended this existing function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! In think a minor change in behavior would be acceptable. An option would be that files ending with .json or .xml are treated as such and all other files (including the files without ending) are treated as prm files. The probability that someone is using the suffix .json/.xml for a prm-file is quite low, do yo agree?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. On the other hand, I actually would like to not have to make workarounds with implicit assumptions (like all other file endings are interpreted as prm; other file endings might be added in the future). We will right now understand why it is implemented like this, but this will soon get blurred and others will encounter a situation similar to the current one that needs to be improved. I would suggest that we open an issue with the question whether we are willing to make incompatible changes to ParameterHandler for a better design/interface. For now, to have all options for the future and If we don't like the ending ...any_type() because this has of course historical reasons (I fully agree in this point), I would suggest that we for now agree on another generic name like parse_parameters(), parse_file(), parse_input_file() (I think there are enough options) and deprecate/remove other parse functions in the long run depending on the outcome of the issue. For me it is not worth combining the new function with the existing parse_input() function just for reserving the name. Why? Then we still have to get rid of the other four parse_input() functions. This will otherwise lead to misunderstandings: For example, people will then ask themselves why is there a function parse_input_from_json() if parse_input() can handle everything. So I think we actually have to get rid of the multitude of parse_input_...() functions. If one can convince to deprecate/remove parse_input() functions with all the associated incompatibilities, from that perspective I think it does not make a big difference whether these are four or five functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the correct way forward would be to change parse_input. I had to read through all comments to understand what was going on.... parse_input_any_type does not ring any bell. I thought the function would allow to parse any parameter type, not any file type. It is clearly wrong that the parse_input function that takes a string would happily read a file.xml in the same way as file.prm.
In fact, now it would throw, and a user would have to understand that parse_input_any_type and parse_input are actually the exact same thing for prm and differ only for other file types, where one does not work, and the other has just a more complicated name.
My two cents would be to just change the behavior of the currently existing function (by extending it). It would be backward compatible, and would be much easier to understand than having parse_input_any_type that takes a string. I argue that parse_input_from_XXX still makes sense for input streams, since that’s the only possibility we have, but we should not make easy things complicated when it is unnecessary. I see no “implicit” assumptions, provided that the methods are well documented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implicit assumptions and incompatibilities arise when you take the comments of @kronbichler and @peterrum together. I take this as an authorization and will change it accordingly.

Comment on lines 772 to 773
std::string file_ending =
parameter_file.substr(parameter_file.find_last_of(".") + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not completely safe. File names might be written in all caps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make a suggestion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could convert the substring to small letters and then the comparison in the next lines could stay the same.

@nfehn
Copy link
Contributor Author

nfehn commented Apr 11, 2020

I am not sure about assert_mandatory_entries_are_found parameter anymore. It had to be added all over the code. Why cannot the user call directly the assert function assert_that_entries_have_been_set()? In contrast, this is not possible in the case of skip_undefined since it influences the decision entry by entry and cannot done in a post-processing step.

In my opinion, the problem that you are describing is actually related to the design of ParameterHandler. One should have introduced a single parse_input() function for file-type-independent parsing very early in my opinion (or at the time when a second type of input-file was introduced), so that all other parse_input() functions are internal lmplementations of the ParameterHandler class, which are private and therefore easier to change or adapt to new requirements without raising compatibility issues. If we are willing to make some incompatible changes, we should go this way.

I don't like calling an assert...() function separately. This will imply that many users will not be aware of that functionality, but it is in the interest of every user that parameters are parsed correctly, and this is why I believe it should be a single function call or looking up a single function in the documentation. From a user perspective, it does not make a difference whether the checks are done parameter-wise or all at once in a postprocessing step, because this is a detail of the internal implementation. For this reason, I think skip_undefined and assert_mandatory_entries_are_set should be part of the same interface.

@nfehn nfehn changed the title new function ParameterHandler::parse_input() for file-type-independent parsing with inbuilt correctness check Extend ParameterHandler::parse_input() for file-type-independent parsing with inbuilt correctness check Apr 12, 2020
Copy link
Member

@luca-heltai luca-heltai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add three tests, that check that the new functionality actually does what is supposed to do?

@nfehn
Copy link
Contributor Author

nfehn commented Apr 12, 2020

I added a test. Don't you think one test should be enough? I think verification of the implementation of the different parse_input_...() functions should be done by other tests.

@luca-heltai
Copy link
Member

You have added an additional argument that is not tested, and you allow reading also XML files. Three tests check that Json, XML, and the new argument are correctly handled. Prm is already tested elsewhere. :-)

@nfehn
Copy link
Contributor Author

nfehn commented Apr 12, 2020

I would like to know what we can reasonably test here that is not already "tested" by the compiler and the two Asserts in this function. Let me mention that this PR has been split off from PR #9718 because @bangerth requested that. In #9718 I have also written a test for the assert_that_entries_have_been_set() function. Is it really necessary to write a test for the if-statement in front of the function call? If I have to test xml here, I also have to test prm here, because the implementation of this particular function (which we want to test) does not make a difference between prm and xml? I will write the tests if you want. I just have concerns whether this is really necessary and want find a good compromise because we probably do not test that tightly elsewhere.

@nfehn
Copy link
Contributor Author

nfehn commented Apr 12, 2020

all tests added

@luca-heltai
Copy link
Member

/rebuild

@luca-heltai
Copy link
Member

clang-tidy complains about this:

clang-tidy-6.0 -header-filter=/jenkins/workspace/dealii-tidy_PR-9741/include/* -p=. -quiet /jenkins/workspace/dealii-tidy_PR-9741/source/base/mpi_noncontiguous_par/jenkins/workspace/dealii-tidy_PR-9741/source/base/parameter_handler.cc:520:71: error: 'find_last_of' called with a string literal consisting of a single character; consider using the more effective overload accepting a character [performance-faster-string-find,-warnings-as-errors]

Can you please address this?

Otherwise, this is good to go.

@luca-heltai luca-heltai merged commit 2f9a789 into dealii:master Apr 13, 2020
@luca-heltai
Copy link
Member

There is still something we need to address: currently this PR introduced a backward compatibility issue (I just realized this in some of my old codes, that now fail to run). The reason is that we never require the extension of the parameter files to be prm, and in some older versions of the library, we also used par as an extension. My proposal is to modify this PR and make sure that the default file type is prm, irrespective of the extension (to keep backward compatibility).
If we really want to enforce the extension to be prm, then we should do this in two steps: first we print a warning if the extension is not prm, and from version 9.3 we revert to the implementation proposed in this PR.

@nfehn
Copy link
Contributor Author

nfehn commented Apr 14, 2020

Sorry for the misunderstanding. These were exactly the implicit assumptions I was talking about

I see your point. On the other hand, I actually would like to not have to make workarounds with implicit assumptions (like all other file endings are interpreted as prm; other file endings might be added in the future). We will right now understand why it is implemented like this, but this will soon get blurred and others will encounter a situation similar to the current one that needs to be improved. I would suggest that we open an issue with the question whether we are willing to make incompatible changes to ParameterHandler for a better design/interface. For now, to have all options for the future and If we don't like the ending ...any_type() because this has of course historical reasons (I fully agree in this point), I would suggest that we for now agree on another generic name like parse_parameters(), parse_file(), parse_input_file() (I think there are enough options) and deprecate/remove other parse functions in the long run depending on the outcome of the issue. For me it is not worth combining the new function with the existing parse_input() function just for reserving the name. Why? Then we still have to get rid of the other four parse_input() functions. This will otherwise lead to misunderstandings: For example, people will then ask themselves why is there a function parse_input_from_json() if parse_input() can handle everything. So I think we actually have to get rid of the multitude of parse_input_...() functions. If one can convince to deprecate/remove parse_input() functions with all the associated incompatibilities, from that perspective I think it does not make a big difference whether these are four or five functions.

This is why I voted for having a fresh start with a new function name. Now (PR #9885) we have a kind of workaround with potential misunderstandings and again not a clean code, see concerns by @kronbichler

Maybe one thing you could add here is to say that this new function explicitly goes by the filename extension - in the specific functions, I could run parse_input_from_json with a file called file.prm and everything worked if the content was json - such a mistake is certainly not good style, but it happens especially for things that get passed on between different people with different habits. For those folks, the comment would make sure that the selection is by the filename extension and not by the format in the content (which could technically also be done but we do not want to implement it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants