New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend ParameterHandler::parse_input() for file-type-independent parsing with inbuilt correctness check #9741
Conversation
@bangerth, you mentioned concerns regarding the necessity to explicitly include the Assert in PR #9718. Maybe I have not given enough motivation for my point of view. Not only for why I believe it is really necessary, but also why it is a benefit for the deal.II library if the assert is used in an inbuilt way: I am referring to this discussion from PR #9718 and because this PR has been split off from PR #9718 during the discussion:
I than looked at how tutorial step-60, https://www.dealii.org/developer/doxygen/deal.II/step_60.html, is dealing with this, i.e., making sure that parameters are initialized. I explicitly refer to an explicit example, because before we only talked vaguely about "user code" so far, which might also have been the cause for some of the controversial discussions.
Please also take into account the effort that has to be made for documenting all this in the tutorial step-60. I think the explanations in the tutorial and at other places in the deal.II docu show that a Put differently, it is difficult to judge whether a function in a library like deal.II is really necessary or useful, but it is more simple to judge whether it provides an improvement or an advancement. |
see also PR #9862 |
dealii << "Request for review" << std::flush |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me and I like the option to have the additional checks.
Regarding your explanation of step-60: Could you sketch (in terms of code) how the new design would look like? This would make for a good "possibilities for extensions" part of the tutorial, showing an alternative way how to do these checks, and helping the library as a whole to establish best practices with parameter files.
As parameter handling is often a somewhat secondary topic for most of us, besides the actual research questions we try to answer with deal.II, I think some trial and error and even controversial discussions on this topic are fine, as long as we are polite to each other and honor the other's time in reviewing and commenting things.
@@ -0,0 +1,7 @@ | |||
New: There is a new function ParameterHandler::parse_input() that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean
New: There is a new function ParameterHandler::parse_input() that | |
New: There is a new function ParameterHandler::parse_input_any_type() that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this. I also renamed _all_entries_
into _mandatory_entries_
since "all" can be misunderstood.
I hope my suggestions for improvements are not understood as criticism of step-60. Let me point out that I would have done it exactly the same way given what dealii currently offers! Only after the recent PRs it turned out that maybe some things can be done more elegantly. I am unsure whether one should describe these changes as extensions of the tutorial since this would only affect the way how the code is written. The question is whether the class
A correct initialization of parameters is then checked automatically, so the three steps described in the above comment could in principle be removed, as well as the related documentation. I can not give more concrete information as every user has a different understanding of what is a good default behavior, some would like to have the checks done automatically, but for example we decided to use |
I have another question regarding the documentation: Should I add the same documentation text for the new parameter in all functions, or explain it for one function and refer to this one for the others? |
|
||
/** | ||
* Parse input from a specified @param parameter_file independently of the type | ||
* of input file (prm, xml, json) being used. The user can specify whether |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe one thing you could add here is to say that this new function explicitly goes by the filename extension - in the specific functions, I could run parse_input_from_json
with a file called file.prm
and everything worked if the content was json - such a mistake is certainly not good style, but it happens especially for things that get passed on between different people with different habits. For those folks, the comment would make sure that the selection is by the filename extension and not by the format in the content (which could technically also be done but we do not want to implement it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this. Regarding the documentation, I added documentation to those functions that also explained the parameter skip_undefined
that has been there before, while I did not add documentation to those function that also do not explain skip_undefined
.
We are not really consistent on this topic in deal.II I believe, but I would prefer if this is explained in verbose manner in one of the functions (maybe the topmost in terms of order in the file and doxygen documentation), and let the other functions refer here in the form of |
Yes, I think the dcoumentation is in line with ParameterHandler, where other functions already refer to the first parse_input() function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about assert_mandatory_entries_are_found
parameter anymore. It had to be added all over the code. Why cannot the user call directly the assert function assert_that_entries_have_been_set()
? In contrast, this is not possible in the case of skip_undefined
since it influences the decision entry by entry and cannot done in a post-processing step.
But if others are fine with this, I won't stop this PR. If we should go with the assert_mandatory_entries_are_found
parameter, I have the feeling that the number of arguments of the parse_input
functions becomes too large. A solution would be to introduce a set_flags()
method (as in the case of GridOut
). Any opinions to that? We should make this change than in a follow-up PR before the release (since some functions did not even have skip_undefined
in the last release).
* such mistakes would otherwise remain unrecognized. | ||
*/ | ||
void | ||
parse_input_any_type(const std::string parameter_file, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this name reflect the purpose of this function better?
parse_input_any_type(const std::string parameter_file, | |
parse_input_from_file(const std::string parameter_file, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from_file
also holds for other parse_input_...()
functions, so I don't think it is a better description for this particular function. I originally used just parse_input()
but due to the i) function overloading of many parse_input-functions and ii) using default values for some function parameters, the code did no longer compile because of ambiguities (what makes things more complicated, there are dervied classes MultipleParameterLoop, ParameterAcceptor). I would agree on ...from_any_file()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed there is a parse_input()
-function which takes a file name (
dealii/source/base/parameter_handler.cc
Lines 509 to 519 in 4d7085e
void | |
ParameterHandler::parse_input(const std::string &filename, | |
const std::string &last_line, | |
const bool skip_undefined) | |
{ | |
PathSearch search("PARAMETERS"); | |
std::string openname = search.find(filename); | |
std::ifstream file_stream(openname.c_str()); | |
parse_input(file_stream, filename, last_line, skip_undefined); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a notation consistent with the other parse_input() functions, one would probably call this function pasrse_input_from_prm()
? To explain the names, I can only speculate that this has been the first function and json-, xml-versions were added later.
I intentionally created a new one because I would otherwise change the behavior of this function, do you agree? For example, I think this parse_input() function can be called without the .prm file ending. So codes calliing this function without file ending would then run into the Assert()
implemented in the new function parse_input_any_type()
if I had extended this existing function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation! In think a minor change in behavior would be acceptable. An option would be that files ending with .json
or .xml
are treated as such and all other files (including the files without ending) are treated as prm
files. The probability that someone is using the suffix .json
/.xml
for a prm
-file is quite low, do yo agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. On the other hand, I actually would like to not have to make workarounds with implicit assumptions (like all other file endings are interpreted as prm; other file endings might be added in the future). We will right now understand why it is implemented like this, but this will soon get blurred and others will encounter a situation similar to the current one that needs to be improved. I would suggest that we open an issue with the question whether we are willing to make incompatible changes to ParameterHandler
for a better design/interface. For now, to have all options for the future and If we don't like the ending ...any_type()
because this has of course historical reasons (I fully agree in this point), I would suggest that we for now agree on another generic name like parse_parameters()
, parse_file()
, parse_input_file()
(I think there are enough options) and deprecate/remove other parse functions in the long run depending on the outcome of the issue. For me it is not worth combining the new function with the existing parse_input()
function just for reserving the name. Why? Then we still have to get rid of the other four parse_input()
functions. This will otherwise lead to misunderstandings: For example, people will then ask themselves why is there a function parse_input_from_json()
if parse_input()
can handle everything. So I think we actually have to get rid of the multitude of parse_input_...()
functions. If one can convince to deprecate/remove parse_input()
functions with all the associated incompatibilities, from that perspective I think it does not make a big difference whether these are four or five functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the correct way forward would be to change parse_input
. I had to read through all comments to understand what was going on.... parse_input_any_type
does not ring any bell. I thought the function would allow to parse any parameter type, not any file type. It is clearly wrong that the parse_input
function that takes a string would happily read a file.xml
in the same way as file.prm
.
In fact, now it would throw, and a user would have to understand that parse_input_any_type
and parse_input
are actually the exact same thing for prm
and differ only for other file types, where one does not work, and the other has just a more complicated name.
My two cents would be to just change the behavior of the currently existing function (by extending it). It would be backward compatible, and would be much easier to understand than having parse_input_any_type
that takes a string. I argue that parse_input_from_XXX
still makes sense for input streams, since that’s the only possibility we have, but we should not make easy things complicated when it is unnecessary. I see no “implicit” assumptions, provided that the methods are well documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implicit assumptions and incompatibilities arise when you take the comments of @kronbichler and @peterrum together. I take this as an authorization and will change it accordingly.
source/base/parameter_handler.cc
Outdated
std::string file_ending = | ||
parameter_file.substr(parameter_file.find_last_of(".") + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not completely safe. File names might be written in all caps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make a suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could convert the substring to small letters and then the comparison in the next lines could stay the same.
In my opinion, the problem that you are describing is actually related to the design of I don't like calling an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add three tests, that check that the new functionality actually does what is supposed to do?
I added a test. Don't you think one test should be enough? I think verification of the implementation of the different parse_input_...() functions should be done by other tests. |
You have added an additional argument that is not tested, and you allow reading also XML files. Three tests check that Json, XML, and the new argument are correctly handled. Prm is already tested elsewhere. :-) |
I would like to know what we can reasonably test here that is not already "tested" by the compiler and the two Asserts in this function. Let me mention that this PR has been split off from PR #9718 because @bangerth requested that. In #9718 I have also written a test for the |
all tests added |
/rebuild |
clang-tidy complains about this:
Can you please address this? Otherwise, this is good to go. |
There is still something we need to address: currently this PR introduced a backward compatibility issue (I just realized this in some of my old codes, that now fail to run). The reason is that we never require the extension of the parameter files to be |
Sorry for the misunderstanding. These were exactly the implicit assumptions I was talking about
This is why I voted for having a fresh start with a new function name. Now (PR #9885) we have a kind of workaround with potential misunderstandings and again not a clean code, see concerns by @kronbichler
|
It is useful to have a
parse_input()
function that does not depend on the type of input file (prm, xml, json). It is also useful to provide the flexibility to check that mandatory parameters, i.e., those parameters declared with flaghas_to_be_set = true
indeclare_entry()
oradd_parameter()
, see PR #9718, are indeed found in the input file.This PR depends on PR #9718.