Ethdebug isoltest framework#16675
Conversation
clonker
left a comment
There was a problem hiding this comment.
Some minor comments.
Mainly it would be good to have descriptions of the test format in the header so one doesn't have to go look through the code to find out what options there are.
Beyond that quite a few things can be const qualified I think. But that's more of a nit and not an official style guide... yet. :P
| if (colonPos != std::string_view::npos && | ||
| stripped.find_first_not_of(" \t", colonPos + 1) == std::string_view::npos) |
There was a problem hiding this comment.
the formatting here is weird... :)
There was a problem hiding this comment.
Yes, it should be:
| if (colonPos != std::string_view::npos && | |
| stripped.find_first_not_of(" \t", colonPos + 1) == std::string_view::npos) | |
| if ( | |
| colonPos != std::string_view::npos && | |
| stripped.find_first_not_of(" \t", colonPos + 1) == std::string_view::npos | |
| ) |
Here and in a few other places.
9f4aacf to
bdc36c0
Compare
I applied all of the suggested changes, and went full |
clonker
left a comment
There was a problem hiding this comment.
You didn't fix the replace thing. Try defining the following test:
contract C {
function f() public pure {}
}
// ----
// .compilation:
// {
// "compiler": "<VERSION>"
// }
Simplified a bit since |
Sweet, TIL! |
6ac8d05 to
59617b9
Compare
59617b9 to
ed60a86
Compare
cameel
left a comment
There was a problem hiding this comment.
Just a few initial comments. Will add more later.
I wanted to quickly skim through this to understand the overall structure and give some comments on that but the code in some parts is not that easy to read in any way other than linearly due to the combination of auto and vague names.
| if (Json::accept(rawJSON)) | ||
| return rawJSON; |
There was a problem hiding this comment.
I don't like the O(N^2) complexity of this. To parse To parse N lines you'll run accept() N times, each time having the JSON library process everything again. It's just tests, but having fast tests is nice too.
IMO it would be much better to detect the end of JSON input based on indent and then parse it once.
There was a problem hiding this comment.
To make things simpler we could also assume that it must start already on the same line, i.e.:
// SomeContract.instructions[2].context: {
// "code": {
// "range": {"length": 83, "offset": 0},
// "source": {"id": 0}
// }
// }It will not let you format the JSON in weird ways, but that's actually good for readability anyway.
There was a problem hiding this comment.
We discussed it in the call, but I genuinely don't view this as much of a problem; replacing it with a parser would be less readable, and the only instances where this could potentially be an issue is with very large direct object comparison, which these isoltests are not really aimed at - their purpose is more to assert specific information - for everything else, there's always cmdlineTests if we're inclined to just dump the whole output. I added a comment noting this.
There was a problem hiding this comment.
It's not much of a parser really. We already have it in NatspecJSONTest::extractExpectationJSON() and it's comparable length to what you have here. Though might need a small adjustment to allow any unindented char to mark the end (back there I assumed it's always }).
The performance issue is trival to fix so I don't really see a reason to keep it.
But aside from performance, what bothers me here is that this is implemented again and with different behavior. And it's definitely not the last test we'll write that uses JSON expectations. I think we should put this in a common helper so that writing future tests requires less boilerplate.
There was a problem hiding this comment.
Overall my impression is that this test may be reinventing things that already exist. To be fair that's just from initial look so I may be wrong beyond this JSON parsing bit. But even if new utilities need to be added, I'd rather see them added to TestFileReader or as general helpers (depending on where they fit). I'll give you more specifics once I'm done with review.
But as a general point on my this bothers me, my biggest problem with the existing test cases is that they still require more boilerplate for writing new test cases than I think they should. I spent some time in the past trying to address it with my AnalysisFramework refactor and I just want to make sure we keep building on that instead of regressing.
There was a problem hiding this comment.
In that case, why would I move to a common function - it would make more sense to keep the logic in JSONExpectationTest and then refactor NatspecJSONTest to inherit from it?
There was a problem hiding this comment.
Sure, that would work too. It would achieve the goal of reusing the implementation.
I'm not sure if all tests that parse JSON out of expectations will fit JSONExpectationTest, but if you apply my path suggestions from #16675 (comment) that might make them general enough.
| auto const resolved = resolvePath(it->second, jsonPath); | ||
| if (!resolved) | ||
| { | ||
| allMatch = false; | ||
| continue; | ||
| } | ||
|
|
||
| Json value = *resolved; |
There was a problem hiding this comment.
Just wanted to say that this code is really annoying to read without backtracking and jumping around. If you want to use auto, you need to put more effort into including the missing information in your names. Like, what is a path, what is JSON, what is an expectation vs received input. it, resolved, obtained value are not good names because they require tracking too much context in your head.
2243a86 to
e31bbe9
Compare
There was a problem hiding this comment.
This is a complete review now. Overall, I did find some structural problems here that I'd like addressed, but I do like the concept. Now that I understand better what you intended with StandardJSONTest it makes much more sense.
Main problems I see:
- I really don't think placeholders in the current form make sense. They seem simple but have tons of caveats. I'd either rework it or replace it with a simple
<IGNORE>. - There are some issues with path processing (colons) and overall I think it could easily be made more flexible with a small syntax change.
- There's another performance issue due to eager calculation of outputs.
- I think JSON validation is broken for single-line values.
- Overall I'd scrap the single-/multi-line value distinction. There's a ton of branching based on that, but there's little reason not to treat all values uniformly.
| continue; | ||
|
|
||
| // Check if this is a multi-line value (line ends with ":" and no inline value after it) | ||
| auto const colonPos = stripped.rfind(':'); |
There was a problem hiding this comment.
I think this should be find(). If you use rfind() then how do you handle things like C.src: "1:2:3"?
There was a problem hiding this comment.
I see that this is less of an immediate problem than I expected because source locations are not using this notation in ethdebug. I still think this is something that needs to be fixed, even if just in a follow-up, because we'll run into it sooner or later. And I don't think the current implementation gives a clear error message about what happens if you do run into this limitation.
Unfortunately, fixing it is a bit problematic: technically, you can have an arbitrary number of colons on both sides. The path part of a qualified contract name is an arbitrary string and can contain colons (in addition to the separator, which is a colon too). I don't see a way to solve it fully other than some quoting+escaping for the qualified contract name.
We can go for a simplifying assumption that one side has no colons, which is basically what you did already, but I think that not supporting colons in the path would be a better choice.
| // Convert expectation path, e.g. "instructions[0].offset" to RFC 6901 JSON pointer, i.e. | ||
| // /instructions/0/offset to be later used in JSON lookups. |
There was a problem hiding this comment.
Why not just use the JSON pointer syntax in expectations if nlohmann-json supports it out of the box?
Admittedly, I find the . and [] syntax a bit more readable, but I'm not sure it always translates 1:1 or generalizes well and relying on some established notation might be simpler. Which is why I left that up to the implementation. What's your rationale here?
| if (_path[0] == '.') | ||
| { | ||
| // Global path: ".compilation.compiler.name" -> outputKey=".compilation", jsonPath="compiler.name" | ||
| // ".resources.compilation.sources" -> outputKey=".resources", jsonPath="compilation.sources" | ||
| auto const secondDot = _path.find('.', 1); | ||
| if (secondDot == std::string_view::npos) | ||
| return {std::string(_path), ""}; | ||
| return {std::string(_path.substr(0, secondDot)), std::string(_path.substr(secondDot + 1))}; | ||
| } | ||
| auto const [scope, jsonPath] = splitNonGlobalPath(_path); | ||
| return {std::string(scope), std::string(jsonPath)}; |
There was a problem hiding this comment.
I think we should give more flexibility here to the derived test case and make less assumptions about what paths look like in a general case. For example, to make it possible to put NatspecJSONTest on top of your base class as you suggested in #16675 (comment), we'd need to allow processing of something like this, which does does not fit the path format this function assumes:
// :C userdoc
// {
// "kind": "user",
// "methods": {},
// "version": 1
// }We could handle that easily by making splitNonGlobalPath() a bit more general. I.e. just give it every path, global or not, and have it explicitly return the qualified contract name and the name of the output. This way the derived class is responsible for the whole split and can do it its own way (or just yield to the default implementation).
For example NatspecJSONTest would have an implementation that gets :C userdoc and returns (:C, userdoc, ).
EthdebugTest would on the other hand do, e.g.,
a.sol:C.contract.name -> (a.sol:C, evm.bytecode.ethdebug, contract.name) or
.compilation:compiler.name -> ( , ethdebug.compilation, compiler.name).
The default implementation in JSONExpectationTest should support your syntax but with the whole triple spelled out explicitly:
// .ethdebug.compilation:compiler.name: solc
// a.sol:C:evm.bytecode.ethdebug:contract.name | length: 1or, with less colons:
// (ethdebug.compilation) compiler.name: solc
// (a.sol:C evm.bytecode.ethdebug) contract.name | length: 1or whatever other separators you think would work better.
The nice thing about it would be that in addition to the flexibility, this default format would require less work when creating a new test case. In most cases the verbose format would work just fine. You'd only have to spend time customizing it if you really wanted shorter paths.
There was a problem hiding this comment.
I see that you now did something similar to what I suggested above, but not exactly.
You now have splitPath(), but instead of giving you the full output names, the triple contains shortened names specific to the derived test case. Which is why you had to also implement a virtual fetchOutput() that can recognize these output names. In what I described above, fetchOutput() would be universal and sit in JSONExpectationTest. The logic for translating short names to long ones would be in splitPath(). This lets us minimize the amount of logic that has to be adjusted in the derived case.
Anyway, just wanted to make this clear, but it's also something we can adjust in the follow-up that will handle NatspecJSONTest so for the purposes of this PR what you have now is good enough.
e31bbe9 to
46f3f0a
Compare
There was a problem hiding this comment.
Assuming that the reusability/O(N^2) issues (#16675 (comment), #16675 (comment)) will be addressed in a follow-up, this is almost ready.
There are a few leftover things that are straightforward to do and I think they should be cleaned up before we merge (missing assert (#16675 (comment)), proper explanation of placeholders in the docstring (#16675 (comment)), confusing/incomplete explanation of what happens to lines not ending with a colon (#16675 (comment)), weird parsing of parentheses, generally it's all in the comments above and below).
Then there are two issues that may require some bigger adjustments to test logic:
- I don't think
rfind()will properly split lines with colons in the value (#16675 (comment)). - The multi-line/single-line status can be different between expectations and actual output and the test case is not accounting for that, which will sometimes result in a broken test on update.
Normally I'd prefer these flaws to be addressed here, but in the interest of making this test suite available quicker, I'd be fine with punting them to a follow-up. I.e. if you at least clean up the easy stuff and note the remaining stuff in the issue I expect to approve the PR the next time I look at it (which could even be later today).
| if (_multiLine) | ||
| return jsonPrint(_value, {JsonFormat::Pretty, 4}); | ||
| if (_value.is_string()) | ||
| // Preserve the on-disk style: bare-string values were written unquoted. | ||
| return _bareString ? _value.get<std::string>() : _value.dump(); |
There was a problem hiding this comment.
I see you changed to the flags after #16675 (comment), but I still think the idea of having a separate single- and multi-line mode is flawed.
So, again, what happens if you have a value written a single-line, bare string in the original test and when you run it it's actually multi-line JSON in the output? Won't you get a broken test as a result?
46f3f0a to
a8c31d0
Compare
cameel
left a comment
There was a problem hiding this comment.
I'll tentatively approve to signal that there are no major dealbreakers here, but the ( thing is really something that's better off addressed before we merge.
| if (_path[0] == ' ' || _path[0] == '\t') | ||
| BOOST_THROW_EXCEPTION(std::runtime_error( | ||
| "Path must not have leading whitespace: "s.append(_path) | ||
| )); |
There was a problem hiding this comment.
This is addressing #16675 (comment), right? This does not look like an assert.
Overall, you have zero asserts in this PR, which makes me think that you are very likely missing some :)
There was a problem hiding this comment.
I believe you requested most of the asserts be converted into exceptions for better error handling.
| auto const& content = _value.get_ref<std::string const&>(); | ||
| Json probe; | ||
| if (jsonParseStrict(content, probe)) | ||
| return _value.dump(); |
There was a problem hiding this comment.
I'll leave fixing this for the follow-up, but for now just wanted to point out how bad of an idea this is. This basically lets you use bare strings as a shortcut to avoid quoting. Which will work but only most of the time, but then you write e.g. true instead of "true" or null instead of "null" or 'abc' instead of "abc" and spend an hour pulling your hair and wondering why your expectation is not matching what the compiler spews out even though it looks identical.
IMO the right solution here is to only accept <IGNORE>, <SCOPE NOT FOUND> and <PATH NOT FOUND> as special cases and treat any other invalid JSON as actually invalid.
There was a problem hiding this comment.
Yeah, that's a fair point. I'll do that in a follow up.
|
|
||
| JSONExpectationTest::PathParts JSONExpectationTest::splitPath(std::string_view _path) const | ||
| { | ||
| auto const open = _path.find('('); |
There was a problem hiding this comment.
Variable names in this PR are just bad, but I'm determined to ignore that so let me instead annoy you with another comment on how they suck.
a8c31d0 to
afa9744
Compare
Closes #16502.
Supports the following:
Allows matching of specific portions of the output JSON, length checks on arrays, type checks. I could remove some of the current CLI tests, but would prefer to keep them in for the time being - most of the CLI tests requests outputs other than ethdebug, and the new EthdebugTest isoltest only supports reading of ethdebug data directly. In addition, there's no checks for errors like in syntax tests, which automatically disqualified migration of all of the 'invalid' CLI tests.