Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: env unit test RPC errors return a unique result: #4887

Merged
merged 1 commit into from Mar 19, 2024

Conversation

ximinez
Copy link
Collaborator

@ximinez ximinez commented Jan 18, 2024

High Level Overview of Change

Changes the error used in unit tests submit returns an RPC error (including failed connections, etc.) from temINVALID to telLOCAL_ERROR. Also adds a few log messages that should be helpful to diagnose some test failure cases, and changes the description of telLOCAL_ERROR.

Context of Change

temINVALID is currently returned when submit gets an RPC failure, but it is also used in several places in the transaction engine. This can make diagnosing test failures, especially spurious test failures, confusing and annoying to debug.
telLOCAL_ERROR is not new, but it is not used anywhere in the transaction engine. This will make those types of errors distinct and easier to diagnose.
I did not create a new code, because the available numeric range for codes is large, but not unlimited, and it's wasteful to create a new one when there's one which fits the purpose already available.

Type of Change

  • [X ] Tests (you added tests for code that already exists, or your new feature included in this PR)

API Impact

None.

Before / After

No externally visible effects when things are working correctly.

Test Plan

Since this is limited to unit tests, there is no need for additional testing if those tests pass.

@ximinez ximinez added the Perf impact not expected Change is not expected to improve nor harm performance. label Jan 18, 2024
@ximinez
Copy link
Collaborator Author

ximinez commented Jan 18, 2024

@scottschurr @ckeshava The discussion in #4846 (comment) finally prompted me to clean up this branch enough to create a PR.

@ckeshava
Copy link
Collaborator

ckeshava commented Jan 18, 2024

I observed two other unit tests that are using temINVALID. I think they should use telLOCAL_ERROR instead. here is a branch with the change: https://github.com/ckeshava/rippled/tree/ed-localerr

I'd prefer to use telUNIT_TEST_ERR instead of telLOCAL_ERR, because it's more readable. Although telLOCAL_ERR hasn't been used, I'm guessing we can't alias it with any other name right?

@ckeshava
Copy link
Collaborator

ckeshava commented Jan 19, 2024

Sorry, my bad. Ignore the above modifications to the Ticket and PseudoTx test classes. Your changes are correct as is

@codecov-commenter
Copy link

codecov-commenter commented Jan 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.94%. Comparing base (af9cabe) to head (56f90dd).

❗ Current head 56f90dd differs from pull request most recent head 47b7535. Consider uploading reports for the commit 47b7535 to get more accurate results

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #4887   +/-   ##
========================================
  Coverage    76.94%   76.94%           
========================================
  Files         1127     1127           
  Lines       131675   131690   +15     
  Branches     39609    39571   -38     
========================================
+ Hits        101320   101333   +13     
+ Misses       24446    24416   -30     
- Partials      5909     5941   +32     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@scottschurr scottschurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good. Thanks for doing this.

That said, I think it could be better. I think that @ckeshava is right, it would be better to use a new TER code with a proper name.

We have not historically burned through through tel codes very quickly. Because we rely so much on our unit test framework, I think that sacrificing a tel code in order to be really clear would be very worthwhile. The top-most branch here has made that change: https://github.com/scottschurr/rippled/commits/ed-env-failures/

For one thing, I think it makes it much more obvious what's happening if someone is casually reading a unit test.

But I won't insist on it. This is a useful change either way.

@ximinez
Copy link
Collaborator Author

ximinez commented Feb 14, 2024

@scottschurr Ok, I'm convinced. I grabbed your commit.

Copy link
Collaborator

@scottschurr scottschurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks great. Thanks!

@intelliot intelliot added this to the 2.2.0 (Apr 2024) milestone Feb 16, 2024
@intelliot intelliot changed the title Env unit test RPC errors return a unique result: test: env unit test RPC errors return a unique result: Feb 17, 2024
@intelliot
Copy link
Collaborator

  • awaiting Passed label by ximinez

@ximinez ximinez added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Feb 21, 2024
@ximinez
Copy link
Collaborator Author

ximinez commented Feb 29, 2024

Suggested squash commit message:

test: Env unit test RPC errors return a unique result: (#4877)

* telENV_RPC_FAILED is a new code, reserved exclusively
  for unit tests when RPC fails unexpectedly. This will
  make those types of errors distinct and easier to test
  for when expect and/or diagnose when not.

Copy link
Collaborator

@scottschurr scottschurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I suspect that logging the returned JSON will help in some tricky debugging situations. Thanks for adding that.

env(noop(alice),
msig(demon, demon),
fee(3 * baseFee),
ter(telENV_RPC_FAILED));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ximinez Only the error code in Env.cpp's Env::parseResult should be returning telENV_RPC_FAILED. Why are all the other unit tests using this new error code?

For instance, if I use duplicate signers in an RPC call to a mainnet validator (non-unit test scenario), I should get temINVALID error code, not a telRNV_RPC_FAILED

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are all the other unit tests using this new error code?

Excellent question. The reason is that the RPC layer does some validation before ever invoking the transaction engine. While some of these checks are redundant, catching them early prevents a lot more work from being done later, and so saves time overall. In this example, I think the duplicate signer check is done before any signature verification, which is way more expensive.

Either way, this request is failing on those RPC checks, and so the result has no TER code. You can verify this yourself by changing the result to something else (e.g. tesSUCCESS), and noting that the new log message will show an RPC-level error, and not a transactor-level error.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I understand.

From a code readability point of view, it doesn't help that telENV_RPC_FAILED is scattered across a few unit tests (unrelated to the unit testing skeleton code).

If performance were not an issue, I'd love to clean up these instances of RPC-level errors. alas ://

thanks for the explanation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know.... It might not be a bad idea to introduce a new "requirement" parameter, similar to ter() for RPC result code values.... Then the test could explicitly specify which kind of RPC failure it's expecting.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially something not called ter() but rpcErr()? So if a particular RPC error was expected we could check for that. It might have the side effect of preventing the check for tesSUCCESS?

How hard would it be to code something like that up?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Ed,
You have done a great job! your branch passes all the tests, I don't think I can do better 😅

I have a minor nit about the placement of a comment, you can find it here: https://github.com/ckeshava/rippled/pull/new/minorNit

Having said that, I wanted to discuss three things:

  1. I feel we can avoid using exception as a variable name (to avoid confusing with std::exception) here - develop...ximinez:rippled:ci/jtx_rpc#diff-19c35fddcebc11f5dd0b6b647315df1340bdddc6f9a79c3a6de4b6a98a43a5a5R49

  2. Regarding the presentation of RPC errors, you have structured it as one of two alternatives: <rpcCode, epcError> or <rpcError, epcExceptionMessage>. This does capture all the existing unit tests.
    But I came across three instances where <rpcCode, epcError, rpcExceptionMessage> are all returned by rippled. Here are some examples:

As part of future work, it would be nice if we can have consistency across all such instances. Would it be too hard to specify an rpcErrorCode in all such instances? I don't know.

  1. I realised that we don't unit-test rpc<error_code> nearly as much as the tem/tec error codes. A handful of unit tests expect rpc-related errors. I know that Bronek is working to add more unit tests. I feel such unit tests are useful to detect breaking of Public APIs. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Thanks! I'll make those changes and open a PR!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried to do a PoC for point-2 in my previous comment. Please check the top two commits: ximinez/rippled@ci/jtx_rpc...ckeshava:minorNit

I think it would be nice if all the error cases would return a jss::error_code, instead of only returning a jss::error or jss::error_message. What do you think?

I think this will be a non-trivial amount of work.

The Developer Growth team is working on an OpenAPI specification. I believe it would be useful if we can have a consistent set of error codes -- it would be easier to model the RPC error cases. (This is my read of the situation, I haven't discussed this with the team yet)

Copy link
Collaborator Author

@ximinez ximinez Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it would be a valuable addition, but I think adding new RPC codes is beyond the scope of my change, partly because it could change the RPC API.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's true

return Json::Value();
}
}();
return postconditions(jt, ter_, didApply, jr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a useful change, thanks! it saves developers from printing out the suspicious outputs in debug sessions

}
{
// Put an invalid field in a Memo object.
JTx memoExtra = makeJtxWithMemo();
memoExtra
.jv[sfMemos.jsonName][0u][sfMemo.jsonName][sfFlags.jsonName] =
13;
env(memoExtra, ter(temINVALID));
env(memoExtra, ter(telENV_RPC_FAILED));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To emphasize my point: this test case underscores invalid fields in a memo object. It is more appropriate to return a temINVALID code, instead of the current change. That would help localize telENV_RPC_FAILED to a few places in the code base.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My explanation above covers this test case as well.

Copy link
Collaborator

@ckeshava ckeshava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few clarifying questions about the intended usage of telENV_RPC_FAILURES. Otherwise, the PR looks fine

* telENV_RPC_FAILED is a new code, reserved exclusively
  for unit tests when RPC fails. This will
  make those types of errors distinct and easier to test
  for when expected and/or diagnose when not.
* Output RPC command result when result is not expected.
@ximinez ximinez merged commit 0c32fc5 into XRPLF:develop Mar 19, 2024
14 of 17 checks passed
@ximinez ximinez deleted the env-failures branch March 19, 2024 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. Perf impact not expected Change is not expected to improve nor harm performance.
Projects
Status: Merged
Development

Successfully merging this pull request may close these issues.

None yet

5 participants