Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPC connection failure handling #74

Closed
dexX7 opened this issue Apr 24, 2015 · 19 comments
Closed

RPC connection failure handling #74

dexX7 opened this issue Apr 24, 2015 · 19 comments

Comments

@dexX7
Copy link
Member

dexX7 commented Apr 24, 2015

In BitcoinClient.java#L61-L67, when waiting for the server, and a SocketException is caught, then the stack trace should not be dumped, if certain certain expected exception messages are identified. This is pretty straight forward, to suppress errors while waiting.

However, those messages are localized, even without getLocalizedMessage(), and this causes a lot of output.

  1. Could we replace those checks and suppress all ConnectException? Or is there an alternative to using the exception messages?

  2. From an user's point of view, there are probably three relevant outcomes in this context, which should be handled differently:

  • server is not running at all, or connection information are wrong -> "please start the server with the following options ..."
  • failed authorization -> "please use this or that rpc username and password"
  • the client really waits -> error suppression

A failed authorization can be identified as JsonRPCException with httpCode = 401, but the other two are probably more tricky: a refused connection expresses itself as ConnectException, but I was unable to see "waiting" in action.

@msgilligan
Copy link
Member

I developed this code largely by trial-and-error and I remember two lessons learned:

  1. Multiple exceptions could be thrown during the waiting process on a single platform
  2. The exceptions thrown are different on different platforms.

I am not confident that this code as written is definitive or correct. However, it seems to be working for our current use cases with the exception of your complaint about localized messages.

  1. Could we replace those checks and suppress all ConnectException? Or is there an alternative to using the exception messages?

This code is an artifact of my trial-and-error development. As I saw each new message and decided it was "normal" while waiting for a connection I added it to the list of messages to be supressed.

An easy way to resolve the output logging issue would be using debug logging and debug level settings. However, I would also like to have a better understanding of which error messages are expected while waiting for a server to start and which ones should be rethrown (if any.) One alternative (only alternative?) to using the message would be to look at the subclasses of SocketException such as ConnectException.

  1. From an user's point of view, there are probably three relevant outcomes in this context, which should be handled differently

This is what I'm trying to do. Although the waitForServer case is a special case where some exceptions that would normally be considered hard errors should be ignored.

server is not running at all, or connection information are wrong

Unfortunately these two cases are not mutually exclusive. Since the server spends time scanning the blockchain before opening the RPC port, there is no way to know you have a correctly specified server until it finished. I know that Bitcoin Core 0.10.x reduces this wait period, but I don't think it is eliminated entirely. I am also unsure how the Omni Core -startclean option affects RPC port opening.

I'm looking forward to seeing how 0.10.x works and seeing if we can reduce the timeout used with waitForServer.

@msgilligan
Copy link
Member

@dexX7 I just made some improvements to this code:

You can see the improved test console output in Job 25.

@dexX7
Copy link
Member Author

dexX7 commented Jun 28, 2015

Ah, that's awesome!

While the server is in warmup mode (i.e. still parsing, ...), the error code is -28:

if (    e.getMessage().contains("Verifying blocks") ||
        e.getMessage().contains("Parsing Omni Layer transactions") ||
        e.getMessage().contains("Still scanning.. at block") |
        e.getMessage().contains("Loading addresses...")) {
    // Swallow
    status = e.getMessage();
}

Could we make the "waiting for sync/block" smarter, such that it works both ways?

@msgilligan
Copy link
Member

If there's a consistent error code we should probably use that instead of the strings.

Could we make the "waiting for sync/block" smarter, such that it works both ways?

I'm not sure what you mean by 'both ways'.

@dexX7
Copy link
Member Author

dexX7 commented Jun 28, 2015

Ah, I was referring to this:

foundation.omni.test.consensus.chest.tmsc.CompareCoreChestSpec STANDARD_ERROR
    Jun 28, 2015 4:43:01 AM com.msgilligan.bitcoin.rpc.BitcoinClient waitForServer
    WARNING: Waiting for server RPC ready:
    Jun 28, 2015 4:43:01 AM com.msgilligan.bitcoin.rpc.BitcoinClient waitForServer
    WARNING: 
    RPC Ready.
    Jun 28, 2015 4:43:02 AM org.slf4j.Logger$info$0 call
    INFO: Reference current height: 362863
    Jun 28, 2015 4:43:02 AM com.msgilligan.bitcoin.rpc.BitcoinClient waitForBlock
    WARNING: Waiting for server to reach block 362863
    Jun 28, 2015 4:43:02 AM com.msgilligan.bitcoin.rpc.BitcoinClient waitForBlock
    WARNING: Server is at block 362863 returning 'true'.
    Jun 28, 2015 4:43:02 AM org.slf4j.Logger$info$0 call
    INFO: Current reference block height: 362863
    Jun 28, 2015 4:43:02 AM sun.reflect.NativeMethodAccessorImpl invoke0
    INFO: Bitcoin version: 100200
    Jun 28, 2015 4:43:02 AM sun.reflect.NativeMethodAccessorImpl invoke0
    INFO: Mastercore version: 0.0.10.0-dev

foundation.omni.test.consensus.chest.tmsc.CompareCoreChestSpec STANDARD_OUT
    Setup comparison height1 = 362863, height2 = 362862

foundation.omni.test.consensus.chest.tmsc.CompareCoreChestSpec > blockHeight 362863 == 362862 (CurrencyID:2) FAILED
    org.spockframework.runtime.SpockComparisonFailure

@dexX7
Copy link
Member Author

dexX7 commented Jun 28, 2015

It would also be great use a lower log level for the waiting messages. It clutters the log:

foundation.omni.test.rpc.sto.SendToOwnersTestPlanRawSpec

foundation.omni.test.rpc.sto.SendToOwnersTestPlanRawSpec STANDARD_ERROR
    Jun 29, 2015 12:28:34 AM com.msgilligan.bitcoin.rpc.BitcoinClient waitForServer
    WARNUNG: Waiting for server RPC ready...
    Jun 29, 2015 12:28:34 AM com.msgilligan.bitcoin.rpc.BitcoinClient waitForServer
    WARNUNG: RPC Ready.
    Sender available balance = 0, reserved balance = 0 for the STO property
    ...

foundation.omni.test.rpc.sto.SendToManyOwnersSpec

foundation.omni.test.rpc.sto.SendToManyOwnersSpec STANDARD_ERROR
    Jun 29, 2015 12:28:51 AM com.msgilligan.bitcoin.rpc.BitcoinClient waitForServer
    WARNUNG: Waiting for server RPC ready...
    Jun 29, 2015 12:28:51 AM com.msgilligan.bitcoin.rpc.BitcoinClient waitForServer
    WARNUNG: RPC Ready.
    PropertyType:1: start with n * 1 and send n * 1 to 70 owners
    ...

@msgilligan
Copy link
Member

@dexX7 I'm still not sure what you mean by:

Could we make the "waiting for sync/block" smarter, such that it works both ways?

@msgilligan
Copy link
Member

I added code to record the JSON-RPC error code and am now using -28 to check for "warming up" state: 937550d

I also changed the log levels to be less verbose by default: dc023ab

@dexX7
Copy link
Member Author

dexX7 commented Jun 29, 2015

@msgilligan: when running consensus tests, the local server "waits for sync", until the current block height is reached, but I often see test failures, because the remote server, mostly Chest, is still behind:

foundation.omni.test.consensus.chest.tmsc.CompareCoreChestSpec STANDARD_OUT
    Setup comparison height1 = 362863, height2 = 362862

This also introduces a chance that not only the height is different, but also the data to compare.

What I meant with "both ways": could we wait in a similar manner, until the remote server catches up, too?

@dexX7
Copy link
Member Author

dexX7 commented Jun 29, 2015

I also changed the log levels to be less verbose by default: dc023ab

Thanks! There are still two lines, which are shown, when running ./gradlew :omnij-rpc:regTest. I left comments inline. :)

@msgilligan
Copy link
Member

What I meant with "both ways": could we wait in a similar manner, until the remote server catches up, too?

@dexX7 , see Issue #87

@msgilligan
Copy link
Member

Thanks! There are still two lines, which are shown, when running ./gradlew :omnij-rpc:regTest. I left comments inline. :)

I responded to those comments, but should probably copy those responses here:

I need to think about this a little. I think I prefer more logging than you do. The solution is to (hopefully) agree on the correct levels for the messages and then have each of us choose a different setting of the level when running tests. (And I may chose a different level for locally running and for Jenkins.)

So, since the "RPC Status:" message is the most verbose of them all and can happen hundreds of times while syncing, it should be a higher level of verbosity than the "RPC Ready" message. I'm just not sure if they should be info/debug or debug/trace. I think this depends upon the default debug level, what we configure in Gradle, in our IDEs, etc.

@dexX7
Copy link
Member Author

dexX7 commented Jun 29, 2015

then have each of us choose a different setting of the level when running tests

Good idea. To outline my perspective: I run omnij-rpc:regTest and omnij-rpc:consensusTest most of the time, and I'm interested in errors and test results. Successful RPC status messages are not relevant for me, because I expect it to be ready.

So, since the "RPC Status:" message is the most verbose of them all ...

This is a very reasonable approach to find a solution.

Since 0.0.10, we print the scanning/parsing progress during startup to the console every 30 seconds, but it is also used as "warmup message" for the RPC warmup error messages (-28), so this might be considered here.

... and can happen hundreds of times while syncing

Reporting it hundred of times, in any case, seems a bit over the top.

if (seconds % MESSAGE_SECONDS == 0) {
    log.debug("RPC Status: " + status);
}

What if no time based check would be used here, but instead something like:

String statusNew;
// ...
if (status != statusNew) {
    status = statusNew;
    log.debug("RPC Status: " + status);
}

... so that no message is swallowed, but the logs are also not spammed with duplicated messages?

Since it wouldn't be hundred of messages anymore, a similar log level might then be used to report "RPC Ready" and "RPC status", like "debug".

@msgilligan
Copy link
Member

The use case I am most concerned with is running the consensus tests on the Jenkins server. For those tests we remove the Omni state files and it takes ~30 minutes to reach "RPC Ready". Since the jobs run automatically and remotely it is nice to have as detailed a log as possible when reviewing the results. Sometimes I leave a browser window open with the console log in order to monitor the status of the job, so seeing the scrolling status messages is helpful. Also, we generally look at formatted test results when the tests complete and only look at the logs for failures.

When I run the tests locally, I do prefer less verbosity -- but, in general, I'd rather have unwanted verbosity locally than miss critical information when a test fails on Jenkins.

I like your if (status != statusNew) solution and will try to implement that in addition to whatever other logging level changes we decide on.

I also think we should try to follow the semantic guidelines for the logging levels, so that when they are configured to run at a particular level the behavior makes sense. We can then set a default level in the Gradle file that makes sense. I believe that logging level can also be modified for a specific Java pakage, e.g. com.msgilligan.bitcoin.rpc

The current levels pretty much conform to my understanding of the meanings of the various levels, though I know there a particular cases that I know or wrong and I am certainly open to suggestions.

If I understand your preferences correctly, I think you prefer logging at the ERROR or WARN level. I need to learn (relearn) the details about configuring these levels via Gradle and/or IDEA.

@msgilligan
Copy link
Member

I'm pretty sure slf4j uses log4j logging level, defined in Level

INFO:

designates informational messages that highlight the progress of the application at coarse-grained level.

DEBUG:

designates fine-grained informational events that are most useful to debug an application.

@dexX7
Copy link
Member Author

dexX7 commented Jun 29, 2015

Currently INFO is used, yes.

If I understand your preferences correctly, I think you prefer logging at the ERROR or WARN level.

Hehe sorry again, I wasn't clear: I'm interested in the test results and test progress, but I'm not interested in the RPC initialization, which I'd consider "one level below".

If we say that progress messages are INFO, then:

  • "RPC Ready": I would agree that this is a status message, but given that the focus is on testing, and assuming test progress sets the bar for INFO, then it should be DEBUG.
  • "Waiting for server RPC ready...": it should be sufficient to report something, if we really have to wait for the server, but if not, it's close to a TRACE message.
  • "RPC Status": assuming it reports only, if there is a "new" message, then it should be INFO. If we don't have to wait, then it's not reached at all, and if we have to wait, then it's an "exceptional" situation, one should be aware of, which qualifies it for something higher than DEBUG in my opinion.

So, since the "RPC Status:" message is the most verbose of them all ...

At this point I would disagree, and consider "RPC Status" as more important than "RPC Ready".

The "Waiting for server..." messages in Base[RegTest|MainNet]Spec are redundant.

@msgilligan
Copy link
Member

I'm reluctant to use the TRACE message because it isn't standard to all logging implementations (slf4j is a facade and can work with several of the popular Java loggers underneath)

I'm hoping we can make it work with just INFO and DEBUG and the ability to customize the logging configuration to meet your needs. Let me experiment and learn about the options and I'll see what I can come up with.

At this point I would disagree, and consider "RPC Status" as more important than "RPC Ready".

I see your point.

The "Waiting for server..." messages in Base[RegTest|MainNet]Spec are redundant.

Removed. See aff3cbc.

@msgilligan
Copy link
Member

@dexX7 Is this one completed in your opinion? Can we close it?

@dexX7
Copy link
Member Author

dexX7 commented Sep 14, 2015

Yup, was resolved by #89.

@dexX7 dexX7 closed this as completed Sep 14, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants