New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track error stats in the main socket event #141

Closed
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@bemasc
Contributor

bemasc commented Dec 4, 2018

This will allow us to determine what kinds of socket errors
are causing problems for users.

Track error stats in the main socket event
This will allow us to determine what kinds of socket errors
are causing problems for users.

@bemasc bemasc requested review from fortuna and dalyk Dec 4, 2018

@@ -164,11 +166,17 @@ public void doConnect(Session session, CommandMessage commandMessage) throws IOE
// We had a functional socket long enough to record statistics.
// Report the BYTES event :
// - VALUE : total transfer over the lifetime of a socket

This comment has been minimized.

@fortuna

fortuna Dec 5, 2018

Could you clarify what "transfer" is? It seems that's downloaded + uploaded bytes?

This comment has been minimized.

@bemasc

bemasc Dec 5, 2018

Contributor

Done

// We consider a termination event as a potentially recoverable failure if
// (1) It is HTTPS
// (2) Some data has been uploaded (i.e. the TLS ClientHello)
// (3) No data has been received
// (4) The socket was terminated by a TCP RST
// In this situation, the listener's uploadBuffer should be nonempty.
if (remoteServerPort == HTTPS_PORT && listener.uploadBytes > 0 && listener.downloadBytes == 0
&& listener.error != null && RESET_MESSAGE.equals(listener.error.getMessage())
&& initialError != null && RESET_MESSAGE.equals(initialError.getMessage())

This comment has been minimized.

@fortuna

fortuna Dec 5, 2018

Can't we check the type of the error instead? Is there any guarantee the message is stable and the same across Android versions? Isn't the message subjected to i18n?

This comment has been minimized.

@fortuna

fortuna Dec 5, 2018

Humm, it seems you get java.net.SocketException, which may not be fine grained enough.

This comment has been minimized.

@bemasc

bemasc Dec 5, 2018

Contributor

Yeah, I don't know of any other way to distinguish the different types of failures.

@@ -185,6 +193,10 @@ public void doConnect(Session session, CommandMessage commandMessage) throws IOE
event.putInt(Names.DURATION.name(), durationMs / 1000);
}
if (initialError != null) {
event.putInt(Names.ERROR.name(), initialError.getMessage().hashCode());

This comment has been minimized.

@fortuna

fortuna Dec 5, 2018

How do you envision us using this?
If we see 4723849237 errors with hash 21314809, what do we do?
Do we have a table of errors to hash?

This comment has been minimized.

@dalyk

dalyk Dec 5, 2018

This is my main question as well, but as long as we have a good answer, I think hashcodes make sense. Are the possible error strings we could be hashing limited to those generated by the Android OS plus anything generated by Intra and its third party dependencies? Do you think the list of error codes in bionic/libc/bionic/strerror.cpp is a reasonably comprehensive list for Android that we could use to generate an offline lookup table of hashcodes, or is there a better list?

This comment has been minimized.

@bemasc

bemasc Dec 5, 2018

Contributor

I envision us using this by keeping a small table of common errors strings and their hashes. Unfortunately, there's no single source for these strings. For example, "Connection reset" is just set as a literal in the middle of some JDK code: https://android.googlesource.com/platform/libcore/+/2c87ad3/ojluni/src/main/java/java/net/SocketInputStream.java#136

This comment has been minimized.

@dalyk

dalyk Dec 5, 2018

I suppose if we get into a situation where a certain hash shows up a ton but we can't figure out what it is, we could assume that the error message doesn't contain PII and temporarily introduce some code allowing us to record a decodable representation of that specific error message.

This comment has been minimized.

@fortuna

fortuna Dec 5, 2018

It may be worth adding a unit test where we check the numbers for a few errors we know about. That's a way to document the mapping. Otherwise, could you list a few of the common errors and their hash code?

@dalyk

dalyk approved these changes Dec 5, 2018

@@ -185,6 +193,10 @@ public void doConnect(Session session, CommandMessage commandMessage) throws IOE
event.putInt(Names.DURATION.name(), durationMs / 1000);
}
if (initialError != null) {
event.putInt(Names.ERROR.name(), initialError.getMessage().hashCode());

This comment has been minimized.

@dalyk

dalyk Dec 5, 2018

I suppose if we get into a situation where a certain hash shows up a ton but we can't figure out what it is, we could assume that the error message doesn't contain PII and temporarily introduce some code allowing us to record a decodable representation of that specific error message.

@bemasc

This comment has been minimized.

Contributor

bemasc commented Dec 7, 2018

Thanks for all the feedback here. It seems like this is not very appealing, and might not be the best way to go. I think #142 is probably a better solution. PTAL.

@bemasc bemasc closed this Dec 7, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment