Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track error stats in the main socket event #141

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Expand Up @@ -116,16 +116,18 @@ public void doConnect(Session session, CommandMessage commandMessage) throws IOE
download.setBufferSize(BUFFER_SIZE);

StatsListener listener = null;
Exception initialError = null;
try {
listener = runPipes(upload, download, remoteServerPort, SIMULATE_RESET);
initialError = listener.error;
// We consider a termination event as a potentially recoverable failure if
// (1) It is HTTPS
// (2) Some data has been uploaded (i.e. the TLS ClientHello)
// (3) No data has been received
// (4) The socket was terminated by a TCP RST
// In this situation, the listener's uploadBuffer should be nonempty.
if (remoteServerPort == HTTPS_PORT && listener.uploadBytes > 0 && listener.downloadBytes == 0
&& listener.error != null && RESET_MESSAGE.equals(listener.error.getMessage())
&& initialError != null && RESET_MESSAGE.equals(initialError.getMessage())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we check the type of the error instead? Is there any guarantee the message is stable and the same across Android versions? Isn't the message subjected to i18n?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Humm, it seems you get java.net.SocketException, which may not be fine grained enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't know of any other way to distinguish the different types of failures.

&& listener.uploadBuffer != null) {
// To attempt recovery, we
// (1) Close the remote socket (which has already failed)
Expand Down Expand Up @@ -164,11 +166,17 @@ public void doConnect(Session session, CommandMessage commandMessage) throws IOE
// We had a functional socket long enough to record statistics.
// Report the BYTES event :
// - VALUE : total transfer over the lifetime of a socket
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify what "transfer" is? It seems that's downloaded + uploaded bytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// - UPLOAD : Total bytes uploaded on this socket
// - PORT : TCP port number (i.e. protocol type)
// - DURATION: socket lifetime in seconds
// - DURATION : socket lifetime in seconds
// - ERROR: If present, the hashcode of the IOException message that ended the connection.
// Including a hashcode instead of the entire error message improves efficiency and
// ensures that we only see the common messages. It works because all Android versions
// use the same hashcode() implementation.

Bundle event = new Bundle();
event.putInt(Param.VALUE, listener.uploadBytes + listener.downloadBytes);
event.putInt(Names.UPLOAD.name(), listener.uploadBytes);

int port = listener.port;
if (port >= 0) {
Expand All @@ -185,6 +193,10 @@ public void doConnect(Session session, CommandMessage commandMessage) throws IOE
event.putInt(Names.DURATION.name(), durationMs / 1000);
}

if (initialError != null) {
event.putInt(Names.ERROR.name(), initialError.getMessage().hashCode());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you envision us using this?
If we see 4723849237 errors with hash 21314809, what do we do?
Do we have a table of errors to hash?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my main question as well, but as long as we have a good answer, I think hashcodes make sense. Are the possible error strings we could be hashing limited to those generated by the Android OS plus anything generated by Intra and its third party dependencies? Do you think the list of error codes in bionic/libc/bionic/strerror.cpp is a reasonably comprehensive list for Android that we could use to generate an offline lookup table of hashcodes, or is there a better list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I envision us using this by keeping a small table of common errors strings and their hashes. Unfortunately, there's no single source for these strings. For example, "Connection reset" is just set as a literal in the middle of some JDK code: https://android.googlesource.com/platform/libcore/+/2c87ad3/ojluni/src/main/java/java/net/SocketInputStream.java#136

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose if we get into a situation where a certain hash shows up a ton but we can't figure out what it is, we could assume that the error message doesn't contain PII and temporarily introduce some code allowing us to record a decodable representation of that specific error message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth adding a unit test where we check the numbers for a few errors we know about. That's a way to document the mapping. Otherwise, could you list a few of the common errors and their hash code?

}

FirebaseAnalytics.getInstance(context).logEvent(Names.BYTES.name(), event);
}
}
Expand Down Expand Up @@ -289,6 +301,11 @@ public void onTransfer(Pipe pipe, byte[] buffer, int bufferLength) {

@Override
public void onError(Pipe pipe, Exception exception) {
if (this.error != null) {
// When simulateReset is true, the simulated error can be followed by an actual error.
// We want to ignore the actual error so that the simulation works.
return;
}
this.error = exception;
}
}
Expand Down
2 changes: 2 additions & 0 deletions Android/app/src/main/java/app/intra/util/Names.java
Expand Up @@ -25,10 +25,12 @@ public enum Names {
DNS_STATUS,
DURATION,
EARLY_RESET,
ERROR,
LATENCY,
PORT,
RESULT,
RETRY,
SPLIT,
TRANSACTION,
UPLOAD,
}