Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async DNS over TCP #25690

Merged
merged 10 commits into from
Oct 16, 2018
Merged

Async DNS over TCP #25690

merged 10 commits into from
Oct 16, 2018

Conversation

raboof
Copy link
Member

@raboof raboof commented Sep 27, 2018

Refs #25460

  • Basic functionality
  • Retry/reconnect logic on failures
  • Buffer/reassemble fragmented replies

The current implementation includes an integration test that relies on being online and receiving a truncated response when resolving the AAAA records for many.bzzt.net. We're considering creating a Docker image that acts as a DNS server that shows the behavior we want to test against.

@akka-ci akka-ci added the validating PR is currently being validated by Jenkins label Sep 27, 2018
}

"resolve queries that are too big for UDP" in {
val name = "many.bzzt.net"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still up? Not working for me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should! (serving AAAA records)

@akka-ci akka-ci added needs-attention Indicates a PR validation failure (set by CI infrastructure) and removed validating PR is currently being validated by Jenkins labels Sep 27, 2018
@akka-ci
Copy link

akka-ci commented Sep 27, 2018

Test FAILed.

@raboof raboof force-pushed the asyncDnsRetryOverTcpWhenTruncated branch from 47e7e4a to 42ed417 Compare September 27, 2018 09:31
@akka-ci akka-ci added validating PR is currently being validated by Jenkins tested PR that was successfully built and tested by Jenkins and removed needs-attention Indicates a PR validation failure (set by CI infrastructure) validating PR is currently being validated by Jenkins labels Sep 27, 2018
@akka-ci
Copy link

akka-ci commented Sep 27, 2018

Test PASSed.

@akka-ci akka-ci added validating PR is currently being validated by Jenkins and removed tested PR that was successfully built and tested by Jenkins labels Sep 27, 2018
@raboof raboof changed the title [wip] Async DNS over TCP Async DNS over TCP Sep 27, 2018
@akka-ci akka-ci added needs-attention Indicates a PR validation failure (set by CI infrastructure) and removed validating PR is currently being validated by Jenkins labels Sep 27, 2018
@akka-ci
Copy link

akka-ci commented Sep 27, 2018

Test FAILed.

@akka-ci akka-ci added the needs-attention Indicates a PR validation failure (set by CI infrastructure) label Sep 27, 2018
@akka-ci
Copy link

akka-ci commented Sep 27, 2018

Test FAILed.

@raboof
Copy link
Member Author

raboof commented Sep 27, 2018

(cancelled 2 builds to make some room on Jenkins, we're mostly interested in the newest commit after all)

@akka-ci akka-ci added tested PR that was successfully built and tested by Jenkins and removed needs-attention Indicates a PR validation failure (set by CI infrastructure) labels Sep 27, 2018
@akka-ci
Copy link

akka-ci commented Sep 27, 2018

Test PASSed.

Copy link
Member

@johanandren johanandren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a partial review...

minBackoff = 10.millis,
maxBackoff = 20.seconds,
randomFactor = 0.1
), "tcpDnsClientSupervisor")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this only happens for the SRV use case, not normal DNS, so should we perhaps start it lazily instead of always?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory it can also happen for large normal DNS responses, but I agree that is a corner case and it'd make sense to start this lazily

}
} else {
val (recs, additionalRecs) = if (msg.flags.responseCode == ResponseCode.SUCCESS) (msg.answerRecs, msg.additionalRecs) else (Nil, Nil)
self ! Answer(msg.id, recs, additionalRecs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to reschedule it a message, couldn't we just deal with the result right away?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the truncated case, it's the TcpDnsClient that sends the Answer to the DnsClient

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it not get the replyTo out of the inflightRequests then send it back right away?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chbatey I'm not sure I follow. The reason we're sending Answer to self here because that makes it possible to share that logic with the code path where TcpDnsClient sends the Answer to us. We could also put that logic in a helper function and call that helper function here and when we receive an Answer from TcpDnsClient, do you think that would be clearer?

val connecting: Receive = {
case CommandFailed(_: Connect) ⇒
log.warning("Failed to connect to [{}]", ns)
throw new AkkaException("Connecting failed")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't just throwing going to lead to it getting logged? Put the ns in the exception instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, but that might not be bad? Weird if connecting to the DNS server fails.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly thinking that if we get two errors in the log that is confusing but maybe it doesn't matter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good one, will remove the warning 👍


private def parseResponse(data: ByteString) = {
val msg = Message.parse(data)
log.debug(s"Decoded: $msg")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug("Decoded: {}", msg)

case msg: Message ⇒
val bytes = msg.write()
connection ! Tcp.Write(encodeLength(bytes.length))
connection ! Tcp.Write(bytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do them with one write message?

if (data.drop(prefixSize).length < expectedPayloadLength)
context.become(ready(connection, data))
else {
val payload = data.drop(prefixSize).take(expectedPayloadLength)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there something with drop and take on ByteStrings being overly expensive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're doing I/O anyway and we need to seek in the message while parsing references to domain names, I think the drop is worth it. The take we can skip, though.

@akka-ci akka-ci added validating PR is currently being validated by Jenkins tested PR that was successfully built and tested by Jenkins and removed tested PR that was successfully built and tested by Jenkins validating PR is currently being validated by Jenkins labels Sep 28, 2018
@akka-ci
Copy link

akka-ci commented Sep 28, 2018

Test PASSed.

@akka-ci akka-ci removed the validating PR is currently being validated by Jenkins label Oct 2, 2018
@akka-ci
Copy link

akka-ci commented Oct 2, 2018

Test PASSed.

@akka-ci akka-ci added validating PR is currently being validated by Jenkins needs-attention Indicates a PR validation failure (set by CI infrastructure) and removed tested PR that was successfully built and tested by Jenkins validating PR is currently being validated by Jenkins labels Oct 3, 2018
@akka-ci
Copy link

akka-ci commented Oct 3, 2018

Test FAILed.

@akka-ci
Copy link

akka-ci commented Oct 3, 2018

Test FAILed.

@raboof
Copy link
Member Author

raboof commented Oct 3, 2018

PLS BUILD

@akka-ci akka-ci added validating PR is currently being validated by Jenkins tested PR that was successfully built and tested by Jenkins and removed needs-attention Indicates a PR validation failure (set by CI infrastructure) validating PR is currently being validated by Jenkins labels Oct 3, 2018
@akka-ci
Copy link

akka-ci commented Oct 3, 2018

Test PASSed.

Copy link
Member

@johanandren johanandren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@chbatey chbatey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, looks good

}
} else {
val (recs, additionalRecs) = if (msg.flags.responseCode == ResponseCode.SUCCESS) (msg.answerRecs, msg.additionalRecs) else (Nil, Nil)
self ! Answer(msg.id, recs, additionalRecs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it not get the replyTo out of the inflightRequests then send it back right away?


val idle: Receive = {
case _: Message ⇒
stash()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the stash will never get too big if the dns server doesn't support TCP as we'll restart back a backoff

Copy link
Member

@patriknw patriknw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

val bytes = msg.write()
connection ! Tcp.Write(encodeLength(bytes.length) ++ bytes)
case CommandFailed(_: Write) ⇒
throw new AkkaException("Write failed")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it include any reason for the failure that we want to have logged?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha - I expected not, but CommandFailed actually has a cause: Option[Throwable]. I'll see if I can add a 'log' here.

@akka-ci akka-ci added validating PR is currently being validated by Jenkins tested PR that was successfully built and tested by Jenkins and removed tested PR that was successfully built and tested by Jenkins validating PR is currently being validated by Jenkins labels Oct 15, 2018
@akka-ci
Copy link

akka-ci commented Oct 15, 2018

Test PASSed.

@raboof raboof merged commit 23b7f86 into master Oct 16, 2018
@johanandren johanandren deleted the asyncDnsRetryOverTcpWhenTruncated branch January 24, 2019 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tested PR that was successfully built and tested by Jenkins
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants