Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traffic Router logs SocketTimeoutException for port 53 TCP health checks #7081

Closed
zrhoffman opened this issue Sep 21, 2022 · 4 comments · Fixed by #7146
Closed

Traffic Router logs SocketTimeoutException for port 53 TCP health checks #7081

zrhoffman opened this issue Sep 21, 2022 · 4 comments · Fixed by #7146
Assignees
Labels
improvement The functionality exists but it could be improved in some way. logging related to logging infrastructure Traffic Router related to Traffic Router

Comments

@zrhoffman
Copy link
Member

zrhoffman commented Sep 21, 2022

This Improvement request (usability, performance, tech debt, etc.) affects these Traffic Control components:

  • Traffic Router

Current behavior

To ensure that a Traffic Router can handle DNSSEC requests, a common practice is for an operator to create a health check that attempts to connect to Traffic Router over TCP on port 53, and, once successfully connected, close the request. When this happens, the following stack trace shows up in traffic_router.log:

ERROR 2022-09-21T18:44:42.289 [pool-13-thread-1] org.apache.traffic_control.traffic_router.core.dns.protocol.TCP - Read timed out
java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method) ~[?:?]
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:115) ~[?:?]
	at java.net.SocketInputStream.read(SocketInputStream.java:168) ~[?:?]
	at java.net.SocketInputStream.read(SocketInputStream.java:140) ~[?:?]
	at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:388) ~[?:?]
	at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65) ~[?:?]
	at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:107) ~[?:?]
	at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:101) ~[?:?]
	at java.io.DataInputStream.readFully(DataInputStream.java:200) ~[?:?]
	at java.io.DataInputStream.readFully(DataInputStream.java:170) ~[?:?]
	at org.apache.traffic_control.traffic_router.core.dns.protocol.TCP$TCPSocketHandler.run(TCP.java:108) [classes/:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]

New behavior

org.apache.traffic_control.traffic_router.core.dns.protocol.TCP$TCPSocketHandler.run() should catch SocketTimeoutException and give a more descriptive error (including IP address).

Also, this case is also not currently recorded in access.log. Do we want to log this case there?

Steps to reproduce

  1. Start CDN in a Box
  2. Make an HTTP request against traffic router on port 53:
[zrhoffman@computer cdn-in-a-box]$ docker-compose exec trafficops curl -v trafficrouter:53
* Rebuilt URL to: trafficrouter:53/
*   Trying fc01:9400:1000:8::f...
* TCP_NODELAY set
* Connected to trafficrouter (fc01:9400:1000:8::f) port 53 (#0)
> GET / HTTP/1.1
> Host: trafficrouter:53
> User-Agent: curl/7.61.1
> Accept: */*
>
* Empty reply from server
* Connection #0 to host trafficrouter left intact
curl: (52) Empty reply from server
  1. Watch the Traffic Router logs
@zrhoffman zrhoffman added Traffic Router related to Traffic Router improvement The functionality exists but it could be improved in some way. logging related to logging infrastructure labels Sep 21, 2022
@tcfdev
Copy link
Collaborator

tcfdev commented Sep 21, 2022

I was able to reproduce this using both curl and nc utilities

@tcfdev
Copy link
Collaborator

tcfdev commented Sep 21, 2022

Additionally, if the connection is closed by the client prematurely, a different exception is thrown:

ERROR 2022-09-21T16:40:25.925 [pool-14-thread-78] org.apache.traffic_control.traffic_router.core.dns.protocol.TCP - null
java.io.EOFException: null
	at java.io.DataInputStream.readFully(DataInputStream.java:202) ~[?:?]
	at java.io.DataInputStream.readFully(DataInputStream.java:170) ~[?:?]
	at org.apache.traffic_control.traffic_router.core.dns.protocol.TCP$TCPSocketHandler.run(TCP.java:108) [classes/:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]

It would be beneficial to log a more descriptive error in this case as well (including IP address if available).

@zrhoffman
Copy link
Member Author

Additionally, if the connection is closed by the client prematurely, a different exception is thrown:

Reproducible by running

docker-compose exec trafficops sh -c '</dev/null nc trafficrouter 53'

@smalenfant
Copy link
Contributor

I used to turn those messages off in the past when that was a problem. I'm glad there is some resolution to this.

log4j.logger.com.comcast.cdn.traffic_control.traffic_router.core.dns.protocol.TCP=off

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement The functionality exists but it could be improved in some way. logging related to logging infrastructure Traffic Router related to Traffic Router
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants