-
Notifications
You must be signed in to change notification settings - Fork 6
refactor: split traceroute implementation from check #304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ed internal package - Added TCP client implementation for traceroute and corresponding tests. - Introduced ICMP listener for reading ICMP messages with tests. - Created mock implementations for tracer and ICMP listener to facilitate testing. - Defined data structures for traceroute results, targets, and hops with tests. Signed-off-by: lvlcn-t <75443136+lvlcn-t@users.noreply.github.com>
This commit adds handling for ICMP Destination Unreachable messages when parsing the incoming ICMP packets. It ensures that the traceroute hop is correctly marked as reached if the code is 3 (Port Unreachable), which indicates that the destination itself is reachable but the specific port is not. Signed-off-by: lvlcn-t <75443136+lvlcn-t@users.noreply.github.com>
Signed-off-by: lvlcn-t <75443136+lvlcn-t@users.noreply.github.com>
Signed-off-by: lvlcn-t <75443136+lvlcn-t@users.noreply.github.com>
conn, err := dialer.DialContext(ctx, "tcp", addr.String()) | ||
switch { | ||
case err == nil: | ||
return tcpConn{conn: conn, port: port}, nil | ||
case errors.Is(err, unix.EADDRINUSE): | ||
// If the address is already in use, | ||
// we just retry with a new random port. | ||
return dialTCP(ctx, addr, ttl, timeout) | ||
default: | ||
return tcpConn{conn: conn, port: port}, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I could simplify this to the following:
conn, err := dialer.DialContext(ctx, "tcp", addr.String()) | |
switch { | |
case err == nil: | |
return tcpConn{conn: conn, port: port}, nil | |
case errors.Is(err, unix.EADDRINUSE): | |
// If the address is already in use, | |
// we just retry with a new random port. | |
return dialTCP(ctx, addr, ttl, timeout) | |
default: | |
return tcpConn{conn: conn, port: port}, err | |
} | |
conn, err := dialer.DialContext(ctx, "tcp", addr.String()) | |
if errors.Is(err, unix.EADDRINUSE) { | |
// If the address is already in use, | |
// we just retry with a new random port. | |
return dialTCP(ctx, addr, ttl, timeout) | |
} | |
return tcpConn{conn: conn, port: port}, err |
Do you think it's more readable that way or should we keep it like it currently is?
// User error: we don't have the necessary capabilities | ||
// to open a raw socket for reading ICMP messages. | ||
case errors.Is(err, errICMPNotAvailable): | ||
return wrapError(ctx, err, "ICMP not available for reading") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about this? Should we consider it an error or not?
With the current implementation, I treated it as an usual error that gets retried, but maybe we should just send a hop with minimal information on the channel and return nil or check errors.Is(err, errICMPNotAvailable)
further up the stack?
// Address is the target address to trace to. | ||
Address string `json:"address" yaml:"address" mapstructure:"address"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This config API change is done consciously. I find the addr
ugly, but if you think we should keep it for backwards compatibility, I'm okay with changing it back to addr
in the struct field's tags.
Signed-off-by: lvlcn-t <75443136+lvlcn-t@users.noreply.github.com>
Motivation
The current traceroute implementation in the Sparrow monitoring check was tightly coupled to the check logic and difficult to test or reuse. By extracting the core traceroute functionality into its own
internal/traceroute
package, we achieve:traceroute.Client
abstraction without dragging in all of the Sparrow check code. In the future, this package can be promoted to a standalone CLI tool or library for other Go projects.genericClient
abstraction lays the groundwork for supporting multiple traceroute protocols (TCP/UDP/ICMP) side-by-side, even though we currently only implement TCP.This refactor ensures that Sparrow’s traceroute check remains robust and that the underlying traceroute machinery can grow independently as needs evolve.
Changes
Note: I highly recommend reviewing this PR in your editor, you need to open this file where all the logic, that this PR tries to simplify, is currently buried.
In order to improve the traceroute check, I needed to split the traceroute logic into a dedicated package which will be in
internal
but could be promoted to a dedicated CLI/go library in the future.In the
internal/traceroute
package we now have the following types to split up the different parts of the traceroute into components that work symbiotically together:traceroute.Client
interface is the main interaction entrypoint for external users. For consumers of the package they just create a new client that has one simple run method, which allows to pass the targets to traceroute to and some additional optional options.traceroute.Client
is agenericClient
that will later do the hard-lifiting of deciding which traceroute client to use, for which protocol. Since we currently only have TCP traceroute, we could remove this and just use thetcpClient
.tcpClient
is responsible for implementing the TCP traceroute specific logic. Here is where the heavy-lifting is done to control the further parts of traceroute. To still be able to test this, I needed to leverage over anonymous functions on the struct that can be monkey patched in tests.tracer
is an abstraction to be able to test thehopper
. It shall be implemented by all traceroute clients that implement a certain communication protocol (TCP/UDP/ICMP).hopper
is a small abstraction to pull the ttl iterations and otel traces out of thetracer
s. It's also nice to have this because of easier unit testing.icmpListener
is responsible for privilege guarding the raw socket interactions and reading/closing the raw socket that receives the ICMP messages by intermediate routers. It returns a simpleicmpPacket
which contains all the parsed information received over the raw socket. Here I also fixed a bug, where ICMPTypeDestinationUnreachable wasn't correctly considered. If we receive this type of message with the code 3, it means that we reached the target, but didn't use the correct port. Since the traceroute was able to reached the target, we can safely set the reached boolean. For more on this, see: https://www.iana.org/assignments/icmp-parameters/icmp-parameters.xhtml#icmp-parameters-codes-3For additional information look at the commits.
Tests done
E2E tests succeeded- Current traceroute e2e tests still use the old implementation, as this PR only introduces the refactor as new package without touching anything of the current traceroute check.✔️ Added Unit tests
I've copied some old unit tests from the old implementation and added some new unit tests for:
Unfortunately, it's still hard to test the ICMP related logic since its quite low-level. For a later PR, we could add an interface for
icmp.PacketConn
, which could then be mocked thus unit tested!🧑💼 Manual e2e tests
Since it was easiest to adjust the current traceroute check to test if the behavior still works, I did so in refactor/use-traceroute-pkg.
Logs
API response
TODO