Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support monitoring TCP services #18

Closed
dchidell opened this issue Oct 4, 2020 · 5 comments
Closed

Support monitoring TCP services #18

dchidell opened this issue Oct 4, 2020 · 5 comments
Labels
feature New feature or request

Comments

@dchidell
Copy link
Contributor

dchidell commented Oct 4, 2020

Not all services are web based - but they can be checked fairly simply (i.e. see if they're at least running) by opening a TCP socket. If the socket opens successfully, the service is considered online.

This is crude - but at least opens the door to monitoring applications which are not based on HTTP.

This is pretty similar to this request: #5 but should be a lot more generic.

@TwiN TwiN changed the title Feature Request - Backend monitoring of TCP sockets Support monitoring TCP services Oct 4, 2020
@TwiN TwiN added the feature New feature or request label Oct 4, 2020
@TwiN
Copy link
Owner

TwiN commented Oct 4, 2020

I'm all for it, but however, I'm a little perplexed by the following statement:

If the socket opens successfully, the service is considered online.

If the socket opens successfully, it just means that there's something listening on that port at the host specified, it doesn't necessarily mean that the service is healthy.

The code for this is quite simple, in fact:

conn, err := net.DialTimeout("tcp", "127.0.0.1:6379", 10 * time.Second)
if err != nil {
	// couldn't open connection :(
	panic(err)
}
fmt.Println("Managed to open connection successfully :)")
conn.Close()

But this is the least of our concerns, as the biggest concern lies with the conditions themselves. To be perfectly frank, the idea of supporting non-http requests hadn't even crossed my mind when I started Gatus, and which the fault lies entirely with me on this one, it doesn't change the fact that we need to ask ourselves how can we support both the new pattern you're suggesting as well as HTTP.

Let's take a look at the current core configuration options:

services:
  - name: twinnation  
    url: "https://twinnation.org/health"
    interval: 30s     
    conditions:
      - "[STATUS] == 200"         
      - "[BODY].status == UP"     
      - "[RESPONSE_TIME] < 300"

We can already see a few problems here, namely the conditions and the url:

  • A few of the conditions use placeholders that are specific to HTTP requests, such as the [BODY], which uses JSONPath, and [STATUS], which is the HTTP status.
  • The URL is, well, an URL, but TCP doesn't really care about URL, it only cares about the host name and the port.

Suggested solution

Coupled with forcing the prefix tcp:// for tcp services and http:///https:// for HTTP services,
ignoring unnecessary placeholders are configuration options would definitely make the work easy.

I will introduce a new placeholder, [CONNECTED], which will return true only if a connection could be established.
It will work for both TCP and HTTP - probably UDP in the future too.

Here's what the configuration for a TCP service would look like:

services:
  - name: redis
    url: "tcp://127.0.0.1:6379"
    interval: 1m
    conditions:
      - "[CONNECTED] == true"

Of course, quite a few fields will be useless for TCP services, such as services[].body, services[].insecure,
services[].headers, services[].method and services[].graphql. Likewise, the [STATUS] and [BODY] placeholders will be useless.

I originally wanted to create a new top level field tcpServices[] instead of services[] just for TCP services, but then I realized that later on, if we want to add support for native MySQL query, UDP, native Redis commands, etc. it would require adding one new top level field for each of them.

Instead, leveraging the prefix of the services[].url should be enough even for future use case, i.e.:

  • tcp://127.0.0.1:6379
  • redis://127.0.0.1:6379
  • postgres://{user}:{password}@{hostname}:{port}/{database-name}

In any case, I've already started the work necessary and it'll be available shortly.

@TwiN TwiN closed this as completed in 3ecfe4d Oct 4, 2020
@TwiN
Copy link
Owner

TwiN commented Oct 5, 2020

@dchidell I've pushed this into latest. Would you mind giving it a try locally and telling me what you think before I release this in v1.2.0?

@dchidell
Copy link
Contributor Author

dchidell commented Oct 5, 2020

Tested it using the latest docker tag, and looks good to me! Thanks for getting that in, SUPER fast as well.

To your above point regarding if the service is listening / healthy - there are almost limitless ways to test if a TCP service is healthy / functional, but while checking the TCP port is crude, it's often effective (i.e. the service hasn't crashed, and is at least configured to listen on the right port). It's a good indicator if something is likely to work or not without delving into the complexity of different services.

I don't think this will work for UDP based services, since there's no socket to 'establish' as you get with TCP using the 3 way handshake. I think you'd have to end up building a mechanism to send a particular byte string, and then expecting some form of response depending on the service type. For TCP however, this works beautifully :)

@TwiN
Copy link
Owner

TwiN commented Oct 5, 2020

For UDP, I was hoping to just send something and accept any answer as a valid answer, or perhaps support services[].body as what to write. Anyhow, one problem at a time 🤣

Thanks for your input! I'll release this in v1.2.0, probably today

@TwiN
Copy link
Owner

TwiN commented Oct 6, 2020

Released in v1.2.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants