Implement persistence #83

TwiN · 2021-02-03T04:09:55Z

No description provided.

TwiN · 2021-02-04T00:23:57Z

Released in v2.1.0

rissson · 2021-03-01T16:06:33Z

Quick question about this, how does it behave if several instances of gatus are using the same file?

Thanks for your work!

TwiN · 2021-03-02T02:10:35Z

@rissson Unfortunately, that won't do anything constructive.

The only interactions with the file are:

At application start, read the file
Every X minutes, auto save
On application stop (i.e. SIGINT/SIGTERM), try to save one more time before exiting

Every save operations overwrites the previous data persisted, thus, multiple applications pointing to the same file would just independently persist their data over the other applications. Since Gatus only reads from the file once on application start, each application wouldn't be influenced by any changes made by other applications on the file.

TL;DR: Persistence is really just to ensure that the data can be read on the next application start, once the data is read, the entire dataset is in memory and any changes made to the file is inconsequential, thus, the current persistence implementation cannot be used for synchronization between multiple Gatus instances, which I assume is why you're asking.

rissson · 2021-03-02T17:38:00Z

Actually, my question wasn't exactly about syncing multiple Gatus instances. My main problem is that my Gatus pods restart quite frequently:

devoups-gatus-6d6c579b67-4hmv4   1/1     Running   798        5d4h
devoups-gatus-6d6c579b67-82xw6   1/1     Running   797        5d4h
devoups-gatus-6d6c579b67-84r8s   1/1     Running   276        2d16h
devoups-gatus-6d6c579b67-n5ptf   1/1     Running   418        2d16h
devoups-gatus-6d6c579b67-w94wv   1/1     Running   418        2d16h

According to my logs, they get OOM killed by the kernel. So, the only thing I'm really looking for is whether they'll get some state after they're back up. Thus, it doesn't really matter if they don't sync their state. The only question left is what happens if two instances try to write to the file at the same time.

TwiN · 2021-03-03T02:57:59Z

@rissson If you care about persistence, you may want to use a StatefulSet instead, but before even worrying about that, I'd suggest that you use only a single pod for Gatus (unless you're using it for stress test purposes).

Furthermore, you should give the deployment a bit more memory if it's getting OOMKilled that often.

Would you mind showing me the output of kubectl get deploy devoups-gatus -o yaml as well as the output of kubectl top pods | grep devoups-gatus?

For reference, I use a single pod for a Gatus instance that has 50+ services to monitor and it hasn't restarted a single time in the past 26 days.

rissson · 2021-03-03T17:31:20Z

Here is kubectl get deploy devoups-gatus -o yaml: http://ix.io/2Rza. kubectl top pods is not working on my cluster somehow. Looking for the process manually on the nodes, they all use about 40M of RAM.

TwiN · 2021-03-03T23:46:07Z

@rissson Since Gatus keeps all of its data in memory, if you have enough monitored services to cause the instance to use more than 30M of memory, I suggest that you increase your memory request to 40M and the limit to 100M.

i.e. Update the following:

        resources:
          limits:
            cpu: 200m
            memory: 50M
          requests:
            cpu: 50m
            memory: 20M

to

        resources:
          limits:
            cpu: 200m
            memory: 100M
          requests:
            cpu: 50m
            memory: 40M

As I've mentioned, I also suggest you decrease the number of replicas (spec.replicas) from 5 to 1, as Gatus is meant to be run on a single instance, since it doesn't benefit from using multiple instances because they don't share their data with other instances (unless you're using Gatus for stress testing purposes, in which case using multiple instances is reasonable).

P.S. For kubectl top pods to work, you must have metrics-server running in your cluster.

rissson · 2021-03-13T17:39:35Z

I increased the limits as you suggested, no restart since. Again, thanks a lot for your wonderful work!

TwiN · 2021-03-14T14:28:18Z

@rissson Glad to hear it! 🥳

Implement persistence

79bef8d

TwiN mentioned this pull request Feb 3, 2021

Implement persistence #82

Closed

TwiN merged commit cd1430f into master Feb 3, 2021

TwiN deleted the persistence branch February 3, 2021 04:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement persistence #83

Implement persistence #83

TwiN commented Feb 3, 2021

TwiN commented Feb 4, 2021

rissson commented Mar 1, 2021

TwiN commented Mar 2, 2021 •

edited

Loading

rissson commented Mar 2, 2021

TwiN commented Mar 3, 2021

rissson commented Mar 3, 2021

TwiN commented Mar 3, 2021

rissson commented Mar 13, 2021

TwiN commented Mar 14, 2021

Implement persistence #83

Implement persistence #83

Conversation

TwiN commented Feb 3, 2021

TwiN commented Feb 4, 2021

rissson commented Mar 1, 2021

TwiN commented Mar 2, 2021 • edited Loading

rissson commented Mar 2, 2021

TwiN commented Mar 3, 2021

rissson commented Mar 3, 2021

TwiN commented Mar 3, 2021

rissson commented Mar 13, 2021

TwiN commented Mar 14, 2021

TwiN commented Mar 2, 2021 •

edited

Loading