Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement persistence #83

Merged
merged 1 commit into from
Feb 3, 2021
Merged

Implement persistence #83

merged 1 commit into from
Feb 3, 2021

Conversation

TwiN
Copy link
Owner

@TwiN TwiN commented Feb 3, 2021

No description provided.

@TwiN TwiN mentioned this pull request Feb 3, 2021
@TwiN TwiN merged commit cd1430f into master Feb 3, 2021
@TwiN TwiN deleted the persistence branch February 3, 2021 04:23
@TwiN
Copy link
Owner Author

TwiN commented Feb 4, 2021

Released in v2.1.0

@rissson
Copy link

rissson commented Mar 1, 2021

Quick question about this, how does it behave if several instances of gatus are using the same file?

Thanks for your work!

@TwiN
Copy link
Owner Author

TwiN commented Mar 2, 2021

@rissson Unfortunately, that won't do anything constructive.

The only interactions with the file are:

  1. At application start, read the file
  2. Every X minutes, auto save
  3. On application stop (i.e. SIGINT/SIGTERM), try to save one more time before exiting

Every save operations overwrites the previous data persisted, thus, multiple applications pointing to the same file would just independently persist their data over the other applications. Since Gatus only reads from the file once on application start, each application wouldn't be influenced by any changes made by other applications on the file.

TL;DR: Persistence is really just to ensure that the data can be read on the next application start, once the data is read, the entire dataset is in memory and any changes made to the file is inconsequential, thus, the current persistence implementation cannot be used for synchronization between multiple Gatus instances, which I assume is why you're asking.

@rissson
Copy link

rissson commented Mar 2, 2021

Actually, my question wasn't exactly about syncing multiple Gatus instances. My main problem is that my Gatus pods restart quite frequently:

devoups-gatus-6d6c579b67-4hmv4   1/1     Running   798        5d4h
devoups-gatus-6d6c579b67-82xw6   1/1     Running   797        5d4h
devoups-gatus-6d6c579b67-84r8s   1/1     Running   276        2d16h
devoups-gatus-6d6c579b67-n5ptf   1/1     Running   418        2d16h
devoups-gatus-6d6c579b67-w94wv   1/1     Running   418        2d16h

According to my logs, they get OOM killed by the kernel. So, the only thing I'm really looking for is whether they'll get some state after they're back up. Thus, it doesn't really matter if they don't sync their state. The only question left is what happens if two instances try to write to the file at the same time.

@TwiN
Copy link
Owner Author

TwiN commented Mar 3, 2021

@rissson If you care about persistence, you may want to use a StatefulSet instead, but before even worrying about that, I'd suggest that you use only a single pod for Gatus (unless you're using it for stress test purposes).

Furthermore, you should give the deployment a bit more memory if it's getting OOMKilled that often.

Would you mind showing me the output of kubectl get deploy devoups-gatus -o yaml as well as the output of kubectl top pods | grep devoups-gatus?

For reference, I use a single pod for a Gatus instance that has 50+ services to monitor and it hasn't restarted a single time in the past 26 days.

@rissson
Copy link

rissson commented Mar 3, 2021

Here is kubectl get deploy devoups-gatus -o yaml: http://ix.io/2Rza. kubectl top pods is not working on my cluster somehow. Looking for the process manually on the nodes, they all use about 40M of RAM.

@TwiN
Copy link
Owner Author

TwiN commented Mar 3, 2021

@rissson Since Gatus keeps all of its data in memory, if you have enough monitored services to cause the instance to use more than 30M of memory, I suggest that you increase your memory request to 40M and the limit to 100M.

i.e. Update the following:

        resources:
          limits:
            cpu: 200m
            memory: 50M
          requests:
            cpu: 50m
            memory: 20M

to

        resources:
          limits:
            cpu: 200m
            memory: 100M
          requests:
            cpu: 50m
            memory: 40M

As I've mentioned, I also suggest you decrease the number of replicas (spec.replicas) from 5 to 1, as Gatus is meant to be run on a single instance, since it doesn't benefit from using multiple instances because they don't share their data with other instances (unless you're using Gatus for stress testing purposes, in which case using multiple instances is reasonable).

P.S. For kubectl top pods to work, you must have metrics-server running in your cluster.

@rissson
Copy link

rissson commented Mar 13, 2021

I increased the limits as you suggested, no restart since. Again, thanks a lot for your wonderful work!

@TwiN
Copy link
Owner Author

TwiN commented Mar 14, 2021

@rissson Glad to hear it! 🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants