Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement persistence through application restart #32

Closed
cjheppell opened this issue Nov 13, 2020 · 6 comments
Closed

Implement persistence through application restart #32

cjheppell opened this issue Nov 13, 2020 · 6 comments
Labels
area/storage Related to storage feature New feature or request help wanted Extra attention is needed

Comments

@cjheppell
Copy link
Contributor

cjheppell commented Nov 13, 2020

At the moment, it seems that the status results are stored in an in-memory map: https://github.com/TwinProduction/gatus/blob/3773f952a80058eb88f48fe9ae9ac51bf1c1efe7/watchdog/watchdog.go#L16

And also limited to only 20 results:
https://github.com/TwinProduction/gatus/blob/3773f952a80058eb88f48fe9ae9ac51bf1c1efe7/watchdog/watchdog.go#L59-L62

It'd be great if these were stored in a persistent data store somehow instead (e.g a database, or files on disk). Whilst Gatus currently only returns the last 20 results, it'd be nice to keep the history to review outages that might have occurred in the past. Storing the results in a database/persistence layer would enable this as the first step.

Of course, the option to retain results in memory should also be kept as it makes Gatus very easy to get up and running.

@TwiN TwiN added the feature New feature or request label Nov 13, 2020
@TwiN
Copy link
Owner

TwiN commented Nov 13, 2020

This has been on my list as well.

I've also entertained the idea of having another in-memory map that stored the last N outages for each services and dynamically generated a timeline instead of trying to persist the entire history, i.e.

Service A Outages
- From YYYY-MM-DD hh:mm:ss to YYYY-MM-DD hh:mm:ss

Though this wouldn't survive application restarts.

But yeah, thanks for opening this issue.

@TwiN TwiN added the help wanted Extra attention is needed label Nov 13, 2020
@GeorgFleig
Copy link

Just a side note: Having a separate persistence layer (not in-memory) would be a step towards running Gatus in a high-availability setup to ensure monitoring works even when a host is down. In this case the state of performed requests of all instances would be shared and the dashboard could show them all.

Syncing execution of requests and sending of alerts would still be an open topic though 🙃

@TwiN TwiN changed the title Feature request - Persistent history of results Persistent result history Dec 31, 2020
@cjheppell
Copy link
Contributor Author

As discussed in #66 (comment), the first step towards this will be introducing file-based persistence.

Later, other storage means can be introduced if required (e.g databases).

@yarhamjohn
Copy link

I am taking a look at this with @cjheppell this afternoon

@TwiN TwiN linked a pull request Feb 2, 2021 that will close this issue
@TwiN TwiN changed the title Persistent result history Persist results to file Feb 3, 2021
@TwiN
Copy link
Owner

TwiN commented Feb 3, 2021

Posting part of the comment I posted in #69 here for traceability:

I've been working on persistence, and I've tried and compared several different options:

  • memory-only (current implementation)
  • file-only (with bolt)
  • gocache (memory + file, but persistence is achieved through autosave)

The file-only option was pretty terrible in term of performance, granted it may have been because the implementation itself was a little lazy.

memory-only was used more as a baseline to compare performance, but since the purpose was to achieve performance, it's a no-go.

Finally, gocache gave such good results. It's essentially on-par with memory while providing the persistence required, but well, that's why I suggested using it in the first place.

To continue on that comment, the implementation I went for does not persist the data immediately. It dumps the data to a file every 7 minutes. I had initially tried to implement it through bolt only, but I was very unsatisfied with the performance of using file-only, so I decided to use a hybrid: in memory + occasional persistence.

I've renamed this issue to focus purely on persisting data to survive restarts, thus allowing the data generated by Gatus to survive restarts which in turns makes services with longer interval more viable than before.
The ability to view older history is definitely also on my agenda, but since the former is more important right now, I decided to focus only developing that feature.

The ability to go back in history will be added in the future, but I don't have a specific date yet.

P.S. For those of you who want the ability to view older history over a long period right now as opposed to just persistence, it's not ideal, but you can probably leverage the /metrics endpoint and persist it through Prometheus. I didn't give an insane amount of love to the metrics, but if it's requested enough, I can work on improving the metrics exposed (unless somebody else wants to do it in my stead 🤷‍♂️)

@TwiN TwiN changed the title Persist results to file Implement persistence to survive application restart Feb 3, 2021
@TwiN TwiN changed the title Implement persistence to survive application restart Implement persistence through application restart Feb 3, 2021
@TwiN
Copy link
Owner

TwiN commented Feb 4, 2021

Released in v2.1.0

@TwiN TwiN closed this as completed Feb 4, 2021
@TwiN TwiN added the area/storage Related to storage label Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/storage Related to storage feature New feature or request help wanted Extra attention is needed
Projects
None yet
4 participants