Implement persistence through application restart #32

cjheppell · 2020-11-13T14:39:57Z

At the moment, it seems that the status results are stored in an in-memory map: https://github.com/TwinProduction/gatus/blob/3773f952a80058eb88f48fe9ae9ac51bf1c1efe7/watchdog/watchdog.go#L16

And also limited to only 20 results:
https://github.com/TwinProduction/gatus/blob/3773f952a80058eb88f48fe9ae9ac51bf1c1efe7/watchdog/watchdog.go#L59-L62

It'd be great if these were stored in a persistent data store somehow instead (e.g a database, or files on disk). Whilst Gatus currently only returns the last 20 results, it'd be nice to keep the history to review outages that might have occurred in the past. Storing the results in a database/persistence layer would enable this as the first step.

Of course, the option to retain results in memory should also be kept as it makes Gatus very easy to get up and running.

TwiN · 2020-11-13T18:28:11Z

This has been on my list as well.

I've also entertained the idea of having another in-memory map that stored the last N outages for each services and dynamically generated a timeline instead of trying to persist the entire history, i.e.

Service A Outages
- From YYYY-MM-DD hh:mm:ss to YYYY-MM-DD hh:mm:ss

Though this wouldn't survive application restarts.

But yeah, thanks for opening this issue.

GeorgFleig · 2020-12-11T16:33:50Z

Just a side note: Having a separate persistence layer (not in-memory) would be a step towards running Gatus in a high-availability setup to ensure monitoring works even when a host is down. In this case the state of performed requests of all instances would be shared and the dashboard could show them all.

Syncing execution of requests and sending of alerts would still be an open topic though 🙃

cjheppell · 2021-01-08T09:51:41Z

As discussed in #66 (comment), the first step towards this will be introducing file-based persistence.

Later, other storage means can be introduced if required (e.g databases).

yarhamjohn · 2021-01-08T14:21:02Z

I am taking a look at this with @cjheppell this afternoon

TwiN · 2021-02-03T02:45:55Z

Posting part of the comment I posted in #69 here for traceability:

I've been working on persistence, and I've tried and compared several different options:

memory-only (current implementation)

file-only (with bolt)

gocache (memory + file, but persistence is achieved through autosave)

The file-only option was pretty terrible in term of performance, granted it may have been because the implementation itself was a little lazy.

memory-only was used more as a baseline to compare performance, but since the purpose was to achieve performance, it's a no-go.

Finally, gocache gave such good results. It's essentially on-par with memory while providing the persistence required, but well, that's why I suggested using it in the first place.

To continue on that comment, the implementation I went for does not persist the data immediately. It dumps the data to a file every 7 minutes. I had initially tried to implement it through bolt only, but I was very unsatisfied with the performance of using file-only, so I decided to use a hybrid: in memory + occasional persistence.

I've renamed this issue to focus purely on persisting data to survive restarts, thus allowing the data generated by Gatus to survive restarts which in turns makes services with longer interval more viable than before.
The ability to view older history is definitely also on my agenda, but since the former is more important right now, I decided to focus only developing that feature.

The ability to go back in history will be added in the future, but I don't have a specific date yet.

P.S. For those of you who want the ability to view older history over a long period right now as opposed to just persistence, it's not ideal, but you can probably leverage the /metrics endpoint and persist it through Prometheus. I didn't give an insane amount of love to the metrics, but if it's requested enough, I can work on improving the metrics exposed (unless somebody else wants to do it in my stead 🤷‍♂️)

TwiN · 2021-02-04T00:23:35Z

Released in v2.1.0

TwiN added the feature New feature or request label Nov 13, 2020

TwiN added the help wanted Extra attention is needed label Nov 13, 2020

cjheppell mentioned this issue Dec 30, 2020

Introduce Postgres persistence layer #66

Closed

TwiN changed the title ~~Feature request - Persistent history of results~~ Persistent result history Dec 31, 2020

cjheppell mentioned this issue Dec 31, 2020

Split memory map out of watchdog #67

Merged

yarhamjohn mentioned this issue Jan 8, 2021

Add file persistence #69

Closed

TwiN linked a pull request Feb 2, 2021 that will close this issue

Implement persistence #82

Closed

TwiN changed the title ~~Persistent result history~~ Persist results to file Feb 3, 2021

TwiN changed the title ~~Persist results to file~~ Implement persistence to survive application restart Feb 3, 2021

TwiN changed the title ~~Implement persistence to survive application restart~~ Implement persistence through application restart Feb 3, 2021

TwiN closed this as completed Feb 4, 2021

TwiN added the area/storage Related to storage label Feb 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement persistence through application restart #32

Implement persistence through application restart #32

cjheppell commented Nov 13, 2020 •

edited

Loading

TwiN commented Nov 13, 2020

GeorgFleig commented Dec 11, 2020

cjheppell commented Jan 8, 2021

yarhamjohn commented Jan 8, 2021

TwiN commented Feb 3, 2021

TwiN commented Feb 4, 2021

Implement persistence through application restart #32

Implement persistence through application restart #32

Comments

cjheppell commented Nov 13, 2020 • edited Loading

TwiN commented Nov 13, 2020

GeorgFleig commented Dec 11, 2020

cjheppell commented Jan 8, 2021

yarhamjohn commented Jan 8, 2021

TwiN commented Feb 3, 2021

TwiN commented Feb 4, 2021

cjheppell commented Nov 13, 2020 •

edited

Loading