The current data strategy is essentially to keep everything in memory and back it up to boltdb as a failsafe only. This has a few issues, including long startup times, long save pauses, inability to share data between multiple instances, etc.
We would like to move to using a redis datastore for bosun state data, but also don't want to alienate users who prefer a standalone application. ledisdb allows us to have a in-proc redis-compatible data store. A standard redis client can be used to talk to it, or can be pointed at a real redis server.
redisHost = myRedis:6379
ledisDir = /opt/bosun/ledis_data
default setup is ledisDir = ledis_data
ledisDir = ledis_data
I strongly recommend setting one of the above config items before rolling this change.
We will likely only convert one data structure at a time in order to test things thoroughly.
This pr only migrates metric-metadata.
Read through the changes, nothing jumps out except the unused import statement that I commented about. Some more notes:
ledisDir = ../ledis_data
An hour in and the metadata tab started working, so I think it is just an scollector metadata issue.
The series type = auto and /api/metadata/get?metric=bosun.collect.sent route missing rate details mentioned above are still occurring.
Confirmed the series type is being set correctly now, but if I query for something that doesn't have a rate metadata I would expect to get the error message indicated above. It seems to just default to gauge now instead of warning that you have to manually select one.
Not sure if you still want comments here or on the other PR.
Converting metadata storage from in-memory to a redis-based model.
@gbrayut this should be the authority now, and should build. Dependencies have been previously vendored.
looks good now, the graph tab displays the correct error when there isn't metadata for gauge/counter.
Probably ready to start testing this on branchbosun, just make sure to check the host metadata tab to see if anything stops working there. I'm out in Seattle next week but ping me if you run into any issues.
not deleting from bolt, just setting flag.
Setting reds clientname
So, just to make sure I'm understanding this correctly, if we were to configure Bosun to interface with Redis, we'd be able to share the data across multiple Bosun instances and thus have a team of Bosun instances handling the work or am I completely off here?
@krutaw Redis does not give us clustering. It does position us to have a redis replica, and instance of bosun that only reads the state. But I don't think we were looking towards active-active, the redis readonly replica would at least seem to be the next logical step for us, but not there yet.
The main reason we brought in redis was performance. We had everything in big blobs, and lock times would cause 30 second delays in places as we started to grow our instance.
That makes alot of sense. Is that something that is on the radar or more to the point, how do you guys handle the whole "High Availability" question regarding Bosun internally?
Currently manual failover and backups. Bosun also has to restart to change the config, which causes about a 20 second gap as it loads all the last data points from redis (although if you don't index your data to bosun, this doesn't mater).
In general though, I posted this earlier this week to show what we do at Stack: http://kbrandt.com/post/bosun_arch/
AWESOME post, seriously, thank you. So quick question, how do you detect when the bosun instance needs to be rebuilt from backup/restarted/etc?