Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to backup MQTT server database #1213

Open
amcewen opened this issue Aug 10, 2019 · 15 comments

Comments

@amcewen
Copy link
Member

commented Aug 10, 2019

Spinning this out of #1210 because it's not what caused the problem with mqtt.local/energy usage messages, but does need to be fixed.

@ajlennon ajlennon changed the title Need to create more space for MQTT server database Need to backup MQTT server database Aug 10, 2019

@ajlennon

This comment has been minimized.

Copy link
Contributor

commented Aug 10, 2019

We don't need to create more space. We need a data retention policy.

In the immediate term we need to back up the data before the device falls over because the disk filesystem is full.

We knew this would happen. We are now getting to the point we need to address this.

The data retention policy should answer the following questions

  • we currently generate a lot of data. I think power sensing devices publish at 1s intervals. Is this useful to us (From conversations with @goatchurchprime it may be as he can do some anaysis he couldn't do with less granular data).

  • we are not deleting any data. Thus we are going to fill up available uSD card memory on the RPi used for mqtt.local at some point. Should we delete old data?

  • it strikes me that a good solution would be to keep current data in its most granular form for analysis and average out older data, say over 5 minute blocks. This may not be appropriate though. Need input from @goatchurchprime

In the meantime we should store all the data collected thus far in case changes to mqtt.local cause the loss of the current database

@ajlennon

This comment has been minimized.

Copy link
Contributor

commented Aug 10, 2019

Some background info. The uSD filesystem currently looks like this

Filesystem                         Size  Used Avail Use% Mounted on
devtmpfs                           480M     0  480M   0% /dev
/dev/disk/by-partuuid/921093ea-02  300M  217M   64M  78% /mnt/sysroot/active
/dev/disk/by-label/resin-state      19M  222K   17M   2% /mnt/state
none                               300M  217M   64M  78% /
tmpfs                              488M  332K  488M   1% /dev/shm
tmpfs                              488M  9.9M  479M   3% /run
tmpfs                              488M     0  488M   0% /sys/fs/cgroup
/dev/mmcblk0p1                      40M  8.2M   32M  21% /mnt/boot
tmpfs                              488M     0  488M   0% /tmp
tmpfs                              488M   28K  488M   1% /var/volatile
/dev/mmcblk0p3                     300M  2.1M  278M   1% /mnt/sysroot/inactive
/dev/mmcblk0p6                      28G  1.4G   25G   6% /mnt/data

Looking at this it may be less of a problem than I'd thought depending on where the InfluxDB data is being held. Looks like a 32GB stick and if the d/b is on /mnt/data we have a huge amount of headroom

(This doesn't mitigate the need to back up the data to ensure we don't lose it!)

@ajlennon

This comment has been minimized.

Copy link
Contributor

commented Aug 10, 2019

Playing with the influxdb shell to generate some metrics on current usage

bash-4.4#  influx -execute 'SHOW DATABASES'
name: databases
name
----
balena-sense
_internal

Looking at the database size within the InfluxDB container we get

bash-4.4# pwd
/data/influxdb/data
bash-4.4# du -sh *
14.6M	_internal
536.3M	balena-sense

We can use balena inspect to determine the mount point for the /data partition within the host OS.

root@mqtt:~# ls /var/lib/docker/volumes/2_sense-data/_data/influxdb/data/
_internal  balena-sense

Following on from this it looks like there's some bind mounting or some such going on to map that location onto the /mnt/data mountpoint

root@mqtt:/var/lib/docker/volumes/2_sense-data# ls /mnt/data/docker/volumes/2_sense-data/_data/influxdb/data/balena-sense/
_series  autogen

So it seems as though we have about half a gig in use so far and plenty of space

@ajlennon

This comment has been minimized.

Copy link
Contributor

commented Aug 10, 2019

Details on backing up an online InfluxDB are here

Command seems to be something like

influxd backup -portable <path-to-backup>

So I ran a test backup command in the InfluxDB container which gives

bash-4.4# influxd backup -portable /data/influxdb-backup-20190810.dat
2019/08/10 16:06:54 backing up metastore to /data/influxdb-backup-20190810.dat/meta.00
2019/08/10 16:06:54 No database, retention policy or shard ID given. Full meta store backed up.
2019/08/10 16:06:54 Backing up all databases in portable format
2019/08/10 16:06:54 backing up db=
2019/08/10 16:06:54 backing up db=balena-sense rp=autogen shard=2 to /data/influxdb-backup-20190810.dat/balena-sense.autogen.00002.00 since 0001-01-01T00:00:00Z
2019/08/10 16:06:56 backing up db=balena-sense rp=autogen shard=6 to /data/influxdb-backup-20190810.dat/balena-sense.autogen.00006.00 since 0001-01-01T00:00:00Z
2019/08/10 16:06:58 backing up db=balena-sense rp=autogen shard=11 to /data/influxdb-backup-20190810.dat/balena-sense.autogen.00011.00 since 0001-01-01T00:00:00Z
2019/08/10 16:07:02 backing up db=balena-sense rp=autogen shard=18 to /data/influxdb-backup-20190810.dat/balena-sense.autogen.00018.00 since 0001-01-01T00:00:00Z
2019/08/10 16:07:08 backing up db=balena-sense rp=autogen shard=26 to /data/influxdb-backup-20190810.dat/balena-sense.autogen.00026.00 since 0001-01-01T00:00:00Z
2019/08/10 16:07:15 backing up db=balena-sense rp=autogen shard=34 to /data/influxdb-backup-20190810.dat/balena-sense.autogen.00034.00 since 0001-01-01T00:00:00Z
2019/08/10 16:07:24 backing up db=balena-sense rp=autogen shard=42 to /data/influxdb-backup-20190810.dat/balena-sense.autogen.00042.00 since 0001-01-01T00:00:00Z
2019/08/10 16:07:33 backing up db=balena-sense rp=autogen shard=50 to /data/influxdb-backup-20190810.dat/balena-sense.autogen.00050.00 since 0001-01-01T00:00:00Z
2019/08/10 16:07:42 backing up db=balena-sense rp=autogen shard=58 to /data/influxdb-backup-20190810.dat/balena-sense.autogen.00058.00 since 0001-01-01T00:00:00Z
2019/08/10 16:07:59 backing up db=balena-sense rp=autogen shard=66 to /data/influxdb-backup-20190810.dat/balena-sense.autogen.00066.00 since 0001-01-01T00:00:00Z
2019/08/10 16:08:19 backing up db=balena-sense rp=autogen shard=74 to /data/influxdb-backup-20190810.dat/balena-sense.autogen.00074.00 since 0001-01-01T00:00:00Z
2019/08/10 16:08:31 backing up db=balena-sense rp=autogen shard=82 to /data/influxdb-backup-20190810.dat/balena-sense.autogen.00082.00 since 0001-01-01T00:00:00Z
2019/08/10 16:08:45 backing up db=balena-sense rp=autogen shard=90 to /data/influxdb-backup-20190810.dat/balena-sense.autogen.00090.00 since 0001-01-01T00:00:00Z
2019/08/10 16:08:57 backing up db=balena-sense rp=autogen shard=98 to /data/influxdb-backup-20190810.dat/balena-sense.autogen.00098.00 since 0001-01-01T00:00:00Z
2019/08/10 16:09:09 backing up db=balena-sense rp=autogen shard=106 to /data/influxdb-backup-20190810.dat/balena-sense.autogen.00106.00 since 0001-01-01T00:00:00Z
2019/08/10 16:09:17 backing up db=balena-sense rp=autogen shard=114 to /data/influxdb-backup-20190810.dat/balena-sense.autogen.00114.00 since 0001-01-01T00:00:00Z
2019/08/10 16:09:24 backing up db=_internal rp=monitor shard=112 to /data/influxdb-backup-20190810.dat/_internal.monitor.00112.00 since 0001-01-01T00:00:00Z
2019/08/10 16:09:24 backing up db=_internal rp=monitor shard=113 to /data/influxdb-backup-20190810.dat/_internal.monitor.00113.00 since 0001-01-01T00:00:00Z
2019/08/10 16:09:24 backing up db=_internal rp=monitor shard=115 to /data/influxdb-backup-20190810.dat/_internal.monitor.00115.00 since 0001-01-01T00:00:00Z
2019/08/10 16:09:25 backing up db=_internal rp=monitor shard=116 to /data/influxdb-backup-20190810.dat/_internal.monitor.00116.00 since 0001-01-01T00:00:00Z
2019/08/10 16:09:25 backing up db=_internal rp=monitor shard=117 to /data/influxdb-backup-20190810.dat/_internal.monitor.00117.00 since 0001-01-01T00:00:00Z
2019/08/10 16:09:25 backing up db=_internal rp=monitor shard=118 to /data/influxdb-backup-20190810.dat/_internal.monitor.00118.00 since 0001-01-01T00:00:00Z
2019/08/10 16:09:25 backing up db=_internal rp=monitor shard=119 to /data/influxdb-backup-20190810.dat/_internal.monitor.00119.00 since 0001-01-01T00:00:00Z
2019/08/10 16:09:26 backing up db=_internal rp=monitor shard=120 to /data/influxdb-backup-20190810.dat/_internal.monitor.00120.00 since 0001-01-01T00:00:00Z
2019/08/10 16:09:26 backup complete:
2019/08/10 16:09:26 	/data/influxdb-backup-20190810.dat/20190810T160654Z.meta
2019/08/10 16:09:26 	/data/influxdb-backup-20190810.dat/20190810T160654Z.s2.tar.gz
2019/08/10 16:09:26 	/data/influxdb-backup-20190810.dat/20190810T160654Z.s6.tar.gz
2019/08/10 16:09:26 	/data/influxdb-backup-20190810.dat/20190810T160654Z.s11.tar.gz
2019/08/10 16:09:26 	/data/influxdb-backup-20190810.dat/20190810T160654Z.s18.tar.gz
2019/08/10 16:09:26 	/data/influxdb-backup-20190810.dat/20190810T160654Z.s26.tar.gz
2019/08/10 16:09:26 	/data/influxdb-backup-20190810.dat/20190810T160654Z.s34.tar.gz
2019/08/10 16:09:26 	/data/influxdb-backup-20190810.dat/20190810T160654Z.s42.tar.gz
2019/08/10 16:09:26 	/data/influxdb-backup-20190810.dat/20190810T160654Z.s50.tar.gz
2019/08/10 16:09:26 	/data/influxdb-backup-20190810.dat/20190810T160654Z.s58.tar.gz
2019/08/10 16:09:26 	/data/influxdb-backup-20190810.dat/20190810T160654Z.s66.tar.gz
2019/08/10 16:09:26 	/data/influxdb-backup-20190810.dat/20190810T160654Z.s74.tar.gz
2019/08/10 16:09:26 	/data/influxdb-backup-20190810.dat/20190810T160654Z.s82.tar.gz
2019/08/10 16:09:26 	/data/influxdb-backup-20190810.dat/20190810T160654Z.s90.tar.gz
2019/08/10 16:09:26 	/data/influxdb-backup-20190810.dat/20190810T160654Z.s98.tar.gz
2019/08/10 16:09:26 	/data/influxdb-backup-20190810.dat/20190810T160654Z.s106.tar.gz
2019/08/10 16:09:26 	/data/influxdb-backup-20190810.dat/20190810T160654Z.s114.tar.gz
2019/08/10 16:09:26 	/data/influxdb-backup-20190810.dat/20190810T160654Z.s112.tar.gz
2019/08/10 16:09:26 	/data/influxdb-backup-20190810.dat/20190810T160654Z.s113.tar.gz
2019/08/10 16:09:26 	/data/influxdb-backup-20190810.dat/20190810T160654Z.s115.tar.gz
2019/08/10 16:09:26 	/data/influxdb-backup-20190810.dat/20190810T160654Z.s116.tar.gz
2019/08/10 16:09:26 	/data/influxdb-backup-20190810.dat/20190810T160654Z.s117.tar.gz
2019/08/10 16:09:26 	/data/influxdb-backup-20190810.dat/20190810T160654Z.s118.tar.gz
2019/08/10 16:09:26 	/data/influxdb-backup-20190810.dat/20190810T160654Z.s119.tar.gz
2019/08/10 16:09:26 	/data/influxdb-backup-20190810.dat/20190810T160654Z.s120.tar.gz
2019/08/10 16:09:26 	/data/influxdb-backup-20190810.dat/20190810T160654Z.manifest

So this results in a backup folder in the host OS here at 432MB

root@mqtt:~#  du -sh /var/lib/docker/volumes/2_sense-data/_data/influxdb-backup-20190810.dat/
432M	/var/lib/docker/volumes/2_sense-data/_data/influxdb-backup-20190810.dat/

In the simplest case it would be helpful to batch up backups to a robust backing store on the local network somewhere.

It would be more interesting to do something HA as I think you were suggesting and have real-time backups to another instance? Comments @MatthewCroughan ?

@MatthewCroughan

This comment has been minimized.

Copy link

commented Aug 11, 2019

For this amount of data and throughput, I think Influx is intended to either be setup via HA or centralized on both a more powerful or high capacity system, like most things. Clearly it's fine for taking measurements and displaying them via Grafana on a pi, but long term usage produces an amount of writes that would be liable to kill an SD card/USB drive anyway. So to keep it in its current state, processing and storing as much as it, I think is unrealistic.

Alternatively, and I think this is the best option. We could do all of this with a 3d printed case to support a 1TB hard drive (since I have lots of those), then we'll have 1TB and can endeavour to set up the future distributed network. We could very very easily distribute the filesystem using GlusterFS, we wouldn't even have to think about it. Distributing the application (Influx) I have no idea about, and you would know more about.

I'm happy to setup a container on my server in the space which has lots of ECC ram and storage, install influx, and then we can just point all the sensors there. In fact, if I call it mqtt.local and we stop the one running on the pi, they should all transition over if we trust mDNS. We'll have to co-ordinate that and make sure there aren't any issues though. Let me know when you want to do that, or if you do. @ajlennon

@ajlennon

This comment has been minimized.

Copy link
Contributor

commented Aug 11, 2019

My preference would be to leave the d/b running on the box as it is and have the database replicated in real time to your box. Can you set that up?

@MatthewCroughan

This comment has been minimized.

Copy link

commented Aug 11, 2019

@ajlennon

I don't think we can do it in realtime. I don't understand how this could be done. No matter what, we're going to lose data. Imagine we start the transfer now, any data gathered in the meantime will be lost.

I can imagine a scenario where we do what I suggested, perform the transfer at the same time as cutting mqtt.local off from the network, then spawning the container with a hostname mqtt.local. But no matter what we do, we're still going to lose some data.

@ajlennon

This comment has been minimized.

Copy link
Contributor

commented Aug 12, 2019

The point of replication is that we don't lose data...

@MatthewCroughan

This comment has been minimized.

Copy link

commented Aug 12, 2019

@ajlennon Okay, then how do we backup what is currently being written?

@ajlennon

This comment has been minimized.

Copy link
Contributor

commented Aug 12, 2019

@MatthewCroughan

This comment has been minimized.

Copy link

commented Aug 12, 2019

All of that seems a bit complicated just to save from losing 10-20 minutes of sensor data, but I'm willing to try it, though you'll have to wait until I have a free few hours in the week, unless you want to try and do it. I can give you access to the aforementioned 1TB Influx container soon if you want to go ahead with that?

@ajlennon

This comment has been minimized.

Copy link
Contributor

commented Aug 12, 2019

Haven't got time at the moment. Replication or a similar mechanism is how we should be doing things.

@MatthewCroughan

This comment has been minimized.

Copy link

commented Aug 12, 2019

All for doing things the right way, if it's not so urgent we should do it that way I agree. Let me know when/if you're doing this so I can learn from it. I'll also give it a go at some point.

@ajlennon

This comment has been minimized.

Copy link
Contributor

commented Aug 12, 2019

I have no more magical insights into this than you. All I will be doing is googling and reading :)

@MatthewCroughan

This comment has been minimized.

Copy link

commented Aug 12, 2019

Well I'll give it a go soon, though I have other issues I'm working on, and other priorities, I can guarantee I'll get to it within 7 days though. I'll post back here with where I'm at with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.