Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potentially doing more harm than good by increasing bytes written? #173

Open
kwinz opened this issue Sep 19, 2021 · 3 comments
Open

Potentially doing more harm than good by increasing bytes written? #173

kwinz opened this issue Sep 19, 2021 · 3 comments

Comments

@kwinz
Copy link

kwinz commented Sep 19, 2021

log2ram is supposed to be useful for e.g. protecting the SD card of Raspberry Pis from wearing out.
Usually the log is constantly synced to SD card, causing write amplification. Because the SD card will have to read and write back a whole block each time. Log2RAM solves this by having the logs written to tempfs and only persisted on the SD card e.g. once a day or once a week. So far so great!

But by default the Raspian is configured to log rotate. Usually this is harmless.
Every logfile (daemon, debug, dpkg, kern, syslog, user,...) and every version of that is just renamed with incremented numbers. And by default number 2 and up is compressed.
E.g. syslog -> syslog.1, ... syslog3.gz -> syslog4.gz,...
Renames don't cause excessive writes, they are basically free.

But what log2ram does as far as I know is using rsync to copy those changes to the SD card. link to source
I think it is smart enough to not touch files that are already present and are identical. But rsync has no way of tracking renames. source
The previously free renames get turned into expensive rewrites of all the log files on every rotation! It will always rewrite syslog.1, syslog2.gz, syslog3.gz syslog4.gz and so on until syslog7.gz on every log rotation. By default turning 2 writes (once uncompressed and once compressed) into 8 writes (twice uncompressed and 6 times compressed)

According to this rsync has a flag -y, --fuzzy that tells rsync that it should look for a basis file for any destination file that is missing. But I am unsure if that just reduces the transferred data from source to destination or if this would actually reduce writes (doubtful).

Is this correct? If so could you please warn about this effect prominently in the README.md? And suggest how to work around this? I know from issue #65 that Log2Ram is a KISS project and wants to minimize impact on other OS components. But if the above is a real issue please provide at least a simple workaround or sample /etc/logrotate config in the README.md to address the problem that affects everybody with the default install.
If this is not a correct issue please say why it isn't in the documentation.

Other maybe related issues:
#64, #65 (old log) , #92 (fuzzy param causing issues)
And I also found #45, and #81

@azlux
Copy link
Owner

azlux commented Oct 12, 2021

Hi,
I appreciate the work you have done searching for information here.

Good parameters have been a long discussion with rsync developers. We have debate between --inplace and --fuzzy , two options, not doing the same work, but they cannot work well together.
As you have seen, the fuzzy can cause many issues with new files.

I just saw some rsync trick on the link you've provide where people run rsync many times, that can be a improvement.

To comfort you on the potential harm, log2ram maybe write a full file every time, but it do it only one time a day. It still better than write small changes minutes after minutes because SDcard have segments block you open at every write. It's my own observation.

About your concern, the workaround for you is the disable the systemctl-timer job. This way, logs will be write only on shutdown/reboot.
To be transparent with you, I use log2ram on all my devices, but I didn't improve it since a long time since I consider it's work fine. I'm quite sure there are improvements, you point one here.

Best regards,
Azlux

@twojstaryzdomu
Copy link

log2ram-daily.service runs once a day, not really a big concern. How many writes are you expecting to save?

To support renames, one could run md5 checksums on source & target files, match them up and rename them before rsync runs, as follows:

# join <(find /var/log -maxdepth 1 -type f -exec md5sum \{\} \; | sort ) <(find /var/hdd.log -maxdepth 1 -type f -exec md5sum \{\} \; | sort )
0efb70e261500beeb7384858ff8203c1 /var/log/wtmp.1 /var/hdd.log/wtmp.1
237be6c96f54e6f4eb97469936cbe94a /var/log/alternatives.log /var/hdd.log/alternatives.log
32039827158e13fff19201d6571b938e /var/log/log2ram.log.7.gz /var/hdd.log/log2ram.log.6.gz
3eb732e864756efdb66dde6738f587f7 /var/log/messages.1 /var/hdd.log/messages
535e7a5baa38ee54c45595baf1eaecdf /var/log/log2ram.log.5.gz /var/hdd.log/log2ram.log.4.gz
65452d110561a3f75517dd100c48d5f7 /var/log/debug.1 /var/hdd.log/debug
675b2f8c2707bb4ed86852e7e25ed410 /var/log/log2ram.log.3.gz /var/hdd.log/log2ram.log.2.gz
aee61bc37790a1a0f7f89d958ec91ccc /var/log/kern.log.1 /var/hdd.log/kern.log
aee61bc37790a1a0f7f89d958ec91ccc /var/log/kern.log.1 /var/hdd.log/syslog
aee61bc37790a1a0f7f89d958ec91ccc /var/log/syslog.1 /var/hdd.log/kern.log
aee61bc37790a1a0f7f89d958ec91ccc /var/log/syslog.1 /var/hdd.log/syslog
c33e1bcb601de863cf1f08423f46bc92 /var/log/log2ram.log.4.gz /var/hdd.log/log2ram.log.3.gz
dde1a82969480a2c7e8f45e046f3df22 /var/log/log2ram.log.6.gz /var/hdd.log/log2ram.log.5.gz
e4868925f2d32b0dd404826aa6a53b27 /var/log/dpkg.log /var/hdd.log/dpkg.log

Do note the duplicates which will result in unnecessary renames (kern.log & syslog). To get around it would require additional code and rigorous testing.

Something that could work would be the following code:

# join <(find /var/log -maxdepth 1 -type f -exec md5sum \{\} \; | sort ) <(find /var/hdd.log -maxdepth 1 -type f -exec md5sum \{\} \; | sort )  | cut -f2,3 -d' ' | while read a b; do echo mv ${a/\log/hdd.log} ${b}.temp; done; find /var/hdd.log -name '*.temp' | while read f; do echo mv ${f%.temp} ${f}; done
mv /var/hdd.log/wtmp.1 /var/hdd.log/wtmp.1.temp
mv /var/hdd.log/alternatives.log /var/hdd.log/alternatives.log.temp
mv /var/hdd.log/log2ram.log.7.gz /var/hdd.log/log2ram.log.6.gz.temp
mv /var/hdd.log/messages.1 /var/hdd.log/messages.temp
mv /var/hdd.log/log2ram.log.5.gz /var/hdd.log/log2ram.log.4.gz.temp
mv /var/hdd.log/debug.1 /var/hdd.log/debug.temp
mv /var/hdd.log/log2ram.log.3.gz /var/hdd.log/log2ram.log.2.gz.temp
mv /var/hdd.log/kern.log.1 /var/hdd.log/kern.log.temp
mv /var/hdd.log/kern.log.1 /var/hdd.log/syslog.temp
mv /var/hdd.log/syslog.1 /var/hdd.log/kern.log.temp
mv /var/hdd.log/syslog.1 /var/hdd.log/syslog.temp
mv /var/hdd.log/log2ram.log.4.gz /var/hdd.log/log2ram.log.3.gz.temp
mv /var/hdd.log/log2ram.log.6.gz /var/hdd.log/log2ram.log.5.gz.temp
mv /var/hdd.log/dpkg.log /var/hdd.log/dpkg.log.temp

Is it worth it? It will create all sorts of issues demanding testing to ensure no data loss occurs. For example you could lose POSIX attributes if more than two files with the same md5sum are matched up out of order (like kern.log & syslog from the example), if they only differ in permissions or owner.

If your logs are large it will run for a long time. Worse, if they're being written to and md5sum is stuck, then it will run forever. Getting around it from a shell script is no trivial matter.

Limaa added a commit to Limaa/log2ram that referenced this issue Feb 8, 2022
- Adds support for missing options in the /etc/log2ram.conf (USE_RSYNC and MAIL)
- apt-key command has been deprecated, as mentioned by issue azlux#173. Changing it to use a keyring in /usr/share/keyrings
Limaa added a commit to Limaa/log2ram that referenced this issue Feb 8, 2022
- Adds support for missing options in the /etc/log2ram.conf (USE_RSYNC and MAIL)
- apt-key command has been deprecated, as mentioned by issue azlux#173. Changing it to use a keyring in /usr/share/keyrings
Limaa added a commit to Limaa/log2ram that referenced this issue Feb 8, 2022
- Adds support for missing options in the /etc/log2ram.conf (USE_RSYNC and MAIL)
- apt-key command has been deprecated, as mentioned by issue azlux#173. Changing it to use a keyring in /usr/share/keyrings
@thomas725
Copy link

thomas725 commented Oct 21, 2022

thank you for publishing your findings!

I guess since I'm using btrfs with enabled compression I wouldn't need the compression part of logrotate, but I think it's also the thing that takes care of deleting old logs, is that correct?

Do you have any suggestions on how to workaround this problem? Could we maybe get logrotate to use timestamp strings instead of an incrementing number that renames all old logs on every run?

UPDATE: so I've checked out the file /etc/logrotate.conf on my system and found it already documents the option dateext which when used should prevent repeated rewriting of whole logfiles on every run of logrotate. I guess to force it to take effect I'll put it after the include /etc/logrotate.d to overwrite contrary settings of specific applications. Also I've put nocompress there to not have it written thrice (on creation, on moving with date adition, and a 3rd time when delaycompress has been set as many apps do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants