Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupted file not detected by cvmfs_server check -a #3553

Open
DrDaveD opened this issue Mar 25, 2024 · 5 comments
Open

Corrupted file not detected by cvmfs_server check -a #3553

DrDaveD opened this issue Mar 25, 2024 · 5 comments
Assignees

Comments

@DrDaveD
Copy link
Contributor

DrDaveD commented Mar 25, 2024

I found another case of doubled data on the Nebraska primary stratum 1, in the sft.cern.ch repository. The file was from 2021, so the corruption happened prior to the fix in #2991. The weirdest thing is that cvmfs_server check -a did not detect it. The last time the check was completed on sft.cern.ch was March 10, 2024. Oh, I see that there's a check option -i that says "check data integrity" and I have not been using that. Does it not even check the file sizes without -i? What does it check by default? It takes a terribly long time to check even without -i; I wonder how long it will take with it.

The corruption was also slightly different than before, but probably not enough different to be useful. This time there were two doubled 4096-byte blocks, copying bytes 4*4096 (16384) through 6*4096 (24576) into bytes 6*4096 through 8*4096 (32768).

Just in case it's helpful for testing a change to check, here is a zip file containing the good and corrupted forms of the file. The files were in the /srv/cvmfs/sft.cern.ch/data/1d subdirectory on the backup and primary stratum 1s, respectively.

@jblomer
Copy link
Member

jblomer commented Apr 4, 2024

Without -i, the check is only looking at the catalogs and checks the meta-data for consistency (e.g., valid file-system structure, correct accounting in the summary data, etc.). I think we could have another option to check the file sizes. This will be significantly more expensive because we need to stat (or HEAD) all referenced objects but it will still be short of a full hash verification.

@jblomer jblomer self-assigned this Apr 4, 2024
@DrDaveD
Copy link
Contributor Author

DrDaveD commented Apr 4, 2024

By default, without the -c option, it is checking for the existence of data "chunks" already using HEAD. I found that out with strace. Doesn't HEAD give file sizes? They might as well be checked at the same time.

I don't have exact timings on the -i integrity check, but it appears to add far less time than the regular check. The -i option translates to running cvmfs_swissknife scrub before cvmfs_swissknife check. On the machine where I have enabled integrity checks, there is currently running a cvmfs_swissknife check on lhcb.cern.ch that started at 21:03 two days ago (it is now 11:35) and according to the log the integrity check started at 01:41 that morning. So the integrity check took less than 19.5 hours and the regular check has been running so far over 36.5 hours. The logs only go back a month so I can't check exactly how long the last time took. However, if I sort all the "last_check" times in .cvmfs_status.json on the identical sister machine, it appears that the last check for lhcb.cern.ch there ran from Feb 27 11:42 to Mar 6 05:03, so more than 7.5 days. This was with cvmfs-server-2.10.1.

@DrDaveD
Copy link
Contributor Author

DrDaveD commented Apr 4, 2024

On the other side, if -i is used, is there any need for also checking the "chunks"? Maybe the -i option should imply the -c option.

@DrDaveD
Copy link
Contributor Author

DrDaveD commented Apr 5, 2024

And it's the lack of -c that's taking most of the time! I switched to cvmfs_server check -aic and the lhcb.cern.ch integrity and regular check took just slightly over 24 hours instead of 7.5 days.

@DrDaveD
Copy link
Contributor Author

DrDaveD commented Apr 16, 2024

Jakob and I discussed this and he said that there is still value to doing the HEAD requests (without -c) but at the same time it should check the Content-Length header to verify the size. The scrub will not detect missing files, so although it's good to do have -i, -c should not be used if you want to find all classes of errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants