Skip to content
This repository has been archived by the owner on Mar 4, 2024. It is now read-only.

Add crc data checksum to uv Snapshot #201

Closed
MathieuBordere opened this issue Apr 30, 2021 · 10 comments
Closed

Add crc data checksum to uv Snapshot #201

MathieuBordere opened this issue Apr 30, 2021 · 10 comments
Assignees
Labels
Feature New feature, not a bug

Comments

@MathieuBordere
Copy link
Contributor

MathieuBordere commented Apr 30, 2021

If I'm not mistaken we calculate crc checksums for the snapshot.meta and segment files, but perform no integrity checking on the snapshot data files. This should be introduced to detect disk corruptions.

@freeekanayaka
Copy link
Contributor

FWIW the rational behind that was that the application should be entirely responsible for what is contained in the snapshot data file, and possibly come up with its own way to ensure data integrity. YMMV

@MathieuBordere
Copy link
Contributor Author

MathieuBordere commented Apr 30, 2021

Sometimes we receive issues where one of the snapshot files seems to cause issues, and it's difficult to pinpoint where it went wrong. If we know if what we read from disk is the same as what we received from the application it could already eliminate a potential source of error. That was a bit my reasoning, don't know if you think it makes sense.

@freeekanayaka
Copy link
Contributor

Do we have more specific data about the issues we saw? Like a copy of the data directory of the broken deployment (or at least of the snapshot file)

@MathieuBordere
Copy link
Contributor Author

Yes, we have one. Not sure if I can share though (if you're interested), will check.

@freeekanayaka
Copy link
Contributor

freeekanayaka commented Apr 30, 2021

It's okay not to share, I wouldn't have time to check it, just wondering if you have more details about what's wrong with the file, e.g. if you used a debugger to understand where SQLite fails when opening it (assuming that's what's happening).

@MathieuBordere
Copy link
Contributor Author

A dump of the db reveals errors like this:

sqlite> SELECT id, instance_id, name, creation_date, stateful, description, expiry_date  FROM instances_snapshots WHERE instance_id = 14 AND id > 2360 AND id < 2600;
2368|14|snapshot-65|2021-04-17 19:00:08.875615999+00:00|0||2021-05-17 12:00:08.874972385-07:00
2380|14|snapshot-66|2021-04-18 01:00:17.732227872+00:00|0||2021-05-17 18:00:17.731749164-07:00
Error: database disk image is malformed

and

sqlite> pragma integrity_check;
*** in database main ***
On tree page 126 cell 65: Rowid 2359 out of order 
On tree page 43 cell 106: Rowid 71835 out of order
On tree page 43 cell 97: Rowid 71291 out of order 
On tree page 43 cell 91: Rowid 70960 out of order 
On tree page 43 cell 90: Rowid 70911 out of order 
On tree page 43 cell 79: Rowid 70272 out of order 
On tree page 43 cell 76: Rowid 70097 out of order 
On tree page 43 cell 51: Rowid 63037 out of order 
On tree page 43 cell 50: Rowid 62740 out of order 
On tree page 354 cell 51: Rowid 68748 out of order
row 66 missing from index sqlite_autoindex_instances_snapshots_1
wrong # of entries in index sqlite_autoindex_instances_snapshots_1
row 66 missing from index sqlite_autoindex_storage_volumes_snapshots_2
row 66 missing from index sqlite_autoindex_storage_volumes_snapshots_1
row 93 missing from index sqlite_autoindex_storage_volumes_snapshots_2
--- continues ----
row 241 missing from index sqlite_autoindex_storage_volumes_snapshots_2 
row 243 missing from index sqlite_autoindex_storage_volumes_snapshots_2 
row 248 missing from index sqlite_autoindex_storage_volumes_snapshots_2

@MathieuBordere
Copy link
Contributor Author

MathieuBordere commented Apr 30, 2021

This is when trying to load the latest snapshot, the oldest snapshot is fine and is also +- 0.7MB bigger (5.5MB vs 4.8MB, not necessarily relevant info).

@freeekanayaka
Copy link
Contributor

Interesting that a couple of rows where returned, so the file is not complete garbage. I guess the database schema is not corrupted? SELECT * FROM sqlite_master WHERE type='table' or something like that.

@MathieuBordere
Copy link
Contributor Author

Yes, that runs fine.

@MathieuBordere
Copy link
Contributor Author

Checksum has been added as part of #207.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Feature New feature, not a bug
Projects
None yet
Development

No branches or pull requests

2 participants