Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/subd: export a merkle hash of on-disk state #551

Open
masiulaniec opened this issue Jan 24, 2019 · 17 comments
Open

cmd/subd: export a merkle hash of on-disk state #551

masiulaniec opened this issue Jan 24, 2019 · 17 comments

Comments

@masiulaniec
Copy link
Contributor

masiulaniec commented Jan 24, 2019

It would be good to have a metric that holds, for example, a float32 obtained by taking a 4-byte prefix of a SHA-512 of the merkle hash of the entire file system. Such values could be logged to time series databases, and used by monitoring systems for making sure that the managed hosts converge to the same bits. Subd already scans the file system and calculates hashes so deriving a merkle hash should not introduce much extra overhead.

@rgooch
Copy link
Contributor

rgooch commented Jan 24, 2019

I like the basic idea. Does it need to be a Merkle tree hash? That would require storing hashes in the directory inodes.
A more straight-forward implementation might be to have a modified hasher which hashes each hash that is computed.
Note that, either way, this would only expose a hash of all the file data. Inode metadata would not be captured.

@masiulaniec
Copy link
Contributor Author

Agreed on all counts. I just wanted to put the basic idea in your head. I think metadata ought to be part of the hash.

@rgooch
Copy link
Contributor

rgooch commented Jan 24, 2019

Metadata will be more complicated, but I agree that it's the kind of thing you'd want. Perhaps mtime data maybe not included?

@masiulaniec
Copy link
Contributor Author

I can see wanting to exclude mtime for computed files.

@rgooch
Copy link
Contributor

rgooch commented Jan 25, 2019

What about mtime for regular files?

@masiulaniec
Copy link
Contributor Author

For regular files, mtime is image-defined and enforced just like any other attribute. I would include it.

@rgooch
Copy link
Contributor

rgooch commented Jan 25, 2019

If mtimes for regular files are included in the hash, they will also be included for computed files, because as far as the sub is concerned, they are just regular files. It's only the Dominator that knows that they are computed files.

@masiulaniec
Copy link
Contributor Author

Ack. So I don't see a reason to exclude mtime from the hash. We plan to do horizontal checks (host vs. host) and vertical (host vs. image).

@rgooch
Copy link
Contributor

rgooch commented Jan 28, 2019

The mtime difference for computed files will make that difficult.

@masiulaniec
Copy link
Contributor Author

The computed files will all have equal mtime thanks to os.Chtimes, no?

@rgooch
Copy link
Contributor

rgooch commented Jan 30, 2019

The mtime for computed files is taken from the current time when the Dominator sees that the computed file contents need to be changed. So, in practice, every sub is going to have a different mtime for a particular computed file. There is no horizontal consistency.

@masiulaniec
Copy link
Contributor Author

I can see two options: a) set the mtime anyway (I realize this could confuse tools such as rsync), b) present the hasher with zero mtime for computed files.

@masiulaniec
Copy link
Contributor Author

I understand option b) would require dominator to start revealing to subd that certain files are computed, a classification detail that is currently beautifully hidden.

@masiulaniec
Copy link
Contributor Author

masiulaniec commented Jan 30, 2019

Your suggestion of excluding mtime from hash computation sounds pragmatic: it would allow the feature to be implemented without expanding interfaces but would not preclude including mtime later if a clean design is found.

@rgooch
Copy link
Contributor

rgooch commented Jan 31, 2019

Yes, excluding mtime from the hash seems the best for now. I'm reluctant to complicate subd unless it's essential.

@masiulaniec
Copy link
Contributor Author

Alternatively, the metric could be emitted at the level of the dominator server where the distinction between regular and computed files can still be made.

@rgooch
Copy link
Contributor

rgooch commented Apr 13, 2019

Hm. Maybe we should take a step back at look at the problem you're trying to solve? Do you want to ensure that all machines converge to the required state and have alerting for machines which do not converge (after N attempts, say)? If that's what you're looking for, then the Dominator already knows this. It's currently presented in the dashboard and it could be exposed via metrics too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants