-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for hash chaining to detect modifications in postings #2300
base: master
Are you sure you want to change the base?
Conversation
The following details of a posting contribute to its hash: fullname of account string representation of amount Each posting hashes contributes to the transaction hash, which is compromised of: previous transaction’s hash (as encountered in parsing order) actual date optional auxiliary date optional code payee hashes of all postings Note that this means that changes in the “code” or any of the comments
At the moment only "sha512" or "SHA512" is accepted, but this could extend to more algorithms in the future.
Also, support matching provided hashes against a prefixed of the generated hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jwiegley, reviewing as requested - no C++ code review, just some high level thoughts:
The following details of a posting contribute to its hash:
fullname of account
string representation of amountEach posting hashes contributes to the transaction hash, which is compromised
of:previous transaction’s hash (as encountered in parsing order)
actual date
optional auxiliary date
optional code
payee
hashes of all postingsNote that this means that changes in the “code” or any of the comments
Maybe the above details should appear in docs also ? Apologies if I missed it.
Posting's "string representation of amount" - that's the representation the journal file, I assume (not what print or reg would show).
--hashes option requires an argument to specify the algorithm
At the moment only "sha512" or "SHA512" is accepted, but this could extend to
more algorithms in the future.
Overall comments:
Cool feature!
As we discussed in chat, one obvious user benefit it promises is being able to warn when any past entries have changed in the journal files. VCS users can already detect this before commit, but this does not require a VCS and would be available to every Ledger user without setup. VCS users who don't check the diff before committing might find it helpful to avoid accidentally committing fat-finger edits, eg.
I think users fairly often want to clean up small mistakes, whitespace, or even make bigger cleanups to old files and entries. And the commit messages above make me think this will be very sensitive - to any edits, to changes in hashing algorithm, to any rearrangement of included files or to different order of file arguments on the command line (because of "in parsing order"). So it's my guess users of this will quite often need to regenerate hashes for all of their data. Maybe that's not a problem, I'm not totally clear on the workflow. I imagine it would mean replacing at least some explicit Hash metadata values (tags) in journal entries in all old files (and committing those changes in VCS).
It seems to me to be a prototype that will need field testing and tweaking to find its best design and usage patterns. Possibly it's worth signalling this status to users by mentioning "Experimental" in descriptions.
As mentioned in chat Tackler has some similar-ish features described at https://tackler.e257.fi/docs/auditing - perhaps not this exactly, but there might be some interesting related ideas there.
Hope this helps! I appreciate this exploration and will follow with interest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this bring ledger closer to being a triple-entry accounting system? 😃
I left some comments from first glance below, and will take a closer look when trying out the proposed changes on my machine.
Co-authored-by: Alexis Hildebrandt <afh@surryhill.net>
Co-authored-by: Alexis Hildebrandt <afh@surryhill.net>
Co-authored-by: Alexis Hildebrandt <afh@surryhill.net>
Thanks for the context on sha512_256, @jwiegley. I did a bit of research into libraries supporting SHA-512/256, e.g. libtomcrypt, Botan-3, pycryptodome. These libraries support it in a different manner than this PR suggests in the sense, that a truncated hash "is not equivalent to simply truncating the output digest" (pycryptodome). Possibly this is "for users to be able to distinguish between a SHA-512 digest which has been truncated and a SHA512/256 digest, [offering] new initialization constants, analogous to those used in SHA-384." — https://eprint.iacr.org/2010/548.pdf I took the liberty to hack on a little Nix Flake to get a feel for the API of the different implementations and evaluate the feasibility of base64 encoding. The Flake uses LibreSSL, Botan-3, and tomcrypt to compute the SHA-512 and SHA-512/256 hashes for the given arguments and print the hashes as a hex and a base64 encoded string. What are your thoughts on using one of the aforementioned libraries or another third-party implementation? nix build https://projects.surryhill.net/ledger/sha512-test-0.1.3.tgz
./result/bin/sha512-test 'Heureka!'
input = Heureka!
LibreSSL
sha512 = 83bcf89b75e21ab7d9fe332a6f82ca4d1e94ec587cec1e137d50087fcc6b7518f366ee9e2ba086346bdcc0561a522db4b3bdebc53483199f58ac7139531ded7c
sha512/evp = g7z4m3XiGrfZ/jMqb4LKTR6U7Fh87B4TfVAIf8xrdRjzZu6eK6CGNGvcwFYaUi20s73rxTSDGZ9YrHE5Ux3tfA==
sha512_256 = 83bcf89b75e21ab7d9fe332a6f82ca4d1e94ec587cec1e137d50087fcc6b7518
sha512_256/evp = N/A
Botan3
SHA-512 = 83BCF89B75E21AB7D9FE332A6F82CA4D1E94EC587CEC1E137D50087FCC6B7518F366EE9E2BA086346BDCC0561A522DB4B3BDEBC53483199F58AC7139531DED7C
SHA-512.b64 = g7z4m3XiGrfZ/jMqb4LKTR6U7Fh87B4TfVAIf8xrdRjzZu6eK6CGNGvcwFYaUi20s73rxTSDGZ9YrHE5Ux3tfA==
SHA-512/256 = 5EABC68E077C6338A305D388F20AE3A04200F6D942164FFBFD659E345C39D0A7
SHA-512/256.b64 = XqvGjgd8YzijBdOI8grjoEIA9tlCFk/7/WWeNFw50Kc=
tomcrypt
SHA-512 = 83bcf89b75e21ab7d9fe332a6f82ca4d1e94ec587cec1e137d50087fcc6b7518
SHA-512.b64 = g7z4m3XiGrfZ/jMqb4LKTR6U7Fh87B4TfVAIf8xrdRjzZu6eK6CGNGvcwFYaUi20s73rxTSDGZ9YrHE5Ux3tfA==
SHA-512/256 = 5eabc68e077c6338a305d388f20ae3a04200f6d942164ffbfd659e345c39d0a7
SHA-512/256.b64 = XqvGjgd8YzijBdOI8grjoEIA9tlCFk/7/WWeNFw50Kc= If you'd like to inspect the small utility closer have a look at
|
Co-authored-by: Alexis Hildebrandt <afh@surryhill.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jwiegley I see several changes to the newly added src/sha512.cc
file. I'd prefer to treat it as third-part code that is integrated into ledger verbatim (just like utfcpp) or even better replace it with a third-party library (see my previous comment) that provides calculation of SHA-512 and SHA-512/256 ideally with support for base64 encoding.
What are your thoughts?
I can reduce the number of changes down to just |
@afh I've reverted all of my changes to the SHA512 code. What I would like to understand now is why the flake build here on GitHub fails, when it succeeds just fine on my machine, using either |
34b55d7
to
8d43d4c
Compare
I was able to reproduce the build failure in a Linux VM, and found that all we're missing are two standard system headers. |
I'm not really excited about new dependencies, they come with so many other costs (maintenance, licensing, keeping up-to-date, etc). This is a stable, simple algorithm, and we can crack the can on using a 3rd party library if it becomes a popular feature and people end up wanting other algorithms besides the default ones offered. |
I'm current awaiting a close review of |
I have this on my agenda, yet it'll likely take me until after the holidays and probably New Year's before I get to this. 🎄🎇 |
The following details of a posting contribute to its hash:
Each posting hashes contributes to the transaction hash, which is compromised of:
Note that this means that changes in the “code” or any of the comments