Add support for hash chaining to detect modifications in postings #2300

jwiegley · 2023-11-23T00:51:04Z

The following details of a posting contribute to its hash:

fullname of account
string representation of amount

Each posting hashes contributes to the transaction hash, which is compromised of:

previous transaction’s hash (as encountered in parsing order)
actual date
optional auxiliary date
optional code
payee
hashes of all postings

Note that this means that changes in the “code” or any of the comments

The following details of a posting contribute to its hash: fullname of account string representation of amount Each posting hashes contributes to the transaction hash, which is compromised of: previous transaction’s hash (as encountered in parsing order) actual date optional auxiliary date optional code payee hashes of all postings Note that this means that changes in the “code” or any of the comments

At the moment only "sha512" or "SHA512" is accepted, but this could extend to more algorithms in the future.

Also, support matching provided hashes against a prefixed of the generated hash.

simonmichael

Hi @jwiegley, reviewing as requested - no C++ code review, just some high level thoughts:

The following details of a posting contribute to its hash:

fullname of account
string representation of amount

Each posting hashes contributes to the transaction hash, which is compromised
of:

previous transaction’s hash (as encountered in parsing order)
actual date
optional auxiliary date
optional code
payee
hashes of all postings

Note that this means that changes in the “code” or any of the comments

Maybe the above details should appear in docs also ? Apologies if I missed it.

Posting's "string representation of amount" - that's the representation the journal file, I assume (not what print or reg would show).

--hashes option requires an argument to specify the algorithm
At the moment only "sha512" or "SHA512" is accepted, but this could extend to
more algorithms in the future.

Overall comments:

Cool feature!

As we discussed in chat, one obvious user benefit it promises is being able to warn when any past entries have changed in the journal files. VCS users can already detect this before commit, but this does not require a VCS and would be available to every Ledger user without setup. VCS users who don't check the diff before committing might find it helpful to avoid accidentally committing fat-finger edits, eg.

I think users fairly often want to clean up small mistakes, whitespace, or even make bigger cleanups to old files and entries. And the commit messages above make me think this will be very sensitive - to any edits, to changes in hashing algorithm, to any rearrangement of included files or to different order of file arguments on the command line (because of "in parsing order"). So it's my guess users of this will quite often need to regenerate hashes for all of their data. Maybe that's not a problem, I'm not totally clear on the workflow. I imagine it would mean replacing at least some explicit Hash metadata values (tags) in journal entries in all old files (and committing those changes in VCS).

It seems to me to be a prototype that will need field testing and tweaking to find its best design and usage patterns. Possibly it's worth signalling this status to users by mentioning "Experimental" in descriptions.

As mentioned in chat Tackler has some similar-ish features described at https://tackler.e257.fi/docs/auditing - perhaps not this exactly, but there might be some interesting related ideas there.

Hope this helps! I appreciate this exploration and will follow with interest.

doc/ledger3.texi

test/baseline/opt-hashes-neg.test

doc/ledger3.texi

afh

Does this bring ledger closer to being a triple-entry accounting system? 😃

I left some comments from first glance below, and will take a closer look when trying out the proposed changes on my machine.

doc/ledger.1

doc/ledger3.texi

src/sha512.cc

Co-authored-by: Alexis Hildebrandt <afh@surryhill.net>

afh · 2023-12-07T13:48:05Z

Thanks for the context on sha512_256, @jwiegley.

I did a bit of research into libraries supporting SHA-512/256, e.g. libtomcrypt, Botan-3, pycryptodome. These libraries support it in a different manner than this PR suggests in the sense, that a truncated hash "is not equivalent to simply truncating the output digest" (pycryptodome).

Possibly this is "for users to be able to distinguish between a SHA-512 digest which has been truncated and a SHA512/256 digest, [offering] new initialization constants, analogous to those used in SHA-384." — https://eprint.iacr.org/2010/548.pdf
Is this to also thwart length extension attacks? (see https://news.ycombinator.com/item?id=21981874)

I took the liberty to hack on a little Nix Flake to get a feel for the API of the different implementations and evaluate the feasibility of base64 encoding. The Flake uses LibreSSL, Botan-3, and tomcrypt to compute the SHA-512 and SHA-512/256 hashes for the given arguments and print the hashes as a hex and a base64 encoded string.

What are your thoughts on using one of the aforementioned libraries or another third-party implementation?

nix build https://projects.surryhill.net/ledger/sha512-test-0.1.3.tgz
./result/bin/sha512-test 'Heureka!'
input           = Heureka!
LibreSSL
sha512          = 83bcf89b75e21ab7d9fe332a6f82ca4d1e94ec587cec1e137d50087fcc6b7518f366ee9e2ba086346bdcc0561a522db4b3bdebc53483199f58ac7139531ded7c
sha512/evp      = g7z4m3XiGrfZ/jMqb4LKTR6U7Fh87B4TfVAIf8xrdRjzZu6eK6CGNGvcwFYaUi20s73rxTSDGZ9YrHE5Ux3tfA==
sha512_256      = 83bcf89b75e21ab7d9fe332a6f82ca4d1e94ec587cec1e137d50087fcc6b7518
sha512_256/evp  = N/A
Botan3
SHA-512         = 83BCF89B75E21AB7D9FE332A6F82CA4D1E94EC587CEC1E137D50087FCC6B7518F366EE9E2BA086346BDCC0561A522DB4B3BDEBC53483199F58AC7139531DED7C
SHA-512.b64     = g7z4m3XiGrfZ/jMqb4LKTR6U7Fh87B4TfVAIf8xrdRjzZu6eK6CGNGvcwFYaUi20s73rxTSDGZ9YrHE5Ux3tfA==
SHA-512/256     = 5EABC68E077C6338A305D388F20AE3A04200F6D942164FFBFD659E345C39D0A7
SHA-512/256.b64 = XqvGjgd8YzijBdOI8grjoEIA9tlCFk/7/WWeNFw50Kc=
tomcrypt
SHA-512         = 83bcf89b75e21ab7d9fe332a6f82ca4d1e94ec587cec1e137d50087fcc6b7518
SHA-512.b64     = g7z4m3XiGrfZ/jMqb4LKTR6U7Fh87B4TfVAIf8xrdRjzZu6eK6CGNGvcwFYaUi20s73rxTSDGZ9YrHE5Ux3tfA==
SHA-512/256     = 5eabc68e077c6338a305d388f20ae3a04200f6d942164ffbfd659e345c39d0a7
SHA-512/256.b64 = XqvGjgd8YzijBdOI8grjoEIA9tlCFk/7/WWeNFw50Kc=

If you'd like to inspect the small utility closer have a look at main.cc below or download and unpack the flake archive from https://projects.surryhill.net/ledger/sha512-test-0.1.3.tgz and open main.cc in your $EDITOR.

main.cc

#include <string>
#include <sstream>
#include <iomanip>
#include <iostream>
#include <vector>

#include <openssl/crypto.h>
#include <openssl/sha.h>

#include <openssl/hmac.h>
#include <openssl/evp.h>
#include <openssl/bio.h>
#include <openssl/buffer.h>

#include <botan-3/botan/hash.h>
#include <botan-3/botan/hex.h>
#include <botan-3/botan/base64.h>

#include <tomcrypt.h>

// Convert buffer to hex string. Kudos to https://github.com/ledger/ledger/pull/2300
std::string bufferToHex(const unsigned char* buffer, std::size_t size) {
    std::ostringstream oss;
    oss << std::hex << std::setfill('0');
    for(std::size_t i = 0; i < size; ++i)
        oss << std::setw(2) << static_cast<int>(buffer[i]);
    return oss.str();
}

// Encode input as base64 using LibreSSL BIO. Kudos to https://ioncannon.net/programming/34/howto-base64-encode-with-cc-and-openssl/
char *bio_base64(const unsigned char *input, int length) {
  BIO *bmem, *b64;
  BUF_MEM *bptr;
 
  b64 = BIO_new(BIO_f_base64());
  bmem = BIO_new(BIO_s_mem());
  b64 = BIO_push(b64, bmem);
  BIO_write(b64, input, length);
  BIO_flush(b64);
  BIO_get_mem_ptr(b64, &bptr);
 
  char *buff = (char *)malloc(bptr->length);
  memcpy(buff, bptr->data, bptr->length-1);
  buff[bptr->length-1] = 0;
 
  BIO_free_all(b64);
 
  return buff;
}

// Encode input as base64 using LibreSSL EVP. Kudos to mtrw https://stackoverflow.com/a/60580965
char *evp_base64(const unsigned char *input, int length) {
  const auto pl = 4*((length+2)/3);
  auto output = reinterpret_cast<char *>(calloc(pl+1, 1)); // +1 for the terminating null that EVP_EncodeBlock adds on
  const auto ol = EVP_EncodeBlock(reinterpret_cast<unsigned char *>(output), input, length);
  if (pl != ol) { std::cerr << "Whoops, encode predicted " << pl << " but we got " << ol << "\n"; }
  return output;
}

// Encode input as base64 using tomcrypt. Kudos to https://techoverflow.net/2012/11/20/cc-base64-codec-using-libtomcrypt/
std::string encodeBase64(const char* input, const unsigned long inputSize) {
    unsigned long outlen = inputSize + (inputSize / 3.0) + 16;
    unsigned char* outbuf = new unsigned char[outlen]; //Reserve output memory
    base64_encode((unsigned char*) input, inputSize, outbuf, &outlen);
    std::string ret((char*) outbuf, outlen);
    delete[] outbuf;
    return ret;
}

int main(int argc, char*argv[]) {

  // Setup program arguments
  std::vector<std::string> args;
  if (argc > 1)
    args.assign(argv+1, argv + argc);

  // Initialize LibreSSL
  OPENSSL_init_crypto(0, NULL);

  // Initialize Botan3
  const auto bt_sha512_256 = Botan::HashFunction::create_or_throw("SHA-512-256");
  const auto bt_sha512     = Botan::HashFunction::create_or_throw("SHA-512");

  // Initialize tomcrypt
  register_all_ciphers();
  register_all_hashes();
  int tc_sha512_256 = find_hash("sha512-256");
  int tc_sha512     = find_hash("sha512");

  for (const auto& arg : args) {
    std::cout << "input           = " << arg << std::endl;

    // Compute Hash using LibreSSL
    const unsigned char* input = (const unsigned char*)arg.c_str();
    unsigned char* hv = SHA512(input, arg.length(), NULL);
    std::cout << "LibreSSL" << std::endl
      << "sha512          = " << bufferToHex(hv, 64)
      << std::endl
      << "sha512/evp      = " << evp_base64(hv, SHA512_DIGEST_LENGTH)
      << std::endl
      << "sha512_256      = " << bufferToHex(hv, 32)
      << std::endl
      << "sha512_256/evp  = " << "N/A"
      // bio_base64 includes a new line every 64 bytes, which is impractical
      // for ledger's use-case, i.e. single line checksum.
      //<< std::endl
      //<< "sha512/bio      = " << bio_base64(hv, SHA512_DIGEST_LENGTH)
      << std::endl;

    // Compute Hash using Botan3
    bt_sha512->update(input, arg.length());
    auto bt_sha512f = bt_sha512->final();
    std::cout << "Botan3" << std::endl
      << "SHA-512         = " << Botan::hex_encode(bt_sha512f)
      << std::endl
      << "SHA-512.b64     = " << Botan::base64_encode(bt_sha512f)
      << std::endl;

    bt_sha512_256->update(input, arg.length());
    auto bt_sha512_256f = bt_sha512_256->final();
    std::cout
      << "SHA-512/256     = " << Botan::hex_encode(bt_sha512_256f)
      << std::endl
      << "SHA-512/256.b64 = " << Botan::base64_encode(bt_sha512_256f)
      << std::endl;

    // Compute Hash using tomcrypt
    unsigned long outl = MAXBLOCKSIZE;
    unsigned char* out = (unsigned char*)XMALLOC(MAXBLOCKSIZE);
    if (hash_memory(tc_sha512, (unsigned char*)input, arg.length(), out, &outl) != CRYPT_OK)
      continue;
    std::cout << "tomcrypt" << std::endl
      << "SHA-512         = " << bufferToHex(out, 32)
      << std::endl
      << "SHA-512.b64     = " << encodeBase64((const char*)out, outl)
      << std::endl;

    if (hash_memory(tc_sha512_256, (unsigned char*)input, arg.length(), out, &outl) != CRYPT_OK)
      continue;
    std::cout
      << "SHA-512/256     = " << bufferToHex(out, 32)
      << std::endl
      << "SHA-512/256.b64 = " << encodeBase64((const char*)out, outl)
      << std::endl;
  }

  OPENSSL_cleanup();
  return 0;
}

src/sha512.cc

Co-authored-by: Alexis Hildebrandt <afh@surryhill.net>

afh

@jwiegley I see several changes to the newly added src/sha512.cc file. I'd prefer to treat it as third-part code that is integrated into ledger verbatim (just like utfcpp) or even better replace it with a third-party library (see my previous comment) that provides calculation of SHA-512 and SHA-512/256 ideally with support for base64 encoding.

What are your thoughts?

jwiegley · 2023-12-12T16:45:51Z

@jwiegley I see several changes to the newly added src/sha512.cc file. I'd prefer to treat it as third-part code that is integrated into ledger verbatim (just like utfcpp) or even better replace it with a third-party library (see my previous comment) that provides calculation of SHA-512 and SHA-512/256 ideally with support for base64 encoding.

What are your thoughts?

I can reduce the number of changes down to just s/uint8_t/unsigned char/, which I can even do before including it elsewhere, so let me try removing all changes to it.

jwiegley · 2023-12-12T17:32:15Z

@afh I've reverted all of my changes to the SHA512 code. What I would like to understand now is why the flake build here on GitHub fails, when it succeeds just fine on my machine, using either nix build or nix develop followed by make.

jwiegley · 2023-12-12T19:01:08Z

I was able to reproduce the build failure in a Linux VM, and found that all we're missing are two standard system headers.

jwiegley · 2023-12-12T21:08:44Z

What are your thoughts on using one of the aforementioned libraries or another third-party implementation?

I'm not really excited about new dependencies, they come with so many other costs (maintenance, licensing, keeping up-to-date, etc). This is a stable, simple algorithm, and we can crack the can on using a 3rd party library if it becomes a popular feature and people end up wanting other algorithms besides the default ones offered.

jwiegley · 2023-12-21T20:08:52Z

I'm current awaiting a close review of xact_t::hash from @afh before merging this in.

afh · 2023-12-21T22:17:18Z

I have this on my agenda, yet it'll likely take me until after the holidays and probably New Year's before I get to this. 🎄🎇

jwiegley added 6 commits November 22, 2023 16:50

--hashes option requires an argument to specify the algorithm

0ddea87

At the moment only "sha512" or "SHA512" is accepted, but this could extend to more algorithms in the future.

Add positive and negative tests for the --hashes option

e2e4716

Add documentation for the --hashes option

853374b

Make xact hashes independent of posting order

55287a0

Also, support matching provided hashes against a prefixed of the generated hash.

Improvements to the hashing tests

8f2d712

jwiegley self-assigned this Nov 27, 2023

jwiegley requested review from simonmichael and afh November 27, 2023 20:58

simonmichael reviewed Nov 28, 2023

View reviewed changes

doc/ledger3.texi Show resolved Hide resolved

test/baseline/opt-hashes-neg.test Show resolved Hide resolved

doc/ledger3.texi Outdated Show resolved Hide resolved

jwiegley added 2 commits November 27, 2023 22:44

Add further documentation on the --hashes option

0c808e0

Add support for --hashes=sha512_256 as another algorithm

28c10eb

afh reviewed Dec 6, 2023

View reviewed changes

doc/ledger.1 Outdated Show resolved Hide resolved

doc/ledger.1 Show resolved Hide resolved

doc/ledger.1 Outdated Show resolved Hide resolved

doc/ledger3.texi Outdated Show resolved Hide resolved

doc/ledger3.texi Outdated Show resolved Hide resolved

afh reviewed Dec 6, 2023

View reviewed changes

src/sha512.cc Show resolved Hide resolved

jwiegley and others added 3 commits December 6, 2023 14:11

Update doc/ledger.1

d8c341d

Co-authored-by: Alexis Hildebrandt <afh@surryhill.net>

Update doc/ledger.1

d4eff3d

Co-authored-by: Alexis Hildebrandt <afh@surryhill.net>

Update src/sha512.cc

845ccb5

Co-authored-by: Alexis Hildebrandt <afh@surryhill.net>

afh reviewed Dec 7, 2023

View reviewed changes

src/sha512.cc Outdated Show resolved Hide resolved

afh added the enhancement New feature or request label Dec 7, 2023

afh added this to the 3.4 milestone Dec 10, 2023

jwiegley and others added 8 commits December 11, 2023 10:58

Update src/sha512.cc

ce5664e

Co-authored-by: Alexis Hildebrandt <afh@surryhill.net>

Minor doc update

b5161c9

Type signature fix

c729035

Revert a type change

0f723d2

Try something else

43a173d

Fix return type of SHA512

8b346cf

Include stdint.h in sha512.cc

82f98d7

Include sys/types.h

c5fa5fa

jwiegley requested a review from simonmichael December 12, 2023 03:26

jwiegley requested a review from afh December 12, 2023 03:26

afh reviewed Dec 12, 2023

View reviewed changes

jwiegley added 3 commits December 12, 2023 08:48

Remove most changes to sha512.cc

2499963

Change one prototype

bea3498

Revert all changes to sha512.c

8d43d4c

jwiegley force-pushed the johnw/hashes branch from 34b55d7 to 8d43d4c Compare December 12, 2023 18:59

Add two missing system headers to sha512.cc

4514248

jwiegley requested a review from afh December 12, 2023 20:40

jwiegley added 3 commits December 13, 2023 12:47

Rename SHA-512/256 to the more appropriate SHA-512Half

f24640b

Add whitespace to xact_t::hash

cf0fadf

Another whitespace change

60010af

jwiegley added 4 commits January 4, 2024 12:35

Merge remote-tracking branch 'origin/master' into johnw/hashes

2f732b9

Expand the size of an arbitrary safety limit

a098d7f

Change an assertion into an if test

baddd0e

Merge remote-tracking branch 'origin/master' into johnw/hashes

33a70ca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for hash chaining to detect modifications in postings #2300

Add support for hash chaining to detect modifications in postings #2300

jwiegley commented Nov 23, 2023 •

edited

simonmichael left a comment •

edited

afh left a comment

afh commented Dec 7, 2023

afh left a comment •

edited

jwiegley commented Dec 12, 2023

jwiegley commented Dec 12, 2023

jwiegley commented Dec 12, 2023

jwiegley commented Dec 12, 2023

jwiegley commented Dec 21, 2023

afh commented Dec 21, 2023

Add support for hash chaining to detect modifications in postings #2300

Are you sure you want to change the base?

Add support for hash chaining to detect modifications in postings #2300

Conversation

jwiegley commented Nov 23, 2023 • edited

simonmichael left a comment • edited

Choose a reason for hiding this comment

afh left a comment

Choose a reason for hiding this comment

afh commented Dec 7, 2023

afh left a comment • edited

Choose a reason for hiding this comment

jwiegley commented Dec 12, 2023

jwiegley commented Dec 12, 2023

jwiegley commented Dec 12, 2023

jwiegley commented Dec 12, 2023

jwiegley commented Dec 21, 2023

afh commented Dec 21, 2023

jwiegley commented Nov 23, 2023 •

edited

simonmichael left a comment •

edited

afh left a comment •

edited