Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add binary representation of hash-size #399

Merged
merged 3 commits into from
Jun 8, 2019
Merged

Conversation

afbjorklund
Copy link
Contributor

@afbjorklund afbjorklund commented Apr 28, 2019

Fixed size efficient binary representation data, instead of the variable size hexadecimal string:

typedef uint32_t binary[5]; // 20 bytes: 16 for hash + 4 for size
void format_hash_as_binary(binary result, const unsigned char *hash, int size);

This is suitable for machine use, while the other is more suitable for human readability as string:

char *format_hash_as_string(const unsigned char *hash, int size);

As discussed in #218

Can be used as key, when not needed as filename. Data structure is aligned and avoids allocation.

@jrosdahl
Copy link
Member

Sorry, I don't quite understand. What will this be used for?

@afbjorklund
Copy link
Contributor Author

It was just trying to show clearly (in code), what was mentioned in passing in the comments.

Also split it up into smaller patches for delivery, and found the alignment issue by accident...

@jrosdahl
Copy link
Member

jrosdahl commented May 7, 2019

Still not sure if I understand. Do you want me to merge this?

the alignment issue by accident...

Which issue is that?

@afbjorklund
Copy link
Contributor Author

Still not sure if I understand. Do you want me to merge this?

It can wait, there is nothing in the current code that is using it.

It's going to be more interesting when the storage API is up

the alignment issue by accident...

Which issue is that?

Clang complained, when writing an integer to a byte array...

So changed the definition of the type, from 20 x 1 to 5 x 4 :

-typedef uint8_t binary[20];
+typedef uint32_t binary[5];

@jrosdahl jrosdahl added the work in progress Do not merge label May 11, 2019
@afbjorklund
Copy link
Contributor Author

afbjorklund commented May 11, 2019

As per discussion in #404, one interesting variation could be an improved text representation...
Current format (of the binary hash) is mostly intended for debugging purposes, like for test/debug.

If we do still need a string format, for instance for a file-based storage, we could try something new.
One approach used for other projects, when going to larger checksums (avoids line breaks) is base32.

This extends the 0-9a-f (16 digits) used for normal hexadecimal, to use 0-9A-V (32 digits) "hex"

So instead of 31d6cfe0d16ae931b73c59d7e0c089c000000000 (40 characters)
we can write it as 67BCVO6HDBKJ3DPSB7BU1G49O0000000 (only 32 characters)

The main reason to use plain old hexadecimal is for portability and accessability. But if not needed ?

@jrosdahl
Copy link
Member

If we do still need a string format, for instance for a file-based storage, we could try something new.
[...] The main reason to use plain old hexadecimal is for portability [...] But if not needed ?

Did you mean (backward) compatibility with "portability"? If so: Before we can discuss a replacement of the text representation, I think that we first need to come to a consensus regarding whether it is OK to break compatibility with old versions or not since changing the default backend's first-level storage directory names breaks compatibility. My current view on this is described in #416 (comment).

@afbjorklund
Copy link
Contributor Author

[...] The main reason to use plain old hexadecimal is for portability [...] But if not needed ?

Did you mean (backward) compatibility with "portability"?

Not necessarily.

I meant that hexadecimal is available everywhere, like in hexdump or in *sum commands or openssl.
But if one chooses a custom representation, then you also need a custom tool to work with it as well.

Then again the binary keys (i.e. full 8-bit range) aren't really easy to work with (without a tool) either.
I still think they should be the main entry point to the storage API, rather than using readable strings.

Will comment in #416

@afbjorklund
Copy link
Contributor Author

I removed the base32hex function, it's not really needed if the keys are indeed binary.

Fixed size efficient binary representation data,
instead of the variable size hexadecimal string.

This is suitable for machine use, while the other
is more suitable for human readability as string.
Make sure that output is NUL-terminated
jrosdahl added a commit to jrosdahl/ccache that referenced this pull request Jun 8, 2019
@jrosdahl jrosdahl added the improvement Improvement that is not a bug fix or new feature label Jun 8, 2019
@jrosdahl jrosdahl added this to the 3.8 milestone Jun 8, 2019
@jrosdahl jrosdahl merged commit 2b568ba into ccache:master Jun 8, 2019
@jrosdahl
Copy link
Member

jrosdahl commented Jun 8, 2019

Thanks!

llunak pushed a commit to llunak/ccache that referenced this pull request Nov 2, 2019
* Add binary representation of hash-size

Fixed size efficient binary representation data,
instead of the variable size hexadecimal string.

This is suitable for machine use, while the other
is more suitable for human readability as string.

* Promote hex format as a global function

* Add unittest for the format_hex function

Make sure that output is NUL-terminated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement that is not a bug fix or new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants