Skip to content

Rudxain/xorsum

Repository files navigation

xorsum

XOR symbol at upper-left corner, plus-sign at bottom-right corner

Algorithm

It uses the XOR-cipher to compute a checksum digest. Basically, it splits the data in non-overlapping chunks (padding the remainder with 0s), where each chunk's length equals digest size, and XORs all chunks together into an output chunk (the final digest).

This isn't a good HF. It lacks the Avalanche Effect, because flipping 1 input bit flips 1 output bit.

Program

The raw digest size is 8octets by default, but can be set to any valid usize value with the --length option. The printed size is 16Bytes, because of ASCII hexadecimal expansion.

Why 8B?

That was a somewhat arbitrary decision. I've choosen 8 because it's the geometric-mean of 4 and 16, CRC32's and MD5's digest-sizes, respectively. 8B is easier to implement (in many langs) than 16B, when a constant fixed size is desired, because it fits in u64.

The I.V. is hardcoded to be 0.

Name and behavior heavily influenced by

Usage

To install latest release from crates.io registry:

cargo install xorsum

This isn't guaranteed to be the latest version, but it'll always compile.

To install latest dev crate from GH:

cargo install --git https://github.com/Rudxain/xorsum.git

This is the most recent ("cutting-edge") version. Compilation isn't guaranteed. Semver may be broken. And --help may not reflect actual program behavior. This one has a very unstable/experimental API (especially lib.rs).

To get already-compiled non-dev executables, go to GH releases. *.elfs will only be compatible with GNU-Linux x64. *.exes will only be compatible with Windows x64. These aren't setup/installer programs, these are the same executables cargo would install, so you should run them from a terminal CLI, not click them.

For a Llamalab Automate implementation, visit XOR hasher.

Argument "syntax":

xorsum [OPTIONS] [FILE]...

For ℹinfo about options, run:

xorsum --help

Examples

Regular use

# let's create an empty file named "a"
echo -n > a
xorsum --length 4 a
# output will be "00000000 a" (without quotes)

# write "aaaa" to this file and rehash it
echo -n aaaa > a
xorsum a -l 4
#out: "61616161 a"
# because "61" is the hex value of the UTF-8 char "a"

# same result when using stdin
echo -n aaaa | xorsum -l4
#61616161 -

xorsum a --brief #`-l 8` is implicit
#6161616100000000

Note

echo -n has different behavior depending on OS and binary version, it might include line endings like \n (LF) or \r\n (CR-LF). The outputs shown in the example are the (usually desired) result of NOT including an EOL.

PowerShell will ignore -n because echo is an alias of Write-Output and therefore can't recognize -n. Write-Host -NoNewline can't be piped nor redirected, so it's not a good alternative.

Emulating 🏔AE

--length doesn't truncate the output:

xorsum some_big_file -bl 3 #"00ff55"
xorsum some_big_file -bl 2 #"69aa" NOT "00ff"

As you can see, -l can return very different hashes from the same input. This property can be exploited to emulate the Avalanche Effect (to some extent).

Finding corrupted bytes

If you have 2 copies of a file and 1 is corrupted, you can attempt to "🔺️triangulate" the index of a corrupted byte, without manually searching the entire file. This is useful when dealing with big raw-binary files

xorsum a b
#6c741b7863326b2c a
#6c74187863326b2c b
# the 0-based index is 2 when using `-l 8`
# mathematically, i mod 8 = 2

xorsum a b -l 3
#3d5a0a a
#3d590a b
# i mod 3 = 1

xorsum a b -l 2
#7f12 a
#7c12 b
# i mod 2 = 0

# you can repeat this process with different `-l` values, to solve it easier.
# IIRC, using primes gives you more info about the index

There are programs (like diff) that compare bytes for you, and are more efficient and user-friendly. But if you are into math puzzles, this is a good way to pass the time by solving systems of linear modular equations 🤓.

💭Thoughts

I was surprised I couldn't find any implementation of a checksum algorithm completely based on XOR, so I posted this for the sake of completeness, and because I'm learning Rust. I also made this for low-power devices, despite only compiling for x64 (this will probably change in the future, so don't worry).

⚠DISCLAIMER

  1. DO NOT SHARE CHECKSUMS OF PRIVATE DATA. You might be leaking sensitive information. Small sums and bigger files tend to be safer, because the sbox will (probably) have enough bytes to "mix well".
  2. This program is not production-ready. The version should be 0.x.y to reflect the incompleteness of the code. I'm sorry for the inconvenience and potential confusion.