Skip to content

How to compare binary files to check if they are the same?

John Hau edited this page Mar 9, 2022 · 2 revisions

The standard unix diff will show if the files are the same or not: [me@host ~]$ diff 1.bin 2.bin Binary files 1.bin and 2.bin differ

Use cmp command. This will either exit cleanly if they are binary equal, or it will print out where the first difference occurs and exit. I have a shell script that runs: cmp $1 $2 && echo "identical" || echo "different"

I found Visual Binary Diff was what I was looking for, available on:

Ubuntu:

sudo apt install vbindiff Arch Linux:

sudo pacman -S vbindiff Mac OS X via MacPorts:

port install vbindiff Mac OS X via Homebrew:

brew install vbindiff

Use sha1 to generate checksum:

sha1 [FILENAME1] sha1 [FILENAME2]

I ended up using hexdump to convert the binary files to there hex representation and then opened them in meld / kompare / any other diff tool. Unlike you I was after the differences in the files.

hexdump tmp/Circle_24.png > tmp/hex1.txt hexdump /tmp/Circle_24.png > tmp/hex2.txt

meld tmp/hex1.txt tmp/hex2.txt

Use hexdump -v -e '/1 "%02x\n"' if you want to diff and see exactly which bytes were inserted or removed. – William Entriken Mar 17, 2017 at 21:13

You can use MD5 hash function to check if two files are the same, with this you can not see the differences in a low level, but is a quick way to compare two files.

md5 md5

Use cmp command. Refer to Binary Files and Forcing Text Comparisons for more information.

cmp -b file1 file2

Diff with the following options would do a binary comparison to check just if the files are different at all and it'd output if the files are the same as well:

diff -qs {file1} {file2} If you are comparing two files with the same name in different directories, you can use this form instead:

diff -qs {file1} --to-file={dir2}

Try diff -s Short answer: run diff with the -s switch.

Long answer: read on below.

Here's an example. Let's start by creating two files with random binary contents:

$ dd if=/dev/random bs=1k count=1 of=test1.bin 1+0 records in 1+0 records out 1024 bytes (1,0 kB, 1,0 KiB) copied, 0,0100332 s, 102 kB/s

$ dd if=/dev/random bs=1k count=1 of=test2.bin 1+0 records in 1+0 records out 1024 bytes (1,0 kB, 1,0 KiB) copied, 0,0102889 s, 99,5 kB/s Now let's make a copy of the first file:

$ cp test1.bin copyoftest1.bin Now test1.bin and test2.bin should be different:

$ diff test1.bin test2.bin Binary files test1.bin and test2.bin differ ... and test1.bin and copyoftest1.bin should be identical:

$ diff test1.bin copyoftest1.bin But wait! Why is there no output?!?

The answer is: this is by design. There is no output on identical files.

But there are different error codes:

$ diff test1.bin test2.bin Binary files test1.bin and test2.bin differ

$ echo $? 1

$ diff test1.bin copyoftest1.bin

$ echo $? 0 Now fortunately you don't have to check error codes each and every time because you can just use the -s (or --report-identical-files) switch to make diff be more verbose:

$ diff -s test1.bin copyoftest1.bin Files test1.bin and copyoftest1.bin are identical

My favourite ones using xxd hex-dumper from the vim package :

  1. using vimdiff (part of vim)

#!/bin/bash FILE1="$1" FILE2="$2" vimdiff <( xxd "$FILE1" ) <( xxd "$FILE2" ) 2) using diff

#!/bin/bash FILE1=$1 FILE2=$2 diff -W 140 -y <( xxd $FILE1 ) <( xxd $FILE2 ) | colordiff | less -R -p ' | '

md5sum binary1 binary2 If the md5sum is same, binaries are same

E.g

md5sum new* 89c60189c3fa7ab5c96ae121ec43bd4a new.txt 89c60189c3fa7ab5c96ae121ec43bd4a new1.txt root@TinyDistro:~# cat new* aa55 aa55 0000 8010 7738 aa55 aa55 0000 8010 7738

root@TinyDistro:# cat new* aa55 aa55 000 8010 7738 aa55 aa55 0000 8010 7738 root@TinyDistro:# md5sum new* 4a7f86919d4ac00c6206e11fca462c6f new.txt 89c60189c3fa7ab5c96ae121ec43bd4a new1.txt

How do I compare binary files in Linux?

This will print the offset and bytes in hex: cmp -l file1.bin file2.bin | gawk '{printf "%08X %02X %02X\n", $1, strtonum(0$2), strtonum(0$3)}'

Or do $1-1 to have the first printed offset start at 0. cmp -l file1.bin file2.bin | gawk '{printf "%08X %02X %02X\n", $1-1, strtonum(0$2), strtonum(0$3)}' Unfortunately, strtonum() is specific to GAWK, so for other versions of awk—e.g., mawk—you will need to use an octal-to-decimal conversion function. For example, Broken out for readability: cmp -l file1.bin file2.bin | mawk 'function oct2dec(oct, dec) { for (i = 1; i <= length(oct); i++) { dec *= 8; dec += substr(oct, i, 1) }; return dec } { printf "%08X %02X %02X\n", $1, oct2dec($2), oct2dec($3) }'

diff + xxd Try diff in the following combination of zsh/bash process substitution:

diff -y <(xxd foo1.bin) <(xxd foo2.bin) Where:

-y shows you differences side-by-side (optional). xxd is CLI tool to create a hexdump output of the binary file. Add -W200 to diff for wider output (of 200 characters per line). For colors, use colordiff as shown below. colordiff + xxd If you've colordiff, it can colorize diff output, e.g.:

colordiff -y <(xxd foo1.bin) <(xxd foo2.bin) Otherwise install via: sudo apt-get install colordiff.

vimdiff + xxd You can also use vimdiff, e.g.

vimdiff <(xxd foo1.bin) <(xxd foo2.bin)

Method that works for byte addition / deletion

diff <(od -An -tx1 -w1 -v file1)
<(od -An -tx1 -w1 -v file2)

Generate a test case with a single removal of byte 64:

for i in seq 128; do printf "%02x" "$i"; done | xxd -r -p > file1 for i in seq 128; do if [ "$i" -ne 64 ]; then printf "%02x" $i; fi; done | xxd -r -p > file2

If you also want to see the ASCII version of the character:

bdiff() ( f() ( od -An -tx1c -w1 -v "$1" | paste -d '' - - ) diff <(f "$1") <(f "$2") )

bdiff file1 file2

Short answer

vimdiff <(xxd -c1 -p first.bin) <(xxd -c1 -p second.bin) When using hexdumps and text diff to compare binary files, especially xxd, the additions and removals of bytes become shifts in addressing which might make it difficult to see. This method tells xxd to not output addresses, and to output only one byte per line, which in turn shows exactly which bytes were changed, added, or removed. You can find the addresses later by searching for the interesting sequences of bytes in a more "normal" hexdump (output of xxd first.bin).

12

I'd recommend hexdump for dumping binary files to textual format and kdiff3 for diff viewing.

hexdump myfile1.bin > myfile1.hex hexdump myfile2.bin > myfile2.hex kdiff3 myfile1.hex myfile2.hex

The hexdiff is a program designed to do exactly what you're looking for.

Usage:

hexdiff file1 file2 It displays the hex (and 7-bit ASCII) of the two files one above the other, with any differences highlighted. Look at man hexdiff for the commands to move around in the file, and a simple q will quit.

Clone this wiki locally