Command line utility to get the delta between 2 file. Includes signature, delta and patch commands.
./deltagopher --block-size 256 old.txt new.txt
It will display the diff, but deleted and to copy content won't be displayed. If you want a full report see the next command.
- [ 647-657 ] ----------
+ [ 726-789 ] um vulputate gravida. Vestibulum ante ipsum primis in faucibus
c [ 779-789 ] ----------
where -, + and c - to delete, to insert and to copy,
./deltagopher --full --block-size 256 old.txt new.txt
./deltagopher signature --block-size 256 --filename old.txt
./deltagopher delta --sigfile signature.yml new.txt
You need a Golang version 1.16 or greater installed on your PC.
- To install dependencies, run from root directory:
go get
- To build the CLI
go build
deltagopher uses the Rolling Hash hash function to find a delta, it provides a way to efficiently traverse the sequence and calculate a new hash in O(1) time using the old hash. The total time for calculating a delta is O(n old file) + O(n changed file), where n in a length of the file.
Here is how it's done:
- Obtain a Signature from the first file. Signature represents a hashed chunks of the file with some metadata of chunks size, hashing algorythms and so on. Each chunk hashed in 2 ways - weak and strong. Weak hash is a adler32 checksum, strong hash - md5 hash.
type Checksum struct {
WeakCheaksum uint32 `yaml:"weak"`
StrongChecksum [16]byte `yaml:"strong"`
Start int `yaml:"start"`
End int `yaml:"end"`
}
- Signature can be saved into a file (json/yml), and used to generate a Delta. Delta represents the difference between a new changed version of the file and the Signature of old file.
type Delta struct {
Inserted []*SingleDelta `yaml:"insert,omitempty"`
Deleted []*SingleDelta `yaml:"delete,omitempty"`
Copied []*SingleDelta `yaml:"copy,omitempty"`
}
type SingleDelta struct {
WeakCheaksum uint32 `yaml:"weak,omitempty"`
StrongChecksum *[16]byte `yaml:"strong,omitempty"`
Start int `yaml:"start"`
End int `yaml:"end"`
// For Inserted
DiffBytes []byte `yaml:"diff,omitempty"`
}
- Delta is generated using the following algorithm:
- Load the first chunk of the file into buffer length Signature.blockSize
- Generate the weak hash for the chunk
- Search for this hash in the Signature.checksums from old file
- If it exists in Signature - it's a potential candidate to be a hash of the same text as for an old file.
To see if it's exactly the text we had in the old file we compare strong hashes:
- if md5 hash old == md5 hash new => mathing, to be copied from the old file.
- if md5 hash old != md5 hash new => consider this block as new and to be inserted
- If it's not presented in old hashes - this block of the file to be inserted to the Delta
- Seek the reader pointer to size of the chunk to skip the chunk we already processed.
- If not - start rolling across the file content using Rolling Hash (adler32). On each step generate a new hash of the sliding window (it moves byte by byte), search for it in old checksums and generate a Delta on the fly.
- If it exists in Signature - it's a potential candidate to be a hash of the same text as for an old file.
To see if it's exactly the text we had in the old file we compare strong hashes:
- If some of the old hashes weren't found in the new file at all - they are to be deleted.