Skip to content

curtismenmuir/go-file-diff

Repository files navigation

go-file-diff

💥 Important

  • This project has been built using Go version: go1.18.3
  • This project will diff an Original + Updated version of a file to produce a changeset on how to update the Original version to sync latest changes.
  • This project implements a 16-byte rolling hash algorithm for evaluating differences between the 2 files.
    • Rolling hash algorithm is based on the Rabin–Karp algorithm.
    • A stronger SHA-256 hash of each 16-byte chunk will also be compared to reduce the impact of collisions with the rolling hash algorithm.
  • This project is based on the rdiff application.
  • Delta changeset will evaluate:
    • Chunk changes and/or additions
    • Chunk removals
    • Additions between chunks with shifted original chunks

📝 Description

This project can be used to compare 2 versions of a file, establish what has changed, and produce a Delta changeset of how the Original version can be patched to sync the latest changes.

  • NOTE: patch functionality is out of scope for now, but will be added in the future!

This can be used with 2 files on the same machine, or used to update files across different machines.

Example flow for distributed files

  • Machine 1 and Machine 2 both have a copy of the same file
  • Machine 1 has local updates to the file and these should be synced with Machine 2
  • Machine 2 generates a Signature of their original copy of the file:
    • ./go-file-diff -signatureMode -original=original.txt -signature=sig.txt
  • Machine 2 sends Signature file to Machine 1
  • Machine 1 generates a Delta of their local changes using provided Signature file:
    • ./go-file-diff -deltaMode -signature=sig.txt -updated=updated.txt -delta=delta.txt
  • Machine 1 returns Delta file to Machine 2
  • Machine 2 uses the Delta file to Patch their original version of the file to sync latest changes
    • NOTE: Patch functionality coming soon!

🔜 Future Improvements

  • Add Dockerfile
    • Use docker-compose for mounting host volume into container?
  • Implement Patch functionality
  • Performance testing
  • Setup CI pipeline
    • CircleCI free account?
  • Setup go channels for processing Signature Weak + Strong hashes concurrently?
  • Add default file names for Signature + Delta files?
    • EG: signature_yyyy_mm_dd_hh_mm_ss + delta_yyyy_mm_dd_hh_mm_ss

⬆️ How to Setup Project

Step 1: git clone this repo

Step 2: Ensure Go v1.18.3 is installed & configured on machine

  • NOTE: Go should be installed with gvm for managing multiple go versions

Step 3: Download deps: go mod download

▶️ How to Run Project for Development

Step 1: Complete Setup instructions above

Step 2: Run app: go run . <CMD Args>

  • EG go run . -signatureMode -original=original.txt -signature=sig.txt -v
  • NOTE: See CMD Commands section below for more details

🚀 How to Run Project for Release

Step 1: Complete Setup instructions above

Step 2: Build release app: go build

Step 3: Run release app: ./go-file-diff <CMD Args>

  • EG ./go-file-diff -signatureMode -original=original.txt -signature=sig.txt -v
  • NOTE: See CMD Commands section below for more details

💡 CMD Commands

Command Example usage Description
-signatureMode -signatureMode Enables Signature generation.
-deltaMode -deltaMode Enables Delta generation.
-original -original=SomeFile.txt Name of Original file used for Signature generation.
-signature -signature=SomeFile.txt Name of Signature file. In Signature mode, this will be used as Output file. In Delta mode, this will be used as an input file.
-updated -updated=SomeFile.txt Name of Updated file used for Delta generation.
-delta -delta=SomeFile.txt Name of Delta file. In Delta mode, this will be used as an Output file.
-v -v Enables verbose logging.

NOTE: Relative file paths should be used to access files in different folders from the application. EG:

  • ./SomeFolder/SomeFile.txt
  • ../../AnotherFile.txt

💻 Example Usage

  • Signature Mode: ./go-file-diff -signatureMode -original=original.txt -signature=sig.txt -v
  • Delta Mode: ./go-file-diff -deltaMode -signature=Outputs/sig.txt -updated=updated.txt -delta=delta.txt -v
  • Signature + Delta Mode: ./go-file-diff -signatureMode -deltaMode -original=original.txt -signature=sig.txt -updated=updated.txt -delta=delta.txt -v

🚨 Unit Tests

  • Run Tests: go test ./...
  • Run Tests with Coverage: go test ./... -coverprofile cp.out
  • View coverage report in Browser: go tool cover -html=cp.out

👮 Linting

About

Rolling hash based file diffing algorithm written in Golang

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages