Skip to content

Fast, content based duplicate file detector with cache and more!

License

Notifications You must be signed in to change notification settings

MarcinOrlowski/dhunter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dhunter logo

PyPI version CodeFactor Code Rating Codacy Badge codebeat badge Language grade: Python Reviewed by Hound

Table of contents

Introduction

dhunter (pronounced The Hunter) is [d]uplicate [hunter] utility, designed to help scanning and processing large sets of files. Uses content based file duplicates matching and smart caching for faster directory scanning, data changes detection and processing.

Features

  • Content based file matching (sha256)
  • Designed to work with lot of data:
    • caches folder scaning results for quick reuse/rescan
    • directory scanning can be aborted and resumed at any moment
  • Smart content filters
    • Ignores zero length files and symlinks
    • Ignores folders like .git, .cvs, .svn
    • Supports file size based (min and/or max) filtering
    • Per folder exlusion via .dhunterignore file

Credits and license

  • Written and copyrighted ©2018-2019 by Marcin Orlowski
  • dhunter is open-sourced software licensed under the MIT license