GitHub - loh-tar/dupfi: Duplicate Finder - A pure shell script to find identical files across your drives

dupfi - A Duplicate Finder

dupfi is a pure shell script to find identical files across your drives with no other dependencies than a *NIX like system.

The collection of files is done by find utility, see that man page for details. It works in two main steps:

Collect files with same size
Filter files by same checksum

The result list will be saved for further processing or alternatively immediately printed.

Usage

dupfi [<options>] <path>... [<find-options>]

Some options are:

-d    Be dull, without calc checksum only for first 1M
-p    Print result list but not save, implies -q
-q    Be quiet as possible
-s    Show last or current results with less

Examples

Well ok, I do not have so much stuff on my laptop, especially no big files where calculation of the checksum need notable time, but without hidden directories the check of my home tree need slightly more than 1min.

[ ~]$ time dupfi . ! -path '*/.*'
Execute: find  . ! -path \*/.\* -type f -size +0c
Collect files.....................................[DONE] Found 11679 files
Sort and filter files by same size................[DONE] Found 6290 candidates
Calc checksum for file 6290 of 6290...............[DONE]
Sort and filter files by same checksum............[DONE] Found 2135 duplicates
Result saved as: /home/lot/.dupfi/duplicates

real    1m5,496s

When looking only for pdf files the check was done in 1sec.

[ ~]$ time dupfi . '! -path */.* -iname *.pdf'
Execute: find  . ! -path \*/.\* -iname \*.pdf -type f -size +0c
Collect files.....................................[DONE] Found 1065 files
Sort and filter files by same size................[DONE] Found 27 candidates
Calc checksum for file 27 of 27...................[DONE]
Sort and filter files by same checksum............[DONE] Found 20 duplicates
Result saved as: /home/lot/.dupfi/duplicates

real    0m1,214s

Running the same check on jpg files need 6sec, whereby it's interesting that all candidates are duplicates and the checksum calculation take the most time. At this point I add the option -d which gives sadly only at notable big files a real benefit.

[ ~]$ time dupfi . '! -path */.* -iname *.jpg'
Execute: find  . ! -path \*/.\* -iname \*.jpg -type f -size +0c
Collect files.....................................[DONE] Found 637 files
Sort and filter files by same size................[DONE] Found 180 candidates
Calc checksum for file 180 of 180.................[DONE]
Sort and filter files by same checksum............[DONE] Found 180 duplicates
Result saved as: /home/lot/.dupfi/duplicates

real    0m5,897s

Install

Copy dupfi somewhere to your $PATH and ensure it is executable, that's all.

TODOs and Thoughts

Think about how to further process the duplicate file
Add some more useful examples how to filter something
Testing on various platforms

Credits

The idea to dupfi is inspired by a answers at a stackexchange thread, so thanks goes there!

All contributors of stackexchange.com, bash-hackers.org, dict.cc and many more.

License

GNU General Public License (GPL), Version 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
LICENSE		LICENSE
README.md		README.md
dupfi		dupfi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

dupfi

dupfi

Repository files navigation

dupfi - A Duplicate Finder

Usage

Some options are:

Examples

Install

TODOs and Thoughts

Credits

License

About

Releases

Packages

Languages

License

loh-tar/dupfi

Folders and files

Latest commit

History

Repository files navigation

dupfi - A Duplicate Finder

Usage

Some options are:

Examples

Install

TODOs and Thoughts

Credits

License

About

Resources

License

Stars

Watchers

Forks

Languages