dupes - WIP
A command line tool for finding duplicate files
dupes helps you find duplicate files.
This is a work in progress.
$ npm install -g dupes
dupes [directory] will search that directory for duplicate files and write a dupes.json results file.
[directory] will run the commands on the current directory.
dupes-read [filePath] will open that directory's .json file and output its results.
[filePath] will open 'dupes.json' in the current directory.
dupes -h and
dupes-read -h for more options.
How does it work?
dupes works by running a list of differentiation methods - each one seperates the current pools of files into smaller pools. Pools of size 1 are then ommited. The current differentiation methods are:
- File size - Very fast, filters out a lot of files, but obviously still leaves different files together.
- MD5 of the whole file - Slower but definite.
Each method has a different trade-off of speed and accuracy. By starting with the cheapest methods, we make sure that more expensive ones will be executed on a smaller amount of files. By finishing with a whole-file-checksum, we guarantee that files deemed identical are indeed so.
- Allow rewriting of dupes.json files if they are signed
- Implement an option to run only on files from old result
- Implement better ways to read results (largest files, largest groups)
- Add option to limit files checked to certain sizes.
- Add option to include or exclude files/folders by name/regex, and set default excludes.
- Turn this list into github issues.
- Find duplicate folders.
- Check if separateFilesByHeadMD5 (commented out) will work better with bigger file size limitations.
- Try searching for a hash function faster than md5.
dupes-gui, a web gui to write and read dupes results in the browser with express.
- Change array style to comma-before.