Duplicate File Finder

This script scans a directory tree and identifies duplicate files with a given file extension. It uses SHA256 hashing to compare the files and outputs the duplicate matches to a CSV file.

File signatures courtesy of: fleep @ua-nick

Prerequisites

Python 3.8 or higher

Installation

Clone the repository:

git clone https://github.com/dfirsec/dup_file_finder.git

Navigate to the project directory:

cd dup_file_finder

Install the dependencies using poetry:

poetry install

Usage

Create the virtual environment

poetry shell

Run using the following commands:

python dup_file_finder.py dirpath ext

dirpath: The directory path to scan for duplicate files.
ext: The file extension to scan for.

Example

python dup_file_finder.py /path/to/directory pdf

This will scan the specified directory for PDF files and identify duplicate matches. The results will be saved to a CSV file named duplicate_matches.csv in the results directory.

Contributing

Contributions are welcome! If you find any issues or have suggestions for improvement, please create an issue or submit a pull request.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
utils		utils
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
dup_file_finder.py		dup_file_finder.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Duplicate File Finder

Prerequisites

Installation

Usage

Example

Contributing

License

About

Releases

Packages

Contributors 2

Languages

License

dfirsec/dup_file_finder

Folders and files

Latest commit

History

Repository files navigation

Duplicate File Finder

Prerequisites

Installation

Usage

Example

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages