Skip to content

a Python command-line tool that identifies and groups similar images using average hashing. It supports single-level and recursive directory scanning, adjustable similarity threshold, and presents results in JSON format. Ideal for image deduplication, organization, and content-based retrieval tasks.

License

Notifications You must be signed in to change notification settings

blackmonk13/similar_images

Repository files navigation

Similar Images

Status GitHub Issues GitHub Pull Requests License


This is a Python-based application designed to identify and group similar images within a specified directory. It utilizes image hashing, specifically average hashing, to compare images efficiently. The tool offers the flexibility to scan directories at a single level or recursively, depending on the user's needs. The similarity threshold can be adjusted to control the sensitivity of the comparison process. The results are presented in a JSON format, making it easy to understand and process the grouped images further.

📝 Table of Contents

🧐 About

The primary purpose of the project is to address the challenge of identifying and organizing similar images within large datasets. Manually comparing and categorizing images can be a laborious and impractical task, especially when dealing with a significant number of images. By employing image hashing techniques, the project aims to streamline this process, making it more efficient and accurate. The tool can be particularly useful for image deduplication, image organization, and content-based image retrieval tasks, offering a valuable solution for individuals and organizations working with extensive image collections.

🏁 Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

To get started with the Similarity Finder project, you need to have the following software installed on your local machine:

  1. Git - Download & Install Git
  2. Python (version 3.6 or higher) - Download & Install Python

Installing

Here's a step-by-step guide to help you set up a development environment for the Similarity Finder project:

  1. Clone the repository
git clone https://github.com/blackmonk13/similar_images.git
  1. Navigate to the project directory
cd similar_images
  1. Set up a Python virtual environment (optional but recommended)
python3 -m venv env
  1. Activate the virtual environment

On Windows:

.\env\Scripts\activate

On macOS and Linux:

source env/bin/activate
  1. Install the dependencies from the requirements.txt file
pip install -r requirements.txt
  1. Run the project using the command mentioned in the Usage section.

🎈 Usage

To use the Similarity Finder, run the following command in your terminal:

python -m similar_images -t 1 -r -o json -f output.json path/to/your/image/directory

Replace path/to/your/image/directory with the path to the directory containing images you want to analyze. The -t flag sets the similarity threshold (default is 10), the -r flag enables or disables recursive directory scanning, the -o flag sets the output format (default is json), and the -f flag specifies the path to the output file.

The application will output a JSON or CSV file containing groups of similar images found in the specified directory.

🚀 Deployment

You can find the latest wheel (.whl) file in the release page. To deploy the project on a live system:

Unix-like systems (macOS, Linux)

Run the following command in your terminal:

curl -s https://api.github.com/repos/blackmonk13/similar_images/releases/latest | jq -r '.assets[] | select(.name | endswith(".whl")) | .browser_download_url' | xargs pip install

Windows

Open PowerShell and run the following command:

(Invoke-WebRequest -Uri "https://api.github.com/repos/blackmonk13/similar_images/releases/latest" -UseBasicParsing | ConvertFrom-Json).assets | Where-Object { $_.name -like "*whl" } | ForEach-Object { pip install $_.browser_download_url }

This command will install the project and its dependencies, allowing you to run the similar_images command directly from your terminal or Command Prompt without any hassle.

⛏️ Built Using

✍️ Authors

See also the list of contributors who participated in this project.

Contributing

We welcome contributions from the community! If you'd like to contribute to Banner, please follow these steps:

  1. Fork the repository and create a new branch for your changes.
  2. Commit your changes and push them to your fork.
  3. Open a pull request against the main branch of the original repository.

Please make sure that your contributions adhere to the project's coding style and guidelines.

Before submitting a pull request, please make sure that:

  1. Your changes do not introduce any new bugs or regressions.
  2. Your code is well-documented and easy to understand.

About

a Python command-line tool that identifies and groups similar images using average hashing. It supports single-level and recursive directory scanning, adjustable similarity threshold, and presents results in JSON format. Ideal for image deduplication, organization, and content-based retrieval tasks.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages