v3
📝 Changelog - IMG Dataset Refiner (v3.0 Pro)
This major update transforms the tool into a true professional Data Engineering suite for AI models. It brings visual analysis capabilities, automated image processing, and local Artificial Intelligence assistance.
🤖 AI New Features (Local Assistant via API)
- Ollama / LM Studio Integration: Native support to run language models (LLM) and vision models (VLM) directly on the dataset via local API.
- Auto-Tagging / Super OCR (VLM): Full generation of captions or precise extraction of text embedded in the image.
- Reality Check & Hallucination Hunter (VLM): The AI compares text to the image and automatically removes tags describing invisible elements.
- Concept Isolator (VLM): The AI describes the environment and ignores the central subject, ideal for preparing training data for character LoRAs.
- Visual Translator (Booru ↔ Natural): Intelligent conversion of tag lists into fluent complete sentences (optimized for Flux and SD3).
- Tag Sorting & Standardization: Restructuring tags by order of importance and automatic correction of semantic errors.
- Custom Prompt & Templates: Ability to create your own AI queries (with the {tags} variable), choose the injection mode (Replace, Add) and save your own AI recipes.
- Advanced Error Management: The tool silently ignores API crashes/timeouts on certain images to continue batch processing, and generates a detailed final report.
- Semantic Bias Analysis: Generation of a detailed report by an LLM on the quality and potential biases of your dataset.
🖼️ Pre-processing & Image New Features
- Visual Duplicate Tracking (Perceptual Hashing): New scanner powered by ImageHash capable of detecting near-identical images (even if cropped or resized). Side-by-side interface for easy deletion.
- Mass Resizing & Formatting: Fast conversion of an entire folder (e.g., to 1024x1024 in WebP) via Pillow.
- Smart Face Crop (OpenCV): Intelligent crop option that detects faces to automatically center the crop around the main subject.
- Automatic Alpha Management: Automatic conversion of transparent backgrounds (PNG) into pure white backgrounds, a required standard for training.
- Batch Renaming: Built-in tool to rename all images and their associated .txt files with a common prefix.
🧬 Analytics & UX New Features
- Name Change: "Datasets Images EditSelect" officially becomes "IMG Dataset Refiner".
- Intellisense (Autocomplete): Injection of a native JavaScript script in the viewer. The tool now automatically suggests existing keywords from your dataset while typing!
- Co-occurrence Matrix (Concept Bleeding): New interactive Plotly chart to spot if two tags (e.g., a character and a clothing item) appear together too often.
- Resolution Analyzer (Bucketing): New scatter plot to check the dimension distribution of your images against standard training "buckets".
- Exclusion Matrix (Anti-Heatmap): List of ultra-frequent tags that are never associated, to detect gaps in the dataset.
- Logical Contradiction Hunter: Offline verification script that flags glaring inconsistencies (e.g., day and night on the same image).
- Onboarding & Contextual Tools: Added quick start guide dropdown menus and interactive tooltips to guide new users.