Skip to content

IamAngusU/vision-sort

Repository files navigation

VisionSort

Rename hundreds of screenshots in minutes instead of hours — powered by local AI.

VisionSort scans a folder of images, sends each one to a vision model running on your machine through Ollama, and proposes descriptive filenames. You review the suggestions in a clean web interface, tweak anything you want, then apply all renames with one click.

Everything runs locally. No uploads, no API keys, no cloud services.

What It Does

  • Analyzes images with AI — A local vision model looks at each image and generates a short, descriptive filename (e.g., Screenshot_2025-03-14_091823.pngvscode-python-debugging.png).
  • Batch processing — Point it at a folder with thousands of images and walk away. A progress bar with ETA keeps you informed.
  • Review before applying — Every suggestion shows up as an editable card. Approve, edit, skip, or re-analyze individual files. Bulk-approve when you're happy.
  • Two modes — Rename files in place, or move them to a target directory.
  • Duplicate detection — Finds visually similar images using perceptual hashing so you can clean up before renaming.
  • NSFW detection — Flags sensitive content automatically with an nsfw- prefix.
  • Persistent image IDs — Embeds a unique identifier into each image's metadata (EXIF for JPEG/WebP, tEXt chunk for PNG). Files keep their identity across renames and moves.
  • Undo support — Every Apply generates a PowerShell undo script and JSON manifest. One click to reverse.
  • Pause & resume — Pause analysis mid-batch and pick up where you left off.
  • History — Full audit trail of all operations in a searchable carousel.
  • Dark & light mode — Follows your system preference, with a manual toggle.
  • System monitoring — Live CPU, RAM, and GPU usage in the header.

Requirements

  • Python 3.11+
  • Ollama running with at least one vision model
  • NVIDIA GPU recommended for faster inference (works on CPU too, just slower)

Quick Start

Windows

git clone https://github.com/IamAngusU/vision-sort.git
cd vision-sort
start.bat

macOS / Linux

git clone https://github.com/IamAngusU/vision-sort.git
cd vision-sort
chmod +x start.sh
./start.sh

The start script creates a virtual environment, installs dependencies, and opens the browser automatically.

Pull a vision model (if you haven't already)

ollama pull qwen2.5vl:7b

VisionSort will also suggest models directly in the UI and can pull them for you.

Manual setup

If you prefer to set things up yourself:

git clone https://github.com/IamAngusU/vision-sort.git
cd vision-sort
pip install -r requirements.txt
python run.py

Then open http://localhost:8899.

How To Use

  1. Start Ollama — Make sure Ollama is running before you start VisionSort.
  2. Select a folder — Browse to the directory with your screenshots using the sidebar.
  3. Pick a model — Choose from the installed vision models or pull a new one.
  4. Start analysis — Hit "Start" and watch the AI work through your images.
  5. Review suggestions — Edit or approve the generated names. Use filters and search to find specific files.
  6. Apply — Click "Apply Approved" to rename/move all approved files at once.
  7. Stop the server — Click "Stop Server" in the sidebar or close the terminal.

Stopping the Server

Three ways to stop VisionSort:

  • Click Stop Server in the sidebar
  • Press Ctrl+C in the terminal
  • Close the terminal window

Configuration

All settings are optional and controlled through environment variables:

Variable Default Description
VISION_SORT_OLLAMA_URL auto-detected Ollama API endpoint
VISION_SORT_DEFAULT_MODEL qwen2.5vl:7b Default vision model
VISION_SORT_HOST 127.0.0.1 Server bind address
VISION_SORT_PORT 8899 Server port
VISION_SORT_BASE_PATH Base path for reverse proxy
VISION_SORT_DATA_DIR data State and thumbnail storage
VISION_SORT_DEBUG false Enable debug mode and API docs

Project Structure

vision-sort/
├── start.bat / start.sh      # One-click launcher
├── run.py                     # Entry point
├── requirements.txt
├── src/
│   ├── app.py                 # FastAPI application factory
│   ├── config.py              # Environment-driven configuration
│   ├── models/
│   │   └── schemas.py         # Data models
│   ├── routers/
│   │   ├── events.py          # Server-Sent Events (live updates)
│   │   ├── files.py           # File management & thumbnails
│   │   ├── models.py          # Ollama models & system info
│   │   └── process.py         # Analysis, rename, undo
│   ├── services/
│   │   ├── analyzer.py        # Vision AI pipeline
│   │   ├── duplicates.py      # Perceptual hash duplicate detection
│   │   ├── image_id.py        # Persistent EXIF/PNG metadata IDs
│   │   ├── history.py         # Operation audit trail
│   │   ├── ollama_client.py   # Ollama HTTP client
│   │   ├── renamer.py         # Rename, move, backup, undo
│   │   ├── scanner.py         # Directory scanning
│   │   └── state.py           # Thread-safe state persistence
│   └── static/                # Web interface (vanilla JS, no build step)
└── data/                      # Auto-created: state, thumbnails

Reverse Proxy (nginx)

To serve behind nginx at /vision-sort:

location /vision-sort/ {
    proxy_pass http://127.0.0.1:8899/;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_http_version 1.1;
    proxy_set_header Connection '';
    proxy_buffering off;
    proxy_cache off;
    proxy_read_timeout 86400s;
}
VISION_SORT_BASE_PATH=/vision-sort python run.py

License

PolyForm Noncommercial 1.0.0 — free for personal and non-commercial use. Commercial use is not permitted.

About

AI-powered screenshot renaming using local vision models via Ollama

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors