"iu" is an experiment that started with this tweet.
The goal is to do research around a tool to index and searching your image collection.
"iu" is not intended for productive use, and perhaps will never be.
The name comes from "mu", which is a mail indexer that inspired this project. "mu" means maildir utils, so I guess "iu" means "image utils".
- Just a command line tool
- Targetted to the average person collecing lot of photos over the years
- Basic integration with other tools eg. query search results opening in some album viewer
- Reasonable fast indexing when re-indexing from scratch
- Very fast indexing when a couple of new photos are added to the collection
- Some basic features when indexing:
- Camera model
- Date
- Album (?)
- Some fancy indexing features I expect to add at some point:
- Offline reverse geo-location: Turn GPS data into places names
- Offline automatic tagging: Recognize basic entities (food, guitars, animals, bikes, cars, colors) and index on the object word
- Search similar images, to detect duplicates while sorting my collection
- Find images with low quality, to be used when curating my camera inbox
- OCR, index on words in the image
- Recognize people and index them
- Ultra fast searching
You need:
Once you satisfy those requirements
cmake -S . -B build
cmake --build build
or
$ cd build
cmake ..
make
cmake --build build --target data
or..
$ cd build
$ make data
$ cd build
$ src/iu index --root ~/Pictures
...
indexed: 15465 files
$ cd build
$ src/iu find "camera:powershot"
8725 result found
0: docid /home/foo/1.jpg
...
real 0m0.013s
user 0m0.008s
sys 0m0.005s
- Without many optimizations, I can index 15k files (50G) in 2.7s on a old X230 laptop with SSD (libexif backend).
- Adding offline geolocation over 121k places brings that up to 16s.
- Indexing is built on top of Xapian, a free and open-source probabilistic information retrieval library.
The idea of using [SQLite] was considered too.
-
Metadata from photos is retrieved using libexif.
exiv2 was tested and while the API and format coverage was wider, it was much slower.
-
Examination of images is done with the help of Open Computer Vision Library.
Uses data from reverse_geocode, which is turn, comes from geonames.org. CC-By licence.
It is a dumb search by distance and it is not optimized yet.
Right now the technique is that we convert the photo location into a label (place name) and add this name to the index as a term. Therefore the place is passed into the query.
An alternative approach I am exploring is to allow to pass the place as part of a command line, separate from the query, and use Xapian geospatial (ie. LatLongDistancePostingSource
), adding this posting source to the query object.
I will start this exploration by adding the location as a value to the document.
Uses Berkeley Vision and Learning Center Caffe GoogleNet model, and the word list from ImageNet.
I would still like to allow to drop models and labels list in a directory and have the indexer pick it up automatically.
Uses the BRISQUE (Blind/Referenceless Image Spatial Quality Evaluator), a No Reference Image Quality Assessment (NR-IQA) algorithm as in implemented in OpenCV contrib.
We use the trained model provided in the /samples/ directory, trained on the LIVE-R2 database as in the original implementation.
Right now we don't do anything with this except of adding the word "blurry" to the index. In theory I should add this as a value.
Right now if you add "-b" (browse) to a search, it will pass the list of files in the result to eog
. This does not work well, as there is a limit on the number of files, and if there are no results, eog
will still show other files. I am looking for a good replacement.
Hopefully I don't need to write my own.
-
(C)2020 Duncan Mac-Vicar P.
-
"iu" is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
-
"iu" is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.