Skip to content
/ iu Public

images indexer/searcher

License

Unknown, GPL-3.0 licenses found

Licenses found

Unknown
COPYING.GPLv2
GPL-3.0
COPYING.GPLv3
Notifications You must be signed in to change notification settings

dmacvicar/iu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

iu

Build

"iu" is an experiment that started with this tweet.

The goal is to do research around a tool to index and searching your image collection.

"iu" is not intended for productive use, and perhaps will never be.

The name comes from "mu", which is a mail indexer that inspired this project. "mu" means maildir utils, so I guess "iu" means "image utils".

What should "iu" be?

  • Just a command line tool
  • Targetted to the average person collecing lot of photos over the years
  • Basic integration with other tools eg. query search results opening in some album viewer
  • Reasonable fast indexing when re-indexing from scratch
  • Very fast indexing when a couple of new photos are added to the collection
  • Some basic features when indexing:
    • Camera model
    • Date
    • Album (?)
  • Some fancy indexing features I expect to add at some point:
    • Offline reverse geo-location: Turn GPS data into places names
    • Offline automatic tagging: Recognize basic entities (food, guitars, animals, bikes, cars, colors) and index on the object word
    • Search similar images, to detect duplicates while sorting my collection
    • Find images with low quality, to be used when curating my camera inbox
    • OCR, index on words in the image
    • Recognize people and index them
  • Ultra fast searching

Building from source

You need:

Once you satisfy those requirements

cmake -S . -B build
cmake --build build

or

$ cd build
cmake ..
make

Running

Getting data files

cmake --build build --target data

or..

$ cd build
$ make data

Indexing images

$ cd build
$ src/iu index --root ~/Pictures
...
indexed: 15465 files

Search

$ cd build
$ src/iu find "camera:powershot"
8725 result found
0: docid /home/foo/1.jpg
...
real    0m0.013s
user    0m0.008s
sys     0m0.005s

Performance

  • Without many optimizations, I can index 15k files (50G) in 2.7s on a old X230 laptop with SSD (libexif backend).
  • Adding offline geolocation over 121k places brings that up to 16s.

Implementation Notes

Technologies

  • Indexing is built on top of Xapian, a free and open-source probabilistic information retrieval library.

The idea of using [SQLite] was considered too.

  • Metadata from photos is retrieved using libexif.

    exiv2 was tested and while the API and format coverage was wider, it was much slower.

  • Examination of images is done with the help of Open Computer Vision Library.

Reverse geocoding index

Uses data from reverse_geocode, which is turn, comes from geonames.org. CC-By licence.

It is a dumb search by distance and it is not optimized yet.

Right now the technique is that we convert the photo location into a label (place name) and add this name to the index as a term. Therefore the place is passed into the query.

An alternative approach I am exploring is to allow to pass the place as part of a command line, separate from the query, and use Xapian geospatial (ie. LatLongDistancePostingSource), adding this posting source to the query object.

I will start this exploration by adding the location as a value to the document.

Automatic labeling

Uses Berkeley Vision and Learning Center Caffe GoogleNet model, and the word list from ImageNet.

I would still like to allow to drop models and labels list in a directory and have the indexer pick it up automatically.

Quality classification

Uses the BRISQUE (Blind/Referenceless Image Spatial Quality Evaluator), a No Reference Image Quality Assessment (NR-IQA) algorithm as in implemented in OpenCV contrib.

We use the trained model provided in the /samples/ directory, trained on the LIVE-R2 database as in the original implementation.

Right now we don't do anything with this except of adding the word "blurry" to the index. In theory I should add this as a value.

Browsing photos

Right now if you add "-b" (browse) to a search, it will pass the list of files in the result to eog. This does not work well, as there is a limit on the number of files, and if there are no results, eog will still show other files. I am looking for a good replacement.

Hopefully I don't need to write my own.

License

  • (C)2020 Duncan Mac-Vicar P.

  • "iu" is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

  • "iu" is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

About

images indexer/searcher

Resources

License

Unknown, GPL-3.0 licenses found

Licenses found

Unknown
COPYING.GPLv2
GPL-3.0
COPYING.GPLv3

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published