I try to make a RESTful face recognition system.
Current inferencing backend supports ONNX and TensorRT models.
The first attempt is to saving face embedding vectors into a MongoDB collection and query the nearest neighbor of a detected face from the collection. I tried to write an aggregate query but the query reponse takes really long time even I have only 100+ documents in the collection.
I cannot find an off-shelf solution for high dimension vector similarity querying in a database system. Postgresql CUBE method only supports an up to 100 dimension vector. There are ways to work around this limit and use a 128-d vector. But Postgres still won't work on 512-d vectors. Most recent face recognition models like arcface or cosface embeds a face picture into a 512-d vec which is far beyond the capability of Postgres.
Optimizing a database engine to fit this certain task is beyond my reach now. Then I look into in-memory elastic search methods. There are few established tools available. Annoy from Spotify, FAISS from FaceBook and Nmslib from a bunch of Ph.D. students (later adopted by AWS to their elastic-search service). I pick nmslib in this repo because it can be easily installed with "pip install" cmd.
REST API is served by FastAPI, to start the server, simply run
uvicorn app.main:app --port 8080 --host 127.0.0.1
You can download IP-Webcam app on your smart phone and start a streaming server. Once IP-Webcam starts running, run test_video.py script and check face recognition results on your screen. You can see a real time sample result in out2.avi file. I run my test on a Jetson Xavier (operation system is jetpack4.6) made by Nvidia.
If you like to add some new faces, you can build a new pkl file from people names and face embedding vectors generated by arcface model.
However, it looks like that neither FAISS nor nmslib supports on-flying CRUD ops. If someone knows how to do a 512-d vector nearest neighbor search in a database or what AWS does in their face recognition service , please let me know.