Any plans to speed up recognition? #99

schrodog · 2018-11-19T02:40:10Z

Right now when I am using howdy with certainty = 5, it usually cost more than 2-3 seconds to authenticate, which is considerably slower than Windows hello (average < 2s speed). Any plan to further improve algorithm?

boltgolt · 2018-11-19T08:29:10Z

Not right now, no. I'm not experienced enough with machine learning to create my own model, and if i did it would probably be slower than the one used right now. @dmig has made some general optimizations, but they are more to make the code more readable and won't speed things up that much.

You can set end_report to true with sudo howdy config to get a more detailed time report after authentication. For me this returns:

Time spend
  Starting up: 135ms
  Opening the camera: 260ms
  Importing face_recognition: 1426ms
  Searching for known face: 301ms

So the most time by far is used on importing the face_recognition module, which is where we could definitely save a lot of time. If anyone has ideas to speed this up or other modules to use, let me know.

dmig · 2018-11-19T08:58:42Z

For me noticeable speedup was gained by rebuilding dlib with full optimisations (see it's manual). Using CUDA also should give performance boost.

dmig · 2018-11-19T09:01:29Z

Oh, btw, using normal cam (not IR one) slows down recognition, but gives more reliable result

dmig · 2018-11-19T09:51:37Z

My results:

Time spend
  Starting up: 92ms
  Opening the camera: 435ms
  Importing face_recognition: 1070ms
  Searching for known face: 91ms

Resolution
  Native: 374x340
  Used: 374x340

Frames searched: 1 (11.01 fps)
Dark frames ignored: 0
Certainty of winning frame: 2.38
Winning model: 0 ("Initial model")

This is probably the best time, it can be up to 2 seconds sometimes. Notice the resolution: I set max_height = 374 in config to avoid unnecessary scaling.
I'll try to discover if it's possible to reduce face_recognition load time.

dmig · 2018-11-19T14:50:49Z

... rebuilding dlib with full optimizations...

Looking at debian/postinst, how it builds dlib, I realized, that I got deeper: I installed one of dlib optional dependencies, OpenBLAS, which is pretty well optimized and gained me better recognition speed.
But there are more options:

Intel® MKL (proprietary, might outperform OpenBLAS)
OpenBLAS itself, takes advantage of newer CPU instructions
BLAS from lapack
clBLAS OpenCL backed
cuBLAS CUDA backed

All of them except first are available as packages in standard repositories. By default, not finding any, dlib will issue a warning at configure stage and allow you to proceed, probably using some fallback code.

I'll make some tests and rewrite installation script later to choose best available libraries. I have a good feeling.

boltgolt · 2018-11-19T15:00:21Z

Nice work! Does it speed up the per-frame recognition times or the face_recognition import? How much does installing OpenBLAS contribute to installation duration?

dmig · 2018-11-19T15:06:32Z

I can't tell how much, since I didn't measure. I'll do that later. Libraries are installed as packages from Ubuntu repos.

sapjunior · 2018-11-20T16:24:05Z

Maybe we have to build howdy as a background service because it's spend most of time on loading the library. The serivice will wake up camera upon authentication request.

dmig · 2018-11-20T16:29:04Z

That's possible, but I don't like the approach. I'll try to reduce amount of data loaded. Also, it's possible to rely on filesystem cache to keep libraries in memory most of the time.

boltgolt · 2018-11-21T09:56:45Z

I've also been thinking about offloading the actual recognition to an (optional) deamon. However, compare.py uses about 217MB of memory right now which i think is a bit much to keep reserved. If we could do it though a cache it would be the best of both worlds though.

There's also the possibility of multithreading. Right now Howdy just uses 1 core and pushes it to maximum load, but doesn't utilize any of the other 7 on my machine because everything is running in a single thread. I think splitting the recognition into a separate process would also allow us to start multiple instances, increasing the amount of frames we can process.

dmig · 2018-11-21T12:33:24Z

@boltgolt that's what I'm addressing right now. It's possible to reduce memory footprint, probably not much, need to test.
Can you tell, why are you using 2 versions of python at the same time? I see no reason for that.

boltgolt · 2018-11-21T13:00:47Z

Howdy depends on pam_python, which handles the conversation with the central authentication system. It's a bit outdated though and can only run python2 scripts, which is why pam.py runs in a different version than the rest of the script.

I'm looking at alternatives in #9. In my opinion a C++ PAM module that starts compare.py would be the most efficient solution.

dmig · 2018-11-21T13:07:45Z

Ok, let me ask another way: why do you use python3? Python2 would be sufficient.

boltgolt · 2018-11-21T13:14:12Z

Python2 will be dropped in 2020, about a year from now. I don't think it's very future proof and all python packages should be moving to 3 relatively soon.

dmig · 2018-11-21T13:19:53Z

Until pam_python gets updated to python3 in system repo, we can stick to 2.7. No need to waste time/ram spawning python3 interpreter.

boltgolt · 2018-11-21T13:23:48Z

I don't particularly like the idea rewriting everything except pam.py to python2, which is slowly fading away. A more structural move would be to ditch the pam_python part and directly go from C++ to python3.

dmig · 2018-11-21T13:25:27Z

You mean pam_exec? It's installed by default, -1 dependency.

boltgolt · 2018-11-21T13:27:50Z

No, i mean writing a custom PAM module for Howdy alone. The major upside to this is that it should also be able to ask for simultaneous password input while Howdy is running.

dmig · 2018-11-21T13:31:31Z

So, why stick to python code then?
Rewritten in C completely it'll provide the best possible performance.

boltgolt · 2018-11-21T13:41:49Z

The honest answer to that is that i started this as a small script just for myself, and i'm much more familiar with python. It's the same reason why we're still depending on pam_python to this day, i'm not comfortable at all with C++. That doesn't mean we shouldn't switch. But my contributions would be much less in C++.

sapjunior · 2018-11-21T18:46:27Z

I found a fork of pam_python which support python3 here https://github.com/minrk/pamela (used in JupyterHub).

sapjunior · 2018-11-22T06:03:01Z

@dmig @boltgolt
Please take a look at my repository https://github.com/sapjunior/dlibFaceRecognition. I wrote a stripped down version of face_recognition by loading only necessary dlib model. The original ageitgey/face_recognition loading the unused models (CNN and 64 face landmark). This stripped down version can reduce the total process time (loading library,model + open camera + recogition ) to only ~1.4s on my notebook. I also apply some optimizations by using numpy broadcasting technique instead of simple for-loop to compare faces.

Original
Time spend
Starting up: 297ms
Opening the camera: 357ms
Importing face_recognition: 1028ms
Searching for known face: 2191ms

Resolution
Native: 340x340
Used: 340x340

Improve version
Load Library and Model in 0.7182595729827881 s
Open Camera in 0.36762571334838867 s
Winning Model 0 Distance 0.22640754844688699
Face Recognition in 0.29424214363098145 s
Total Time 1.3801851272583008 s

dmig · 2018-11-22T06:55:11Z

@sapjunior great! I'm looking into ditching face_recognition packages completely because these are just dlib examples wrapped in packages.

dmig · 2018-11-28T07:30:08Z

Ok, tests finished. Here is a small report.

Test HW: Core-i7 8550U, 8GB, SSD (SATA, 6MBit/s), no GPU acceleration used (unfortunately, dlib doesn't support neither OpenCL by itself nor any of OpenCL backed libraries).
Test software:

Ubuntu 18.10 x64 with kernel 4.18.19
6 python2 + python3 virtualenvs with different dlib builds and same other dependencies
200 sample images from IR webcam, face recognition fails on ~60 of them, no dark frames
face models from Howdy installation
every test was repeated 10 times

I built dlib with following configurations:

default - no dependencies, builtin BLAS/LAPACK substitutes used
atlas - Automatically Tuned Linear Algebra Software (ATLAS) (http://math-atlas.sourceforge.net/), libatlas-base-dev version: 3.10.3-7build1
blas - libblas from Netlib (http://www.netlib.org/lapack/), libblas-dev version: 3.8.0-1build1
lapack - libblas + liblapack from Netlib (http://www.netlib.org/lapack/), libblas-dev + liblapack-dev both version 3.8.0-1build1
openblas - OpenBLAS (http://www.openblas.net/), libopenblas-dev version: 0.3.3+ds-1
mkl - Intel MKL (https://software.intel.com/en-us/mkl), libmkl-dev version: 2018.3.222-3
flags USE_AVX_INSTRUCTIONS and DLIB_NO_GUI_SUPPORT always set

Results:

Each graph represent numbers for a single test. Bars denote median values. Smaller are better.
All numbers are milliseconds if not noted otherwise. Blue bars for Python2, red for Python3.

Conclusion:
I was surprised with atlas results and would recommend adding it as a package dependency or installing at build time. I wouldn't recommend using Intel MKL libs because of their size and some packaging problems.

More tests needed:

on CUDA enabled hardware
possibly with other dlib build flags on older CPUs w/o AVX, SSE4 support -- different configuration may win

If someone wishes to repeat tests, I'll write some instructions and publish all scripts.

dmig · 2018-11-28T07:32:19Z

Next step is -- get rid of face_recognition packages and reduce load time.

dmig · 2018-11-29T08:53:33Z

Another test results. I've rewritten the test in the same way as @sapjunior example.

Here face recognition was measured differently so face search + face recognition still take the same time.

Most notable change: memory consumption reduced by ~70mb.

boltgolt · 2018-11-29T22:18:32Z

Amazing work man! Very nice and readable with the great graphs.

I agree that atlas is the obvious choice if you look at the data. MKL also looks quite good in a lot of tests but i trust your judgement.

Would adding atlas be as simple (at least for Ubuntu) as adding libatlas3-base to the dependencies and a -DLIB_USE_BLAS to the dlib build command?

sapjunior · 2018-12-01T04:23:55Z

I think ATLAS is the best choice because MKL is quite large and difficult to install. By the way, I'm trying to port MobileFacenet from https://github.com/deepinsight/insightface by using OpenCV 4.0 ONNX dnn module. This network is quite fast (~40ms inference time) and look promising in term of accuracy as well.

dmig · 2018-12-01T04:36:45Z

@boltgolt all these libraries are mutually exclusive. There is no flag to tell dlib to use one of them, there is a simple routine to detect library presence. Moreover these libraries use alternatives mechanism to provide libblas.so and liblapack.so. So installing another one may possibly break existing software.
IMO better way would be adding a line to control:

Recommends: libatlas-base-dev | libopenblas-dev | liblapack-dev

dmig · 2018-12-01T05:24:16Z

@sapjunior MKL is not really difficult to install -- it is available from system repositories, but one of dependencies is packaged wrong way, so one extra package must be installed manually.
Overall size of all libmkl-dev dependencies is huge and this may increase load time with HDD.

dmig · 2018-12-01T05:37:00Z

BTW this may help HDD owners:

https://github.com/kokoko3k/gopreload -- nice and simple preloader, add installed dlib and face_recognition_models (especially!) to its preload list
old preload daemon -- I didn't find a way to point it to dlib/face_recognition_models so lets hope it'll pick them up automatically

boltgolt · 2018-12-02T21:33:23Z

I'll add them as recommended packages for Debian to the next release, thanks!

I'd say we add the optimized face_recognition @sapjunior made directly into the code here. Then either keep howdy in memory as a whole or just the .dat files.

dmig · 2018-12-07T03:14:01Z

I'd like to know some opinions about face_recognition_models package which contains 4 .dat files. It's simple to install but we need only 2 files of 4 at the same time. 1 common and 2 different files for CNN or HOG recognition code.

I see 2 options: leave it as a dependency or ditch it and download required .dat files from https://github.com/davisking/dlib-models (their origin)?

Right now I tend to the second, because we already download dlib itself.

boltgolt · 2018-12-07T11:43:57Z

If we ditch face_recognition completely we could implement the code from @sapjunior i mentioned in my last comment and just ship the +/- 30MB of data files in the package itself. Would eliminate a 2/3 of the space required and keep everything easy to install.

dmig · 2018-12-08T08:50:50Z

Ok, great. I made modifications to the code and testing them right now.

dmig · 2018-12-08T08:53:43Z

Also, I added an option to use CNN for recognition. Test runs 12-15 times slower on my hardware and consumes 200-600MB ram (opposite to 120MB for HOG), but maybe someone with CUDA enabled will find it useful.

dmig · 2018-12-09T07:35:22Z

Ok, the new code is in my repository https://github.com/dmig/howdy. Still not thoroughly tested, so not yet making a PR.
Authentication is now almost as fast as Windows Hello.

I want to test a pure python2 version of pam module: this should reduce ram usage by 8-16MB and slightly reduce startup time. Probably, subprocess.call() in pam.py is a blocker for #9.

boltgolt · 2018-12-09T12:28:27Z

Changes look very nice! The speedup would be very welcome and completely dropping face_recognition is a huge improvement. Open PR whenever you're ready.

I'm still against moving to python2 though. It's a step backwards in tech that 8 to 16 MB of memory doesn't compensate. Still working on a PAM module in C++ that should probably speed up startup time in a similar fashion.

dmig · 2018-12-09T12:52:24Z

I'm still against moving to python2 though. It's a step backwards in tech that 8 to 16 MB of memory doesn't compensate.

Python2 would be not a step backward, but a removal of ad-hoc code. Also, python code may be modified to run on both versions.

Still working on a PAM module in C++ that should probably speed up startup time in a similar fashion.

That would be great to have and it will give the same memory/startup improvements.

dmig · 2018-12-11T09:34:23Z

I've managed to reduce auth time to less than 1 second:

$ sudo ls
Time spent
  Starting up: 99ms
  Open cam + load libs: 690ms
    Opening the camera: 690ms
    Importing recognition libs: 548ms
  Searching for known face: 161ms

Resolution
  Native: 374x340
  Used: 374x340

Frames searched: 2 (12.45 fps)
Dark frames ignored: 0 
Certainty of winning frame: 3.431
Winning model: 3 ("Model #4")
Identified face as dmig
...

A small gain compared to the previous result, around 250ms:

Time spent
  Starting up: 101ms
  Opening the camera: 434ms
  Importing libs: 535ms
  Searching for known face: 1447ms

Some more tests and I'll make a PR.

boltgolt · 2018-12-11T13:14:13Z

That's almost exactly as long as windows hello (on my machine), very nice work!

dmig · 2018-12-11T17:06:30Z

That wasn't a final result. Probably this is:

$ sudo ls
Time spent
  Starting up: 95ms
  Open cam + load libs: 434ms
    Opening the camera: 434ms
    Importing recognition libs: 392ms
  Searching for known face: 63ms

Resolution
  Native: 374x340
  Used: 374x340

Frames searched: 2 (31.61 fps)
Dark frames ignored: 1 
Certainty of winning frame: 2.641
Winning model: 4 ("Model #5")
Identified face as dmig
autocomplete  debian  LICENSE  README.md  src

timwelch · 2018-12-16T18:11:28Z

I installed your branch, to see what might need to be changed for compatibility with the ffmpeg work I had been doing. It looks like you call a video_capture.grab() function, instead of .read(). In my ffmpeg_reader.py class, we'll need to add that function as a sort of redirect into .read().

	def grab(self):
		""" Redirect grab() to read() for compatibility """
		self.read()

Other than that it seems to work, and although the ffmpeg methods are slower than openCV, it looks like the new face recog saves around 500ms on average to the total time taken on my machine.

Kudos for the hard work!

timwelch · 2018-12-16T22:38:27Z

In the spirit of speeding things up... I found a way to ditch ffmpeg, which is horrendously slow, by using fully python v4l2 implementation instead (for my own HP IR camera). :-)

I ran a handful of tests. :-)

Note 1: I haven't compiled dlib for any tweaks, if you did that along the way. I only pulled down your git repo and started using it. So if there's more speedups because of that, this could be even faster!
Note 2: It appears that starting a new thread to load in the new face recog does make things faster, but for some reason it causes the rest of the code to slow down. Initially the pyv4l2 loading was 226ms, but running in parallel (i'm imagining) with the face recog loads nearly doubles that time to 475ms. Not quite sure what to make of that. If I remove the thread piece, and run the code like it was, loading the face recog piece after the pyv4l2 opening of the camera, the total time is ~890ms.
Note 3: Running with the idea of threading further, I ran a test pulling the actual video_capture initialization into a thread as well... That slowed things down more. So, there is definitely a 'cost' associated to threading with a diminishing return.

Original code with FFMPEG class (2210 ms):

-> sudo echo original-with-ffmpeg
Time spent
  Starting up: 118ms
  Open cam + load libs: 1007ms
    Opening the camera: 1007ms
    Importing recognition libs: 988ms
  Searching for known face: 97ms
  Total time: 2210ms

Resolution
  Native: 352x352
  Used: 320x320

Frames searched: 2 (20.60 fps)
Dark frames ignored: 1 
Certainty of winning frame: 2.583
Winning model: 0 ("Initial model")
Identified face as tim
original-with-ffmpeg

Original code with new pyv4l2 class (1565 ms):

-> sudo echo original-with-pyv4l2
Time spent
  Starting up: 117ms
  Open cam + load libs: 1124ms
    Opening the camera: 226ms
    Importing recognition libs: 1124ms
  Searching for known face: 98ms
  Total time: 1565ms

Resolution
  Native: 352x352
  Used: 320x320

Frames searched: 1 (10.22 fps)
Dark frames ignored: 0 
Certainty of winning frame: 3.228
Winning model: 0 ("Initial model")
Identified face as tim
original-with-pyv4l2

New recog code with new pyv4l2 class WITHOUT threading (885 ms):

-> sudo echo new-recog-with-pyv4l2
Time spent
  Starting up: 159ms
  Open cam + load libs: 382ms
    Opening the camera: 257ms
    Importing recognition libs: 382ms
  Searching for known face: 87ms
  Total time: 885ms

Resolution
  Native: 352x352
  Used: 320x320

Frames searched: 2 (22.98 fps)
Dark frames ignored: 1 
Certainty of winning frame: 2.450
Winning model: 1 ("Model #2")
Identified face as tim
new-recog-with-pyv4l2

New recog code with new pyv4l2 class WITH THREADING (748 ms):

-> sudo echo new-recog-with-pyv4l2
Time spent
  Starting up: 139ms
  Open cam + load libs: 522ms
    Opening the camera: 475ms
    Importing recognition libs: 522ms
  Searching for known face: 87ms
  Total time: 748ms

Resolution
  Native: 352x352
  Used: 320x320

Frames searched: 2 (22.98 fps)
Dark frames ignored: 1 
Certainty of winning frame: 2.542
Winning model: 1 ("Model #2")
Identified face as tim
new-recog-with-pyv4l2

boltgolt · 2018-12-17T09:48:23Z

Looks even better than the FFmpeg option! Again, very nice work!

The performance improvement is definitely worth it so i'd add it as a new recorder, just like FFmpeg. I think it's best to keep the FFmpeg option available like it is now on the dev branch, for compatibility with non-v4l devices among other things.

boltgolt · 2019-01-02T22:45:07Z

Sorry for the delay, but the code written by @dmig has now been merged into the dev branch. There were a few issues (a deprecated Github API endpoint for instance), but those are fixed now. The speed upgrade is incredible. Most of my tests succeeded in less than 0.3 seconds, which is just amazing.

Because this thread was mostly about those changes and the goal of the author (<2 seconds) has been met i'm closing this issue.
Thanks again all for your work!

boltgolt added the enhancement New feature or request label Nov 19, 2018

boltgolt mentioned this issue Nov 26, 2018

Support for Fedora #26

Closed

boltgolt added this to the Release 2.5.0 milestone Dec 2, 2018

boltgolt mentioned this issue Dec 2, 2018

HP Spectre x360 13" IR camera support #105

Closed

boltgolt closed this as completed Jan 2, 2019

boltgolt mentioned this issue Jan 6, 2019

Version 2.5.0 #118

Merged

mosmash mentioned this issue Nov 7, 2020

Extremely Slow Performance - Particularly Launch #467

Open

saidsay-so mentioned this issue Dec 26, 2020

Add native PAM module #484

Merged

Any plans to speed up recognition? #99

Any plans to speed up recognition? #99

Comments

schrodog commented Nov 19, 2018

boltgolt commented Nov 19, 2018

dmig commented Nov 19, 2018

dmig commented Nov 19, 2018

dmig commented Nov 19, 2018 • edited Loading

dmig commented Nov 19, 2018

boltgolt commented Nov 19, 2018

dmig commented Nov 19, 2018

sapjunior commented Nov 20, 2018

dmig commented Nov 20, 2018

boltgolt commented Nov 21, 2018

dmig commented Nov 21, 2018

boltgolt commented Nov 21, 2018

dmig commented Nov 21, 2018

boltgolt commented Nov 21, 2018 • edited Loading

dmig commented Nov 21, 2018

boltgolt commented Nov 21, 2018 • edited Loading

dmig commented Nov 21, 2018 • edited Loading

boltgolt commented Nov 21, 2018

dmig commented Nov 21, 2018

boltgolt commented Nov 21, 2018 • edited Loading

sapjunior commented Nov 21, 2018 • edited Loading

sapjunior commented Nov 22, 2018 • edited Loading

dmig commented Nov 22, 2018

dmig commented Nov 28, 2018

dmig commented Nov 28, 2018

dmig commented Nov 29, 2018

boltgolt commented Nov 29, 2018

sapjunior commented Dec 1, 2018 • edited Loading

dmig commented Dec 1, 2018

dmig commented Dec 1, 2018

dmig commented Dec 1, 2018

boltgolt commented Dec 2, 2018

dmig commented Dec 7, 2018

boltgolt commented Dec 7, 2018

dmig commented Dec 8, 2018

dmig commented Dec 8, 2018

dmig commented Dec 9, 2018

boltgolt commented Dec 9, 2018

dmig commented Dec 9, 2018

dmig commented Dec 11, 2018 • edited Loading

boltgolt commented Dec 11, 2018 • edited Loading

dmig commented Dec 11, 2018

timwelch commented Dec 16, 2018

timwelch commented Dec 16, 2018 • edited Loading

boltgolt commented Dec 17, 2018

boltgolt commented Jan 2, 2019

dmig commented Nov 19, 2018 •

edited

Loading

boltgolt commented Nov 21, 2018 •

edited

Loading

boltgolt commented Nov 21, 2018 •

edited

Loading

dmig commented Nov 21, 2018 •

edited

Loading

boltgolt commented Nov 21, 2018 •

edited

Loading

sapjunior commented Nov 21, 2018 •

edited

Loading

sapjunior commented Nov 22, 2018 •

edited

Loading

sapjunior commented Dec 1, 2018 •

edited

Loading

dmig commented Dec 11, 2018 •

edited

Loading

boltgolt commented Dec 11, 2018 •

edited

Loading

timwelch commented Dec 16, 2018 •

edited

Loading