Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

face landmark tracking over video #1

Open
stanchiang opened this issue Oct 25, 2017 · 17 comments
Open

face landmark tracking over video #1

stanchiang opened this issue Oct 25, 2017 · 17 comments

Comments

@stanchiang
Copy link

hi I'm playing with the vision framework and can use the face landmark feature to get the position of facial features in real time. However, I have to run the detector for every frame. This makes the real time face mask jittery.

any ideas on how we could optimize the landmark detection in a real time feed with only the iOS frameworks?

FYI I tried the object tracker, but it wasn't as impressive as it could be. Maybe you've had better luck?

thanks

@Willjay90
Copy link
Owner

Willjay90 commented Oct 25, 2017

Not a big problem, all you have to do is to use VNDetectFaceLandmarksRequest and handle the landmarks you find.

I'll update a new version, you can check it out :)


B.T.W, I get every frame in real time from AVCaptureVideoDataOutputSampleBufferDelegate so that I perform my VNRequest.

If you want to read a saved video, that's another story. But you can do the same VNRequest!

@stanchiang
Copy link
Author

Yea I'm already doing that. But I have the request run at every new frame so the tracking is jittery from frame to frame because we aren't using any data from the previous frame.

@stanchiang
Copy link
Author

Doing a new detection every frame is different from detecting and then tracking subsequent frames. I can already do the former, trying to see if the latter is possible in the Vision framework because it'll perform smoother

@Willjay90
Copy link
Owner

Are you doing detection on a saved video?

@stanchiang
Copy link
Author

no real time

@Willjay90
Copy link
Owner

Oh, I got it. You wanna doing something like motion detection (smooth way) instead of just detect it every single second.

@stanchiang
Copy link
Author

yea :)

@stanchiang
Copy link
Author

I think we want something like
https://github.com/HalfdanJ/ofxFaceTracker2
"The face detection in ofxFT2 is considerably slower then ofxFT, but it can easily run on a background thread. The landmark detection (finding the actual details on the face) is faster and more robust."

or https://github.com/hrastnik/face_detect_n_track but with face landmark detection included
"The algorithm I came up with is a hybrid using Haar cascades with template matching as a fallback when Haar fails."

based on https://developer.apple.com/documentation/vision/vndetectfacelandmarksrequest
splitting the face detection (slow) from the landmark detection (fast) seems possible
"If you've already located all the faces in an image, or want to detect landmarks in only a subset of the faces in the image, set the inputFaceObservations"

@Willjay90
Copy link
Owner

Willjay90 commented Oct 26, 2017

There's a lot of vision libraries, also Google Vision API. I don't know what exactly the difference between vision framework and these libs. But these APIs are all based on single image input. You have to feed in image every single time/frame.

Also, I tested with the demo video from dlib C++ with my app(updated, with landmarks). It works pretty well.

The VNDetectFaceLandmarksRequest just saying that

either use face observations output by a VNDetectFaceRectanglesRequest or manually create VNFaceObservation instances with the bounding boxes of the faces you want to analyze.

In my project, I just take the first option. You can definitely run on a background thread. However, if you want to update the UI (draw landmarks on screen), you have to do it on main thread.

@shaibt
Copy link

shaibt commented May 17, 2018

For real time video you also need to take into account the difference between video frame rate and "Vision API" sample rate.
Some of the jitter you're experiencing could also be caused by the face bounding box / landmarks updates being lower freq than video - that's a performance issue. For 30fps you need the Vision API to update in less than 0.033 sec + take into account context switch to main Q for drawing.

@Onotoko
Copy link

Onotoko commented May 23, 2018

Hi shaibt,
I am newbie with iOS developer so please show me that how to I can do this " For 30fps you need the Vision API to update in less than 0.033 sec + take into account context switch to main Q for drawing".
Thank shaibt

@shaibt
Copy link

shaibt commented May 23, 2018

Hi hanoi2018,

To be clear, what I meant is that your device has to perform in under 1/30 sec for face detection to run on all frames in 30fps - it mainly depends on your device processing power and Apple's SW/HW optimisations. Little you could probably do yourself to achieve it.
I haven't tested yet with an iPhone X/8 so don't know what the peak Vision API performance is on those devices.

@Onotoko
Copy link

Onotoko commented May 23, 2018

Hi shaibt, Thanks for responding

@ailias
Copy link

ailias commented Jul 24, 2018

Have you achieved your goal for dealing with jittering? Hope for your methods sharing. Thank you .

@Onotoko
Copy link

Onotoko commented Jul 24, 2018

Hi ailias, I haven't achieved yet.

@MiraMirror
Copy link

@stanchiang
hello, have you got any luck on this issue? I read through this thread and was trying to use the previous frame face rectangle for the analysis of the next frame. But the result is not satisfying.

@pablogeek
Copy link

Same issue here, I'm processing every CVPixelBuffer with VNDetectFaceRectanglesRequest and save it to disk applying a blur filter. This works really well with an iPhone XS but it won't perfom well with a normal iPad. Any recommendations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants