Improve dlib (dlib_face_recognition_resnet_model_v1) with Asian faces #1407

gustavomr · 2018-07-11T12:14:53Z

Hi,

We saw that dlib has some acurracy problems with asian faces. Could you retrain dlib in order to get better results including this dataset (http://afad-dataset.github.io) ?

Thanks.

davisking · 2018-07-11T13:14:57Z

Does that dataset include identity information? It looks like it's just age and gender.

gustavomr · 2018-07-11T13:36:50Z

@davisking what you mean about identity information? I don't understand.

Thanks.

davisking · 2018-07-11T13:38:51Z

To train a face recognition model you need lots of images of the same person. Like you need "here are 100 images of Davis", then "here are 100 images of John". Not "here are a bunch of images of different people and none of them are the same person".

gustavomr · 2018-07-11T13:47:10Z

I got it! I think this dataset has a bunch of images of different people.
Do you have any plans to get dlib more accurate on asian people?

davisking · 2018-07-11T13:56:03Z

Only if someone is able to provide me with appropriate training data.

gustavomr · 2018-07-11T14:29:01Z

How many data do you need (per person and how many person) ?
ex: 1000 person and 100 images per person?

davisking · 2018-07-11T14:51:51Z

It's hard to say, but for sure at least something like 1000 persons and 100 images per person. That would be a minimum. The more the better.

PawanWagh · 2018-07-11T15:30:21Z

I am willing to provide the data because i have around 10k+ unique identities but 10 faces each. Can we just do some kind of image transformation to generate n number of required images using this 10 images.
is it sufficient or you need more precise dataset.
Thanks.

PawanWagh · 2018-07-11T15:48:48Z

I have already trained model for asian faces with 98.18% accuracy. this is the output after training

saving network
Testing network on imagenet validation dataset...
val top5 accuracy:  1
val top1 accuracy:  0.981811

but not able to use this model. If it succeeds then i am willing the train the dataset with more and more dataset and make it available here. plz see this issue #1368 . if you can help me.

dlib-issue-bot · 2018-09-04T20:04:51Z

Warning: this issue has been inactive for 55 days and will be automatically closed on 2018-09-07 if there is no further activity.

If you are waiting for a response but haven't received one it's likely your question is somehow inappropriate. E.g. you didn't follow the issue submission instructions, or your question is easily answerable by reading the FAQ, dlib's documentation, or a Google search.

dlib-issue-bot · 2018-09-08T08:00:17Z

Notice: this issue has been closed because it has been inactive for 58 days. You may reopen this issue if it has been closed in error.

helloall1900 · 2018-09-25T08:10:48Z

@davisking Will this help?
deepinsight/insightface#256

gustavomr · 2018-09-26T13:04:40Z

@davisking can you use this dataset do train?
deepinsight/insightface#256

http://trillionpairs.deepglint.com/overview

davisking · 2018-09-27T23:28:09Z

That sounds useful, although the website appears to be down.

gustavomr · 2018-09-28T13:54:11Z

@davisking the website is slow but it's working. I tested now.

gustavomr · 2018-10-23T23:53:28Z

@davisking Hi, could you consider training and evaluate dlib with this dataset? Anyone has tried to do this?

JamieKitson · 2018-10-29T16:26:57Z

It looks like the same dataset is available here (as "Glint") amongst others:

https://github.com/deepinsight/insightface/wiki/Dataset-Zoo

JamieKitson · 2018-10-29T16:51:12Z

Once you register and login to the trillionpairs website you are directed to download the data from:

https://drive.google.com/drive/folders/1ADcZugpo8Z6o5q1p2tIAibwhsL8DcVwH

davisking · 2018-10-30T02:45:05Z

I downloaded this dataset and it seems pretty nice. Altogether I've got a dataset of 10 million faces now. I'm pretty busy with other things at the moment, but at some point I'll retrain the model and post the results.

JamieKitson · 2018-11-03T22:21:54Z

@davisking Which step(s) (and/or specific models) in the following process will the retraining help with?

face detection -> encoding -> clustering

I realised that I had been using sklearn's DBSCAN and that once I switched to dlib's Chinese whispers algorithm the results were much better, at least for male Asian faces. Results for female Asian faces were still pretty bad.

davisking · 2018-11-03T22:40:32Z

We aren't talking about improving the detector. This dataset would make the part that answers questions like "are these two images the same person" more accurate.

gustavomr · 2018-11-20T13:36:23Z

I downloaded this dataset and it seems pretty nice. Altogether I've got a dataset of 10 million faces now. I'm pretty busy with other things at the moment, but at some point I'll retrain the model and post the results.

@davisking any plans to have it done?

davisking · 2018-11-20T13:54:24Z

I'll get to it when I get to it.

ASShusharin · 2019-01-11T09:00:03Z

Hi Davis,
Is there any progress in training model for Asian faces?

davisking · 2019-01-11T12:28:06Z

I still haven't gotten to it. I have many other responsibilities and making this model is not super high on my priority list.

bertytobing · 2019-02-01T10:51:01Z

Could you tell me the step by step to retrain the model with additional Asian faces? Thank you.

davisking · 2019-02-01T11:57:33Z

http://dlib.net/dnn_metric_learning_on_images_ex.cpp.html

bertytobing · 2019-02-06T02:46:35Z

can I do transfer learning instead of retrain the model from beginning? Thank you for your help @davisking

sleebapaul · 2019-02-06T06:48:53Z

@davisking The Diversity in Faces(DiF)is a large and diverse dataset that seeks to advance the study of fairness and accuracy in facial recognition technology. The first of its kind available to the global research community, DiF provides a dataset of annotations of 1 million human facial images. - from IBM

Link to dataset: https://www.research.ibm.com/artificial-intelligence/trusted-ai/diversity-in-faces/

Link to paper: https://www.research.ibm.com/artificial-intelligence/publications/paper/?id=Diversity-in-Faces

srego · 2019-04-22T19:10:43Z

I downloaded this dataset and it seems pretty nice. Altogether I've got a dataset of 10 million faces now. I'm pretty busy with other things at the moment, but at some point I'll retrain the model and post the results.

@davisking Thank you so much for putting all of this together! Would it be possible for you to share your entire dataset? Thanks.

davisking · 2019-04-23T11:07:05Z

I don't want to get into the large dataset hosting game.

ybloch · 2019-05-07T06:59:35Z

Hello everyone, I'm also facing the same problem right now, has anyone here extended @davisking 's model with Asians?

fabner · 2019-06-05T17:22:35Z

Yes, this is an issue I'm running into also. The dataset produces a lot of false positives when comparing Asian faces.

benitofe · 2019-06-27T18:59:04Z

Same question... can I do transfer learning instead of retrain the model from beginning? Thank you for your help @davisking

tiago-alves · 2019-10-10T13:15:51Z

I trained a model using the dnn_metric_learning_on_images_ex.cpp code . I only increased iteration without progress to 10000.

I used a dataset with 63K identites that is a combination of VGG2, Asian Celeb and Clean Microsoft Dataset. I also did some manual cleaning. All identities have at least 50 images.

The result was not good. I downloaded the Face Scrub dataset and generated 5K random genuines pairs and 5K random distractors pairs. EER, FMR100 and FMR1000 were worse on this new model.

I don't know what went wrong since I was excepting a good improvement.

@davisking , do you have any advice on this? Should I increase batch size (this would increase training time a lot)?

basit26374 · 2020-04-14T09:21:40Z

@tiago-alves How long training was done and which hardware(GPU and CPU specification) did you used?

JoeQian · 2020-04-29T08:20:54Z

I trained a model using the dnn_metric_learning_on_images_ex.cpp code . I only increased iteration without progress to 10000.

I used a dataset with 63K identites that is a combination of VGG2, Asian Celeb and Clean Microsoft Dataset. I also did some manual cleaning. All identities have at least 50 images.

The result was not good. I downloaded the Face Scrub dataset and generated 5K random genuines pairs and 5K random distractors pairs. EER, FMR100 and FMR1000 were worse on this new model.

I don't know what went wrong since I was excepting a good improvement.

@davisking , do you have any advice on this? Should I increase batch size (this would increase training time a lot)?

I remember he said 1000 identities and 100 photos each is minimum , any progress?

tiago-alves · 2020-04-29T12:50:04Z

@basit26374 , I used a VM on Google Cloud with 8 vCPUs and 4 GPUs. I implemented data augmentation with glasses/sunglasses and hairstyles and this adds some extra time for sure. If you are going to train a big dataset, be prepared to wait some days for it to complete. At some point it becomes very difficult to converge.

@JoeQian , using this approach with data augmentation that I mentioned and also after some more data cleaning, I was able to acquire a big improve in accuracy (considering a benchmark dataset I created).

Basically my advice is: keep getting new images and increasing your dataset with different people. I will probably double my dataset size on the next two months. I will let you know once I have new numbers.

gustavomr mentioned this issue Jul 11, 2018

Asian faces ageitgey/face_recognition#561

Open

dlib-issue-bot closed this as completed Sep 8, 2018

dlib-issue-bot added the inactive label Sep 8, 2018

Improve dlib (dlib_face_recognition_resnet_model_v1) with Asian faces #1407

Improve dlib (dlib_face_recognition_resnet_model_v1) with Asian faces #1407

Comments

gustavomr commented Jul 11, 2018

davisking commented Jul 11, 2018 via email

gustavomr commented Jul 11, 2018

davisking commented Jul 11, 2018 via email

gustavomr commented Jul 11, 2018

davisking commented Jul 11, 2018 via email

gustavomr commented Jul 11, 2018

davisking commented Jul 11, 2018 via email

PawanWagh commented Jul 11, 2018

PawanWagh commented Jul 11, 2018

dlib-issue-bot commented Sep 4, 2018

dlib-issue-bot commented Sep 8, 2018

helloall1900 commented Sep 25, 2018

gustavomr commented Sep 26, 2018 • edited Loading

davisking commented Sep 27, 2018

gustavomr commented Sep 28, 2018

gustavomr commented Oct 23, 2018

JamieKitson commented Oct 29, 2018

JamieKitson commented Oct 29, 2018

davisking commented Oct 30, 2018

JamieKitson commented Nov 3, 2018

davisking commented Nov 3, 2018

gustavomr commented Nov 20, 2018

davisking commented Nov 20, 2018

ASShusharin commented Jan 11, 2019

davisking commented Jan 11, 2019

bertytobing commented Feb 1, 2019

davisking commented Feb 1, 2019 via email

bertytobing commented Feb 6, 2019

sleebapaul commented Feb 6, 2019

srego commented Apr 22, 2019

davisking commented Apr 23, 2019 via email

ybloch commented May 7, 2019

fabner commented Jun 5, 2019

benitofe commented Jun 27, 2019

tiago-alves commented Oct 10, 2019 • edited Loading

basit26374 commented Apr 14, 2020 • edited Loading

JoeQian commented Apr 29, 2020

tiago-alves commented Apr 29, 2020 • edited Loading

gustavomr commented Sep 26, 2018 •

edited

Loading

tiago-alves commented Oct 10, 2019 •

edited

Loading

basit26374 commented Apr 14, 2020 •

edited

Loading

tiago-alves commented Apr 29, 2020 •

edited

Loading