Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve dlib (dlib_face_recognition_resnet_model_v1) with Asian faces #1407

Closed
gustavomr opened this issue Jul 11, 2018 · 38 comments
Closed
Labels

Comments

@gustavomr
Copy link

Hi,

We saw that dlib has some acurracy problems with asian faces. Could you retrain dlib in order to get better results including this dataset (http://afad-dataset.github.io) ?

Thanks.

@davisking
Copy link
Owner

davisking commented Jul 11, 2018 via email

@gustavomr
Copy link
Author

@davisking what you mean about identity information? I don't understand.

Thanks.

@davisking
Copy link
Owner

davisking commented Jul 11, 2018 via email

@gustavomr
Copy link
Author

I got it! I think this dataset has a bunch of images of different people.
Do you have any plans to get dlib more accurate on asian people?

@davisking
Copy link
Owner

davisking commented Jul 11, 2018 via email

@gustavomr
Copy link
Author

How many data do you need (per person and how many person) ?
ex: 1000 person and 100 images per person?

@davisking
Copy link
Owner

davisking commented Jul 11, 2018 via email

@PawanWagh
Copy link

I am willing to provide the data because i have around 10k+ unique identities but 10 faces each. Can we just do some kind of image transformation to generate n number of required images using this 10 images.
is it sufficient or you need more precise dataset.
Thanks.

@PawanWagh
Copy link

I have already trained model for asian faces with 98.18% accuracy. this is the output after training

saving network
Testing network on imagenet validation dataset...
val top5 accuracy:  1
val top1 accuracy:  0.981811

but not able to use this model. If it succeeds then i am willing the train the dataset with more and more dataset and make it available here. plz see this issue #1368 . if you can help me.

@dlib-issue-bot
Copy link
Collaborator

Warning: this issue has been inactive for 55 days and will be automatically closed on 2018-09-07 if there is no further activity.

If you are waiting for a response but haven't received one it's likely your question is somehow inappropriate. E.g. you didn't follow the issue submission instructions, or your question is easily answerable by reading the FAQ, dlib's documentation, or a Google search.

@dlib-issue-bot
Copy link
Collaborator

Notice: this issue has been closed because it has been inactive for 58 days. You may reopen this issue if it has been closed in error.

@helloall1900
Copy link

@davisking Will this help?
deepinsight/insightface#256

@gustavomr
Copy link
Author

gustavomr commented Sep 26, 2018

@davisking
Copy link
Owner

That sounds useful, although the website appears to be down.

@gustavomr
Copy link
Author

@davisking the website is slow but it's working. I tested now.

@gustavomr
Copy link
Author

@davisking Hi, could you consider training and evaluate dlib with this dataset? Anyone has tried to do this?

@JamieKitson
Copy link

It looks like the same dataset is available here (as "Glint") amongst others:

https://github.com/deepinsight/insightface/wiki/Dataset-Zoo

@JamieKitson
Copy link

Once you register and login to the trillionpairs website you are directed to download the data from:

https://drive.google.com/drive/folders/1ADcZugpo8Z6o5q1p2tIAibwhsL8DcVwH

@davisking
Copy link
Owner

I downloaded this dataset and it seems pretty nice. Altogether I've got a dataset of 10 million faces now. I'm pretty busy with other things at the moment, but at some point I'll retrain the model and post the results.

@JamieKitson
Copy link

@davisking Which step(s) (and/or specific models) in the following process will the retraining help with?

face detection -> encoding -> clustering

I realised that I had been using sklearn's DBSCAN and that once I switched to dlib's Chinese whispers algorithm the results were much better, at least for male Asian faces. Results for female Asian faces were still pretty bad.

@davisking
Copy link
Owner

We aren't talking about improving the detector. This dataset would make the part that answers questions like "are these two images the same person" more accurate.

@gustavomr
Copy link
Author

I downloaded this dataset and it seems pretty nice. Altogether I've got a dataset of 10 million faces now. I'm pretty busy with other things at the moment, but at some point I'll retrain the model and post the results.

@davisking any plans to have it done?

@davisking
Copy link
Owner

I'll get to it when I get to it.

@ASShusharin
Copy link

Hi Davis,
Is there any progress in training model for Asian faces?

@davisking
Copy link
Owner

I still haven't gotten to it. I have many other responsibilities and making this model is not super high on my priority list.

@bertytobing
Copy link

Could you tell me the step by step to retrain the model with additional Asian faces? Thank you.

@davisking
Copy link
Owner

davisking commented Feb 1, 2019 via email

@bertytobing
Copy link

can I do transfer learning instead of retrain the model from beginning? Thank you for your help @davisking

@sleebapaul
Copy link

@davisking The Diversity in Faces(DiF)is a large and diverse dataset that seeks to advance the study of fairness and accuracy in facial recognition technology. The first of its kind available to the global research community, DiF provides a dataset of annotations of 1 million human facial images. - from IBM

Link to dataset: https://www.research.ibm.com/artificial-intelligence/trusted-ai/diversity-in-faces/

Link to paper: https://www.research.ibm.com/artificial-intelligence/publications/paper/?id=Diversity-in-Faces

@srego
Copy link

srego commented Apr 22, 2019

I downloaded this dataset and it seems pretty nice. Altogether I've got a dataset of 10 million faces now. I'm pretty busy with other things at the moment, but at some point I'll retrain the model and post the results.

@davisking Thank you so much for putting all of this together! Would it be possible for you to share your entire dataset? Thanks.

@davisking
Copy link
Owner

davisking commented Apr 23, 2019 via email

@ybloch
Copy link

ybloch commented May 7, 2019

Hello everyone, I'm also facing the same problem right now, has anyone here extended @davisking 's model with Asians?

@fabner
Copy link

fabner commented Jun 5, 2019

Yes, this is an issue I'm running into also. The dataset produces a lot of false positives when comparing Asian faces.

@benitofe
Copy link

Same question... can I do transfer learning instead of retrain the model from beginning? Thank you for your help @davisking

@tiago-alves
Copy link

tiago-alves commented Oct 10, 2019

I trained a model using the dnn_metric_learning_on_images_ex.cpp code . I only increased iteration without progress to 10000.

I used a dataset with 63K identites that is a combination of VGG2, Asian Celeb and Clean Microsoft Dataset. I also did some manual cleaning. All identities have at least 50 images.

The result was not good. I downloaded the Face Scrub dataset and generated 5K random genuines pairs and 5K random distractors pairs. EER, FMR100 and FMR1000 were worse on this new model.

I don't know what went wrong since I was excepting a good improvement.

@davisking , do you have any advice on this? Should I increase batch size (this would increase training time a lot)?

@basit26374
Copy link

basit26374 commented Apr 14, 2020

@tiago-alves How long training was done and which hardware(GPU and CPU specification) did you used?

@JoeQian
Copy link

JoeQian commented Apr 29, 2020

I trained a model using the dnn_metric_learning_on_images_ex.cpp code . I only increased iteration without progress to 10000.

I used a dataset with 63K identites that is a combination of VGG2, Asian Celeb and Clean Microsoft Dataset. I also did some manual cleaning. All identities have at least 50 images.

The result was not good. I downloaded the Face Scrub dataset and generated 5K random genuines pairs and 5K random distractors pairs. EER, FMR100 and FMR1000 were worse on this new model.

I don't know what went wrong since I was excepting a good improvement.

@davisking , do you have any advice on this? Should I increase batch size (this would increase training time a lot)?

I remember he said 1000 identities and 100 photos each is minimum , any progress?

@tiago-alves
Copy link

tiago-alves commented Apr 29, 2020

@basit26374 , I used a VM on Google Cloud with 8 vCPUs and 4 GPUs. I implemented data augmentation with glasses/sunglasses and hairstyles and this adds some extra time for sure. If you are going to train a big dataset, be prepared to wait some days for it to complete. At some point it becomes very difficult to converge.

@JoeQian , using this approach with data augmentation that I mentioned and also after some more data cleaning, I was able to acquire a big improve in accuracy (considering a benchmark dataset I created).

Basically my advice is: keep getting new images and increasing your dataset with different people. I will probably double my dataset size on the next two months. I will let you know once I have new numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests