I decided to work on a larger and simpler database: CelebA. We have reviewed not only the recommended GAN literature (https://paperswithcode.com/paper/progressive-growing-of-gans-for-improved ) but also VAEs and beta VAEs.
For the discriminative model run the https://github.com/Kata5/DeepLearning2021/blob/main/DeepLearning2021_Milestone1_2.ipynb
For the generative model run the https://github.com/Kata5/DeepLearning2021/blob/main/DeepLearning2021Milestone03.ipynb
Please find the results of VAE parameter optimalization in https://github.com/Kata5/DeepLearning2021/tree/main/VAE_results folder.
Latents were discovered by hand and the best latents were achieved at https://github.com/Kata5/DeepLearning2021/blob/main/VAE_results/vae_best.pt (Which is uploaded to Google drive so as to be able to download it from the ipynb file as well.)
In the first milestone we prepared a balanced training / test / validation dataset for the later classificaton task with a splitting function of the database. Some basic data discovery was also performed.
In this milestone
- we built a discriminative model for learning the labels of the database - DeepLearning2021_Milestone1_2.ipynb
- a deep generative model (i.e. a variational autencoder) was trained to learn the features of the faces - DeepLearning2021Milestone02.ipynb (due to hardware error the version on the deadline became corrupted; but having a check on the version before can prove that the model was working before the deadline as well)
The discriminatve model used transfer learning based on an Inception v3 model pretrained on imagenet and the last layers were trained on the database with an early stopping callback. With the first naive model a pecision of 90% was reached. (for more details see the documentation later)
A deep generative model i.e. a variational autoencoder was built to learn the features of the faces. To discover the meaning of the latents a latent traversal was implemented with an The latent space of the variational autoencoder consisted of 64 dimension and some of them coded interpretable features of the faces. I.e.
- 16th dimension seems to code the factor for the color of the hair
- 26.th dimension seems to code the existence of bangs
- 2nd dimension seems to code the rotation of the face
- 12th dimension seem to code the width of face
- 9th dimension seems to code thickness of hair (not completely disentengled from hair color..)
1.) Color of the hair:
2.) The existence of bangs
3.) The rotation of the face
4.) The thickness of hair
5.) the width of face(being fat)
Please find the details in https://github.com/Kata5/DeepLearning2021/blob/main/Documentation2.pdf