## Tiny Portraits Project

* A low-resource deep learning/computer vision dataset
* Christian Bracher, Zalando Research
* August-September 2021

### Synopsis

*Tiny Portraits* is an annotated face image dataset sourced from the 
well-known *CelebA* dataset that has been utilized in thousands of papers.  We
rendered smaller images (of size 108 x 84 pixels), and consider only a subset of the 
data and attributes, so meaningful experiments can be conducted also in the absence of 
GPU acceleration.  (If you have access to GPUs, by all means feel free to use them in 
your exploration.)

Although the images may look similar, please note that *Tiny Portraits* is not just 
a re-rendering of the 'aligned' version of the original data, but was created 
independently.  Similarly, attribute labels have been processed directly from the
original annotation data.

### Preparation:  Unpacking images

If you have not done so yet, please unpack the image archives first.<br>
A simple tool to do this is included in the repository:
[Unpacking notebook](./Tiny%20Portraits%20-%20Unpack%20Thumbnail%20Images.ipynb)

### Dataset sampler

This is a notebook designed to give an overview of the dataset, and demonstrates
a default way to access image and attribute information.

* Present random examples (images and attributes)
* Create randomized 'wallpapers' displaying a mosaic of faces

If you wish, you can use the notebook as a starting point for developing your own
algorithms and code.  It is 
[part of the repository](./Tiny%20Portraits%20-%20Sampling%20Notebook.ipynb).

## The Challenge

We are interested to learn how you investigate a set of data and build useful
computer vision tools on it.  Critical thinking is most welcome, as is showcasing
your coding capabilities and knowledge of Deep Learning/Computer Vision algorithms
and packages.  Communication is an important aspect of the job, so please explain
your reasoning and the approaches you choose, and illustrate your findings and
limitations.  We will give you a lot of leeway in how you tackle the challenge,
but here are some topics you should consider:

* **Dataset properties**<br>
  Examine the peculiar qualiities of this dataset - is it balanced? 
  are there outliers? etc.
* **Gender classifier**<br>
  Build a classifier algorithm that learns to assign a gender attribute to a
  face image.  How well does this work?  Does it generalise to other portraits?
* **Hair colour classifier**<br>
  How do you have to change the gender detection model to predict hair colour?
  What remains the same?  What is now different?
* **A fairness problem**<br>
  In many applications, one wants to learn one attribute from data (here, the 
  hair colour) without discriminating a 'protected' attribute (here, gender).
  In other words, we are looking for a model that for a given sample, can find other 
  samples with similar hair colour without regard to the gender of the sample.
  Discuss to which degree this may be possible.  Is your hair colour model 
  gender-biased?  Based on your ideas, tweak your model to remove or reduce this bias.
* **Portraits from scratch**<br>
  Sketch how you would build and train a model that can create face thumbnails
  'from scratch' starting from the *Tiny Portraits* dataset.
  
### How your solution will be evaluated

We already touched this above, but let us summarize the rubrics we're interested in:

* **Code quality**<br>
  Is your code correct?  Is it efficient?  Is it easy to follow?
  Does it lend itself to sharing?<br>
  (*Note*: For your solution, feel free to use the standard computer vision/deep 
  learning/data science frameworks, i.e., modules like `PIL`/`Pillow`, `OpenCV`, 
  `Tensorflow`, `Pytorch`, `numpy`, `pandas`, `matplotlib`, etc.)
* **Novelty of solution**<br>
  How interesting is your solution?  How well does it fit the problem at hand?
* **Scientific quality**<br>
  Is your solution *ad hoc*, or based on overarching principles of machine learning?
  Why did you select it?  How well does it perform?  What further steps would you take next?
* **Sharing of insights**<br>
  Are you able to convey your ideas?  Do you clearly present your findings?

### How to submit

Please zip your materials (code, notebooks, results, etc.) into a single archive
and **submit it by e-mail** to [team-z-research@zalando.de](mailto:team-z-research@zalando.de).<br>
Your submission will be visible to all members of the *Zalando Research* team.  We respect
your privacy and will not share or further distribute your work except to parties involved
in the hiring process, like the Zalando recruiting team.

Feel free to keep a copy of your work, but please do not share your results externally!