## Tiny Portraits Project

* A low-resource deep learning/computer vision dataset
* Christian Bracher, Zalando Research
* August-September 2021

### Unpack image thumbnails from archives

* This notebook is a utility to unpack the image archives stored on the GitHub repository
* Each archive contains 2,048 thumbnails in PNG format, size 108 x 84 pixels
* These image files are unpacked and extracted into a folder `Tiny_Portraits_Images`
* Total number of archives: 66
* Total number of image files: 134,734

In [1]:
import numpy as np
import os
import zipfile

### Image and ZIP locations

* The ZIP archives are stored in a folder `Tiny_Portraits_Zipfiles`.<br>
  They are consecutively numbered, with names of the format `Tiny_Portraits_Archive_nnn.zip`,
  where `nnn` runs from `000` to `065`.
* Face thumbnails in PNG format, size 108 x 84 pixels, will be placed in a sub-folder
  `Tiny_Portraits_Images`.  It will be created if necessary. Their names are of the form:
  `Tiny_Portraits_dddddd.png`, where `dddddd` refers to a source image name in the *CelebA* dataset, 
  but shifted by one:  `000001.jpg` begets `Tiny_Portraits_000000.png`, etc.
  Note that numbers are **not** continuous, as only a subset of *CelebA* images has been processed.

In [3]:
thumbnail_directory = './Tiny_Portraits_Images/'
zipfile_directory   = './Tiny_Portraits_Zip/'

if not os.path.exists(thumbnail_directory):
    os.mkdir(thumbnail_directory)

#### Load archive information

* There should be 66 ZIP archives with images.

In [5]:
# Load ordered list of archives
archive_file_list  = [file for file in sorted(os.listdir(zipfile_directory))
                      if file.split('.')[-1] == 'zip']
archive_file_count = len(archive_file_list)

# Check for completeness
assert archive_file_count == 66

### Extract image files from archives

All contents are extracted to `thumbnail_directory`.  
There should be a total of 134,734 images in PNG format.

In [None]:
for index in range(0, archive_file_count):
    
    # Verify archive name
    zip_archive_name = 'Tiny_Portraits_Archive_{:03d}.zip'.format(index)
    assert archive_file_list[index] == zip_archive_name
        
    # Extract image files 
    with zipfile.ZipFile(zipfile_directory + zip_archive_name, 'r') as archive:
        archive.extractall(thumbnail_directory)
            
    # Indicate progress
    num_images = len([file for file in os.listdir(thumbnail_directory) if file.split('.')[-1] == 'png'])
    print('\rUnzipping archive #{:3d} ... Total images # {:6d}'.format(index, num_images), end = '')

# Verify number of images
assert num_images == 134734
            
print('\n*** DONE ***')

Unzipping archive # 18 ... Total images #  38912

### Workbench