## Make your own face dataset (Week 12) - Step 2

####**Designed by Joon Son Chung, November 2020**

This notebook provides the script to crop face regions from the downloaded dataset and save the faces normalized to the center of the image. This method only leaves the files with only one face detection.
Note that the script only considers files with `.jpg` extension.

The face detector can be downloaded from [here](http://www.robots.ox.ac.uk/~joon/data/s3fd_facedet.zip).

First, connect to Google Drive and extract the face detector.

Uncomment the line below and change the path if you have the images on Google Drive.

```
# orig_path = os.path.join(GDRIVE_HOME,'SNU/Face_Dataset')
```



In [None]:
from google.colab import drive
from zipfile import ZipFile
from tqdm import tqdm
import os, glob, sys, shutil, time
import numpy as np
import torch
import cv2

# mount Google Drive
drive.mount('/content/drive', force_remount=True)

# path of the data directory relative to the home folder of Google Drive
GDRIVE_HOME = '/content/drive/My Drive'
FOLDER      = 'MLVU/your_dataset'

# The following 4 lines are the only parts of the code that you need to change. You can simply run the rest.
data_dir        = os.path.join(GDRIVE_HOME,FOLDER) 
detector_path   = os.path.join(GDRIVE_HOME,'MLVU/s3fd_facedet.zip') # Location of the face detector
orig_path       = './original_images' # Location to temporarily store the original images. No need to change this. 
temp_path       = './cropped_images' # Location to temporarily store your cropped images. No need to change this. 

assert os.path.exists(detector_path), "[!] Enter a valid path."
assert os.path.exists(data_dir), "[!] Enter a valid path."

with ZipFile(data_dir+'/original_data.zip', 'r') as zipObj:
  zipObj.extractall(orig_path)
print('Zip extraction complete')

# If you have downloaded the images directly onto Google Drive, uncomment the following line
# orig_path = os.path.join(GDRIVE_HOME,'SNU/Face_Dataset')

# Copy the detector code and model from the first assignment to the current directory
with ZipFile(detector_path, 'r') as zipObj:
  zipObj.extractall('detectors')
print('Zip extraction complete')

files = glob.glob(orig_path+'/*/*.jpg')
print(len(files),'original images found.')

Now, load the detector model.  **You do not need to change this section**.

In [None]:
!pwd
sys.path.append('detectors')
from detectors import S3FD

# Load the face detector (you can ignore this part)
DET = S3FD(device='cuda')

Here, we define the data loader for reading the images. **You do not need to change this section.**

In [None]:
class your_dataset(torch.utils.data.Dataset):
    def __init__(self, data_path):

        self.data   = glob.glob(data_path+'/*/*.jpg')

        print('%d files in the dataset'%len(self.data))

    def __getitem__(self, index):

      fname = self.data[index]
      image = cv2.imread(fname)
      image_np = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

      return image, image_np, fname

    def __len__(self):
      return len(self.data)

dataset = your_dataset(orig_path)
loader = torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=False, num_workers=10)

We now crop the faces and save them to a temporary folder. This step should take around 10 minutes for 5,000 images. **You do not need to change this section.**

In [None]:
for data in tqdm(loader):
  
  image     = data[0][0].numpy()
  image_np  = data[1][0].numpy()
  fname     = data[2][0]

  bboxes = DET.detect_faces(image_np, conf_th=0.9, scales=[0.5])

  ## this removes all images with no face detection or two or more face detections
  if len(bboxes) == 1:

    try:

      bsi = 100

      sx = int((bboxes[0][0]+bboxes[0][2])/2) + bsi
      sy = int((bboxes[0][1]+bboxes[0][3])/2) + bsi
      ss = int(max((bboxes[0][3]-bboxes[0][1]),(bboxes[0][2]-bboxes[0][0]))/2)

      image = np.pad(image,((bsi,bsi),(bsi,bsi),(0,0)), 'constant', constant_values=(110,110))

      face = image[int(sy-ss):int(sy+ss),int(sx-ss):int(sx+ss)]
      face = cv2.resize(face,(240,240))

      outname = fname.replace(orig_path,temp_path)

      if not os.path.exists(os.path.dirname(outname)):
        os.makedirs(os.path.dirname(outname))

      cv2.imwrite(outname,face)

    except:

      print('Error on %s'%fname)

Check the output files. Then zip and save to Google Drive. **You do not need to change this section.**

In [None]:
output_files = glob.glob(temp_path+'/*/*.jpg')

print('%d cropped images found. Now zipping. '%len(output_files))

shutil.make_archive(data_dir+'/cropped_data', 'zip', root_dir=temp_path)