<br>

# Multi-Face Detector
---

<br>

<br>

## 데이터셋 준비 <br><br>

[WIDER FACE 데이터셋](http://shuoyang1213.me/WIDERFACE/index.html) <br><br>

- 32,203개의 이미지, 393,703개의 얼굴 데이터 <br>
- train/validataion/test 의 구성비율 40%/10%/50% <br><br>


다음 네 개의 파일을 다운로드 <br><br>

- (이미지) WIDER Face Training Images <br>
- (이미지) WIDER Face Validation Images <br>
- (이미지) WIDER Face Testing Images <br>
- (레이블) Face annotations (wider_face_slpit) <br><br>


여기서 레이블은 bounding box 정보 ! <br>
(10개 숫자로 이루어진 데이터) <br><br>

x0, y0, w, h, blur, expression, illumination, invalid, occlusion, pose


<br>

<br>

## 데이터셋 전처리1 : bbox 변환 <br><br>

bbox 를 나타내는 레이블 데이터 파일을 분석하여 파싱하고, <br>
bbox 데이터 형식을 필요에 맞게 변환하는 코드 작성

<br>

In [1]:
def get_box(data):
    x0 = int(data[0])
    y0 = int(data[1])
    w = int(data[2])
    h = int(data[3])
    return x0, y0, w, h

In [3]:
# 이미지별 bbox 정보를 wieder_face_train_bbox.txt 파일로 파싱 후,
# 이미지별 bbox 정보 리스트를 추출하는 함수 작성

def parse_widerface(config_path):
    boxes_per_img = []
    with open(config_path) as fp:
        line = fp.readline()
        cnt = 1
        while line:
            num_of_obj = int(fp.readline())
            boxes = []
            for i in range(num_of_obj):
                obj_box = fp.readline().split(' ')
                x0, y0, w, h = get_box(obj_box)
                if w == 0:
                    # remove boxes with no width
                    continue
                if h == 0:
                    # remove boxes with no height
                    continue
                # Because our network is outputting 7x7 grid then it's not worth processing images with more than
                # 5 faces because it's highly probable they are close to each other.
                # You could remove this filter if you decide to switch to larger grid (like 14x14)
                # Don't worry about number of train data because even with this filter we have around 16k samples
                boxes.append([x0, y0, w, h])
            if num_of_obj == 0:
                obj_box = fp.readline().split(' ')
                x0, y0, w, h = get_box(obj_box)
                boxes.append([x0, y0, w, h])
            boxes_per_img.append((line.strip(), boxes))
            line = fp.readline()
            cnt += 1

    return boxes_per_img

<br>

#### bbox 표현형식 변환 <br><br>

x, y, w, h -> x_min, y_min, x_max, y_max

<br>

In [4]:
# 이미지 파일 디코드하는 함수 작성

def process_image(image_file):
    image_string = tf.io.read_file(image_file)
    try:
        image_data = tf.image.decode_jpeg(image_string, channels=3)
        return 0, image_string, image_data
    except tf.errors.InvalidArgumentError:
        logging.info('{}: Invalid JPEG data or crop window'.format(image_file))
        return 1, image_string, None

In [5]:
# x, y, w, h 형식에서 x_min, y_min, x_max, y_max 형식으로 변환하는 함수 작성

def xywh_to_voc(file_name, boxes, image_data):
    shape = image_data.shape
    image_info = {}
    image_info['filename'] = file_name
    image_info['width'] = shape[1]
    image_info['height'] = shape[0]
    image_info['depth'] = 3

    difficult = []
    classes = []
    xmin, ymin, xmax, ymax = [], [], [], []

    for box in boxes:
        classes.append(1)
        difficult.append(0)
        xmin.append(box[0])
        ymin.append(box[1])
        xmax.append(box[0] + box[2])
        ymax.append(box[1] + box[3])
    image_info['class'] = classes
    image_info['xmin'] = xmin
    image_info['ymin'] = ymin
    image_info['xmax'] = xmax
    image_info['ymax'] = ymax
    image_info['difficult'] = difficult

    return image_info

In [6]:
# 이미지별 수정된 bbox 데이터 확인

import os
import tensorflow as tf
dataset_path = os.getenv('HOME')+'/aiffel/face_detector/widerface'
anno_txt = 'wider_face_train_bbx_gt.txt'
file_path = 'WIDER_train'
for i, info in enumerate(parse_widerface(os.path.join(dataset_path, 'wider_face_split', anno_txt))):
    print('--------------------')
    image_file = os.path.join(dataset_path, file_path, 'images', info[0])
    error, image_string, image_data = process_image(image_file)
    boxes = xywh_to_voc(image_file, info[1], image_data)
    print(boxes)
    if i > 3:
        break

--------------------
{'filename': '/home/ssac29/aiffel/face_detector/widerface/WIDER_train/images/0--Parade/0_Parade_marchingband_1_849.jpg', 'width': 1024, 'height': 1385, 'depth': 3, 'class': [1], 'xmin': [449], 'ymin': [330], 'xmax': [571], 'ymax': [479], 'difficult': [0]}
--------------------
{'filename': '/home/ssac29/aiffel/face_detector/widerface/WIDER_train/images/0--Parade/0_Parade_Parade_0_904.jpg', 'width': 1024, 'height': 1432, 'depth': 3, 'class': [1], 'xmin': [361], 'ymin': [98], 'xmax': [624], 'ymax': [437], 'difficult': [0]}
--------------------
{'filename': '/home/ssac29/aiffel/face_detector/widerface/WIDER_train/images/0--Parade/0_Parade_marchingband_1_799.jpg', 'width': 1024, 'height': 768, 'depth': 3, 'class': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'xmin': [78, 78, 113, 134, 163, 201, 182, 245, 304, 328, 389, 406, 436, 522, 643, 653, 793, 535, 29, 3, 20], 'ymin': [221, 238, 212, 260, 250, 218, 266, 279, 265, 295, 281, 293, 290, 328, 320, 22

<br>

## 데이터셋 전처리2 : 텐서플로우 데이터셋 만들기 <br><br>

대용량 데이터셋 처리속도 향상을 위해, <br>
__tfrecord__ 데이터셋으로 변환 ! <br><br>


1 개 데이터의 단위 인스턴스를 생성하는 메소드 : ```tf.train.Example()``` <br><br>

- 'filename' <br>
- 'height' <br>
- 'width' <br>
- 'classes' <br>
- 'x_mins' <br>
- 'y_mins' <br>
- 'x_maxes' <br>
- 'y_maxes' <br>
- 'image_raw'

<br>

In [8]:
# 1 개 데이터의 단위를 생성하는 메소드 작성

def make_example(image_string, image_info_list):

    for info in image_info_list:
        filename = info['filename']
        width = info['width']
        height = info['height']
        depth = info['depth']
        classes = info['class']
        xmin = info['xmin']
        ymin = info['ymin']
        xmax = info['xmax']
        ymax = info['ymax']

    if isinstance(image_string, type(tf.constant(0))):
        encoded_image = [image_string.numpy()]
    else:
        encoded_image = [image_string]

    base_name = [tf.compat.as_bytes(os.path.basename(filename))]

    example = tf.train.Example(features=tf.train.Features(feature={
        'filename':tf.train.Feature(bytes_list=tf.train.BytesList(value=base_name)),
        'height':tf.train.Feature(int64_list=tf.train.Int64List(value=[height])),
        'width':tf.train.Feature(int64_list=tf.train.Int64List(value=[width])),
        'classes':tf.train.Feature(int64_list=tf.train.Int64List(value=classes)),
        'x_mins':tf.train.Feature(float_list=tf.train.FloatList(value=xmin)),
        'y_mins':tf.train.Feature(float_list=tf.train.FloatList(value=ymin)),
        'x_maxes':tf.train.Feature(float_list=tf.train.FloatList(value=xmax)),
        'y_maxes':tf.train.Feature(float_list=tf.train.FloatList(value=ymax)),
        'image_raw':tf.train.Feature(bytes_list=tf.train.BytesList(value=encoded_image))
    }))
    return example

In [9]:
# make_example 메소드로 만든 1 개의 데이터셋 example 을 <br>
# serialize 하여 바이너리파일로 생성

import logging
import tqdm

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
rootPath = os.getenv('HOME')+'/aiffel/face_detector'
dataset_path = 'widerface'

if not os.path.isdir(dataset_path):
    logging.info('Please define valid dataset path.')
else:
    logging.info('Loading {}'.format(dataset_path))

logging.info('Reading configuration...')

for split in ['train', 'val']:
    output_file = rootPath + '/dataset/train_mask.tfrecord' if split == 'train' else rootPath + '/dataset/val_mask.tfrecord'

    with tf.io.TFRecordWriter(output_file) as writer:

        counter = 0
        skipped = 0
        anno_txt = 'wider_face_train_bbx_gt.txt' if split == 'train' else 'wider_face_val_bbx_gt.txt'
        file_path = 'WIDER_train' if split == 'train' else 'WIDER_val'
        for info in tqdm.tqdm(parse_widerface(os.path.join(rootPath, dataset_path, 'wider_face_split', anno_txt))):
            image_file = os.path.join(rootPath, dataset_path, file_path, 'images', info[0])

            error, image_string, image_data = process_image(image_file)
            boxes = xywh_to_voc(image_file, info[1], image_data)

            if not error:
                tf_example = make_example(image_string, [boxes])

                writer.write(tf_example.SerializeToString())
                counter += 1

            else:
                skipped += 1
                logging.info('Skipped {:d} of {:d} images.'.format(skipped, counter))

    logging.info('Wrote {} images to {}'.format(counter, output_file))

100%|██████████| 12880/12880 [00:51<00:00, 251.34it/s]
100%|██████████| 3226/3226 [00:12<00:00, 256.67it/s]


<br>

#### 전처리 과정 파이썬 모듈화 <br><br>

전처리1, 전처리2 과정을 tf_dataset_preprocess.py 파일에 모듈화 <br><br>

아래와 같이 실행 <br>

```cd ~/aiffel/face_detector && python tf_dataset_preprocess.py```

<br>

<br>

## Single Shot multi-box Detector (SSD) 모델 <br><br>


### SSD 모델의 특징 : prior box (anchor box) <br><br>

SSD 모델은 prior box (anchor box) 를 필요로 함 <br><br>


### prior box (anchor box) <br><br>

- object 가 존재할 만한 다양한 크기의 box 좌표 및 클래스 정보를 <br>
    일정 갯수 미리 고정해 두는 것 <br><br>

- ground truth 에 해당하는 bounding box 와의 IOU 계산 결과가 <br>
    일정값(0.5) 이상이 되도록 겹치는 prior box 를 선택하는 방식 <br><br>

    RCNN 계열의 sliding window 방식보다 속도가 빠르고, <br>
    RCNN 계열의 sliding window 방식과 유사한 정도의 정확도 <br><br>


참고. <br>
[Understand Single Shot MultiBox Detector (SSD) and Implement It in Pytorch](https://medium.com/@smallfishbigsea/understand-ssd-and-implement-your-own-caa3232cd6ad) <br>
[Understanding SSD MultiBox Real-Time Obeject Detection in Deep Learning](https://towardsdatascience.com/understanding-ssd-multibox-real-time-object-detection-in-deep-learning-495ef744fab)

<br>

<br>

### 모델 구현 : priors box 구현

<br>

In [10]:
# 이번 프로젝트에서 활용할 config 정보를 모아 dict 구조로 정리

cfg = {
    # general setting
    "batch_size": 32,
    "input_size": (240, 320),  # (h,w)

    # training dataset
    "dataset_path": 'dataset/train_mask.tfrecord',  # 'dataset/trainval_mask.tfrecord'
    "val_path": 'dataset/val_mask.tfrecord',  #
    "dataset_len": 12880,  # train 6115 , trainval 7954, number of training samples
    "val_len": 3226,
    "using_crop": True,
    "using_bin": True,
    "using_flip": True,
    "using_distort": True,
    "using_normalizing": True,
    "labels_list": ['background', 'face'],  # xml annotation

    # anchor setting
    "min_sizes":[[10, 16, 24], [32, 48], [64, 96], [128, 192, 256]],
    "steps": [8, 16, 32, 64],
    "match_thresh": 0.45,
    "variances": [0.1, 0.2],
    "clip": False,

    # network
    "base_channel": 16,

    # training setting
    "resume": False,  # if False,training from scratch
    "epoch": 100,
    "init_lr": 1e-2,
    "lr_decay_epoch": [50, 70],
    "lr_rate": 0.1,
    "warmup_epoch": 5,
    "min_lr": 1e-4,

    "weights_decay": 5e-4,
    "momentum": 0.9,
    "save_freq": 10, #frequency of save model weights

    # inference
    "score_threshold": 0.5,
    "nms_threshold": 0.4,
    "max_number_keep": 200
}

cfg

{'batch_size': 32,
 'input_size': (240, 320),
 'dataset_path': 'dataset/train_mask.tfrecord',
 'val_path': 'dataset/val_mask.tfrecord',
 'dataset_len': 12880,
 'val_len': 3226,
 'using_crop': True,
 'using_bin': True,
 'using_flip': True,
 'using_distort': True,
 'using_normalizing': True,
 'labels_list': ['background', 'face'],
 'min_sizes': [[10, 16, 24], [32, 48], [64, 96], [128, 192, 256]],
 'steps': [8, 16, 32, 64],
 'match_thresh': 0.45,
 'variances': [0.1, 0.2],
 'clip': False,
 'base_channel': 16,
 'resume': False,
 'epoch': 100,
 'init_lr': 0.01,
 'lr_decay_epoch': [50, 70],
 'lr_rate': 0.1,
 'warmup_epoch': 5,
 'min_lr': 0.0001,
 'weights_decay': 0.0005,
 'momentum': 0.9,
 'save_freq': 10,
 'score_threshold': 0.5,
 'nms_threshold': 0.4,
 'max_number_keep': 200}

In [11]:
# config 중 prior box 생성 관련된 정보 확인

image_sizes = cfg['input_size']
min_sizes = cfg["min_sizes"]
steps = cfg["steps"]
clip = cfg["clip"]

if isinstance(image_sizes, int):
    image_sizes = (image_sizes, image_sizes)
elif isinstance(image_sizes, tuple):
    image_sizes = image_sizes
else:
    raise Exception('Type error of input image size format,tuple or int. ')

print(image_sizes)

(240, 320)
