# To get the region of interest 

When we look at any image, most of the time we identify a person using a face. An image might contain multiple faces, also the face can be obstructed and not clear. The first step in our pre-processing pipeline is to detect faces from an image. Once face is detected, we will detect eyes, if two eyes are detected then only we keep that image otherwise discard it

Now how do you detect face and eyes?

In [4]:
import numpy as np
import cv2
import matplotlib
from matplotlib import pyplot as plt

In [5]:
face_cascade = cv2.CascadeClassifier('./opencv/haarcascades/haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier('./opencv/haarcascades/haarcascade_eye.xml')

In [6]:
def get_cropped_image_if_2_eyes(image_path):
    img = cv2.imread(image_path)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    faces = face_cascade.detectMultiScale(gray, 1.3, 5)
    for (x,y,w,h) in faces:
        roi_gray = gray[y:y+h, x:x+w]
        roi_color = img[y:y+h, x:x+w]
        eyes = eye_cascade.detectMultiScale(roi_gray)
        if len(eyes) >= 2:
            return roi_color

It detects faces in an image and, for each detected face, checks if at least 2 eyes are present. If so, it returns the cropped face region from the image.

In [8]:
path_to_data = "./images_dataset/"
path_to_cr_data = "./cropped/"

In [9]:
import os
img_dirs = []
for entry in os.scandir(path_to_data):
    if entry.is_dir():
        img_dirs.append(entry.path)

It scans the directory path_to_data and adds the paths of all subdirectories to the list img_dirs

In [11]:
img_dirs

['./images_dataset/lionel_messi',
 './images_dataset/maria_sharapova',
 './images_dataset/roger_federer',
 './images_dataset/serena_williams',
 './images_dataset/virat_kohli']

In [12]:
import shutil
if os.path.exists(path_to_cr_data):
     shutil.rmtree(path_to_cr_data)
os.mkdir(path_to_cr_data)

It deletes the folder at path_to_cr_data if it exists, then creates a new empty folder with the same name.

In [14]:
cropped_image_dirs = []
celebrity_file_names_dict = {}

for img_dir in img_dirs:
    count = 1
    celebrity_name = img_dir.split('/')[-1]
    print(celebrity_name)
    
    celebrity_file_names_dict[celebrity_name] = []
    
    for entry in os.scandir(img_dir):
        roi_color = get_cropped_image_if_2_eyes(entry.path)
        if roi_color is not None:
            cropped_folder = path_to_cr_data + celebrity_name
            if not os.path.exists(cropped_folder):
                os.makedirs(cropped_folder)
                cropped_image_dirs.append(cropped_folder)
                print("Generating cropped images in folder: ",cropped_folder)
                
            cropped_file_name = celebrity_name + str(count) + ".png"
            cropped_file_path = cropped_folder + "/" + cropped_file_name 
            
            cv2.imwrite(cropped_file_path, roi_color)
            celebrity_file_names_dict[celebrity_name].append(cropped_file_path)
            count += 1  

lionel_messi
Generating cropped images in folder:  ./cropped/lionel_messi
maria_sharapova
Generating cropped images in folder:  ./cropped/maria_sharapova
roger_federer
Generating cropped images in folder:  ./cropped/roger_federer
serena_williams
Generating cropped images in folder:  ./cropped/serena_williams
virat_kohli
Generating cropped images in folder:  ./cropped/virat_kohli


It loops through image directories of celebrities, detects faces with at least 2 eyes, saves the cropped face images into a new folder for each celebrity, and keeps track of the file paths in a dictionary.

Manually examine cropped folder and delete any unwanted images which will help in improving model accuracy