
<a href="https://blog.cloudcommander.net" target="_parent"><img src="https://raw.githubusercontent.com/cloud-commander/hexoblog/master/cloud.png" alt="Visit my Blog">
</a>
<br> 
# <span style="font-family:Didot; font-size:3em;"> Cloud Commander </span>


<a href="https://colab.research.google.com/github/cloud-commander/face-mask-detection/blob/master/1_Prepare_Data_Annotate_Images_Part1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"></a>
&nbsp;&nbsp;&nbsp;&nbsp;
<a href="https://github.com/cloud-commander/face-mask-detection/blob/master/1_Prepare_Data_Annotate_Images_Part1.ipynb" target="_parent"><img src="https://img.shields.io/static/v1?logo=GitHub&label=&color=333333&style=flat&message=View%20on%20GitHub" alt="View in GitHub"></a>



## Automatic Face Image Annotation  ##

We start off the data preparation phase by pre-processing images in our dataset for the unmasked category.

We have a selection of facial images and we need to draw bounding boxes around them. We could do that manually using a tool such as [labelImg](https://github.com/tzutalin/labelImg) however that would be slow, tedious and unnecessary for our purposes.

Instead we will use a facial detection algorithm (Haar cascade) to automatically detect faces and generate an accompanying XML file for each image with the coordinates of the face. Now this method is far from foolproof as it only works properly with full frontal images of faces but its a good starting point.



### Import required libraries

In [0]:
#!pip install lxml
#!pip install opencv-python
from lxml import etree as ET


import os
import cv2
import numpy as np
import glob
from pathlib import Path
import shutil

from PIL import Image, ImageFile

### Prepare dataset ###

#### Connect to Google Drive

In [3]:
from google.colab import drive

drive.mount('/content/drive/')


Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive/


#### Download faces dataset and extract to folder


In [2]:
!mkdir -p /content/dataset/unmasked
!wget https://raw.githubusercontent.com/cloud-commander/face-mask-detection/master/data/1k_faces_00.zip

#Use 7zip to extract only the image files from our archive
!7z e 1k_faces_00.zip -o/content/dataset/masked *.jpg *.png *.jpeg -r

--2020-06-05 12:52:12--  https://raw.githubusercontent.com/cloud-commander/face-mask-detection/master/data/1k_faces_00.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 23867089 (23M) [application/zip]
Saving to: ‘1k_faces_00.zip’


2020-06-05 12:52:13 (31.8 MB/s) - ‘1k_faces_00.zip’ saved [23867089/23867089]

Archive:  1k_faces_00.zip
  inflating: /content/dataset/unmasked/0004STET6P.jpg  
  inflating: /content/dataset/unmasked/000N7AIAFT.jpg  
  inflating: /content/dataset/unmasked/00H858UYSD.jpg  
  inflating: /content/dataset/unmasked/00KPGHV40E.jpg  
  inflating: /content/dataset/unmasked/00P2LUXJW3.jpg  
  inflating: /content/dataset/unmasked/00PYA83V1P.jpg  
  inflating: /content/dataset/unmasked/00SD82OK2A.jpg  
  inflating: /content/dataset/unmasked/00

#### Set image directory ####

In [0]:
dataset_dir = "dataset/unmasked"

#### Download required dependency 

In [5]:
%cd /content/
!wget https://raw.githubusercontent.com/cloud-commander/face-mask-detection/master/utils/lbpcascade_frontalface_improved.xml

/content
--2020-06-05 12:52:51--  https://raw.githubusercontent.com/cloud-commander/face-mask-detection/master/utils/lbpcascade_frontalface_improved.xml
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 54039 (53K) [text/plain]
Saving to: ‘lbpcascade_frontalface_improved.xml’


2020-06-05 12:52:51 (1.97 MB/s) - ‘lbpcascade_frontalface_improved.xml’ saved [54039/54039]



In [6]:
path = os.getcwd()
filename = path + "/" + dataset_dir
face_cascade = cv2.CascadeClassifier('lbpcascade_frontalface_improved.xml')
dataset_labels_folder = dataset_dir + "-labels"
dataset_labels_path = os.getcwd()+ "/"+ dataset_labels_folder 
print(dataset_labels_path)
os.mkdir(dataset_labels_folder)


/content/dataset/unmasked-labels


In [0]:
for subdir, dirs, files in os.walk(filename):
    for file in files:

        
        img_path=os.path.join(subdir, file)
        img_name=os.path.basename(img_path)

        if img_path.lower().endswith(('.png', '.jpg', '.jpeg')):

            img = cv2.imread(img_path)
            gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
            h,w,bpp = np.shape(img)
            imgp = Image.open(img_path)
#            b, landmarks = detect_faces(imgp)
            faces = face_cascade.detectMultiScale(gray, 1.3, 5)


#            g=show_bboxes(imgp, b, landmarks)
            #print(len(faces))
            if len(faces) == 1 :
                try :

                    for bounding_boxes in faces:
                        face = img[int(bounding_boxes[1]):int(bounding_boxes[3]),
                        int(bounding_boxes[0]):int(bounding_boxes[2])]

                        
                        subdir_path, subdir_name = os.path.split(subdir)

                        root = ET.Element("annotation",verified="yes")
                        ET.SubElement(root, "folder").text=filename

                        ET.SubElement(root, "filename").text = img_name
                        ET.SubElement(root, "path").text = img_path

                        source=ET.SubElement(root, "source")
                        ET.SubElement(source, "database").text = "unknown"

                        size=ET.SubElement(root, "size")
                        ET.SubElement(size, "width").text = str(w)
                        ET.SubElement(size, "height").text = str(h)
                        ET.SubElement(size, "depth").text = str(bpp)

                        ET.SubElement(root, "segmented").text = "0"

                        obj=ET.SubElement(root, "object")
                        ET.SubElement(obj, "name").text = subdir_name
                        ET.SubElement(obj, "pose").text = "Unspecified"
                        ET.SubElement(obj, "truncated").text = "0"
                        ET.SubElement(obj, "difficult").text = "0"

                        box=ET.SubElement(obj, "bndbox")
                        ET.SubElement(box, "xmin").text = str(int(bounding_boxes[0]))
                        ET.SubElement(box, "ymin").text = str(int(bounding_boxes[1]))
                        ET.SubElement(box, "xmax").text = str(int(bounding_boxes[2])+int(bounding_boxes[1]))
                        ET.SubElement(box, "ymax").text = str(int(bounding_boxes[3]))

                        tree = ET.ElementTree(root)
                        tree.write(os.path.join(dataset_labels_path, os.path.splitext(img_name)[0] + '.xml'))
                        
                        
                except RuntimeError :
                    with open("delete.txt", "a") as myfile:
                        myfile.write(img_path+"\n")
                	

                else :
                    with open("delete.txt", "a") as myfile:
                            myfile.write(img_path+"\n")
                    

### Save Dataset

In [8]:
source1 = filename + "/"
source2 = dataset_labels_path + "/"

destination = os.getcwd() + "/training_demo"+ "/images"
if not os.path.exists(destination):
    os.makedirs(destination)


def move_files(source,destination):
    files = os.listdir(source)
    for f in files:
        if f.endswith(('.png', '.jpg', '.jpeg', '.xml')):
          shutil.move(source+f, destination)


move_files(source1,destination)
move_files(source2,destination)

# deletes source directories, use as needed
#os.rmdir(source2)
#os.rmdir(source1)

OSError: ignored

### Compress Dataset (Optional) ###

Follow this step if you wish to archive your dataset for future use

In [9]:
!zip -r unmasked-dataset.zip /content/training_demo/images

  adding: content/training_demo/images/ (stored 0%)
  adding: content/training_demo/images/2OKEGQH2XR.jpg (deflated 0%)
  adding: content/training_demo/images/3QOQ2TMK9U.jpg (deflated 0%)
  adding: content/training_demo/images/0G8BKB200W.jpg (deflated 0%)
  adding: content/training_demo/images/4B5AC7FB7W.jpg (deflated 0%)
  adding: content/training_demo/images/0NZVBZ17MN.jpg (deflated 0%)
  adding: content/training_demo/images/2MAKY44FDM.jpg (deflated 0%)
  adding: content/training_demo/images/0TNTAUJVU8.xml (deflated 47%)
  adding: content/training_demo/images/3U6YDZ3ME8.xml (deflated 47%)
  adding: content/training_demo/images/3F8M5W3XFL.xml (deflated 47%)
  adding: content/training_demo/images/4CNDRH3OX7.xml (deflated 47%)
  adding: content/training_demo/images/1GSOH70PZB.xml (deflated 47%)
  adding: content/training_demo/images/00XN46VW5G.jpg (deflated 0%)
  adding: content/training_demo/images/2TAJDIE93R.xml (deflated 47%)
  adding: content/training_demo/images/1TJ1CYQWP7.jpg (def

#### Save archive to Google Drive


In [11]:
!gsutil cp unmasked-dataset.zip /content/drive/My\ Drive/Datasets/

Copying file://unmasked-dataset.zip...
/ [0 files][    0.0 B/ 23.2 MiB]                                                / [1 files][ 23.2 MiB/ 23.2 MiB]                                                
Operation completed over 1 objects/23.2 MiB.                                     
