# **Installing Requirements**

Since the project is developed by different people, we will install all the requirements using the requirements.txt file which specifies all the packets' version that must be installed.

In [None]:
%pip install -r ../requirements.txt

# **Downloading Files from GDrive**

In [16]:
import gdown


########## dlib_face_recognition_resnet_model_v1.dat ################

# URL del file di Google Drive
url_1 = 'https://drive.google.com/uc?id=1tXD6dha1ZD4fceLWsGlI89t8HeHlkJYC' 

# Percorso in cui si desidera salvare il file scaricato
output_1 = '../Models/dlib_face_recognition_resnet_model_v1.dat'

gdown.download(url_1, output_1, quiet=False)



########## shape_predictor_68_face_landmarks.dat ###################

# URL del file di Google Drive
url_2 = 'https://drive.google.com/uc?id=1dvIeJtWhObCgSYJt8WKnjIlHhw5Y9ioN'

# Percorso in cui si desidera salvare il file scaricato
output_2 = '../Models/shape_predictor_68_face_landmarks.dat'

gdown.download(url_2, output_2, quiet=False)

Downloading...
From: https://drive.google.com/uc?id=1tXD6dha1ZD4fceLWsGlI89t8HeHlkJYC
To: c:\Users\marco\Documents\computer_vision\ComputerVisionProject\Models\dlib_face_recognition_resnet_model_v1.dat
100%|██████████| 22.5M/22.5M [00:01<00:00, 11.5MB/s]
Downloading...
From: https://drive.google.com/uc?id=1dvIeJtWhObCgSYJt8WKnjIlHhw5Y9ioN
To: c:\Users\marco\Documents\computer_vision\ComputerVisionProject\Models\shape_predictor_68_face_landmarks.dat
100%|██████████| 99.7M/99.7M [00:08<00:00, 11.2MB/s]


'../Models/shape_predictor_68_face_landmarks.dat'

# **Face Recognition**

Face recognition is a computer vision task that involves identifying and verifying a person's identity based on their facial features. This process can be broken down into these steps:

1. **Detection**: Identifying faces in images or video frames.
2. **Feature** Extraction: Capturing unique facial characteristics.
3. **Representation**: Creating a distinctive template for each face.
4. **Model Training**: Associating templates with known identities during training.
5. **Matching**: Comparing a new face's template to stored ones for identification.
6. **Decision**: Determining a match based on a similarity threshold.

Nowadays, these steps are performed through deep learning models. In the following section we will provide a simple implementation through a pre-trained model and our paper implementation (further details in the next sections).

## **Pre-trained Model**

The following technique is a simple face recognition implemented using dlib's pre-trained models. The face detector is implemented using standard computer vision techniques and classical machine learning models (such as SVM, KNN, ...). The aim of this part is to provide a simple solution for the problem in order to compare the more sophisticated implementation provided by the paper.

### **Load all the user faces**

In order to recognize users, we have to first load all their faces from the dataset (represented by a directory containing all the images). For each of those pictures, we have to compute the associated features (**embeddings**) that will be compared during the recognition step. 

In [19]:
import os
import cv2
import dlib
import numpy as np

# Function used to get the faces in an image
def face_rects(image, face_detector):
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # Convert the image to grayscale
    rects = face_detector(gray, 1) # Detect faces in the grayscale image

    return rects

# Function used to get the landmarks of a face
def face_landmarks(image, shape_predictor, face_detector):
    return [shape_predictor(image, face_rect) for face_rect in face_rects(image, face_detector)] # Compute the face landmarks


# Function used to get the encodings (features) of a face
def face_encodings(image, face_encoder, shape_predictor, face_detector):
    # Compute the facial embeddings for each face (128-d vector that describes the face in an image)
    return [np.array(face_encoder.compute_face_descriptor(image, face_landmark)) for face_landmark in face_landmarks(image, shape_predictor, face_detector)]




face_detector = dlib.get_frontal_face_detector() # Model used to get detect faces
shape_predictor = dlib.shape_predictor("../Models/shape_predictor_68_face_landmarks.dat") # Model used to get the landmarks in a face
face_encoder = dlib.face_recognition_model_v1("../Models/dlib_face_recognition_resnet_model_v1.dat") # Feature extractor

known_faces = {} # Dict that will store the user's embeddings

base_directory = "../UserFaces/" # Directory containing user faces

# Iterate through directories
for user_name in os.listdir(base_directory):

    user_path = os.path.join(base_directory, user_name)

    # Iterate through face images in each user directory
    for filename in os.listdir(user_path):
        image_path = os.path.join(user_path, filename)

        img = cv2.imread(image_path) # Read the image
        new_encodings = face_encodings(img, face_encoder, shape_predictor, face_detector) # Get the embeddings

        encodings = known_faces.get(user_name, []) 
        encodings.extend(new_encodings) # Add the embeddings to the already saved ones
        known_faces[user_name] = encodings


print(known_faces.keys())

dict_keys(['Daniele', 'Giacomo', 'Giuseppe', 'Marco'])


### **User Recognition**

In this part we will use the obtained information perform a real-time recognition through the webcam. The recognition is performed by comparing each known face with the ones spotted in the frame. The comparison is about checking the euclidian distance between the encodings of the known and unknown face. The known face that has the highest number of matches with the one present in the frame is selected as the user.

In [20]:
# Function used to get the 'distace' between the face that must be recognized and the database
def nb_of_matches(known_encodings, unknown_encoding):
    distances = np.linalg.norm(known_encodings - unknown_encoding, axis=1) # Compute the Euclidean distance between the current face encoding and all the face encodings in the database
    small_distances = distances <= 0.6  # Keep only the distances that are less than the threshold
    
    return sum(small_distances)


cap = cv2.VideoCapture(0) # Open a connection to the webcam (0 represents the default camera)

while True:
    ret, frame = cap.read() # Read a frame from the webcam

    frame_encodings = face_encodings(frame, face_detector=face_detector, face_encoder=face_encoder, shape_predictor=shape_predictor) # Get the face encodings of the unknown face
    names = []

    for encoding in frame_encodings:
        counts = {}

        for (name, known_encodings) in known_faces.items(): # Compare the encodings between every face in the user dataset and the current one
            counts[name] = nb_of_matches(known_encodings, encoding)
        
        if all(count == 0 for count in counts.values()): # If there are no matches, the user is unknown
            name = "Unknown"
        else: # Pick the user with the highest number of matches
            name = max(counts, key=counts.get)

        names.append(name)

    for face, name in zip(face_rects(frame, face_detector), names): # Loop on the faces in the frame and assign the associated name

        x1, y1, x2, y2 = face.left(), face.top(), face.right(), face.bottom() # Get the bounding box for each face
        
        # Draw the bounding box of the face and the associated the name of the person
        cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
        cv2.putText(frame, name, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (0, 255, 0), 2)

    cv2.imshow("Face Recognition", frame) # Display the result

    if cv2.waitKey(1) & 0xFF == ord('q'): # Break the loop if the 'q' key is pressed
        break

# Release the webcam and close all windows
cap.release()
cv2.destroyAllWindows()


### **Problems with this approach**

This approach can be easily proven to be very slow and not very robust. In fact, when using a camera with relatively low resolution (resulting in noise in each frame), the camera may fail to detect faces in the image (we're talking about detection!). Moreover, even if faces are detected in the frame, due to the noise, they could be incorrectly recognized as another user or not recognized at all! While fine-tuning parameters (such as the minimum Euclidean distance value) or applying classical computer vision techniques may help reduce noise and assist the detector and recognizer, the fundamental problem with this approach persists. It is extremely inefficient and slow! If we intend to deploy this technique on low-spec hardware, the slowness of the system will adversely affect the accuracy of detection/recognition as well.