# Improving Accuracy of Rekognition Face Search with User Vectors
### Face search of images against a collection of users faces
![results](./results.jpeg)

---
This notebook is an end to end example how to build a face search system using [Amazon Rekognition](https://aws.amazon.com/rekognition/).<br>
Amazon Rekognition enables you to achieve very high face search accuracy with a single face image. In some cases, you can use multiple images of the same person's face to create user vectors and improve accuracy even further. This is especially helpful when images have variations in lighting, poses, and appearances.<br>

This will guide you through creating a collection, storing face vectors in that collection, aggregating those face vectors into user vectors, and then comparing the results of searching against those individual face vectors and user vectors.

In June 2023, [AWS launched user vectors, a new capability that significantly improves face search accuracy](https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-rekognition-face-search-accuracy-user-vectors/) by leveraging multiple face images of a user. Now, customers can create user vectors, which aggregate multiple face vectors of the same user. User vectors offer higher face search accuracy with more robust depictions, as they contain varying degrees of lighting, sharpness, pose, appearance, etc. This improves the accuracy compared to searching aginst individual face vectors.



**NOTE: You can run this notebook in SageMaker Studio, JupyterLab, or on your local machine**

---

# Contents

1. [Installation](#Installation)
2. [Environment Creation](#Environment-Creation)
3. [Face search of image against a collection of individual face vectors](#Face-search-of-image-against-a-collection-of-individual-face-vectors)
4. [Face search of image against a collection of user vectors](#Face-search-of-image-against-a-collection-of-user-vectors)
5. [Cleanup](#Cleanup)

# Installation

In [None]:
!pip3 install Pillow --upgrade
!pip3 install boto3

### Permissions required to run this notebook
You are required to have permission to access the Rekognition API and to access an S3 bucket for storing the images.<br>
Add this minimal policy to your IAM Role to enable the execution of the code outlined in this notebook.<br>
Make sure to replace <i>your-bucket-name</i> with the real bucket name.
```
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "RekognitionPermissions",
            "Effect": "Allow",
            "Action": [
                "rekognition:CreateCollection",
                "rekognition:DeleteCollection",
                "rekognition:CreateUser",
                "rekognition:IndexFaces",
                "rekognition:DetectFaces",
                "rekognition:AssociateFaces",
                "rekognition:SearchUsersByImage",
                "rekognition:SearchFacesByImage"
            ],
            "Resource": "*"
        },
        {
            "Sid": "S3BucketPermissions",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::your-bucket-name/*",
                "arn:aws:s3:::your-bucket-name"
            ]
        }
    ]
}
```

# Environment Creation

In this section we will create our environment resources.<br>
The steps are:
* Upload the images to S3 bucket
* Create a Rekognition collection
* Populate our collection

In [None]:
# Bucket where we will store our images
bucket = '<replace_with_your_bucket>'

# The Rekognition collection id
collection_id = "faces-collection"

### Upload the images folder to S3

In [None]:
import os
import boto3

def upload_to_s3(path, bucket):
    s3 = boto3.resource('s3')
    folder_name = os.path.basename(os.path.normpath(path))
    for root, dirs, files in os.walk(path):
        for file in files:
            local_path = os.path.join(root, file)
            s3_key = os.path.relpath(local_path, path).replace("\\", "/")
            s3_key = os.path.join(folder_name, s3_key)
            s3.Bucket(bucket).upload_file(local_path, s3_key)

In [None]:
upload_to_s3(path='./images', bucket=bucket)
print(f"Uploaded images to: {bucket}")

### Define helper functions
* create_collection - create a new collection
* delete_collection - delete a collection
* create_user - create a new user in a collection
* add_faces_to_collection - add faces to collection
* associate_faces - associate face_ids to a user in a collection
* get_subdirs - get all sub directories under s3 prefix
* get_files - get all files under s3 prefix


In [None]:
!pygmentize -g helpers.py

### Create Rekognition Collection for the faces and users

In [None]:
import helpers
helpers.create_collection(collection_id)

### Populate our collection
Our S3 bucket has a directory for each user that stores their images<br>
We will:
* Create users per each user directory under S3<br>
* Get the face_id from each image and add it to the collection as individual face vector<br>
* Associate the face_ids to the appropriate user vector

In [None]:
prefix = 'images/'

# Get all the users directories from s3 containing the images
folder_list = helpers.get_subdirs(bucket, prefix)
print(f"Found users folders: {folder_list}")
print()

for user_id in folder_list:
    face_ids = []
    helpers.create_user(collection_id, user_id)
    # Get all files per user under the s3 user directory
    images = helpers.get_files(bucket, prefix + user_id + "/")
    print (f"Found images={images} for {user_id}")
    for image in images:
        face_id = helpers.add_faces_to_collection(bucket, image, collection_id)
        face_ids.append(face_id)
    helpers.associate_faces(collection_id, user_id, face_ids)
    print()

We would like to take a new photo with multiple people and attempt to match their
faces against our collection, first against the individual face vectors and then against the user vectors

# Face search of image against a collection of individual face vectors

In this section we will take a new image containing multiple faces and attempt to match those faces against **individual faces** in our collection.<br>
For this purpose we will use the <i>detect_users.detect_faces_in_image</i> method.<br>

The function detects faces in an image and for each face it will:
* Print its bounding box location
* Check if such face exist in our collection and print the user or 'Unknown'
* Print the similarity score

The logic:
1. We call Amazon Rekognition [DetectFaces](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_DetectFaces.html) api per our image to detect and get all faces in the image and obtain their bounding box locations within the image
2. For each detected face in the image, we crop the face based on its location and send the cropped face image to the Amazon Rekognition [SearchFacesByImage](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_SearchFacesByImage.html) api. This function gets an image file and compare it faces against individual faces in our collection and return the faces that matches.
<i>SearchFacesByImage</i> returns the biggest face in an input image. Therefore, if the image contains multiple faces, we need to crop each face and send it individually to the method

We will use a similarity score threshold of 99%, which is a common setting for identity verification (IDV)
use cases.
<br/>
Using a threshold of 99% Dr. Werner Vogels should be flagged as Unknown.

In [None]:
import detect_users
from IPython.display import display

# The image we would like to match faces against our internal collection.
file_key = prefix + "photo.jpeg"

img = detect_users.detect_faces_in_image(
    bucket, 
    file_key, 
    collection_id, 
    threshold=99
)
display(img)

Let's try to lower the threshold to 90% and run again

In [None]:
import detect_users
from IPython.display import display

# The image we would like to match faces against our internal collection.
file_key = prefix + "photo.jpeg"

img = detect_users.detect_faces_in_image(
    bucket, 
    file_key, 
    collection_id, 
    threshold=90
)
display(img)

Let's check if we can get the similarity score above our defined threshold by using user vectors

# Face search of image against a collection of user vectors

In this section we will take an image containing multiple faces and attempt to match those faces against our collection using **User Vectors**<br>
For this purpose we will use the <i>detect_users.detect_users_in_image</i> method<br>

The function detects faces in an image and for each face it will:
* Print its bounding box location
* Check if such user face exist in our collection and print the user or 'Unknown'
* Print the similarity score

The logic:
1. We call Amazon Rekognition [DetectFaces](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_DetectFaces.html) api per our image to detect and get all faces in the image and obtain their bounding box locations within the image
2. For each detected face in the image, we crop the face based on its location and send the cropped face image to the Amazon Rekognition [SearchUsersByImage](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_SearchUsersByImage.html) api.
<i>SearchUsersByImage</i> returns the biggest face in an input image. Therefore, if the image contains multiple faces, we need to crop each face and send it individually to the method

Here is the code:

In [None]:
!pygmentize -g detect_users.py

### Find users in an image

In [None]:
import detect_users
from IPython.display import display

# The image we would like to match faces against our internal collection.
file_key = prefix + "photo.jpeg"

img = detect_users.detect_users_in_image(
    bucket, 
    file_key, 
    collection_id, 
    threshold=99
)
display(img)

# Cleanup

In [None]:
helpers.delete_collection(collection_id)