## The Application of AWS Rekognition to the Identification of Missing Children
### Introduction and Motivation

  Kidnappings and missing children are serious problems within our society. Services such as the child abduction alert system and missing people website may help parents find their missing children, yet there is the possibility that some of the kids cannot be found. According to statistics from the Federal Bureau of Investigation (FBI), in 2020, 365,348 missing children cases were registered, and in 2019, there were 421,394 ("Key Facts", 2021). About 89% to 92% percent of all missing persons are found either dead or alive, which leaves about 10% unresolved cases and the parents suffer from huge losses, hoping for finding their children in the future. However, after many years, it might be very hard to recognize a person. To find if there is any possibility to alleviate this worldwide problem with the aid of AWS Rekognition, we designed our project, which explores to what extent that this tool can figure out the similarities between the images of the same person as a child and after turning into an adult.

### The Reason Why We Use Amazon Rekognition

AWS Rekognition generates data from a picture, summarizing information about objects, backgrounds, human faces, facial expressions and so on. When given two photos of the same person, it gives the possibility of similarity, which is usually high, accurately judges that they present the same person. However, there is one challenge when dealing with the missing children – they grow. If they cannot be spotted quickly, their appearances may change. To determine whether AWS Rekognition is useful on the missing children cases, we want to explore this technology and find answering the following questions: first, does AWS Rekognition match the face of a grown-up with his or her childhood photo? Second, how accurate is AWS Rekognition when a part of the face is hidden, or the individuals’ facial structures have changed due to maturing? Will the accuracy decrease with the increasing age difference? Third, are there any possible ways to improve this tool’s accuracy?

We hypothesized that AWS Rekognition can recognize the same person at different ages. However, since people’s facial features can drastically change after years, we also expect the accuracy of results may decrease if the age difference between the two pictures of the same individual is large. So, as the first part of our project, we selected 5 sets of pictures of 5 people (3 males and 2 females). Three of them are gathered from a news website under the title "This Family Took The Same Picture Every Year For Over Two Decades" and the rest two are gathered from two research published by Dr. Reynolds and Dr. Zhao. All data sources are publically available online without the problem of copyright. We tried to use photos of group members as our data source we could not gather photos from long before. Among those five sets of pictures, we are going to test similarity within groups to see how well AWS Rekognition detecting the same person of different ages. One picture that is similar in color, gesture, race, and facial expression was added as a control group for each set so we are sure that the similarity yield by the testing is not due to those attributes. Next, we further call the age detection and age extract function in AWS Rekognition so that we get to know AWS's estimate for the age of people in the picture. If AWS Rekognition is proved to have high accuracy on such tests, looking for missing kids might be more viable since age is one of few attributes that people looking for missing people know for sure and won't change.

![Data Example](https://github.com/AlisiaTian/SP2021QTM350/blob/main/blog/01.png?raw=true)

### How It Works
![Architecture](https://github.com/AlisiaTian/SP2021QTM350/blob/main/blog/02.jpg?raw=true)


To start with, we will build a Amazon S3 bucket. 

In [None]:
# make a bucket
!aws s3 mb s3://image-api-example-finalversion

make_bucket failed: s3://image-api-example-finalversion An error occurred (BucketAlreadyExists) when calling the CreateBucket operation: The requested bucket name is not available. The bucket namespace is shared by all users of the system. Please select a different name and try again.


In [None]:
# Import important package
import boto3

 Then we need to loop through the images in the bucket. To do this we will first make a Python list of all the images in the bucket. First we use boto3 to make an instance of an object s3_resource that will allow us to communicate with S3.


In [None]:
s3_resource = boto3.resource('s3')
my_bucket = s3_resource.Bucket('image-api-example-finalversion')
summaries = my_bucket.objects.all()
summaries

s3.Bucket.objectsCollection(s3.Bucket(name='image-api-example-finalversion'), s3.ObjectSummary)

We create a list called images, and using loop to add the name of the image into the list for later usage

In [None]:
images = []
for image in summaries:
    images.append(image.key)
images
images

Then we create an create an instance client of the client object in the boto3 package for rekognition.It will allow use to communicate and make requests to the Rekognition service using Python.

In [None]:
# Create an instance client of the client object in the boto3 package for rekognition

client=boto3.client('rekognition')

# Dataset 1 

In the below chunck of code, we import two important package numpy and pandas into the enviroment for later usage, create an empty dataset called df using pd, use boto3 to make an instance of an object s3_resource that will allow us to communicate with S3, and loop thorugh the summaries bucket and add the images names into the df dataframe under the column "Name"


In [None]:
import numpy as np
import pandas as pd
df = pd.DataFrame()
s3_resource = boto3.resource('s3')
my_bucket = s3_resource.Bucket('image-api-example-finalversion')
summaries = my_bucket.objects.all()
image_names = [image.key for image in summaries]
df["Name"] = image_names

We also make a new dataframe called qf using pd and add the images names into the df dataframe under the column "Name"

In [None]:
qf = pd.DataFrame()
qf["Name"] = image_names

We then create a list called small_name and add all the image names into the list, and use it latter for the column names in the loop function.

In [None]:
small_name = ["Similarity-child-91","Similarity-child-92","Similarity-child-93","Similarity-child-94","Similarity-child-95","Similarity-child-96","Similarity-child-97","Similarity-child-98","Similarity-child-99","Similarity-child-00","Similarity-child-02","Similarity-child-03","Similarity-child-04","Similarity-child-05","Similarity-child-06","Similarity-child-07","Similarity-child-08","Similarity-child-09","Similarity-child-10","Similarity-child-11","Similarity-child-12","Similarity-child-13",
              "Similarity-female-91","Similarity-female-92","Similarity-female-93","Similarity-female-94","Similarity-female-95","Similarity-female-96","Similarity-female-97","Similarity-female-98","Similarity-female-99","Similarity-female-00","Similarity-female-02","Similarity-female-03","Similarity-female-04","Similarity-female-05","Similarity-female-06","Similarity-female-07","Similarity-female-08","Similarity-female-09","Similarity-female-10","Similarity-female-11","Similarity-female-12","Similarity-female-13",
             "Similarity-male-91","Similarity-male-92","Similarity-male-93","Similarity-male-94","Similarity-male-95","Similarity-male-96","Similarity-male-97","Similarity-male-98","Similarity-male-99","Similarity-male-00","Similarity-male-02","Similarity-male-03","Similarity-male-04","Similarity-male-05","Similarity-male-06","Similarity-male-07","Similarity-male-08","Similarity-male-09","Similarity-male-10","Similarity-male-11","Similarity-male-12","Similarity-male-13"]

big_name = ["range-child-91","range-child-92","range-child-93","range-child-94","range-child-95","range-child-96","range-child-97","range-child-98","range-child-99","range-child-00","range-child-02","range-child-03","range-child-04","range-child-05","range-child-06","range-child-07","range-child-08","range-child-09","range-child-10","range-child-11","range-child-12","range-child-13",
              "Similarity-female-91","Similarity-female-92","Similarity-female-93","Similarity-female-94","Similarity-female-95","Similarity-female-96","Similarity-female-97","Similarity-female-98","Similarity-female-99","Similarity-female-00","Similarity-female-02","Similarity-female-03","Similarity-female-04","Similarity-female-05","Similarity-female-06","Similarity-female-07","Similarity-female-08","Similarity-female-09","Similarity-female-10","Similarity-female-11","Similarity-female-12","Similarity-female-13",
             "Similarity-male-91","Similarity-male-92","Similarity-male-93","Similarity-male-94","Similarity-male-95","Similarity-male-96","Similarity-male-97","Similarity-male-98","Similarity-male-99","Similarity-male-00","Similarity-male-02","Similarity-male-03","Similarity-male-04","Similarity-male-05","Similarity-male-06","Similarity-male-07","Similarity-male-08","Similarity-male-09","Similarity-male-10","Similarity-male-11","Similarity-male-12","Similarity-male-13"]


For this chunk, we first create a constant i = 0 for later usage in the loop.Then we make a loop function. Inside the loop function we define a function called extract_similarity to extract the similarity value for each set using the compare_faces function in the rekognition. After we got the image_score we store it into the df dataframe and stored it under the column name small_name at the i position.

In [None]:
i = 0
for photo in images:
    def extract_similarity(image):
        try:
            comparison = client.compare_faces(SourceImage={'S3Object':{'Bucket':"image-api-example-finalversion",'Name':photo}}, TargetImage={'S3Object':{'Bucket':"image-api-example-finalversion",'Name':image}})
            face_match = comparison['FaceMatches']
            image_score = face_match[0]['Similarity'] 
        except:
            image_score = np.nan
        return image_score
    df[small_name[i]] = [extract_similarity(name) for name in df["Name"]]
    i = i + 1
    print(i)

Here we define a function called extract_Age to extract the age range value for each image using the detect_faces function in the rekognition.

In [None]:
def extract_age(image):
    try:
        comparison = client.detect_faces(Image={'S3Object':{'Bucket':"image-api-example-finalversion",'Name':image}}, Attributes = ["ALL"])
        age_range = comparison["FaceDetails"][0]["AgeRange"] 
    except:
        age_range = np.nan
    return age_range

After we got the age_range we store it into the df dataframe and stored it under the column name "Age_range".

In [None]:
df["Age_Range"] = [extract_age(name) for name in qf["Name"]]

Finally, we output the result to a csv file called Newresult.csv

In [None]:
df.to_csv("Newresult.csv")

NameError: name 'df' is not defined

# Dataset 2

First, make an Amazon S3 bucket

In [None]:
!aws s3 mb s3://image-api-example-finalversion2

make_bucket: image-api-example-finalversion2


Then we need to loop through the images in the bucket. To do this we will first make a Python list of all the images in the bucket.
First we use boto3 to make an instance of an object s3_resource that will allow us to communicate with S3.


In [None]:
s3_resource = boto3.resource('s3')
my_bucket = s3_resource.Bucket('image-api-example-finalversion2')
summaries = my_bucket.objects.all()
summaries

s3.Bucket.objectsCollection(s3.Bucket(name='image-api-example-finalversion2'), s3.ObjectSummary)

We create a list called images, and using loop to add the name of the image into the list for later usage

In [None]:
images = []
for image in summaries:
    images.append(image.key)
images
images

[u'1age10.png',
 u'1age16.png',
 u'1age3.png',
 u'1age4.png',
 u'1age6.png',
 u'1age8.png',
 u'1age9.png',
 u'2age15.png',
 u'2age16.png',
 u'2age18.png',
 u'2age20.png',
 u'2age21.png',
 u'2age23.png',
 u'2age26.png',
 u'2age29.png',
 u'2age36.png',
 u'2age38.png',
 u'2age4.png',
 u'2age5.png',
 u'2age7.png']

Then, we create an instance client of the client object in the boto3 package for rekognition. It will allow use to communicate and make requests to the Rekognition service using Python.


In [None]:
client=boto3.client('rekognition')

We import two important package numpy and pandas into the enviroment for later usage, create an empty dataset called df using pd, use boto3 to make an instance of an object s3_resource that will allow us to communicate with S3.
and we loop thorugh the summaries bucket and add the images names into the ch dataframe under the column "Name"



In [None]:
import numpy as np
import pandas as pd
ch = pd.DataFrame()
s3_resource = boto3.resource('s3')
my_bucket = s3_resource.Bucket('image-api-example-finalversion2')
summaries = my_bucket.objects.all()
image_names = [image.key for image in summaries]
ch["Name"] = image_names



ClientError: An error occurred (AccessDenied) when calling the ListObjects operation: Access Denied

We create a list called t_name, and using loop to add the name of the image into the list for later usage

In [None]:
t_name = []
for photo in images:
    t_name.append(photo)

NameError: name 'images' is not defined

For this chunk, we first create a constant i = 0 for later usage in the loop.Then we make a loop function. Inside the loop function we define a function called extract_similarity to extract the similarity value for each set using the compare_faces function in the rekognition. After we got the image_score we store it into the ch dataframe and stored it under the column name t_name at the i position.


In [None]:
i = 0
for photo in images:
    def extract_similarity(image):
        try:
            comparison = client.compare_faces(SourceImage={'S3Object':{'Bucket':"image-api-example-finalversion2",'Name':photo}}, TargetImage={'S3Object':{'Bucket':"image-api-example-finalversion2",'Name':image}})
            face_match = comparison['FaceMatches']
            image_score = face_match[0]['Similarity'] 
        except:
            image_score = np.nan
        return image_score
    ch[t_name[i]] = [extract_similarity(name) for name in ch["Name"]]
    i = i + 1
    print(i)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20


Define a function called extract_Age to extract the age range value for each image using the detect_faces function in the rekognition.

In [None]:
def extract_age(image):
    try:
        comparison = client.detect_faces(Image={'S3Object':{'Bucket':"image-api-example-finalversion2",'Name':image}}, Attributes = ["ALL"])
        age_range = comparison["FaceDetails"][0]["AgeRange"] 
    except:
        age_range = np.nan
    return age_range

After we got the age_range we store it into the ch dataframe and stored it under the column name "Age_range".


In [None]:
ch["Age_Range"] = [extract_age(name) for name in ch["Name"]]

We output the result to a csv file called "resulsection3.csv"

In [None]:
ch.to_csv("resultsection3.csv")

### Results
After testing all five data sets we have, we can see that there is a clear trend: the bigger the age difference is, the lower the confidence AWS Rekognition has on if the two pictures are for the same person. As the diagrams below have shown, the further confidence levels locate upper right, the lower the confidence is.

![Test Data](https://github.com/AlisiaTian/SP2021QTM350/blob/main/blog/03_2.png?raw=true)

So the result is in accordance with our hypothesis, that the AWS Rekognition is able to detect people across different ages and the accuracy decreases after the age difference is too big. The exact benchmark for the age difference is hard to be concluded due to the scale of this project since according to our result, different people display huge disparities in facial difference after a similar time period.

![Test Data2](https://github.com/AlisiaTian/SP2021QTM350/blob/main/blog/04.png?raw=true)

[insert text]

![Visualization](https://github.com/AlisiaTian/SP2021QTM350/blob/main/blog/05.png?raw=true)

[Insert text
  一段文字（解释上面的图+之前的其它attempt）
方向1: 我们尝试做过age cut off, 但是每个人的成长变化率不同，所以很难有个clear standard, 所以做出来的结果不significant
方向2: 我们尝试做smiling confidence changes with respect to changes in age，但是这样的dataset非常难找，导致我们的data不够，得不到一个clear result
方向3: 我们尝试 identify distinct categories like ‘old man’, ‘man’, ‘young man’, using changes in the similarity score，但是和之前一样，由于每个人成长变化率不一样，导致很难有一个clear standard，所以结果不significant]

### Conclusion
According to the results, the conclusion is consistent with our initial assumption - although the AWS Rekognition tool can recognize the same person's photo at different ages, the accuracy of detection decreases as the age difference increases. Moreover, we cannot give an exact threshold pass which the accuracy of identification will drop due to individual differences. In sum, although AWS Rekogniton can be useful, as time passes, it will become harder to identify a person.

Some improvements can be done. We did not have a large amount of data to determine a certain threshold. Admittedly, accuracy drops as age differences expand, yet it is still important to determine a benchmark pass which will make AWS Rekognition’s result not reliable anymore. Although different people age in various ways, and it is impossible to analyze the difference between individuals, it might be possible to determine a certain threshold for different races or genders with enough graphic data. The drop of accuracy might converge to a certain range of age.

To improve the accuracy of our current result and to gain insight into the topic of facial identification better, we need more graphic data of the same individual across their life span. It will be ideal if they are from different backgrounds and ethnic groups, which will improve the diversity of the data set and give more reliable results.


### References
*Key Facts.* Missingkids.org. (2021). Retrieved 22 April 2021, from https://www.missingkids.org/footer/media/keyfacts.

O. H. (2019, September 16). *This Family Took The Same Picture Every Year For Over Two Decades.* TheFashionBall. https://www.thefashionball.com/trends/family-photos-fb/.

Ross Reynolds, A. H. (n.d.). *How UW's Age-Progression Software Could Help Find Missing Kids.* KUOW News and Information. http://archive.kuow.org/post/how-uws-age-progression-software-could-help-find-missing-kids.

Zhao, W., & Wang, H. (2016). Strategic Decision-Making Learning from Label Distributions: An Approach for Facial Age Estimation. *Sensors*, 16(7), 994. https://doi.org/10.3390/s16070994
