## The Application of AWS Rekognition to the Identification of Missing Children
### Introduction and Motivation

  Kidnappings and missing children are serious problems within our society. Services such as the child abduction alert system and missing people website may help parents find their missing children, yet there is the possibility that some of the kids cannot be found. According to statistics from the Federal Bureau of Investigation (FBI), in 2020, 365,348 missing children cases were registered, and in 2019, there were 421,394 ("Key Facts", 2021). About 89% to 92% percent of all missing persons are found either dead or alive, which leaves about 10% unresolved cases and the parents suffer from huge losses, hoping for finding their children in the future. However, after many years, it might be very hard to recognize a person. To find if there is any possibility to alleviate this worldwide problem with the aid of AWS Rekognition, we designed our project, which explores to what extent that this tool can figure out the similarities between the images of the same person as a child and after turning into an adult.

### The Reason Why We Use Amazon Rekognition

AWS Rekognition generates data from a picture, summarizing information about objects, backgrounds, human faces, facial expressions and so on. When given two photos of the same person, it gives the possibility of similarity, which is usually high, accurately judges that they present the same person. However, there is one challenge when dealing with the missing children – they grow. If they cannot be spotted quickly, their appearances may change. To determine whether AWS Rekognition is useful on the missing children cases, we want to explore this technology and find answering the following questions: first, does AWS Rekognition match the face of a grown-up with his or her childhood photo? Second, how accurate is AWS Rekognition when a part of the face is hidden, or the individuals’ facial structures have changed due to maturing? Will the accuracy decrease with the increasing age difference? Third, are there any possible ways to improve this tool’s accuracy?

We hypothesized that AWS Rekognition can recognize the same person at different ages. However, since people’s facial features can drastically change after years, we also expect the accuracy of results may decrease if the age difference between the two pictures of the same individual is large. So, as the first part of our project, we selected 5 sets of pictures of 5 people (3 males and 2 females). Three of them are gathered from a news website under the title "This Family Took The Same Picture Every Year For Over Two Decades" and the rest two are gathered from two research published by Dr. Reynolds and Dr. Zhao. All data sources are publically available online without the problem of copyright. We tried to use photos of group members as our data source we could not gather photos from long before. Among those five sets of pictures, we are going to test similarity within groups to see how well AWS Rekognition detecting the same person of different ages. One picture that is similar in color, gesture, race, and facial expression was added as a control group for each set so we are sure that the similarity yield by the testing is not due to those attributes. Next, we further call the age detection and age extract function in AWS Rekognition so that we get to know AWS's estimate for the age of people in the picture. If AWS Rekognition is proved to have high accuracy on such tests, looking for missing kids might be more viable since age is one of few attributes that people looking for missing people know for sure and won't change.

![Data Example](https://github.com/AlisiaTian/SP2021QTM350/blob/main/blog/01.png?raw=true)

### How It Works
![Architecture](https://github.com/AlisiaTian/SP2021QTM350/blob/main/blog/02.jpg?raw=true)


To start with, we will build a Amazon S3 bucket. 

In [1]:
# make a bucket
!aws s3 mb s3://image-api-example-finalversion

make_bucket: image-api-example-finalversion


In [2]:
# Import important package
import boto3

 Then we need to loop through the images in the bucket. To do this we will first make a Python list of all the images in the bucket. First we use boto3 to make an instance of an object s3_resource that will allow us to communicate with S3.


In [4]:
s3_resource = boto3.resource('s3')
my_bucket = s3_resource.Bucket('image-api-example-finalversion')
summaries = my_bucket.objects.all()

We create a list called images, and using loop to add the name of the image into the list for later usage

In [5]:
images = []
for image in summaries:
    images.append(image.key)

Then we create an create an instance client of the client object in the boto3 package for rekognition.It will allow use to communicate and make requests to the Rekognition service using Python.

In [6]:
# Create an instance client of the client object in the boto3 package for rekognition

client=boto3.client('rekognition')

#### Test set 1 

In the below chunck of code, we import two important package numpy and pandas into the enviroment for later usage, create an empty dataset called df using pd, use boto3 to make an instance of an object s3_resource that will allow us to communicate with S3, and loop thorugh the summaries bucket and add the images names into the df dataframe under the column "Name"


In [7]:
import numpy as np
import pandas as pd
df = pd.DataFrame()
s3_resource = boto3.resource('s3')
my_bucket = s3_resource.Bucket('image-api-example-finalversion')
summaries = my_bucket.objects.all()
image_names = [image.key for image in summaries]
df["Name"] = image_names

We also make a new dataframe called qf using pd and add the images names into the df dataframe under the column "Name"

In [8]:
qf = pd.DataFrame()
qf["Name"] = image_names

We then create a list called small_name and add all the image names into the list, and use it latter for the column names in the loop function.

In [9]:
small_name = ["Similarity-child-91","Similarity-child-92","Similarity-child-93","Similarity-child-94","Similarity-child-95","Similarity-child-96","Similarity-child-97","Similarity-child-98","Similarity-child-99","Similarity-child-00","Similarity-child-02","Similarity-child-03","Similarity-child-04","Similarity-child-05","Similarity-child-06","Similarity-child-07","Similarity-child-08","Similarity-child-09","Similarity-child-10","Similarity-child-11","Similarity-child-12","Similarity-child-13",
              "Similarity-female-91","Similarity-female-92","Similarity-female-93","Similarity-female-94","Similarity-female-95","Similarity-female-96","Similarity-female-97","Similarity-female-98","Similarity-female-99","Similarity-female-00","Similarity-female-02","Similarity-female-03","Similarity-female-04","Similarity-female-05","Similarity-female-06","Similarity-female-07","Similarity-female-08","Similarity-female-09","Similarity-female-10","Similarity-female-11","Similarity-female-12","Similarity-female-13",
             "Similarity-male-91","Similarity-male-92","Similarity-male-93","Similarity-male-94","Similarity-male-95","Similarity-male-96","Similarity-male-97","Similarity-male-98","Similarity-male-99","Similarity-male-00","Similarity-male-02","Similarity-male-03","Similarity-male-04","Similarity-male-05","Similarity-male-06","Similarity-male-07","Similarity-male-08","Similarity-male-09","Similarity-male-10","Similarity-male-11","Similarity-male-12","Similarity-male-13"]

big_name = ["range-child-91","range-child-92","range-child-93","range-child-94","range-child-95","range-child-96","range-child-97","range-child-98","range-child-99","range-child-00","range-child-02","range-child-03","range-child-04","range-child-05","range-child-06","range-child-07","range-child-08","range-child-09","range-child-10","range-child-11","range-child-12","range-child-13",
              "Similarity-female-91","Similarity-female-92","Similarity-female-93","Similarity-female-94","Similarity-female-95","Similarity-female-96","Similarity-female-97","Similarity-female-98","Similarity-female-99","Similarity-female-00","Similarity-female-02","Similarity-female-03","Similarity-female-04","Similarity-female-05","Similarity-female-06","Similarity-female-07","Similarity-female-08","Similarity-female-09","Similarity-female-10","Similarity-female-11","Similarity-female-12","Similarity-female-13",
             "Similarity-male-91","Similarity-male-92","Similarity-male-93","Similarity-male-94","Similarity-male-95","Similarity-male-96","Similarity-male-97","Similarity-male-98","Similarity-male-99","Similarity-male-00","Similarity-male-02","Similarity-male-03","Similarity-male-04","Similarity-male-05","Similarity-male-06","Similarity-male-07","Similarity-male-08","Similarity-male-09","Similarity-male-10","Similarity-male-11","Similarity-male-12","Similarity-male-13"]

For this chunk, we first create a constant i = 0 for later usage in the loop.Then we make a loop function. Inside the loop function we define a function called extract_similarity to extract the similarity value for each set using the compare_faces function in the rekognition. After we got the image_score we store it into the df dataframe and stored it under the column name small_name at the i position.

In [10]:
i = 0
for photo in images:
    def extract_similarity(image):
        try:
            comparison = client.compare_faces(SourceImage={'S3Object':{'Bucket':"image-api-example-finalversion",'Name':photo}}, TargetImage={'S3Object':{'Bucket':"image-api-example-finalversion",'Name':image}})
            face_match = comparison['FaceMatches']
            image_score = face_match[0]['Similarity'] 
        except:
            image_score = np.nan
        return image_score
    df[small_name[i]] = [extract_similarity(name) for name in df["Name"]]
    i = i + 1
    print(i)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65


Here we define a function called extract_Age to extract the age range value for each image using the detect_faces function in the rekognition.

In [11]:
def extract_age(image):
    try:
        comparison = client.detect_faces(Image={'S3Object':{'Bucket':"image-api-example-finalversion",'Name':image}}, Attributes = ["ALL"])
        age_range = comparison["FaceDetails"][0]["AgeRange"] 
    except:
        age_range = np.nan
    return age_range

After we got the age_range we store it into the df dataframe and stored it under the column name "Age_range".

In [12]:
df["Age_Range"] = [extract_age(name) for name in qf["Name"]]

Finally, we output the result to a csv file called Newresult.csv

In [14]:
df.to_csv("Newresult.csv")

In [15]:
df

Unnamed: 0,Name,Similarity-child-91,Similarity-child-92,Similarity-child-93,Similarity-child-94,Similarity-child-95,Similarity-child-96,Similarity-child-97,Similarity-child-98,Similarity-child-99,...,Similarity-male-04,Similarity-male-05,Similarity-male-06,Similarity-male-07,Similarity-male-08,Similarity-male-09,Similarity-male-10,Similarity-male-11,Similarity-male-12,Age_Range
0,c1991.png,100.000000,83.227959,,,,,,,,...,,,,,,,,,,"{u'High': 3, u'Low': 0}"
1,c1992.png,83.227959,100.000000,,99.697456,,,,,,...,,,,,,,,,,"{u'High': 3, u'Low': 0}"
2,c1993.png,,,100.000000,92.199646,97.935844,91.140457,97.984833,99.370941,85.452332,...,,,,,,,,,,"{u'High': 4, u'Low': 0}"
3,c1994.png,,99.697456,92.199646,100.000000,95.998489,,94.482971,85.566383,,...,,,,,,,,,,"{u'High': 4, u'Low': 0}"
4,c1995.png,,,97.935844,95.998489,100.000000,,99.932747,99.772377,99.470650,...,,,,,,,,,,"{u'High': 9, u'Low': 3}"
5,c1996.png,,,91.140457,,,100.000000,94.908005,91.835197,88.275841,...,,,,,,,,,,"{u'High': 15, u'Low': 5}"
6,c1997.png,,,97.984833,94.482971,99.932747,94.908005,100.000000,99.988731,99.917061,...,,,,,,,,,,"{u'High': 7, u'Low': 1}"
7,c1998.png,,,99.370941,85.566383,99.772377,91.835197,99.988731,100.000000,99.996399,...,,,,,,,,,,"{u'High': 19, u'Low': 9}"
8,c1999.png,,,85.452332,,99.470650,88.275841,99.917061,99.996399,100.000000,...,,,,,,,,,,"{u'High': 9, u'Low': 3}"
9,c2000.png,,,90.742043,,90.211113,,98.173210,99.925995,99.988846,...,,,,,,,,,,"{u'High': 14, u'Low': 4}"


#### Test set 2

First, make an Amazon S3 bucket for new test set.

In [16]:
!aws s3 mb s3://image-api-example-finalversion2

make_bucket: image-api-example-finalversion2


Then we need to loop through the images in the bucket. To do this we will first make a Python list of all the images in the bucket.
First we use boto3 to make an instance of an object s3_resource that will allow us to communicate with S3.


In [23]:
s3_resource = boto3.resource('s3')
my_bucket = s3_resource.Bucket('image-api-example-finalversion2')
summaries = my_bucket.objects.all()

We create a list called images, and using loop to add the name of the image into the list for later usage

In [24]:
images = []
for image in summaries:
    images.append(image.key)

Then, we create an instance client of the client object in the boto3 package for rekognition. It will allow use to communicate and make requests to the Rekognition service using Python.


In [19]:
client=boto3.client('rekognition')

We import two important package numpy and pandas into the enviroment for later usage, create an empty dataset called df using pd, use boto3 to make an instance of an object s3_resource that will allow us to communicate with S3.
and we loop thorugh the summaries bucket and add the images names into the ch dataframe under the column "Name"



In [25]:
import numpy as np
import pandas as pd
ch = pd.DataFrame()
s3_resource = boto3.resource('s3')
my_bucket = s3_resource.Bucket('image-api-example-finalversion2')
summaries = my_bucket.objects.all()
image_names = [image.key for image in summaries]
ch["Name"] = image_names

We create a list called t_name, and using loop to add the name of the image into the list for later usage

In [26]:
t_name = []
for photo in images:
    t_name.append(photo)

For this chunk, we first create a constant i = 0 for later usage in the loop.Then we make a loop function. Inside the loop function we define a function called extract_similarity to extract the similarity value for each set using the compare_faces function in the rekognition. After we got the image_score we store it into the ch dataframe and stored it under the column name t_name at the i position.


In [27]:
i = 0
for photo in images:
    def extract_similarity(image):
        try:
            comparison = client.compare_faces(SourceImage={'S3Object':{'Bucket':"image-api-example-finalversion2",'Name':photo}}, TargetImage={'S3Object':{'Bucket':"image-api-example-finalversion2",'Name':image}})
            face_match = comparison['FaceMatches']
            image_score = face_match[0]['Similarity'] 
        except:
            image_score = np.nan
        return image_score
    ch[t_name[i]] = [extract_similarity(name) for name in ch["Name"]]
    i = i + 1
    print(i)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20


Define a function called extract_Age to extract the age range value for each image using the detect_faces function in the rekognition.

In [29]:
def extract_age(image):
    try:
        comparison = client.detect_faces(Image={'S3Object':{'Bucket':"image-api-example-finalversion2",'Name':image}}, Attributes = ["ALL"])
        age_range = comparison["FaceDetails"][0]["AgeRange"] 
    except:
        age_range = np.nan
    return age_range

After we got the age_range we store it into the ch dataframe and stored it under the column name "Age_range".


In [30]:
ch["Age_Range"] = [extract_age(name) for name in ch["Name"]]

We output the result to a csv file called "resulsection3.csv"

In [31]:
ch.to_csv("resultsection3.csv")

In [32]:
ch

Unnamed: 0,Name,1age10.png,1age16.png,1age3.png,1age4.png,1age6.png,1age8.png,1age9.png,2age15.png,2age16.png,...,2age21.png,2age23.png,2age26.png,2age29.png,2age36.png,2age38.png,2age4.png,2age5.png,2age7.png,Age_Range
0,1age10.png,100.0,99.65184,,,99.073639,99.994667,99.989685,,,...,,,,,,,,,,"{u'High': 22, u'Low': 12}"
1,1age16.png,99.65184,100.0,,,,97.318108,97.061943,,,...,,,,,,,,,,"{u'High': 23, u'Low': 13}"
2,1age3.png,,,100.0,99.535858,99.935539,93.992844,83.269951,,,...,,,,,,,,,,"{u'High': 7, u'Low': 1}"
3,1age4.png,,,99.535858,100.0,99.987579,98.765175,94.841209,,,...,,,,,,,,,,"{u'High': 8, u'Low': 2}"
4,1age6.png,99.073639,,99.935539,99.987579,100.0,99.966721,98.580513,,,...,,,,,,,,,,"{u'High': 8, u'Low': 2}"
5,1age8.png,99.994667,97.318108,93.992844,98.765175,99.966721,100.0,99.986679,,,...,,,,,,,,,,"{u'High': 14, u'Low': 4}"
6,1age9.png,99.989685,97.061943,83.269951,94.841209,98.580513,99.986679,100.0,,,...,,,,,,,,,,"{u'High': 16, u'Low': 6}"
7,2age15.png,,,,,,,,100.0,96.911491,...,89.88784,86.672089,,90.291344,92.386841,86.178375,,92.897202,,"{u'High': 22, u'Low': 12}"
8,2age16.png,,,,,,,,96.911491,100.0,...,99.978516,97.828194,95.237656,99.943939,,97.937271,,,97.212181,"{u'High': 19, u'Low': 9}"
9,2age18.png,,,,,,,,99.892708,99.991905,...,99.976921,99.186554,99.718231,99.895706,98.574669,99.212357,90.59446,92.151489,99.102921,"{u'High': 30, u'Low': 18}"


### Results
After testing all five data sets we have, we can see that there is a clear trend that the bigger the age difference is, the lower the confidence AWS Rekognition has on if the two pictures are for the same person. (In the diagrams, it's clear that the further confidence levels locate upper right, the lower the confidence is.)

![Test Data](https://github.com/AlisiaTian/SP2021QTM350/blob/main/blog/03_2.png?raw=true)

So the result is in accordance with our hypothesis, that the AWS Rekognition is able to detect people across different ages and the accuracy reduces after the age difference is too big. The exact benchmark for the age difference is hard to be concluded due to the scale of this project since according to our result, different people display huge disparities in facial difference after a similar time period. One noteworthy finding is all control groups yield 0 confidence for all picture sets. So AWS Rekognition isn't simple detecting based on the expression or color of pictures. The finding made our project result more reliable.

Note that we also assess the age detect function in AWS Rekognition, the function yields a guess of upper bound and lower bound of the ages of people in pictures tested. The data listed below is the real age versus the estimated age for the 3 kids datasets. Since the point of our study is to look for missing people, using only picture sets for kids is more appropriate since those picture sets all recorded their appearance through purity when facial appearance changed the most in the life of a person. Also, missing kids are harder to be found compared to adults. 

In the processed data, cases where age is captured by AWS Rekognition estimates are highlighted in blue. And for cases where age failed to be captured by the rage, we added a measure of percentage of difference and the average of such difference is about 21%. For all 42 pictures, AWS Rekognition accurately estimated the age of 73.81% of them, which is actually not a good success rate. 

![Test Data2](https://github.com/AlisiaTian/SP2021QTM350/blob/main/blog/04.png?raw=true)

Look more into the data set, we found most of the mismatches are from the pictures of the same person, where AWS Rekognition consecutively did wrong estimates for 7 years. So, again, accuracy for the face detection function differs drastically depends on the pictures chosen and the rate of appearance change of each individual. Looking back to raw data set, test set 1-child changed his hairstyle and had his cloth changed relatively more apparently between age 15 to age 21 compared with other data set or his pictures in other time periods. Also, remember for similarity confidence data, the table generated from test set 1-child was the only one that has a huge area in the upper right left empty, indicating that AWS Rekognition wasn't able to recognize that person just after several years. 

![Visualization](https://github.com/AlisiaTian/SP2021QTM350/blob/main/blog/05.png?raw=true)

Finally, we have a visualization for the estimation. Despite the accuracy for estimation wasn't satisfying (73.81%), we can see in general the AWS function is doing a good job. The age gap provided by the function was rather consistent across different samples and excepted for test set 1-child, the upper and lower bound almost captured the whole sample for the rest two. So, the age detection function is still applicable when looking for missing people as a reference instead of as a guarantee. During the missing people lookup process, there are also a lot that we can do to improve the accuracy like choosing pictures of the suspected candidate with similar clothing or hairstyle as the given childhood picture. 

Besides age detection, we also tried other approaches, but they all yield the insignificant result:

1. We tried detecting smiling confidence changes with respect to changes in age, but smiling pictures set across big age range is hard to gather, and from our 2 datasets got we can't come to a clear result.

2. We tried identifying distinct categories like ‘old man’, ‘man’, ‘young man’, using changes in the similarity score. But again it turns out that the difference in the rate of change of individual is huge so that we can't get a standard for such threshold. 


### Conclusion
According to the results, the conclusion is consistent with our initial assumption - although the AWS Rekognition tool can recognize the same person's photo at different ages, the accuracy of detection decreases as the age difference increases. Moreover, we cannot give an exact threshold for age difference that the accuracy of identification would drop, due to difference in the growing rate of individuals. Similarity check for AWS Rekogniton can be used as the first round check to filter out a smaller candidate pool, then we can use age detect technique to do second round check. With the smaller candidate pool, we can manually select candidates pictures of similar clothing as childhood pictures to further downsize the pool. Real-life applications can be more complicated and may face more problems. **But data we get from our project had already shown that AWS Recognition has sufficient accuracy in detecting the same face of different ages.**

Some improvements can be done. We did not have a large amount of data to determine an exact threshold. Admittedly, accuracy drops as age differences expand, yet it is still important to determine a benchmark pass which will make AWS Rekognition’s result not reliable anymore. Data collection for this study is hard as it's hard to gather a set of someone's pictures in a big age range. Although different people age in various ways, it might be possible to determine a certain threshold for different races or genders with enough graphic data. 

To improve the accuracy of our current result and to gain insight into the topic of facial identification better, we need more graphic data of the same individual across their life span. It will be ideal if they are from different backgrounds and ethnic groups, which will improve the diversity of the data set and give more reliable results.


### References
*Key Facts.* Missingkids.org. (2021). Retrieved 22 April 2021, from https://www.missingkids.org/footer/media/keyfacts.

O. H. (2019, September 16). *This Family Took The Same Picture Every Year For Over Two Decades.* TheFashionBall. https://www.thefashionball.com/trends/family-photos-fb/.

Ross Reynolds, A. H. (n.d.). *How UW's Age-Progression Software Could Help Find Missing Kids.* KUOW News and Information. http://archive.kuow.org/post/how-uws-age-progression-software-could-help-find-missing-kids.

Zhao, W., & Wang, H. (2016). Strategic Decision-Making Learning from Label Distributions: An Approach for Facial Age Estimation. *Sensors*, 16(7), 994. https://doi.org/10.3390/s16070994
