# Jamie Sims - Revolut Analysis Task

The following is the data analysis conducted to support my final presentation. 

**S**ituation: The passrate has decreaesed significnatly in the recent period <br>
**T**ask: To write a report that identifies <br> - root causes for this drop in success rate <br> - suggested solutions 

### Definitions

- **Recent Period**: Need to define recent period <br>
- **Pass Rate**: number of succesful KYC checks (Success) / number of attempts (attempts)

In [5]:
import pandas as pd

docs_report_file = 'doc_reports_sample.csv'
faces_report_file = 'face_reports_sample.csv'

documents = pd.read_csv(docs_report_file, parse_dates=["created_at"])
faces = pd.read_csv(faces_report_file, parse_dates=["created_at"])

Variables to be used in calculations defined below:

In [6]:
unique_users = documents["user_id"].nunique()
document_failures = documents[documents["result"]!="clear"]
faces_failures = faces[faces["result"]!="clear"]

## High Level Overall Analysis 
A selection of high level queries that show overall trends in the data

In [7]:
# Number of failed document checks for "consider", (significantly higher than the faces failure check)
document_failures["result"].count()

1474

In [28]:
# Successful document checks
document_success = documents[documents["result"]=="clear"]
document_success["result"].count()

4406

In [29]:
# Number of failed faces checks, (significantly lower than the failed documents check) 
faces_failures["result"].count()

367

In [30]:
# Successful face checks
faces_success = faces[faces["result"]=="clear"]
faces_success["result"].count()

5513

In [31]:
# Pass rate: Documents
document_pass_rate = document_success["result"].count()/documents["result"].count()*100
document_pass_rate

74.93197278911565

In [32]:
# Pass Rate: Faces
faces_pass_rate = faces_success["result"].count()/faces["result"].count()*100
faces_pass_rate

93.75850340136054

## Analysis of the Document Upload Failures

In [33]:
documents.head(1)

Unnamed: 0.1,Unnamed: 0,user_id,result,visual_authenticity_result,image_integrity_result,face_detection_result,image_quality_result,created_at,supported_document_result,conclusive_document_quality_result,colour_picture_result,data_validation_result,data_consistency_result,data_comparison_result,attempt_id,police_record_result,compromised_document_result,properties,sub_result
0,27241,8190909e566647a5b6afeee9b4ec6c6a,clear,clear,clear,clear,clear,2017-05-25 08:38:56,clear,,,clear,,clear,30e11e95e30748f485a2271ca5e6abb8,clear,,"{'gender': 'Female', 'document_type': 'driving...",clear


### Retry Analysis

In [11]:
documents_retry = documents[documents.duplicated(subset=["user_id"])]
documents_retry["user_id"].count()

32

In [9]:
documents[documents.duplicated(keep=False, subset=["user_id"])].sort_values(by=["user_id","created_at"],ascending=[True, True]).head(6)

Unnamed: 0.1,Unnamed: 0,user_id,result,visual_authenticity_result,image_integrity_result,face_detection_result,image_quality_result,created_at,supported_document_result,conclusive_document_quality_result,colour_picture_result,data_validation_result,data_consistency_result,data_comparison_result,attempt_id,police_record_result,compromised_document_result,properties,sub_result
2989,88899,0a743f7f87884a51bd8c165e0d3e70ed,consider,,consider,,unidentified,2017-09-30 09:55:07,clear,,,,,,98959d065b7d4d30af6785533a256eaa,,,{},rejected
3755,88935,0a743f7f87884a51bd8c165e0d3e70ed,consider,,consider,,unidentified,2017-09-30 10:06:16,clear,,,,,,7477c4c591d747abb6103941afdf5dfb,,,{},rejected
2914,104674,0b3fe48a14554fa687e5152a1c20d768,clear,clear,clear,clear,clear,2017-09-19 17:39:42,clear,clear,clear,clear,clear,,b6cd0902fdd74e72a0263781551cb8b7,clear,,"{'gender': 'Male', 'nationality': 'BEL', 'docu...",clear
3367,104039,0b3fe48a14554fa687e5152a1c20d768,clear,clear,clear,clear,clear,2017-09-19 17:58:58,clear,clear,clear,clear,clear,,5d02c7f7ad30435caeb88a0563393059,clear,,"{'gender': 'Male', 'nationality': 'BEL', 'docu...",clear
1588,127513,0b677d16a072467eb95dd396e25840d9,consider,,consider,,unidentified,2017-08-31 22:30:14,clear,,,,,,7bd292b207c34f569082c1ff87963619,,,{},rejected
1542,127521,0b677d16a072467eb95dd396e25840d9,clear,clear,clear,clear,clear,2017-08-31 22:38:25,clear,clear,clear,clear,clear,,344fecf0bcbc42e3a8bd92bc469e1ddd,clear,,"{'gender': 'Male', 'nationality': 'TWN', 'docu...",clear


### High Level Analysis

In [34]:
# Documents: Document Failures - Visual Authenticity 
# Asserts whether visual, non-textual, elements are correct given the type of document
visual_authenticity_fail = documents[documents["visual_authenticity_result"]!="clear"]
visual_authenticity_fail["visual_authenticity_result"].count()

97

In [35]:
# Asserts whether data on the document is consistent with data provided by an applicant 
# (either through Veritas’s applicant form or when creating an applicant through the API)
data_comparison_fail = documents[documents["data_comparison_result"]!="clear"]
data_comparison_fail["data_comparison_result"].count()

5

In [36]:
# Asserts whether data represented in multiple places on the document is consistent 
# e.g. between MRZ lines and OCR extracted text on passports
data_consistency_fail = documents[documents["data_consistency_result"]!="clear"]
data_consistency_fail["data_consistency_result"].count()

5

In [37]:
# Data Validation Asserts whether algorithmically-validatable elements are correct 
# e.g. MRZ lines and document numbers
data_validation_fail = documents[documents["data_validation_result"]!="clear"]
data_validation_fail["data_validation_result"].count()

56

In [38]:
# Asserts whether the image of the document has been found in our internal database of compromised documents
compromised_document = documents[documents["compromised_document_result"]!="clear"]
compromised_document["compromised_document_result"].count()

0

###  1)  Image Integrity Failures

In [39]:
# DOCUMENTS: Primary Document Failures - image integrity 
# Asserts whether the document was of sufficient quality to verify
image_integrity_fail = documents[documents["image_integrity_result"]!="clear"]
image_integrity_fail["image_integrity_result"].count()

1337

1) ii - Image Quality Failure Sub Categories

In [40]:
# Image Integrity - sub Error, Image Quality
image_quality_fail = documents[documents["image_quality_result"]!="clear"]
image_quality_fail["image_quality_result"].count()

834

In [41]:
# Image Integrity - sub Error, supported Document
supported_document_fail = documents[documents["supported_document_result"]!="clear"]
supported_document_fail["supported_document_result"].count()

55

In [42]:
# Image Integrity - sub Error, Colour Picture
colour_picture_fail = documents[documents["colour_picture_result"]!="clear"]
colour_picture_fail["colour_picture_result"].count()

2

In [43]:
# Image Integrity - sub Error, Conclusive Document Quality
conclusive_document_quality_fail = documents[documents["conclusive_document_quality_result"]!="clear"]
conclusive_document_quality_fail["conclusive_document_quality_result"].count()

446

Other Features

In [44]:
# DOCUMENTS: Primary Document Failures - face Detection
face_detection_fail = documents[documents["face_detection_result"]!="clear"]
face_detection_fail["face_detection_result"].count()

23

In [45]:
# Asserts whether the document has been identified as lost, stolen or otherwise compromised
police_record = documents[documents["police_record_result"]!="clear"]
police_record["police_record_result"].count()

0

### Sub result analysis


In [46]:
# Rejected sub-results count 
rejected_sub_result = documents[documents["sub_result"] == "rejected"]
rejected_sub_result['sub_result'].count()

889

In [47]:
# Clear sub-result count
clear_sub_result = documents[documents["sub_result"] == "clear"]
clear_sub_result['sub_result'].count()

4406

In [48]:
# Caution sub-result count 
suspected_sub_result = documents[documents['sub_result'] == "suspected"]
suspected_sub_result["sub_result"].count()

60

## Analysis of the Face Check Failures

In [49]:
faces.head(5)

Unnamed: 0.1,Unnamed: 0,user_id,result,face_comparison_result,created_at,facial_image_integrity_result,visual_authenticity_result,properties,attempt_id
0,58,ecee468d4a124a8eafeec61271cd0da1,clear,clear,2017-06-20 17:50:43,clear,clear,{},9e4277fc1ddf4a059da3dd2db35f6c76
1,76,1895d2b1782740bb8503b9bf3edf1ead,clear,clear,2017-06-20 13:28:00,clear,clear,{},ab259d3cb33b4711b0a5174e4de1d72c
2,217,e71b27ea145249878b10f5b3f1fb4317,clear,clear,2017-06-18 21:18:31,clear,clear,{},2b7f1c6f3fc5416286d9f1c97b15e8f9
3,221,f512dc74bd1b4c109d9bd2981518a9f8,clear,clear,2017-06-18 22:17:29,clear,clear,{},ab5989375b514968b2ff2b21095ed1ef
4,251,0685c7945d1349b7a954e1a0869bae4b,clear,clear,2017-06-18 19:54:21,clear,clear,{},dd1b0b2dbe234f4cb747cc054de2fdd3


### Retry Analaysis

In [59]:
faces_retry = faces[faces.duplicated(subset=["user_id"])]
faces_retry["user_id"].count()

32

In [79]:
faces[faces.duplicated(keep=False, subset=["user_id"])].sort_values(by=["user_id","created_at"],ascending=[True, True])


Unnamed: 0.1,Unnamed: 0,user_id,result,face_comparison_result,created_at,facial_image_integrity_result,visual_authenticity_result,properties,attempt_id
2874,88899,0a743f7f87884a51bd8c165e0d3e70ed,consider,,2017-09-30 09:55:07,consider,,{},98959d065b7d4d30af6785533a256eaa
2879,88935,0a743f7f87884a51bd8c165e0d3e70ed,clear,clear,2017-09-30 10:06:15,clear,,{},7477c4c591d747abb6103941afdf5dfb
3350,104674,0b3fe48a14554fa687e5152a1c20d768,clear,clear,2017-09-19 17:39:43,clear,clear,{},b6cd0902fdd74e72a0263781551cb8b7
3334,104039,0b3fe48a14554fa687e5152a1c20d768,clear,clear,2017-09-19 17:58:58,clear,clear,{},5d02c7f7ad30435caeb88a0563393059
4085,127513,0b677d16a072467eb95dd396e25840d9,clear,clear,2017-08-31 22:30:14,clear,,{},7bd292b207c34f569082c1ff87963619
...,...,...,...,...,...,...,...,...,...
3345,104370,e96cd0bcab7c4dfbb6b9294f17afc577,clear,clear,2017-09-19 20:43:47,clear,clear,{},cad8b8deac2340f283f1ccd0b1920462
5437,169000,edc77146e73c48be994bf45036a8c48c,consider,,2017-07-17 13:26:25,consider,clear,{},6e09c41e3f154dc4b0ad69be95fc2ded
5439,169023,edc77146e73c48be994bf45036a8c48c,consider,,2017-07-17 13:39:47,consider,clear,{},59aff42d707148d4a72560a066ef86e4
5344,166044,f218379dd76b4058958450cb9aaea143,clear,clear,2017-07-20 15:39:32,clear,consider,{},e33e9fc3588d43a8aa22ef91a5e56da7


### Summary of Failure Reasons
Analysis of the reasons for failure of facial recognition step

In [50]:
# Facial Image Integrity - Asserts whether the quality of the uploaded files and 
# the content contained within them was insufficient to perform a face comparison 
facial_image_integrity_fail = faces[faces["facial_image_integrity_result"]!="clear"]
facial_image_integrity_fail["facial_image_integrity_result"].count()

326

In [51]:
# Facial Image Integirty % Fail
facial_image_integreit_fail_proportion = facial_image_integrity_fail["facial_image_integrity_result"].count()/faces_failures["result"].count()*100
facial_image_integreit_fail_proportion

88.8283378746594

In [52]:
# Comparison Fail - Asserts whether the face in the document matches the face in the live photo or live video
face_comparison_fail = faces[faces["face_comparison_result"]!="clear"]
face_comparison_fail["face_comparison_result"].count()

22

In [53]:
# Authenticity Fail - Asserts whether the live photo or live video is not a spoof 
# (such as photos of printed photos or photos of digital screens)
visual_authenticity_fail = faces[faces["visual_authenticity_result"]!="clear"]
visual_authenticity_fail["visual_authenticity_result"].count()

97