# Fraud Detector - Basic Prediction API Example 

## Setup
------
First setup your AWS credentials so that Fraud Detector can store and access training data and supporting detector artifacts.

https://docs.aws.amazon.com/frauddetector/latest/ug/set-up.html

To use Amazon Fraud Detector, you have to set up permissions that allow access to the Amazon Fraud Detector console and API operations. You also have to allow Amazon Fraud Detector to perform tasks on your behalf and to access resources that you own.

## Plan
------

1. Detector Name, Version
    - You'll need the name of the detector and the version you deployed, you can look this up in the AFD console 
    
2. Model variables
    - You also need the list of variables that your model is expecting, you can look this up in the AFD console 
    
2. Call Prediction API 
    - You can call a single record or run a batch via file. 
    - Or you can call the prediction api on a file of records. 
    - You can optionally write the predictions to a File. 


### Setup Python Libraries

In [1]:
from IPython.core.display import display, HTML
from IPython.display import clear_output
display(HTML("<style>.container { width:90% }</style>"))
# ------------------------------------------------------------------

# -- pandas and numpy stuff -- 
import numpy as np
np.seterr(divide='ignore', invalid='ignore')
import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

# -- standard python stuff -- 
import time 

# -- AWS python client -- 
import boto3

## Initialize AWS Fraud Detector Client 
------

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/frauddetector.html 

```python

client = boto3.client(
    'frauddetector',
    aws_access_key_id=ACCESS_KEY,
    aws_secret_access_key=SECRET_KEY,
    aws_session_token=SESSION_TOKEN,
)

```

In [2]:
# -- fraud detector client --
client = boto3.client('frauddetector')

### Detector, Model, and Identifiers 
-----
<div class="alert alert-info"> 💡 <strong> Detector, Model and Versions </strong>

- DETECTOR_NAME & VERSION coresponds to the name and version of your deployed Fraud Detector  
- MODEL_NAME & VERSION coresponds to the name and version of the model deployed with your Fraud Detector   
- S3_FILE this is the url of the S3 file you wish to apply your detector to, conversly you can access the file locally you can substitute it as well.   
</div>

```python 
DETECTOR_NAME = "your_fraud_detector_name"
DETECTOR_VER  = '1.0'

# -- input file of data to be scored -- 
S3_FILE       = "s3://your-bucket-name/your-file-to-predict.csv"
```

In [3]:
# -- name and version of your detector -- 
DETECTOR_NAME = "detector_gadget_20200420"
DETECTOR_VER  = "1.0"

# -- input file of data to be scored -- 
S3_FILE       = "s3://afd-samples/synthitic_newaccount_data_1k_test.csv"


#### Load Data to be Scored 
-----
<div class="alert alert-info"> 💡 <strong> Check the first 5 Records </strong>

- Does your data look correct? 
- Do you need to rename any columns? - in this example i renamed credit_card_bin to cc_bin; you want the column names to match the field names used by the Model

</div>

In [4]:
df = pd.read_csv(S3_FILE)
# -- rename columns if necessary --
df = df.rename(columns={"credit_card_bin":"cc_bin"})
df.head(5)

Unnamed: 0,order_amt,ip_address,email_address,cc_bin,billing_postal,shipping_postal,event_timestamp,customer_name,billing_address,shipping_address,is_fraud
0,8036.0,192.18.59.93,synth_patrickjennings@gmail.com,42785,17740-2745,20950-6945,2019-03-31 11:21:22,Jeremy Dougherty,"4429 Ann Center\nDonnachester, GA",689 Jessica Centers Suite 969\nNorth Timothypo...,0
1,7839.0,192.88.102.55,synth_nicholas60@yahoo.com,30004,81975-4358,10975-4292,2019-06-23 02:13:27,Scott Keller,"451 Corey Hollow\nLake Vincentview, WA","6574 Wyatt Common\nLanestad, NC",0
2,3225.0,192.52.207.254,synth_chill@yahoo.com,54517,96275-0682,89722-4734,2019-04-13 23:55:51,Stacy Riggs,"2704 Laura Spurs\nEast Kathyland, NH","237 Butler Stream Suite 076\nHendersonview, WY",0
3,8109.0,198.10.49.139,synth_ericksonrandy@yahoo.com,35933,49934-1837,31347-4011,2020-01-03 18:29:06,Angela Robinson,"663 Simpson Ramp Apt. 033\nSouth Matthew, VT","189 Lynn Course\nBillyville, MD",0
4,4926.0,192.0.116.87,synth_gwade@hotmail.com,54658,88645-7360,03075-4962,2019-08-16 07:03:38,Caroline Herrera PhD,"6614 Seth Mountains Suite 667\nEast Erinland, LA","PSC 7948, Box 4183\nAPO AE",0


## Run Predictions  
-----
The following applies the **get_prediction** to records   

<div class="alert alert-info"> 💡 <strong>get_prediction </strong>

to use the get_prediction API you simply need to specify the following 

- DETECTOR_NAME and VERSION
- EVENT_ATTRIBUTES, event attributes are the "record" that you want to "predict" 
- EVENT_IDENTIFIER - this is used to identify the prediction, later you can match this up to actual fraud / not fraud for retraining, I like to use email addresses but you couls use most anything. 

</div>

this is all you need to run predictions: 

<b>client.get_prediction(detectorId=DETECTOR_NAME, detectorVersionId=DETECTOR_VER, eventId = SOME_IDENTIFIER, eventAttributes = RECORD)</b>

Example of what a **record** would look like: 

```python
RECORD = {'order_amt': '8036.0',
  'ip_address': '192.18.59.93',
  'email_address': 'synth_patrickjennings@gmail.com',
  'cc_bin': '42785',
  'billing_postal': '17740-2745',
  'shipping_postal': '20950-6945',
  'event_timestamp': '2019-03-31 11:21:22',
  'customer_name': 'Jeremy Dougherty'}
```

In [5]:
# -- this is all that's needed to make a preciction, just a call to the get_prediction API 
client.get_prediction(detectorId = DETECTOR_NAME, 
                      detectorVersionId = DETECTOR_VER, 
                      eventId = 'some unique identifier', 
                      eventAttributes = {'order_amt': '2296.0', 
                                        'ip_address': '198.33.251.116', 
                                        'email_address': 
                                        'synth_vdavidson@yahoo.com', 
                                        'cc_bin': '37340', 
                                        'billing_postal': '18658-5200', 
                                        'shipping_postal': '60238-4248', 
                                        'event_timestamp': '2020-01-23 16:01:07', 
                                        'customer_name': 'Margaret Salazar'})


{'outcomes': ['approve'],
 'modelScores': [{'modelVersion': {'modelId': 'shutterstock_model20200420',
    'modelType': 'ONLINE_FRAUD_INSIGHTS',
    'modelVersionNumber': '1.0'},
   'scores': {'shutterstock_model20200420_insightscore': 54.0}}],
 'ResponseMetadata': {'RequestId': '46ad81aa-bece-47b0-a082-04eaee23746c',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',
   'date': 'Mon, 04 May 2020 20:01:51 GMT',
   'x-amzn-requestid': '46ad81aa-bece-47b0-a082-04eaee23746c',
   'content-length': '311',
   'connection': 'keep-alive'},
  'RetryAttempts': 0}}

### A Small Batch of Predictions  
-----
The following applies the **get_prediction** to a CSV file by serially looping through the CSV file imported above. 

<div class="alert alert-info"> 💡 <strong> Specify </strong>


- model_varaibles, these are the variables used to train your model. The names need to match what's expected. 
- record count, you can specifiy a sample of records or by default it will predict on the whole file. 



</div>

In [6]:
# -- specify the model varaibles from your file 
model_variables = ['order_amt', 'ip_address', 'email_address', 'cc_bin', 'billing_postal', 'shipping_postal', 'event_timestamp', 'customer_name']

# -- specify the number of records to score. 
record_count = df.shape[0]

# -- no need to change anything below -- 
start = time.time()
def _predict(record):
    stime = time.time()
    try:
        pred  = client.get_prediction(detectorId=DETECTOR_NAME, detectorVersionId=DETECTOR_VER, eventId = record['email_address'], eventAttributes = record)
        score_id = pred['modelScores'][0]['modelVersion']['modelId'] + '_insightscore'
        etime = time.time()
        record['detector_outcome'] = pred['outcomes']
        record['model_status'] = pred['ResponseMetadata']['HTTPStatusCode']
        record['model_score']  = pred['modelScores'][0]['scores'][score_id]
        record['score_ms'] = ((etime - stime)*1000)
        return record
    except:
        pred  = client.get_prediction(detectorId=DETECTOR_NAME, detectorVersionId=DETECTOR_VER, eventId = record['email_address'], eventAttributes = record)
        etime = time.time()
        record['detector_outcome'] = '-- failed --'
        record['model_status']  = pred['ResponseMetadata']['HTTPStatusCode']
        record['model_score']   =  -1 
        record['score_ms'] = ((etime - stime)*1000)
        return record


# -- converts dataframe a records, 
predict_data  = df[model_variables].head(record_count).astype(str).to_dict(orient='records')

predict_score = []
i=0
# --loop through it. 
for record in predict_data:
    clear_output(wait=True)
    rec = _predict(record)
    predict_score.append(rec)
    i += 1
    print("current progress: ", round((i/record_count)*100,2), "%" )
    

# Calculate time taken and print results
time_taken = time.time() - start
print ('Process took %0.2f seconds' %time_taken)
print ('Scored %d records' %len(predict_score))




current progress:  100.0 %
Process took 137.64 seconds
Scored 1000 records



### Take a look at your predictions
-----
Each record will have a score, the time (ms) it took to score it, the outcome and if a label was provided the label. 

In [7]:
predictions = pd.DataFrame.from_dict(predict_score, orient='columns')
predictions[['model_score', 'score_ms', 'detector_outcome'] + model_variables].head()

Unnamed: 0,model_score,score_ms,detector_outcome,order_amt,ip_address,email_address,cc_bin,billing_postal,shipping_postal,event_timestamp,customer_name
0,95.0,204.097986,[approve],8036.0,192.18.59.93,synth_patrickjennings@gmail.com,42785,17740-2745,20950-6945,2019-03-31 11:21:22,Jeremy Dougherty
1,109.0,160.988092,[investigate],7839.0,192.88.102.55,synth_nicholas60@yahoo.com,30004,81975-4358,10975-4292,2019-06-23 02:13:27,Scott Keller
2,113.0,256.751299,[investigate],3225.0,192.52.207.254,synth_chill@yahoo.com,54517,96275-0682,89722-4734,2019-04-13 23:55:51,Stacy Riggs
3,43.0,293.711662,[approve],8109.0,198.10.49.139,synth_ericksonrandy@yahoo.com,35933,49934-1837,31347-4011,2020-01-03 18:29:06,Angela Robinson
4,273.0,227.987289,[decline],4926.0,192.0.116.87,synth_gwade@hotmail.com,54658,88645-7360,03075-4962,2019-08-16 07:03:38,Caroline Herrera PhD


### Optionally Write Predictions to File

<div class="alert alert-info"> <strong> Write Predictions </strong>

- You can write your prediction dataset to a CSV to manually review predictions
- Simply add a cell below and copy the code below

</div>



```python

# -- optionally write predictions to a CSV file -- 
predictions.to_csv("filename.csv", index=False)
# -- or to a XLS file 
predictions.to_excel("filename.xlsx", index=False)

```

In [8]:
predictions.to_csv("predicted_data_today.csv", index=False)

## Model Varaibles 
-----
<div class="alert alert-info"> 💡 <strong> Model Variables </strong>
- Here is a helper function to identify which variables are used by your detector. 
</div>

```python
def get_model_variables(MODEL_NAME):
    """ return list of variables used by a model 
    
    """
    response = client.get_models(
    modelType='ONLINE_FRAUD_INSIGHTS',
    modelId= MODEL_NAME)
    model_variables = []

    for v in response['models'][0]['modelVariables']:
        model_variables.append(v['name'])
    return model_variables

model_variables = get_model_variables(MODEL_NAME)
print("\n -- model variables -- ")
print(model_variables)
```