## How-to guide for Transaction Fraud use-case on Abacus.AI platform

This notebook provides you with a hands on environment to build a  model using the Abacus.AI Python Client Library.

We'll be using the [Credit Card Fraud Transactions Dataset](https://s3.amazonaws.com/realityengines.exampledatasets/fraud_transactions/creditcard.csv), which contains attributes of a transaction made through a given credit card and the class of transaction fraud that took place. We will predict the class of fraud that occurs for a transaction with specified attributes.

1. Install the Abacus.AI library.

In [None]:
!pip install abacusai

We'll also import pandas and pprint tools for neat visualization in this notebook.

In [1]:
import pandas as pd # A tool we'll use to download and preview CSV files
import pprint # A tool to pretty print dictionary outputs
pp = pprint.PrettyPrinter(indent=2)

2. Add your Abacus.AI [API Key](https://abacus.ai/app/profile/apikey) generated using the API dashboard as follows:

In [2]:
#@title Abacus.AI API Key

api_key = '2fdecde877dc45fab937eff82b70eff0'  #@param {type: "string"}

3. Import the Abacus.AI library and instantiate a client

In [3]:
from abacusai import ApiClient
client = ApiClient(api_key)

## 1. Create a Project

Abacus.AI projects are containers that have datasets and trained models. By specifying a business **Use Case**, Abacus.AI tailors the deep learning algorithms to produce the best performing model possible for your data.

We'll call the `list_use_cases` method to retrieve a list of the available Use Cases currently available on the Abacus.AI platform.

In [4]:
client.list_use_cases()

[UseCase(use_case='UCPLUGANDPLAY',
   pretty_name='Plug & Play Your Tensorflow Model',
   description='Upload your already trained model and leverage our model serving infrastructure.. Host your models on our infrastructure and get a JSON api with auto scaling and more!'),
 UseCase(use_case='EMBEDDINGS_ONLY',
   pretty_name='Vector Matching Engine',
   description='Upload embeddings and leverage our similarity search infrastructure.. Scale to high traffic, update your index in near realtime'),
 UseCase(use_case='MODEL_WITH_EMBEDDINGS',
   pretty_name='Tensorflow Model With Vector Matching Engine',
   description='Upload your already trained model and leverage our model serving infrastructure.. Host your models on our infrastructure and get a JSON api with auto scaling and more!'),
 UseCase(use_case='TORCH_MODEL_WITH_EMBEDDINGS',
   pretty_name='PyTorch Model With Vector Matching Engine',
   description='Upload your already trained model and leverage our model serving infrastructure.. H

For this workshop, we're going to create a fraud prediction model using the Credit Card Transactions dataset. The 'FRAUD_TRANSACTIONS' use case is best tailored for this situation.

In [5]:
#@title Abacus.AI Use Case

use_case = 'FRAUD_TRANSACTIONS'  #@param {type: "string"}

By calling the `describe_use_case_requirements` method we can view what datasets are required for this use_case.

In [6]:
for requirement in client.describe_use_case_requirements(use_case):
  pp.pprint(requirement.to_dict())

{ 'allowed_feature_mappings': { 'FRAUD_YN': { 'allowed_feature_types': [ 'CATEGORICAL'],
                                              'description': 'This specifies '
                                                             'whether a '
                                                             'particular '
                                                             'transaction was '
                                                             'fraudulent or '
                                                             'not. You will '
                                                             'need to have '
                                                             'some specific '
                                                             'examples of '
                                                             'fraud in order '
                                                             'to train a model '
                                                        

Finally, let's create the project.

In [7]:
fraud_project = client.create_project(name='Credit Card Fraud', use_case=use_case)
fraud_project.to_dict()

{'project_id': '165b7ea22c',
 'name': 'Credit Card Fraud',
 'use_case': 'FRAUD_TRANSACTIONS',
 'created_at': '2021-11-23T19:37:00+00:00',
 'feature_groups_enabled': True}

**Note: When feature_groups_enabled is True then the use case supports feature groups (collection of ML features). Feature groups are created at the organization level and can be tied to a project to further use it for training ML models**

## 2. Add Datasets to your Project

Abacus.AI can read datasets directly from `AWS S3` or `Google Cloud Storage` buckets, otherwise you can also directly upload and store your datasets with Abacus.AI. For this workshop, we will have Abacus.AI read the datasets directly from a public S3 bucket's location.

We are using one dataset for this notebook. We'll tell Abacus.AI how the dataset should be used when creating it by tagging the dataset with a special Abacus.AI **Dataset Type**.
- [Credit Card Fraud Transactions](https://s3.amazonaws.com/realityengines.exampledatasets/fraud_transactions/creditcard.csv) (**TRANSACTIONS**): 
This dataset contains information about fraud transactions made in the past.

### Add the dataset to Abacus.AI

First we'll use Pandas to preview the file, then add it to Abacus.AI.

In [8]:
pd.read_csv('https://s3.amazonaws.com/realityengines.exampledatasets/fraud_transactions/creditcard.csv')

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,49,1.098608,0.202424,0.525456,1.323436,-0.130486,0.039924,0.028379,0.072841,-0.097869,...,-0.024972,0.154264,-0.063147,0.253205,0.629405,-0.345345,0.040469,0.010264,13.18,0
1,283,-0.529996,0.766554,1.759393,-1.160074,-0.501040,-1.404513,0.679279,-0.242594,0.520868,...,-0.163031,-0.219408,0.016959,0.934128,-0.327383,0.668479,0.114264,-0.091385,2.31,0
2,292,1.252189,-0.126779,0.280285,0.579416,-0.374125,-0.215217,-0.193078,0.011076,0.770448,...,-0.360296,-0.959573,-0.023837,-0.462201,0.381732,0.340518,-0.034929,0.007525,23.88,0
3,297,-1.148038,0.367626,2.769717,-0.356562,-0.268456,0.323423,-0.108718,0.452755,0.166990,...,0.063334,0.168210,-0.194521,0.228655,0.107744,0.222331,-0.109160,0.089767,6.20,0
4,373,1.149246,0.018358,0.430440,0.537503,-0.430754,-0.394699,-0.151398,0.135031,-0.004959,...,-0.124932,-0.449760,0.135781,0.185130,0.129062,0.177780,-0.034565,0.001487,10.29,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5487,172459,2.085175,0.393051,-4.508201,-0.311771,3.510117,2.453299,0.220469,0.543377,-0.100434,...,-0.067217,-0.072642,-0.036584,0.529693,0.414685,0.735870,-0.058233,-0.026658,0.76,0
5488,172520,-7.227073,6.358022,-3.731533,-1.952405,-1.023688,-1.092663,0.047049,1.160109,4.936109,...,-0.819786,-0.048771,-0.004170,-0.605925,1.113937,0.544539,1.452076,0.242312,4.79,0
5489,172682,1.896128,0.668121,-0.851863,3.820147,0.566866,-0.485940,0.579556,-0.292022,-1.225735,...,-0.122086,-0.339227,0.206989,0.013203,-0.023135,-0.183272,-0.050450,-0.045053,29.99,0
5490,172688,2.001831,0.570453,-2.364605,1.455708,1.169517,-0.855711,0.792066,-0.412883,-0.086356,...,-0.018417,0.181531,-0.039637,0.519640,0.537885,-0.521880,0.004388,-0.016896,27.08,0


Using the Create Dataset API, we can tell Abacus.AI the public S3 URI of where to find the datasets. We will also give each dataset a Refresh Schedule, which tells Abacus.AI when it should refresh the dataset (take an updated/latest copy of the dataset).

If you're unfamiliar with Cron Syntax, Crontab Guru can help translate the syntax back into natural language: [https://crontab.guru/#0_12_\*_\*_\*](https://crontab.guru/#0_12_*_*_*)

**Note: This cron string will be evaluated in UTC time zone**

In [9]:
# Add the dataset to Abacus.AI
fraud_dataset = client.create_dataset_from_file_connector(name='Credit Card Fraud Transactions', table_name='Credit_Card_Fraud_Transactions',
                                     location='s3://realityengines.exampledatasets/fraud_transactions/creditcard.csv',
                                     refresh_schedule='0 12 * * *')
datasets = [fraud_dataset]

## 3. Create Feature Groups and add them to your Project

Datasets are created at the organization level and can be used to create feature groups as follows:

In [10]:
feature_group = client.create_feature_group(table_name='transaction_fraud',sql='SELECT * FROM Credit_Card_Fraud_Transactions')

Adding Feature Group to the project:

In [11]:
client.add_feature_group_to_project(feature_group_id=feature_group.feature_group_id,project_id = fraud_project.project_id)

Setting the Feature Group type according to the use case requirements:

In [12]:
client.set_feature_group_type(feature_group_id=feature_group.feature_group_id, project_id = fraud_project.project_id, feature_group_type= "TRANSACTIONS")

Check current Feature Group schema:

In [13]:
client.get_feature_group_schema(feature_group_id=feature_group.feature_group_id)

[Feature(name='Time',
   select_clause=None,
   feature_mapping=None,
   source_table='Credit_Card_Fraud_Transactions',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='NUMERICAL',
   data_type='INTEGER',
   columns=None,
   point_in_time_info=None),
 Feature(name='V1',
   select_clause=None,
   feature_mapping=None,
   source_table='Credit_Card_Fraud_Transactions',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='NUMERICAL',
   data_type='FLOAT',
   columns=None,
   point_in_time_info=None),
 Feature(name='V2',
   select_clause=None,
   feature_mapping=None,
   source_table='Credit_Card_Fraud_Transactions',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='NUMERICAL',
   data_type='FLOAT',
   columns=None,
   point_in_time_info=None),
 Feature(name='V3',
   select_clause=None,
   feature_mapping=None,
   source_table=

#### For each **Use Case**, there are special **Column Mappings** that must be applied to a column to fulfill use case requirements. We can find the list of available **Column Mappings** by calling the *Describe Use Case Requirements* API:

In [15]:
client.describe_use_case_requirements(use_case)[0].allowed_feature_mappings

{'FRAUD_YN': {'description': 'This specifies whether a particular transaction was fraudulent or not. You will need to have some specific examples of fraud in order to train a model that can identify transaction fraud.',
  'allowed_feature_types': ['CATEGORICAL'],
  'required': True},
 'IGNORE': {'description': 'Ignore this column in training',
  'multiple': True,
  'required': False}}

In [16]:
client.set_feature_mapping(project_id = fraud_project.project_id,feature_group_id= feature_group.feature_group_id, feature_name='Class',feature_mapping='TARGET')


[Feature(name='Time',
   select_clause=None,
   feature_mapping=None,
   source_table='Credit_Card_Fraud_Transactions',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='NUMERICAL',
   data_type='INTEGER',
   columns=None,
   point_in_time_info=None),
 Feature(name='V1',
   select_clause=None,
   feature_mapping=None,
   source_table='Credit_Card_Fraud_Transactions',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='NUMERICAL',
   data_type='FLOAT',
   columns=None,
   point_in_time_info=None),
 Feature(name='V2',
   select_clause=None,
   feature_mapping=None,
   source_table='Credit_Card_Fraud_Transactions',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='NUMERICAL',
   data_type='FLOAT',
   columns=None,
   point_in_time_info=None),
 Feature(name='V3',
   select_clause=None,
   feature_mapping=None,
   source_table=

For each required Feature Group Type within the use case, you must assign the Feature group to be used for training the model:

In [None]:
client.use_feature_group_for_training(project_id=fraud_project.project_id, feature_group_id=feature_group.feature_group_id)

Now that we've our feature groups assigned, we're almost ready to train a model!

To be sure that our project is ready to go, let's call project.validate to confirm that all the project requirements have been met:

In [None]:
fraud_project.validate()

## 4. Train a Model

For each **Use Case**, Abacus.AI has a bunch of options for training. We can call the *Get Training Config Options* API to see the available options.

In [None]:
fraud_project.get_training_config_options()

In this notebook, we'll just train with the default options, but definitely feel free to experiment, especially if you have familiarity with Machine Learning.

In [None]:
fraud_model = fraud_project.train_model(training_config={})
fraud_model.to_dict()

After we start training the model, we can call this blocking call that routinely checks the status of the model until it is trained and evaluated:

In [None]:
fraud_model.wait_for_evaluation()

**Note that model training might take some minutes to some hours depending upon the size of datasets, complexity of the models being trained and a variety of other factors**

## **Checkpoint** [Optional]
As model training can take an hours to complete, your page could time out or you might end up hitting the refresh button, this section helps you restore your progress:

In [None]:
!pip install abacusai
import pandas as pd
import pprint
pp = pprint.PrettyPrinter(indent=2)
api_key = ''  #@param {type: "string"}
from abacusai import ApiClient
client = ApiClient(api_key)
fraud_project = next(project for project in client.list_projects() if project.name == 'Credit Card Fraud Transactions')
fraud_model = fraud_project.list_models()[-1]
fraud_model.wait_for_evaluation()

## Evaluate your Model Metrics

After your model is done training you can inspect the model's quality by reviewing the model's metrics


In [None]:
pp.pprint(fraud_model.get_metrics().to_dict())

To get a better understanding on what these metrics mean, visit our [documentation](https://abacus.ai/app/help/useCases/FRAUD_ACCOUNT/training) page.

## 5. Deploy Model

After the model has been trained, we need to deploy the model to be able to start making predictions. Deploying a model will reserve cloud resources to host the model for Realtime and/or batch predictions.

In [None]:
fraud_deployment = client.create_deployment(name='Credit Card Fraud Deployment',description='Credit Card Fraud Deployment',model_id=fraud_model.model_id)
fraud_deployment.wait_for_deployment()

After the model is deployed, we need to create a deployment token for authenticating prediction requests. This token is only authorized to predict on deployments in this project, so it's safe to embed this token inside of a user-facing application or website. 


In [None]:
deployment_token = fraud_project.create_deployment_token().deployment_token
deployment_token

## 6. Predict


Now that you have an active deployment and a deployment token to authenticate requests, you can call the `predict_fraud` command below.

This command will return the probability of a transaction being of each class of fraud. The prediction would be perfomed based on previous transaction frauds for credit credit cards with similar IDs.


In [None]:
ApiClient().predict_fraud(deployment_token=deployment_token, 
               deployment_id=fraud_deployment.deployment_id, 
               query_data={"Time":37569,"V1":-1.9863495,"V2":1.6931525,"V3":0.6006504,"V4":0.33007008,"V5":0.6902556,"V6":0.20807104,"V7":1.169273,"V8":-0.7722932})