## **Fraud and detecting on banking system::**

Some introduction text, formatted in heading 2 style
Fraudulent behavior can be seen across many different fields such as e-commerce, healthcare, payment and banking systems. Fraud is a billion-dollar business and it is increasing every year. The PwC global economic crime survey of 2018 found that half (49 percent) of the 7,200 companies they surveyed had experienced fraud of some kind.

Even if fraud seems to be scary for businesses it can be detected using intelligent systems such as rules engines or machine learning. Here we are trying to explain and demonstrate A rules engine is a software system that executes one or more business rules in a runtime production environment. These rules are generally written by domain experts for transferring the knowledge of the problem to the rules engine and from there to production. Two rules examples for fraud detection would be limiting the number of transactions in a time period (velocity rules), denying the transactions which come from previously known fraudulent IP's and/or domains.

Rules are great for detecting some type of frauds but they can fire a lot of false positives or false negatives in some cases because they have predefined threshold values. For example let's think of a rule for denying a transaction which has an amount that is bigger than 10000 dollars for a specific user. If this user is an experienced fraudster, he/she may be aware of the fact that the system would have a threshold and he/she can just make a transaction just below the threshold value (9999 dollars).

For these type of problems ML comes for help and reduce the risk of frauds and the risk of business to lose money. With the combination of rules and machine learning, detection of the fraud would be more precise and confident.


We detect the fraudulent transactions from the Banksim dataset. This synthetically generated dataset consists of payments from various customers made in different time periods and with different amounts.

Here what we'll do in this kernel:
1. [Exploratory Data Analysis](#Explaratory-Data-Analysis)
2. [Install Required Prerequisites Packages](#Install-Required-Prerequisites-Packages)
3. [Data Preprocessing](#Data-Preprocessing)
4. [XGBoost Classifier](#XGBoost-Classifier)
5. [Logistic Regression Classifier](#Logistic-Regression-Classifier)
6. [ASHAScheduler](#ASHAScheduler)
7. [accuracy_score](#accuracy_score)
8. [precision_score](#precision_score)
9. [recall_score](#recall_score)
10. [f1_score](#f1_score)
11. [roc_auc_score](#roc_auc_score)
12. [Conclusion](#Conclusion)

In this chapter we will perform an EDA on the data and try to gain some insight from it.

## **Explaratory Data Analysis**

Here, we will perform an EDA on the data and try to gain some insight from it.

**Data**
As we can see in the first rows below the dataset has 9 feature columns and a target column. 
The feature columms are :
* **Step**: This feature represents the day from the start of simulation. It has 180 steps so simulation ran for virtually 6 months.
* **Customer**: This feature represents the customer id
* **zipCodeOrigin**: The zip code of origin/source.
* **Merchant**: The merchant's id
* **zipMerchant**: The merchant's zip code
* **Age**: Categorized age 
    * 0: <= 18, 
    * 1: 19-25, 
    * 2: 26-35, 
    * 3: 36-45,
    * 4: 46:55,
    * 5: 56:65,
    * 6: > 65
    * U: Unknown
* **Gender**: Gender for customer
     * E : Enterprise,
     * F: Female,
     * M: Male,
     * U: Unknown
* **Category**: Category of the purchase. I won't write all categories here, we'll see them later in the analysis.
* **Amount**: Amount of the purchase
* **Fraud**: Target variable which shows if the transaction fraudulent(1) or Kind(unfraudulent)(0)

In [1]:
from imblearn.over_sampling import SMOTE
import ray
from ray import tune
from ray.tune.schedulers import ASHAScheduler
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import xgboost as xgb
import matplotlib.pyplot as plt
from minio import Minio
import urllib3
import uuid
import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
import time
import pickle

In [2]:
#upload files to s3
def _set_object(filename: str, model_name: str, client: object, output_bucket: str, s3_path: str)->dict:
    print("MINIO_CLIENT=======>", client)
    print("filename===========>", filename)
    print("model_name=========>", model_name)
    print("output_bucket======>", output_bucket)
    print("s3_path============>",s3_path)

    if filename != None:
        head, tail = os.path.split(filename)
        print("head =======>", head)
        print("tail =======>", tail)
        file_size = os.stat(filename).st_size
        print("file_size======>", file_size)
        object = client.fput_object(bucket_name=output_bucket, object_name=f"{s3_path}"+tail, file_path=filename)
        print(f"The Fraud indentification modle {model_name} classifier finalized model Upload Completed!")

In [3]:
@ray.remote(num_cpus=3)
class RayFraudDetectionExperiment:
    def __init__(self):
        self.data = None
        self.preprocessed_data = None
        self.models = []
            
    #@ray.remote(num_gpus = 0.1)
    def load_data(self):
        # Load the data from the provided data_path
        MINIO_CLIENT_DATASET = Minio(
        endpoint= "home.hpe-staging-ezaf.com:31900",
        access_key= "minioadmin",
        secret_key= "minioadmin",
        secure=True,
        http_client = urllib3.PoolManager(cert_reqs='CERT_NONE'))
        print("MINIO_CLIENT", MINIO_CLIENT_DATASET)
        print("=============:Fetching", "."*10)

        buckets = MINIO_CLIENT_DATASET.list_buckets()
        for bucket in buckets:
            print("=============:", bucket.name, bucket.creation_date)

        csv_file = MINIO_CLIENT_DATASET.get_object("experiments", "/source/feed.csv")
        self.data = pd.read_csv(csv_file)
    
    #@ray.remote(num_gpus = 0.1)
    def preprocess_data(self):
        # Implement data preprocessing steps here
        preprocessed_data = self.data.copy()
        # Remove rows with missing values
        preprocessed_data = preprocessed_data.dropna()  
        # Remove duplicate rows
        preprocessed_data = preprocessed_data.drop_duplicates()  
        
        # Reset the index
        preprocessed_data = preprocessed_data.reset_index(drop=True)  
        # Additional preprocessing steps based on specific requirements
        # ... we can add here
        print("=============:after preprocess:", preprocessed_data)
        self.preprocessed_data = preprocessed_data

    #@ray.remote(num_gpus = 0.1)
    def data_splitting(self):
        data_reduced = self.preprocessed_data.drop(['zipcodeOri','zipMerchant'],axis=1)
        data_reduced.loc[:,['customer','merchant','category']].astype('category')

        # turning object columns type to categorical for easing the transformation process
        col_categorical = data_reduced.select_dtypes(include= ['object']).columns
        for col in col_categorical:
            data_reduced[col] = data_reduced[col].astype('category')
        # categorical values ==> numeric values
        data_reduced[col_categorical] = data_reduced[col_categorical].apply(lambda x: x.cat.codes)
        data_reduced.head(5)

        # Implement data splitting strategies here
        X = data_reduced.drop(['fraud'],axis=1)
        X.fillna(0, inplace=True)
        y = self.preprocessed_data['fraud']
        y.fillna(0, inplace=True)
        
        # Split the data into train and test sets
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
        print("=============:AFTER SPLIT", X_train, X_test, y_train, y_test)

        return X_train, X_test, y_train, y_test

    #@ray.remote(num_gpus = 0.1)
    def train_models(self, config, checkpoint_dir=None):
        X_train, X_test, y_train, y_test = self.data_splitting()
        print("X_train, X_test, y_train, y_test :::" , X_train, X_test, y_train, y_test)
        model = None
        model_name = config["model"]
        
        if model_name == "LogisticRegression":
            model = LogisticRegression(
                C=config.get("C", 1.0),
                max_iter=config.get("max_iter", 999),
                solver=config.get("solver", "lbfgs"),
            )
        elif model_name == "xgboost":
            model = xgb.XGBClassifier(
                silent=None, 
                seed=42, 
                colsample_bynode=1, 
                max_depth=6, 
                learning_rate=0.05, 
                n_estimators=400,
                objective="binary:hinge",
                booster='gbtree',
                missing=1,
                n_jobs=-1,
                nthread=None,
                gamma=0,
                min_child_weight=1,
                max_delta_step=0,
                subsample=1,
                colsample_bytree=1,
                colsample_bylevel=1,
                reg_alpha=0,
                reg_lambda=1,
                base_score=0.5,
                random_state=42, 
                verbosity=1)
        # other models....

        if model:
            # Evaluate the model
            print("=============:avi: system reached phase-3")
            model.fit(X_train, y_train)
            self.models.append(model)
            print("===========:", self.models)
            y_pred = model.predict(X_test)
            accuracy = accuracy_score(y_test, y_pred)
            precision = precision_score(y_test, y_pred)
            recall = recall_score(y_test, y_pred)
            f1 = f1_score(y_test, y_pred)
            auc_roc = roc_auc_score(y_test, y_pred)
            print("=============:accuracy:", accuracy)
            print("=============:precision:", precision)
            print("=============:recall:", recall)
            print("=============:f1:", f1)
            print("=============:auc_roc:", auc_roc)
            print("=============:avi: system reached final phase-4")
            
            '''
                Here upload model into our object store (s3/minio)
            '''
            print("Model path========>", f"{model_name}.pkl")
            client = Minio(
                endpoint= "home.hpe-staging-ezaf.com:31900", 
                access_key= "minioadmin", 
                secret_key= "minioadmin", 
                secure=True, 
                http_client = urllib3.PoolManager(cert_reqs='CERT_NONE'))

            print("MINIO_CLIENT", client)
            
            if model_name == "LogisticRegression":
                run_model_path = f"/home/ray/ray_results/model.pkl"
                pickle.dump(model, open(run_model_path, "wb"))
                object = client.fput_object(bucket_name="experiments", object_name=f"ray/pickels/logisticregression/model/model.pkl", file_path=run_model_path)
                print(f"The Fraud indentification modle {run_model_path} classifier finalized model Upload Completed!")
            elif model_name == "xgboost":
                run_model_path = f"/home/ray/ray_results/model.pkl"
                pickle.dump(model, open(run_model_path, "wb"))
                object = client.fput_object(bucket_name="experiments", object_name=f"ray/pickels/xgboost/model/model.pkl", file_path=run_model_path)
                print(f"The Fraud indentification modle {run_model_path} classifier finalized model Upload Completed!")
                
            tune.report(mean_accuracy=accuracy, mean_precision=precision, mean_recall=recall,
                        mean_f1=f1, mean_auc_roc=auc_roc)

    #@ray.remote(num_gpus = 0.1)
    def run_experiment(self,index):
        try:
            print("=============:avi: system reached phase-2")
            self.load_data()
            self.preprocess_data()
            if index == 0:
                model = tune.choice(["LogisticRegression"])
            else:
                model = tune.choice(["xgboost"])
            config = {
                "model": model,  #xgboost #LogisticRegression
                "estimators": tune.choice([100]),
                "max_depth": tune.choice([8]),
                "C": tune.loguniform(0.01, 10),
                "solver": tune.choice(["lbfgs", "liblinear"]),
                "kernel": tune.choice(["linear", "rbf"]),
                "random_state":tune.choice([42]),
                "verbose":tune.choice([1]),
                "class_weight":tune.choice(["balaced"]),
                "max_iter":tune.choice([999]),
            }


            analysis = tune.run(
                self.train_models,
                config=config,
                resources_per_trial={"cpu": 3},
                metric="mean_accuracy",
                mode="max",
                num_samples=1,
                reuse_actors=True,
                stop={
                    "mean_accuracy": 0.50, 
                    "training_iteration": 1},
                scheduler=ASHAScheduler(max_t=10)
            )


            best_config = analysis.get_best_config(metric="mean_accuracy", mode="max")
            print("Best Configuration:", best_config)


        except Exception as e:
            # Exception handling
            print("An error occurred:", str(e))

In [4]:
def main_experiment():
    # Start timer
    start_time = time.time()
    ray.init(address="ray://kuberay-head-svc.kuberay:10001", 
             runtime_env={
                 #==7.1.13
                 "pip": ["minio","scikit-learn", "imblearn", "xgboost"],
                 #"env_vars":{"http_proxy": "http://10.78.90.46:80", "https_proxy": "http://10.78.90.46:80"} #needed for LR1 network
             }
    )
    # Specify the data path
    print("=============:", ray.cluster_resources())

    # Create the remote RayFraudDetectionExperiment actors
    
    # fraud_detection_experiments = [RayFraudDetectionExperiment.remote() for _ in range(int(ray.cluster_resources()["CPU"]))]
    # Run the experiments in parallel
    # ray.get([experiment.run_experiment.remote() for experiment in fraud_detection_experiments])
    
    model_loop = ["LogisticRegression", "xgboost"]
    for _ in range(len(model_loop)):
        fraud_detection_experiments = RayFraudDetectionExperiment.remote()
        ray.get([fraud_detection_experiments.run_experiment.remote(_)])
    # Run one experiment at once.
    # fraud_detection_experiments = RayFraudDetectionExperiment.remote()
    # ray.get([fraud_detection_experiments.run_experiment.remote()])

    # Stop timer
    end_time = time.time()
    execution_time = end_time - start_time
    print("=============: Execution Time:", execution_time, "seconds")
    ray.shutdown()

In [5]:
if __name__ == "__main__":
    main_experiment()

[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m MINIO_CLIENT <minio.api.Minio object at 0x7f1ff8a73430>




[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m 0          0  C1093826151   4  ...  es_transportation    4.55     0
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m 1          0   C352968107   2  ...  es_transportation   39.68     0
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m 2          0  C2054744914   4  ...  es_transportation   26.89     0
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m 3          0  C1760612790   3  ...  es_transportation   17.25     0
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m 4          0   C757503768   5  ...  es_transportation   35.72     0
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m ...      ...          ...  ..  ...                ...     ...   ...
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m 594638   179  C1753498738   3  ...  es_transportation   20.53     0
[2m[36m(RayFraudDetectionExperiment pid

[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m 2023-07-06 11:55:44,308	INFO registry.py:96 -- Detected unknown callable for trainable. Converting to class.
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m 
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m from ray.air import session
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m 
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m def train(config):
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m     # ...
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m     session.report({"metric": metric}, checkpoint=checkpoint)
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m 
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m For more information please see https://docs.ray.io/en/master/tune/api_docs/trainable.html
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244

[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m == Status ==
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m Current time: 2023-07-06 11:55:53 (running for 00:00:07.70)
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m Memory usage on this node: 17.6/123.8 GiB 
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m Using AsyncHyperBand: num_stopped=0
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m Bracket: Iter 4.000: None | Iter 1.000: None
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m Resources requested: 0/3 CPUs, 0/0 GPUs, 0.0/14.9 GiB heap, 0.0/4.38 GiB objects
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m Result logdir: /home/ray/ray_results/train_models_2023-07-06_11-55-46
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m Number of trials: 1/1 (1 PENDING)
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[



[2m[36m(train_models pid=4182, ip=10.244.3.10)[0m 37107     14       377    2       1        18        12   12.15
[2m[36m(train_models pid=4182, ip=10.244.3.10)[0m 163300    57      2909    3       1        18        12   27.94
[2m[36m(train_models pid=4182, ip=10.244.3.10)[0m 108691    39      3077    6       2        30        12   52.25
[2m[36m(train_models pid=4182, ip=10.244.3.10)[0m 429389   135      3759    3       2        18        12   49.98
[2m[36m(train_models pid=4182, ip=10.244.3.10)[0m 222059    75      1548    5       1        41        14   27.66
[2m[36m(train_models pid=4182, ip=10.244.3.10)[0m ...      ...       ...  ...     ...       ...       ...     ...
[2m[36m(train_models pid=4182, ip=10.244.3.10)[0m 110268    40      1981    5       1        30        12   38.12
[2m[36m(train_models pid=4182, ip=10.244.3.10)[0m 259178    86       747    2       1        30        12   38.63
[2m[36m(train_models pid=4182, ip=10.244.3.10)[0m 365838   11



[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m == Status ==
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m Current time: 2023-07-06 11:56:41 (running for 00:00:55.03)
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m Memory usage on this node: 17.0/123.8 GiB 
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m Using AsyncHyperBand: num_stopped=0
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m Bracket: Iter 4.000: None | Iter 1.000: 0.9940973020241826
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m Resources requested: 0/6 CPUs, 0/0 GPUs, 0.0/22.35 GiB heap, 0.0/6.59 GiB objects
[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m Current best trial: b7105_00000 with mean_accuracy=0.9940973020241826 and parameters={'model': 'LogisticRegression', 'estimators': 100, 'max_depth': 8, 'C': 0.04527074055799663, 'solver': 'lbfgs', 'kernel': 'linear', 'random

[2m[36m(RayFraudDetectionExperiment pid=6755, ip=10.244.3.114)[0m 2023-07-06 11:56:41,298	INFO tune.py:762 -- Total run time: 56.99 seconds (55.03 seconds for the tuning loop).


[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m MINIO_CLIENT <minio.api.Minio object at 0x7f56ccead550>




[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m 0          0  C1093826151   4  ...  es_transportation    4.55     0
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m 1          0   C352968107   2  ...  es_transportation   39.68     0
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m 2          0  C2054744914   4  ...  es_transportation   26.89     0
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m 3          0  C1760612790   3  ...  es_transportation   17.25     0
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m 4          0   C757503768   5  ...  es_transportation   35.72     0
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m ...      ...          ...  ..  ...                ...     ...   ...
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m 594638   179  C1753498738   3  ...  es_transportation   20.53     0
[2m[36m(RayFraudDetectionExperiment pid=5007, 

[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m 2023-07-06 11:56:43,891	INFO registry.py:96 -- Detected unknown callable for trainable. Converting to class.
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m 
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m from ray.air import session
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m 
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m def train(config):
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m     # ...
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m     session.report({"metric": metric}, checkpoint=checkpoint)
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m 
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m For more information please see https://docs.ray.io/en/master/tune/api_docs/trainable.html
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0

[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m == Status ==
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m Current time: 2023-07-06 11:56:54 (running for 00:00:08.75)
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m Memory usage on this node: 16.8/123.8 GiB 
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m Using AsyncHyperBand: num_stopped=0
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m Bracket: Iter 4.000: None | Iter 1.000: None
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m Resources requested: 3.0/6 CPUs, 0/0 GPUs, 0.0/22.35 GiB heap, 0.0/6.59 GiB objects
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m Result logdir: /home/ray/ray_results/train_models_2023-07-06_11-56-45
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m Number of trials: 1/1 (1 RUNNING)
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m +--



[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m == Status ==
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m Current time: 2023-07-06 11:58:43 (running for 00:01:57.52)
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m Memory usage on this node: 16.9/123.8 GiB 
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m Using AsyncHyperBand: num_stopped=0
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m Bracket: Iter 4.000: None | Iter 1.000: 0.9963507536730701
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m Resources requested: 0/6 CPUs, 0/0 GPUs, 0.0/22.35 GiB heap, 0.0/6.59 GiB objects
[2m[36m(RayFraudDetectionExperiment pid=5007, ip=10.244.3.10)[0m Current best trial: dac57_00000 with mean_accuracy=0.9963507536730701 and parameters={'model': 'xgboost', 'estimators': 100, 'max_depth': 8, 'C': 0.14088458982157973, 'solver': 'liblinear', 'kernel': 'linear', 'random_state': 42, '

In [6]:
#kserve
from kubernetes import client
from kserve import KServeClient
from kserve import constants
from kserve import utils
from kserve import V1beta1InferenceService
from kserve import V1beta1InferenceServiceSpec
from kserve import V1beta1PredictorSpec
from kserve import V1beta1SKLearnSpec

default_model_spec = V1beta1InferenceServiceSpec(predictor=V1beta1PredictorSpec(
    service_account_name="fraud-detection-kserver-service",  
    sklearn=V1beta1SKLearnSpec(
        storage_uri="s3://experiments/ray", 
        protocol_version="v2"
    )))

isvc = V1beta1InferenceService(api_version=constants.KSERVE_V1BETA1,
                          kind=constants.KSERVE_KIND,
                          metadata=client.V1ObjectMeta(name="ray-fraud-detection-lr-7062023-2",
                                                       namespace="hpedemo-user01"),
                          spec=default_model_spec)

# print(isvc)
kserve = KServeClient()
kserve.create(isvc)

{'apiVersion': 'serving.kserve.io/v1beta1',
 'kind': 'InferenceService',
 'metadata': {'creationTimestamp': '2023-07-06T18:58:48Z',
  'generation': 1,
  'labels': {'modelClass': 'mlserver_sklearn.SKLearnModel'},
  'managedFields': [{'apiVersion': 'serving.kserve.io/v1beta1',
    'fieldsType': 'FieldsV1',
    'fieldsV1': {'f:spec': {'.': {},
      'f:predictor': {'.': {},
       'f:serviceAccountName': {},
       'f:sklearn': {'.': {},
        'f:name': {},
        'f:protocolVersion': {},
        'f:storageUri': {}}}}},
    'manager': 'OpenAPI-Generator',
    'operation': 'Update',
    'time': '2023-07-06T18:58:45Z'}],
  'name': 'ray-fraud-detection-lr-7062023-2',
  'namespace': 'hpedemo-user01',
  'resourceVersion': '136416993',
  'uid': '401bcd2e-7a05-4d0c-bf90-e3b51e0db5bf'},
 'spec': {'predictor': {'model': {'env': [{'name': 'MLSERVER_MODEL_NAME',
      'value': 'ray-fraud-detection-lr-7062023-2'},
     {'name': 'MLSERVER_MODEL_URI', 'value': '/mnt/models'}],
    'modelFormat': {'n

In [8]:
#---------------- ::: get-generated-data-test ::: ----------------
MINIO_CLIENT_INFR = Minio(
    endpoint= "home.hpe-staging-ezaf.com:31900", 
    access_key= "minioadmin", 
    secret_key= "minioadmin", 
    secure=True,
    http_client = urllib3.PoolManager(cert_reqs='CERT_NONE'))

print("MINIO_CLIENT", MINIO_CLIENT_INFR)
csv_file = MINIO_CLIENT_INFR.get_object("experiments", "/source/generated-data.csv")
data = pd.read_csv(csv_file)
data.head(5)
data_reduced = data.drop(['zipcodeOri','zipMerchant'],axis=1)
data_reduced.loc[:,['customer','merchant','category']].astype('category')

# turning object columns type to categorical for easing the transformation process
col_categorical = data_reduced.select_dtypes(include= ['object']).columns
for col in col_categorical:
    data_reduced[col] = data_reduced[col].astype('category')
data_reduced[col_categorical] = data_reduced[col_categorical].apply(lambda x: x.cat.codes)
data_reduced.head(5)

#---------------- ::: inference input ::: ----------------
# In contrast, model inference is the process of using a trained model to infer a result from live data.
X = data_reduced.drop(['fraud'], axis=1)
y = data_reduced['fraud']
print("shape==============", [len(X.values), len(X.values[0])])
print("X.values[0]==============", X.values[0], "=======",  list(X.values[0]))

inference_request = {
    "inputs" : [{
        "name" : "ray-fraud-detection-infer-001",
        "datatype": "FP32",
        # !!! Multiple record infer !!!
        # "data": [list(item) for item in X.values],
        # "shape": [len(X.values), len(X.values[0])],
 
        # !!! One record infer !!!
        "shape": [1, 7],
        # "data": [list(item) for item in X.values][14], #Non-Fraud Transaction Dtls
        "data": [list(item) for item in X.values][17], #Fraud Transaction Dtls
    }]
}
print("data::", inference_request)

import requests
import json
EZAF_ENV = "hpe-staging-ezaf" 
token_url = f"https://keycloak.{EZAF_ENV}.com/realms/UA/protocol/openid-connect/token"

config_data = {
    "username" : "hpedemo-user01",
    "password" : "Hpepoc@123",
    "grant_type" : "password",
    "client_id" : "ua-grant",
}

token_responce = requests.post(token_url, data=config_data, allow_redirects=True, verify=False)

token = token_responce.json()["access_token"]
headers = {"Authorization": f"Bearer {token}"}
print("token", token)

#---------------- ::: Trigger Kserving ::: ----------------
KServe = KServeClient()
server_isvc_resp = KServe.get(
    "ray-fraud-detection-lr-7062023-2", 
    namespace="hpedemo-user01").get("status").get("components").get('predictor').get('url').replace("http","https")
print("server_isvc_resp", server_isvc_resp)
print("inference::", f"{server_isvc_resp}/v2/models/ray-fraud-detection-lr-7062023-2/infer")

session = requests.Session()
message = {"message":"", "value":""}
response = session.post(
    f"{server_isvc_resp}/v2/models/ray-fraud-detection-lr-7062023-2/infer",
    json = inference_request,
    headers=headers,
    verify=False)
if response.status_code == 200:
    if json.loads(response.__dict__.get('_content')).get('outputs')[0]['data'][0] != None and json.loads(response.__dict__.get('_content')).get('outputs')[0]['data'][0] == 1:
        message['message'] = "Fraud Banking Transaction !"
        message['value'] = json.loads(response.__dict__.get('_content')).get('outputs')[0]['data'][0]        
        print('\033[91m' "Prediction Result:", json.dumps(message))
    elif len(json.loads(response.__dict__.get('_content')).get('outputs')[0]['data'])>1:
        print("Model-Infer-dtl:[data]:\n", json.loads(response.__dict__.get('_content')).get('outputs')[0]['data'])
    else:
        message['message'] = "Non-fraud Banking Transaction !"
        message['value'] = json.loads(response.__dict__.get('_content')).get('outputs')[0]['data'][0]   
        print('\033[92m'  "Prediction Result:", json.dumps(message))
else:
    print("service issue::", response.status_code)
    print("service issue::", response.content)

MINIO_CLIENT <minio.api.Minio object at 0x7f58f4a3b0d0>
data:: {'inputs': [{'name': 'ray-fraud-detection-infer-001', 'datatype': 'FP32', 'shape': [1, 7], 'data': [1.0, 4.0, 4.0, 0.0, 4.0, 1.0, 255.14]}]}
token eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJSQ2JxQ0ZwS05jRXV4WlZid2psajZ0dTVwYWNBWGRJcFRCbUw0NnA0ZWZrIn0.eyJleHAiOjE2ODg3NTYzODIsImlhdCI6MTY4ODY2OTk4MiwianRpIjoiNDI3Y2M5NTUtZTA5NC00OTNmLWE1Y2EtMTQ4NWYyYWNlNmE0IiwiaXNzIjoiaHR0cHM6Ly9rZXljbG9hay5ocGUtc3RhZ2luZy1lemFmLmNvbS9yZWFsbXMvVUEiLCJhdWQiOiJhY2NvdW50Iiwic3ViIjoiNjEzNTA0ZmYtOGI2Zi00NjQ4LThmMTktZWY5MGJmMjU1YTQ5IiwidHlwIjoiQmVhcmVyIiwiYXpwIjoidWEtZ3JhbnQiLCJzZXNzaW9uX3N0YXRlIjoiYjZkMjljMWEtNTA0ZS00NTc2LTgxNjktNjExZWI4ZDEyNGZkIiwiYWNyIjoiMSIsImFsbG93ZWQtb3JpZ2lucyI6WyIvKiJdLCJyZWFsbV9hY2Nlc3MiOnsicm9sZXMiOlsidWEtZW5hYmxlZCIsIm9mZmxpbmVfYWNjZXNzIiwiYWRtaW4iLCJ1bWFfYXV0aG9yaXphdGlvbiIsImRlZmF1bHQtcm9sZXMtdWEiXX0sInNjb3BlIjoiZW1haWwgcHJvZmlsZSIsInNpZCI6ImI2ZDI5YzFhLTUwNGUtNDU3Ni04MTY5LTYxMWViOGQxMjRmZCIsInJlc291cmNlX2FjY2Vzcy