## I. Introduction
Name : Ayudha Amari Hirtranusi  
Batch : BSD-007  

Objective: This notebook is meant to be a file for model inference. The model is trained using the code in the other notebook.

## II. Data Loading

Import library as follows :


In [1]:
# Import necessary libraries for data manipulation, visualization, and machine learning

# Numerical and data manipulation libraries
import pandas as pd

# Suppress warnings
import warnings
warnings.filterwarnings('ignore')

# Libraries for saving and loading models
import pickle
import json

The data for this inference is stored in dict as follows:


In [2]:
# load the inference data

data = {
    "CUST_ID": 9.0,
    "BALANCE": 686.657879,
    "BALANCE_FREQUENCY": 1.0,
    "PURCHASES": 2476.45,
    "ONEOFF_PURCHASES": 1624.5,
    "INSTALLMENTS_PURCHASES": 851.95,
    "CASH_ADVANCE": 253.273353,
    "PURCHASES_FREQUENCY": 1.0,
    "ONEOFF_PURCHASES_FREQUENCY": 0.75,
    "PURCHASES_INSTALLMENTS_FREQUENCY": 1.0,
    "CASH_ADVANCE_FREQUENCY": 0.083333,
    "CASH_ADVANCE_TRX": 1.0,
    "PURCHASES_TRX": 35.0,
    "CREDIT_LIMIT": 4000.0,
    "PAYMENTS": 1600.734366,
    "MINIMUM_PAYMENTS": 224.265608,
    "PRC_FULL_PAYMENT": 0.083333,
    "TENURE": 12.0
}


# convert into df
df = pd.DataFrame([data])

# show the dataframe
df

Unnamed: 0,CUST_ID,BALANCE,BALANCE_FREQUENCY,PURCHASES,ONEOFF_PURCHASES,INSTALLMENTS_PURCHASES,CASH_ADVANCE,PURCHASES_FREQUENCY,ONEOFF_PURCHASES_FREQUENCY,PURCHASES_INSTALLMENTS_FREQUENCY,CASH_ADVANCE_FREQUENCY,CASH_ADVANCE_TRX,PURCHASES_TRX,CREDIT_LIMIT,PAYMENTS,MINIMUM_PAYMENTS,PRC_FULL_PAYMENT,TENURE
0,9.0,686.657879,1.0,2476.45,1624.5,851.95,253.273353,1.0,0.75,1.0,0.083333,1.0,35.0,4000.0,1600.734366,224.265608,0.083333,12.0


In [3]:
# preprocess the data using the same preprocessing steps as the training data in the notebook, here we drop customer_id column
# create a backup of the data
df_original = df.copy()

# drop column country
df = df.drop(columns='CUST_ID')
df

Unnamed: 0,BALANCE,BALANCE_FREQUENCY,PURCHASES,ONEOFF_PURCHASES,INSTALLMENTS_PURCHASES,CASH_ADVANCE,PURCHASES_FREQUENCY,ONEOFF_PURCHASES_FREQUENCY,PURCHASES_INSTALLMENTS_FREQUENCY,CASH_ADVANCE_FREQUENCY,CASH_ADVANCE_TRX,PURCHASES_TRX,CREDIT_LIMIT,PAYMENTS,MINIMUM_PAYMENTS,PRC_FULL_PAYMENT,TENURE
0,686.657879,1.0,2476.45,1624.5,851.95,253.273353,1.0,0.75,1.0,0.083333,1.0,35.0,4000.0,1600.734366,224.265608,0.083333,12.0


## III. Model Loading

First, we will load our model, as follows.

In [4]:
# load the model that has been created
filename = 'model_pca.pkl'
with open(filename, 'rb') as f:
    model_pca = pickle.load(f)

# load the scaler that has been fitted
with open('scaler.pkl', 'rb') as file_2:
    scaler = pickle.load(file_2)

# load the outlier_handler that has been fitted
with open('outlier_handler.pkl', 'rb') as file_3:
    outlier_handler = pickle.load(file_3)
    
with open('outliers_cols_list.txt', 'r') as file_8:
    outliers_cols_list = json.load(file_8)

# load the model that has been created
filename2 = 'model_kmeans.pkl'
with open(filename2, 'rb') as f:
    model_kmeans = pickle.load(f)

In [5]:
# shows our pca model
model_pca

In [6]:
# shows our kmeans model
model_kmeans

## IV. Outliers Handling and Scaling

Now we will do outliers handling and scaling based on the column that we saved and appropriate method for each column.

In [7]:
# handle the outlier with the outlier handler and transform the data
df[outliers_cols_list] = outlier_handler.transform(df[outliers_cols_list])

In [8]:
# scaling the numerical columns
df_scaled = scaler.transform(df)

Now, we have fully prepared data for inference. We can now proceed to the next step.

## V. Predict the Data

Here, we will predict the data using the model that we have loaded. But first, we will used pca to reduce the dimension of the data.

In [9]:
# transform with pca
results_PCA = model_pca.transform(df_scaled)

Now we have the data that has been reduced in dimension. We can now predict the data using the model that we have loaded.

In [10]:
# create a df_final first for concat/combine the cluster labels with original dataframe for evaluation
df_final = df_original.copy()

# fit and predict and assign the output of k-means into the cluster column
df_final['cluster'] = model_kmeans.predict(results_PCA)+1 # add 1 to start the cluster from 1

The output of our prediction with the original data is as follows 

In [11]:
df_final

Unnamed: 0,CUST_ID,BALANCE,BALANCE_FREQUENCY,PURCHASES,ONEOFF_PURCHASES,INSTALLMENTS_PURCHASES,CASH_ADVANCE,PURCHASES_FREQUENCY,ONEOFF_PURCHASES_FREQUENCY,PURCHASES_INSTALLMENTS_FREQUENCY,CASH_ADVANCE_FREQUENCY,CASH_ADVANCE_TRX,PURCHASES_TRX,CREDIT_LIMIT,PAYMENTS,MINIMUM_PAYMENTS,PRC_FULL_PAYMENT,TENURE,cluster
0,9.0,686.657879,1.0,2476.45,1624.5,851.95,253.273353,1.0,0.75,1.0,0.083333,1.0,35.0,4000.0,1600.734366,224.265608,0.083333,12.0,3


For easy interpretation, we will map the prediction like we did in the training notebook as follows.

In [12]:
# Map the cluster labels to profile categories
cluster_mapping = {
    1: "Conservative Users",
    2: "Borrowers Users",
    3: "Power Users"
}

# Apply the mapping to the final dataframe
df_final['cluster_label'] = df_final['cluster'].map(cluster_mapping)

# Display the final dataframe
df_final

Unnamed: 0,CUST_ID,BALANCE,BALANCE_FREQUENCY,PURCHASES,ONEOFF_PURCHASES,INSTALLMENTS_PURCHASES,CASH_ADVANCE,PURCHASES_FREQUENCY,ONEOFF_PURCHASES_FREQUENCY,PURCHASES_INSTALLMENTS_FREQUENCY,CASH_ADVANCE_FREQUENCY,CASH_ADVANCE_TRX,PURCHASES_TRX,CREDIT_LIMIT,PAYMENTS,MINIMUM_PAYMENTS,PRC_FULL_PAYMENT,TENURE,cluster,cluster_label
0,9.0,686.657879,1.0,2476.45,1624.5,851.95,253.273353,1.0,0.75,1.0,0.083333,1.0,35.0,4000.0,1600.734366,224.265608,0.083333,12.0,3,Power Users


After we map the label, we can get the prediction as follows.

In [13]:
# the output of the predictions of data 1 is
print("Based on data given for inference, the customer is considered as category of:", df_final['cluster_label'].values[0])

Based on data given for inference, the customer is considered as category of: Power Users


This means, using the input data that we have on inference, the model predicts that the customer is likely to be a **Power Users**.

**Therefore**, based on the data inference that we used, our user is : 
 
Users that represents high-value, active credit users. They have the highest credit limits, make frequent and large purchases (both regular and installment), and have the highest payment amounts. They use their cards extensively but also tend to pay in full more often, this means that they might be reward-seekers or using credit for convenience rather than financing.