### MODEL INFERENCE FOR CHURN BANK CUSTOMER PREDICTION

Name : Nisrina Tsany Sulthanah

Batch : FTDS-RMT038

In [22]:
# Step 1: Import necessary libraries
import cloudpickle
import pandas as pd

In [23]:
# load file pipeline of preprocessing and model from model train and saving
with open('xgboost_smote_pipeline.pkl', 'rb') as model_file:
    loaded_pipeline = cloudpickle.load(model_file)

# Check if the pipeline is loaded successfully
print(loaded_pipeline)

Pipeline(steps=[('smote', SMOTE(random_state=42)),
                ('preprocessor',
                 ColumnTransformer(transformers=[('num',
                                                  Pipeline(steps=[('dynamic_scaler',
                                                                   DynamicScaler())]),
                                                  ['CreditScore', 'Age',
                                                   'Balance',
                                                   'NumOfProducts']),
                                                 ('cat',
                                                  FunctionTransformer(func=<function <lambda> at 0x000001DA16D6CF70>),
                                                  ['Geography']),
                                                 ('binary', 'passthrough',
                                                  ['IsActiveMember'])])),
                ('classifi...
                               feature_types=None, gamma=None

In [25]:
# Example of new unseen data for inference 
data_inf = pd.DataFrame({
    'RowNumber': [1, 2],  # These columns will be ignored in the preprocessing step
    'CustomerId': [15634602, 15647311],
    'Surname': ['Hargrave', 'Hill'],
    'CreditScore': [600, 800],
    'Age': [30, 45],
    'Tenure': [3, 5],
    'Balance': [150000, 250000],
    'NumOfProducts': [2, 1],
    'EstimatedSalary': [100000, 150000],
    'Geography': ['France', 'Germany'],
    'Gender': ['Female', 'Male'],
    'HasCrCard': [1, 1],
    'IsActiveMember': [1, 0]
})

data_inf_ori = data_inf.copy()

In [26]:
# Rename column 'HasCrCard' to 'HasCreditCard' to match the training data
data_inf.rename(columns={'HasCrCard': 'HasCreditCard'}, inplace=True)


In [27]:
# Drop columns that should not be used in preprocessing (they were not part of the training data)
data_inf_for_prediction = data_inf.drop(columns=['RowNumber', 'CustomerId', 'Surname'])


In [28]:
# Retain only the columns that match the selected features during training
selected_features = [
    'CreditScore', 'Age', 'Balance', 'NumOfProducts',
    'Geography', 'IsActiveMember'
]
data_inf_for_prediction = data_inf_for_prediction[selected_features]

In [29]:
# Perform predictions using the pipeline
predictions = loaded_pipeline.predict(data_inf_for_prediction)
predictions_proba = loaded_pipeline.predict_proba(data_inf_for_prediction)[:, 1]


In [30]:
# Add predictions to the original data for reference
data_inf['Exited_Prediction'] = ['Churn' if pred == 1 else 'No Churn' for pred in predictions]
data_inf['Churn_Probability'] = predictions_proba

In [31]:
# Display the formatted results
display_data = data_inf[['RowNumber', 'CustomerId', 'Surname', 'Exited_Prediction', 'Churn_Probability']]
display_data = display_data.rename(columns={
    'RowNumber': 'Row Number',
    'CustomerId': 'Customer ID',
    'Surname': 'Customer Name',
    'Exited_Prediction': 'Churn Status',
    'Churn_Probability': 'Churn Probability (%)'
})
display_data['Churn Probability (%)'] = (display_data['Churn Probability (%)'] * 100).round(2)



In [32]:
# Show the formatted DataFrame
print(display_data)

   Row Number  Customer ID Customer Name Churn Status  Churn Probability (%)
0           1     15634602      Hargrave     No Churn               6.220000
1           2     15647311          Hill        Churn              97.699997


I have successfully applied a trained model to make predictions about customer churn based on features provided in the dataset. The model was built and trained using various classifiers, and we used the entire preprocessing pipeline (including scaling, encoding, and imputation) along with the trained model to predict whether a customer is likely to churn or not. From two data_inf I use to test the model inference to predict the churn of customer, the model's prediction results are binary: either the customer is predicted to churn (1) or not (0). In this case, the churn prediction is useful for customer retention strategies. For Customer 1, the model predicts they will stay (No Churn), while Customer 2 is predicted to leave (Churn). And from this prediction also display the percentage of churn probabality. Customer 1 has a very low likelihood of churning, as indicated by the churn probability of only 6.22%. The model predicts that the customer is satisfied or unlikely to leave the bank. For customer in row 2, This customer has a very high likelihood of churning, with a churn probability of 97.70%. This indicates that the customer is highly dissatisfied or likely to leave the bank soon.

