<A name="Deploy1"> </A> <h1>Deploy Random Forest Classifier using Streamlit</h1>

## Author: Conrado Zárate Badillo.
### Creation date:  Tuesday October 25 2022.

This short program is to simulate the deploymenyt of the Random Forest Classifier model to predict frauds on insurance claims.
As an input source it takes the "HIC5.csv" which was previuosly created by the program "InsuranceFraudsPredictionsBySML_CZB.ipynb". 

The original dataset was freely obtained from the electronic supplementary material of Chapter 25 from the book (["Essentials of Business Analytics"](https://link.springer.com/chapter/10.1007/978-3-319-68837-4_25)) :

<https://link.springer.com/chapter/10.1007%2F978-3-319-68837-4_25>

The machine learning deployment as web service idea and part of the code was taken from the book: Singh, P. (2022). Deploy Machine Learning Models to Production. With Flask, Streamlit, Docker, and Kubernetes on Google Cloud Platform. Apress.
https://link.springer.com/book/10.1007/978-1-4842-6546-8

In [1]:
import pandas as pd
import numpy as np
import joblib
import streamlit

In [2]:
HIC5=pd.read_csv("HIC5.csv")

In [3]:
HICT10=HIC5[['fraud','member_id_fctzd', 'days_claimdt_startpolicy', 
             'days_claimdt_discharge', 'days_claimdt_endpolicy']]

In [4]:
HICT10.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99991 entries, 0 to 99990
Data columns (total 5 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   fraud                     99991 non-null  int64  
 1   member_id_fctzd           99991 non-null  int64  
 2   days_claimdt_startpolicy  99991 non-null  float64
 3   days_claimdt_discharge    99991 non-null  float64
 4   days_claimdt_endpolicy    99991 non-null  float64
dtypes: float64(3), int64(2)
memory usage: 3.8 MB


In [5]:
HICT10.head()

Unnamed: 0,fraud,member_id_fctzd,days_claimdt_startpolicy,days_claimdt_discharge,days_claimdt_endpolicy
0,1,0,0.297848,0.276459,0.588551
1,0,1,0.178369,0.176395,0.670841
2,1,2,0.37316,0.273894,0.516995
3,0,2,0.373726,0.230917,0.516547
4,0,2,0.357871,0.272611,0.52907


In [19]:
HICT10.tail()

Unnamed: 0,fraud,member_id_fctzd,days_claimdt_startpolicy,days_claimdt_discharge,days_claimdt_endpolicy
99986,0,86290,0.319366,0.374599,0.571556
99987,1,86291,0.302378,0.318794,0.584973
99988,0,86292,0.36863,0.430404,0.532648
99989,1,86293,0.339751,0.322643,0.555456
99990,0,86294,0.403737,0.429763,0.504919


In [4]:
# separate the features and targets
yGenT10=HICT10['fraud']  # selects the first column fraud as target
XGenT10=HICT10.drop(['fraud'], axis=1) 

In [5]:
# train_test spliting 
from sklearn.model_selection import train_test_split
X_trainT10, X_testT10, y_trainT10, y_testT10 = train_test_split(XGenT10, yGenT10, random_state=0)

In [6]:
from imblearn.over_sampling import RandomOverSampler
rosT10 = RandomOverSampler(random_state=0)

In [7]:
# fit predictor and target variable
X_train_rosT10, y_train_rosT10 = rosT10.fit_resample(XGenT10, yGenT10)

In [17]:
from sklearn.ensemble import RandomForestClassifier
# RandomForestClassifier  X_train_rosT10, y_train_rosT10
RFClf = RandomForestClassifier().fit(X_train_rosT10, y_train_rosT10)
print("Accuracy on training set: {:.3f}".format(RFClf.score(X_train_rosT10, y_train_rosT10)))
print("Accuracy on test set: {:.3f}".format(RFClf.score(X_testT10, y_testT10)))
y_pred = RFClf.predict(X_testT10)
y_pred = np.round(y_pred)
from sklearn.metrics import confusion_matrix
cf_matrix = confusion_matrix(y_testT10, y_pred)
print(cf_matrix)

Accuracy on training set: 0.999
Accuracy on test set: 0.999
[[19411    31]
 [    3  5553]]


In [18]:
joblib.dump(RFClf,'RandomForestClassifierModel.pkl')

['RandomForestClassifierModel.pkl']

# To run the deployed model
1. open a terminal (File, New, Terminal)

2. change to current working directory. For example:

"cd C:\Users\conza\EssBusinessAnalytics\Ideal_Insurance"

3. execute on a terminal:

$ streamlit run WebAppFraudIns.py