# Model Inference
> This section explains the process of saving and loading models and other files related to the results of the models that we created in the previous chapter. Since there are only 10 data, handling outliers could be skipped. From here, we can compare and observe if the actual data of `default_payment_next_month` is correct with the predicted classification.

## Loading Models

In [6]:
# Import Libraries
import pandas as pd

# Save and Load Model
import pickle
import json

In [7]:
# Loading Model Files

with open('m_svm.pkl', 'rb') as file_1:
  m_svm = pickle.load(file_1)

with open('s_MinMaxScaler.pkl', 'rb') as file_2:
  s_MinMaxScaler = pickle.load(file_2)

with open('e_OneHotEncoder.pkl', 'rb') as file_3:
  e_OneHotEncoder = pickle.load(file_3)

with open('t_OrdinalEncoder.pkl', 'rb') as file_4:
  t_OrdinalEncoder = pickle.load(file_4)
  
with open('num_col.txt','r') as file_5:
  num_col = json.load(file_5)

with open('cat_col.txt','r') as file_6:
  cat_col = json.load(file_6)

## Features Selection

In [8]:
# data inference
df_inf = pd.read_csv('h8dsft_p1m1_Ahmad Luay Adnani_inference.csv')
df_inf_copy = df_inf.copy()
df_inf

Unnamed: 0.1,Unnamed: 0,limit_balance,sex,education_level,marital_status,age,pay_0,pay_2,pay_3,pay_4,...,bill_amt_4,bill_amt_5,bill_amt_6,pay_amt_1,pay_amt_2,pay_amt_3,pay_amt_4,pay_amt_5,pay_amt_6,default_payment_next_month
0,118,300000.0,1,1,2,25.0,0.0,0.0,0.0,0.0,...,19507.0,18169.0,18533.0,4103.0,1427.0,600.0,1200.0,2500.0,18000.0,0
1,252,170000.0,2,1,2,29.0,0.0,0.0,0.0,0.0,...,66496.0,36000.0,25167.0,3600.0,3000.0,4710.0,1500.0,1000.0,1000.0,0
2,599,420000.0,1,2,1,36.0,0.0,0.0,0.0,0.0,...,22304.0,28465.0,38182.0,15005.0,10013.0,10000.0,10000.0,20000.0,20000.0,0
3,723,130000.0,2,2,2,31.0,0.0,0.0,0.0,0.0,...,103750.0,105818.0,98401.0,4000.0,5300.0,3700.0,3600.0,3700.0,3500.0,0
4,1626,50000.0,1,2,2,28.0,-1.0,2.0,-1.0,0.0,...,937.0,-3.0,894.0,0.0,3141.0,2.0,0.0,897.0,906.0,0
5,1652,450000.0,2,3,2,40.0,-1.0,-1.0,-1.0,-1.0,...,14602.0,18065.0,19239.0,26731.0,55367.0,15174.0,10528.0,10037.0,48551.0,0
6,2337,240000.0,2,2,1,41.0,1.0,-1.0,-1.0,-1.0,...,9795.0,11756.0,12522.0,40529.0,3211.0,9795.0,11756.0,12522.0,6199.0,0
7,2414,30000.0,2,2,2,22.0,-1.0,0.0,-1.0,-1.0,...,3312.0,3145.0,3022.0,1009.0,5572.0,3321.0,3154.0,3031.0,3339.0,0
8,2629,100000.0,1,2,2,30.0,-2.0,-2.0,-2.0,-2.0,...,0.0,1756.0,0.0,1475.0,0.0,0.0,1756.0,0.0,0.0,0
9,2675,260000.0,2,3,2,49.0,-2.0,-2.0,-2.0,-2.0,...,2735.0,316.0,305.0,217773.0,200304.0,2759.0,316.0,305.0,2596.0,0


Following are the variables and definitions of each column in the dataset.

Variable | Definition
---|---
`limit_balance` | Amount of given credit in NT dollars (includes individual and family/supplementary credit)
`sex` | Gender (1=male, 2=female)
`education_level` | Education Level (1=graduate school, 2=university, 3=high school, 4=others, 5=unknown, 6=unknown)
`marital_status` | Marital status (1=married, 2=single, 3=others)
`age` | Age in years
**Repayment Status** | Scale: (-2=no consumption, -1=pay duly, 1=payment delay for one month, 2=payment delay for two months, ... 8=payment delay for eight months, 9=payment delay for nine months and above)
`pay_0` | Repayment status in September, 2005 (scale same as above)
`pay_2` | Repayment status in August, 2005 (scale same as above)
`pay_3` | Repayment status in July, 2005 (scale same as above)
`pay_4` | Repayment status in June, 2005 (scale same as above)
`pay_5` | Repayment status in May, 2005 (scale same as above)
`pay_6` | Repayment status in April, 2005 (scale same as above)
**Billing Amount** | in NT dollar
`bill_amt_1` | Amount of bill statement in September, 2005 (NT dollar)
`bill_amt_2` | Amount of bill statement in August, 2005 (NT dollar)
`bill_amt_3` | Amount of bill statement in July, 2005 (NT dollar)
`bill_amt_4` | Amount of bill statement in June, 2005 (NT dollar)
`bill_amt_5` | Amount of bill statement in May, 2005 (NT dollar)
`bill_amt_6` | Amount of bill statement in April, 2005 (NT dollar)
**Previous Payment** | in NT dollar
`pay_amt_1` | Amount of previous payment in September, 2005 (NT dollar)
`pay_amt_2` | Amount of previous payment in August, 2005 (NT dollar)
`pay_amt_3` | Amount of previous payment in July, 2005 (NT dollar)
`pay_amt_4` | Amount of previous payment in June, 2005 (NT dollar)
`pay_amt_5` | Amount of previous payment in May, 2005 (NT dollar)
`pay_amt_6` | Amount of previous payment in April, 2005 (NT dollar)
**Default Payment** | Target
`default_payment_next_month` | Default payment (1=yes, 0=no)

In [9]:
# Split Numerical Features
df_inf_num = df_inf[num_col]

# Split Categorical Features
df_inf_cat = df_inf[cat_col]
df_inf_num

Unnamed: 0,limit_balance,age,bill_amt_1,bill_amt_2,bill_amt_3,bill_amt_4,bill_amt_5,bill_amt_6,pay_amt_1,pay_amt_2,pay_amt_3,pay_amt_4,pay_amt_5,pay_amt_6
0,300000.0,25.0,76918.0,41773.0,31180.0,19507.0,18169.0,18533.0,4103.0,1427.0,600.0,1200.0,2500.0,18000.0
1,170000.0,29.0,90450.0,82581.0,81703.0,66496.0,36000.0,25167.0,3600.0,3000.0,4710.0,1500.0,1000.0,1000.0
2,420000.0,36.0,56068.0,55115.0,19304.0,22304.0,28465.0,38182.0,15005.0,10013.0,10000.0,10000.0,20000.0,20000.0
3,130000.0,31.0,97544.0,99208.0,102946.0,103750.0,105818.0,98401.0,4000.0,5300.0,3700.0,3600.0,3700.0,3500.0
4,50000.0,28.0,2809.0,187.0,3135.0,937.0,-3.0,894.0,0.0,3141.0,2.0,0.0,897.0,906.0
5,450000.0,40.0,5453.0,25948.0,53938.0,14602.0,18065.0,19239.0,26731.0,55367.0,15174.0,10528.0,10037.0,48551.0
6,240000.0,41.0,0.0,40529.0,3211.0,9795.0,11756.0,12522.0,40529.0,3211.0,9795.0,11756.0,12522.0,6199.0
7,30000.0,22.0,2293.0,3158.0,5547.0,3312.0,3145.0,3022.0,1009.0,5572.0,3321.0,3154.0,3031.0,3339.0
8,100000.0,30.0,914.0,1170.0,0.0,0.0,1756.0,0.0,1475.0,0.0,0.0,1756.0,0.0,0.0
9,260000.0,49.0,-5684.0,211466.0,200304.0,2735.0,316.0,305.0,217773.0,200304.0,2759.0,316.0,305.0,2596.0


### Feature Scaling: MinMaxScaler

In [10]:
# Making feature scaling for numerical columns with MinMaxScaler
df_inf_num_scaled = s_MinMaxScaler.transform(df_inf_num)
df_inf_num_scaled = pd.DataFrame(df_inf_num_scaled, columns = df_inf_num.columns, index = df_inf_num.index)
df_inf_num_scaled

Unnamed: 0,limit_balance,age,bill_amt_1,bill_amt_2,bill_amt_3,bill_amt_4,bill_amt_5,bill_amt_6,pay_amt_1,pay_amt_2,pay_amt_3,pay_amt_4,pay_amt_5,pay_amt_6
0,0.591837,0.102564,0.482543,0.489711,0.325615,0.359393,0.376825,0.462692,0.366037,0.12891,0.059973,0.125981,0.262895,1.850947
1,0.326531,0.205128,0.563934,0.67255,0.616151,0.614746,0.480522,0.496613,0.321163,0.271009,0.470788,0.157476,0.105158,0.10283
2,0.836735,0.384615,0.357136,0.54949,0.257321,0.374593,0.436702,0.56316,1.338627,0.904537,0.99955,1.049841,2.10316,2.056608
3,0.244898,0.25641,0.606603,0.747047,0.73831,0.817196,0.886553,0.871065,0.356848,0.478782,0.369834,0.377943,0.389085,0.359906
4,0.081633,0.179487,0.036798,0.303386,0.16434,0.258478,0.271145,0.372502,0.0,0.283746,0.0002,0.0,0.094327,0.093164
5,0.897959,0.487179,0.052701,0.418808,0.456486,0.332738,0.37622,0.466302,2.384727,5.001649,1.516717,1.105273,1.055471,4.992519
6,0.469388,0.512821,0.019903,0.484137,0.164777,0.306615,0.33953,0.431957,3.615675,0.29007,0.979059,1.234193,1.316788,0.637446
7,0.040816,0.025641,0.033694,0.316698,0.17821,0.271384,0.289452,0.383383,0.090015,0.503354,0.331951,0.33112,0.318734,0.343351
8,0.183673,0.230769,0.0254,0.307791,0.146312,0.253386,0.281374,0.367931,0.131588,0.0,0.0,0.184352,0.0,0.0
9,0.510204,0.717949,-0.014285,1.250016,1.298174,0.268249,0.273,0.369491,19.427972,18.094718,0.275776,0.033175,0.032073,0.266948


### Feature Encoded: OneHotEncoder

In [11]:
# dropping unnecessary features
df_inf_cat = df_inf_cat.drop(['sex','education_level','marital_status','default_payment_next_month'],axis=1).sort_index()
df_inf_cat

Unnamed: 0,pay_0,pay_2,pay_3,pay_4,pay_5,pay_6
0,0.0,0.0,0.0,0.0,0,0
1,0.0,0.0,0.0,0.0,0,0
2,0.0,0.0,0.0,0.0,0,0
3,0.0,0.0,0.0,0.0,0,0
4,-1.0,2.0,-1.0,0.0,0,-1
5,-1.0,-1.0,-1.0,-1.0,0,0
6,1.0,-1.0,-1.0,-1.0,-1,-1
7,-1.0,0.0,-1.0,-1.0,-1,-1
8,-2.0,-2.0,-2.0,-2.0,-2,-2
9,-2.0,-2.0,-2.0,-2.0,-2,-2


In [12]:
# Making feature encoding for categorical columns with OneHotEncoder
df_inf_cat_encoded = e_OneHotEncoder.transform(df_inf_cat).toarray()
df_inf_cat_encoded = pd.DataFrame(df_inf_cat_encoded, index = df_inf_cat.index)
df_inf_cat_encoded

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,46,47,48,49,50,51,52,53,54,55
0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
4,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
6,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
7,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
8,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [13]:
#showing feature names
df_inf_cat_encoded.columns = e_OneHotEncoder.get_feature_names_out()
df_inf_cat_encoded.head(10)

Unnamed: 0,pay_0_-2.0,pay_0_-1.0,pay_0_0.0,pay_0_1.0,pay_0_2.0,pay_0_3.0,pay_0_4.0,pay_0_5.0,pay_0_7.0,pay_0_8.0,...,pay_5_6,pay_5_7,pay_6_-2,pay_6_-1,pay_6_0,pay_6_2,pay_6_3,pay_6_4,pay_6_6,pay_6_7
0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
4,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
6,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
7,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
8,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [14]:
# Concatenate Numerical Columns and Categorical Columns
df_inf_final = pd.concat([df_inf_cat_encoded,df_inf_num_scaled], axis=1)
df_inf_final

Unnamed: 0,pay_0_-2.0,pay_0_-1.0,pay_0_0.0,pay_0_1.0,pay_0_2.0,pay_0_3.0,pay_0_4.0,pay_0_5.0,pay_0_7.0,pay_0_8.0,...,bill_amt_3,bill_amt_4,bill_amt_5,bill_amt_6,pay_amt_1,pay_amt_2,pay_amt_3,pay_amt_4,pay_amt_5,pay_amt_6
0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.325615,0.359393,0.376825,0.462692,0.366037,0.12891,0.059973,0.125981,0.262895,1.850947
1,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.616151,0.614746,0.480522,0.496613,0.321163,0.271009,0.470788,0.157476,0.105158,0.10283
2,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.257321,0.374593,0.436702,0.56316,1.338627,0.904537,0.99955,1.049841,2.10316,2.056608
3,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.73831,0.817196,0.886553,0.871065,0.356848,0.478782,0.369834,0.377943,0.389085,0.359906
4,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.16434,0.258478,0.271145,0.372502,0.0,0.283746,0.0002,0.0,0.094327,0.093164
5,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.456486,0.332738,0.37622,0.466302,2.384727,5.001649,1.516717,1.105273,1.055471,4.992519
6,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.164777,0.306615,0.33953,0.431957,3.615675,0.29007,0.979059,1.234193,1.316788,0.637446
7,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.17821,0.271384,0.289452,0.383383,0.090015,0.503354,0.331951,0.33112,0.318734,0.343351
8,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.146312,0.253386,0.281374,0.367931,0.131588,0.0,0.0,0.184352,0.0,0.0
9,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.298174,0.268249,0.273,0.369491,19.427972,18.094718,0.275776,0.033175,0.032073,0.266948


## Model Training

In [15]:
# Predict using Logistic regression

y_pred_inf = m_svm.predict(df_inf_final) #model_lin_reg from load files
y_pred_inf

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [16]:
# Create default_payment_next_month Prediction DataFrame

y_pred_inf_df = pd.DataFrame(y_pred_inf, columns=['default_payment_next_month_prediction'],index=df_inf_final.index)
round(y_pred_inf_df.head(10),2)

Unnamed: 0,default_payment_next_month_prediction
0,0.0
1,0.0
2,0.0
3,0.0
4,0.0
5,0.0
6,0.0
7,0.0
8,0.0
9,0.0


In [17]:
# Comparing the real default_payment_next_month and predicted classification
model_inf = pd.concat([df_inf_copy['default_payment_next_month'], y_pred_inf_df['default_payment_next_month_prediction']], axis=1).sort_index()
model_inf

Unnamed: 0,default_payment_next_month,default_payment_next_month_prediction
0,0,0.0
1,0,0.0
2,0,0.0
3,0,0.0
4,0,0.0
5,0,0.0
6,0,0.0
7,0,0.0
8,0,0.0
9,0,0.0


It can be concluded that the model that we define is quite accurate in predicting data inference. On the other hand, it should be noted that this model inference is just 10 data, which could be not enough to represent the accuracy.