# BentoML Example: H2O Loan Default Prediction

**BentoML makes moving trained ML models to production easy:**

* Package models trained with **any ML framework** and reproduce them for model serving in production
* **Deploy anywhere** for online API serving or offline batch serving
* High-Performance API model server with *adaptive micro-batching* support
* Central hub for managing models and deployment process via Web UI and APIs
* Modular and flexible design making it *adaptable to your infrastrcuture*

BentoML is a framework for serving, managing, and deploying machine learning models. It is aiming to bridge the gap between Data Science and DevOps, and enable teams to deliver prediction services in a fast, repeatable, and scalable way.

Before reading this example project, be sure to check out the [Getting started guide](https://github.com/bentoml/BentoML/blob/master/guides/quick-start/bentoml-quick-start-guide.ipynb) to learn about the basic concepts in BentoML.

This notebook demonstrates how to use BentoML to __turn a H2O model into a docker image containing a REST API server__ serving this model, as well as distributing your model as a command line tool or a pip-installable PyPI package.

The notebook was built based on: https://github.com/kguruswamy/H2O3-Driverless-AI-Code-Examples/blob/master/Lending%20Club%20Data%20-%20H2O3%20Auto%20ML%20-%20Python%20Tutorial.ipynb

![Impression](https://www.google-analytics.com/collect?v=1&tid=UA-112879361-3&cid=555&t=event&ec=h2o&ea=h2o-loan-default-prediction&dt=h2o-loan-default-prediction)

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

import warnings
warnings.filterwarnings("ignore")

In [None]:
!pip install -q bentoml "h2o>=3.24.0.2" "xlrd>=1.2.0" "sklearn>=0.23.2" "pandas>=1.1.1" "numpy>=1.18.4"

In [2]:
import h2o
import bentoml
import numpy as np
import pandas as pd

import requests
import math
from sklearn import model_selection

h2o.init(strict_version_check=False)

Checking whether there is an H2O instance running at http://localhost:54321 ..... not found.
Attempting to start a local H2O server...
  Java Version: java version "9.0.1"; Java(TM) SE Runtime Environment (build 9.0.1+11); Java HotSpot(TM) 64-Bit Server VM (build 9.0.1+11, mixed mode)
  Starting server from /usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /var/folders/kn/xnc9k74x03567n1mx2tfqnpr0000gn/T/tmpm34g1lnd
  JVM stdout: /var/folders/kn/xnc9k74x03567n1mx2tfqnpr0000gn/T/tmpm34g1lnd/h2o_bozhaoyu_started_from_python.out
  JVM stderr: /var/folders/kn/xnc9k74x03567n1mx2tfqnpr0000gn/T/tmpm34g1lnd/h2o_bozhaoyu_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.


0,1
H2O cluster uptime:,02 secs
H2O cluster timezone:,America/Los_Angeles
H2O data parsing timezone:,UTC
H2O cluster version:,3.24.0.2
H2O cluster version age:,"1 year, 5 months and 5 days !!!"
H2O cluster name:,H2O_from_python_bozhaoyu_392ekt
H2O cluster total nodes:,1
H2O cluster free memory:,4 Gb
H2O cluster total cores:,8
H2O cluster allowed cores:,8


## Prepare Dataset

In [3]:
%%bash

# Download training dataset
if [ ! -f ./LoanStats3c.csv.zip ]; then
    curl -O https://resources.lendingclub.com/LoanStats3c.csv.zip
fi

In [4]:
pd.set_option('expand_frame_repr', True)
pd.set_option('max_colwidth',9999)
pd.set_option('display.max_columns',9999)
pd.set_option('display.max_rows',9999)

data_dictionary = pd.read_excel("https://resources.lendingclub.com/LCDataDictionary.xlsx")
data_dictionary

Unnamed: 0,LoanStatNew,Description
0,acc_now_delinq,The number of accounts on which the borrower is now delinquent.
1,acc_open_past_24mths,Number of trades opened in past 24 months.
2,addr_state,The state provided by the borrower in the loan application
3,all_util,Balance to credit limit on all trades
4,annual_inc,The self-reported annual income provided by the borrower during registration.
5,annual_inc_joint,The combined self-reported annual income provided by the co-borrowers during registration
6,application_type,Indicates whether the loan is an individual application or a joint application with two co-borrowers
7,avg_cur_bal,Average current balance of all accounts
8,bc_open_to_buy,Total open to buy on revolving bankcards.
9,bc_util,Ratio of total current balance to high credit/credit limit for all bankcard accounts.


In [5]:
# Very first row has non-header data and hence skipping it. Read to a data frame
# Fix the Mon-Year on one column to be readable

def parse_dates(x):
    return datetime.strptime(x, "%b-%d")

lc = pd.read_csv("LoanStats3c.csv.zip", skiprows=1,verbose=False, parse_dates=['issue_d'],low_memory=False) 
lc.shape

(235631, 144)

In [6]:
lc.loan_status.unique()

array(['Fully Paid', 'Charged Off', 'Current', 'In Grace Period',
       'Late (31-120 days)', 'Default', 'Late (16-30 days)', nan],
      dtype=object)

In [7]:
# Keep just "Fully Paid" and "Charged Off" to make it a simple 'Yes' or 'No' - binary classification problem

lc = lc[lc.loan_status.isin(['Fully Paid','Charged Off'])]
lc.loan_status.unique()

array(['Fully Paid', 'Charged Off'], dtype=object)

In [8]:
# Drop the columns from the data frame that are Target Leakage ones
# Target Leakage columns are generally created in hindsight by analysts/data engineers/operations after an outcome 
# was detected in historical data. If we don't remove them now, they would climb to the top of the feature list after a model is built and 
# falsely increase the accuracy to 95% :) 
#
# In Production or real life scoring environment, don't expect these columns to be available at scoring time
# , that is,when someone applies for a loan. So we don't train on those columns ...

ignored_cols = [ 
                'out_prncp',                 # Remaining outstanding principal for total amount funded
                'out_prncp_inv',             # Remaining outstanding principal for portion of total amount 
                                             # funded by investors
                'total_pymnt',               # Payments received to date for total amount funded
                'total_pymnt_inv',           # Payments received to date for portion of total amount 
                                             # funded by investors
                'total_rec_prncp',           # Principal received to date 
                'total_rec_int',             # Interest received to date
                'total_rec_late_fee',        # Late fees received to date
                'recoveries',                # post charge off gross recovery
                'collection_recovery_fee',   # post charge off collection fee
                'last_pymnt_d',              # Last month payment was received
                'last_pymnt_amnt',           # Last total payment amount received
                'next_pymnt_d',              # Next scheduled payment date
                'last_credit_pull_d',        # The most recent month LC pulled credit for this loan
                'settlement_term',           # The number of months that the borrower will be on the settlement plan
                'settlement_date',           # The date that the borrower agrees to the settlement plan
                'settlement_amount',         # The loan amount that the borrower has agreed to settle for
                'settlement_percentage',     # The settlement amount as a percentage of the payoff balance amount on the loan
                'settlement_status',         # The status of the borrower’s settlement plan. Possible values are: 
                                             # COMPLETE, ACTIVE, BROKEN, CANCELLED, DENIED, DRAF
                'debt_settlement_flag',      # Flags whether or not the borrower, who has charged-off, is working with 
                                             # a debt-settlement company.
                'debt_settlement_flag_date'  # The most recent date that the Debt_Settlement_Flag has been set
                ]

lc = lc.drop(columns=ignored_cols, axis = 1)

In [9]:
# After dropping Target Leakage columns, we have 223K rows and 125 columns
lc.shape

(235543, 124)

In [10]:
import csv
import os 

train_path = os.getcwd() + "/train_lc.csv.zip"
test_path = os.getcwd() + "/test_lc.csv.zip"

train_lc, test_lc = model_selection.train_test_split(lc, test_size=0.2, random_state=10,stratify=lc['loan_status'])
train_lc.to_csv(train_path, index=False,compression="zip")
test_lc.to_csv(test_path, index=False,compression="zip")
print('Train LC shape', train_lc.shape)
print('Test LC shape', test_lc.shape)

# These two CSV files were created in the previous section
train_path = os.getcwd()+"/train_lc.csv.zip"
test_path = os.getcwd()+ "/test_lc.csv.zip"

train = h2o.load_dataset(train_path)
test = h2o.load_dataset(test_path)


train.describe()

Train LC shape (188434, 124)
Test LC shape (47109, 124)
Parse progress: |█████████████████████████████████████████████████████████| 100%
Parse progress: |█████████████████████████████████████████████████████████| 100%
Rows:188434
Cols:124




Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,url,desc,purpose,title,zip_code,addr_state,dti,delinq_2yrs,earliest_cr_line,inq_last_6mths,mths_since_last_delinq,mths_since_last_record,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,collections_12_mths_ex_med,mths_since_last_major_derog,policy_code,application_type,annual_inc_joint,dti_joint,verification_status_joint,acc_now_delinq,tot_coll_amt,tot_cur_bal,open_acc_6m,open_act_il,open_il_12m,open_il_24m,mths_since_rcnt_il,total_bal_il,il_util,open_rv_12m,open_rv_24m,max_bal_bc,all_util,total_rev_hi_lim,inq_fi,total_cu_tl,inq_last_12m,acc_open_past_24mths,avg_cur_bal,bc_open_to_buy,bc_util,chargeoff_within_12_mths,delinq_amnt,mo_sin_old_il_acct,mo_sin_old_rev_tl_op,mo_sin_rcnt_rev_tl_op,mo_sin_rcnt_tl,mort_acc,mths_since_recent_bc,mths_since_recent_bc_dlq,mths_since_recent_inq,mths_since_recent_revol_delinq,num_accts_ever_120_pd,num_actv_bc_tl,num_actv_rev_tl,num_bc_sats,num_bc_tl,num_il_tl,num_op_rev_tl,num_rev_accts,num_rev_tl_bal_gt_0,num_sats,num_tl_120dpd_2m,num_tl_30dpd,num_tl_90g_dpd_24m,num_tl_op_past_12m,pct_tl_nvr_dlq,percent_bc_gt_75,pub_rec_bankruptcies,tax_liens,tot_hi_cred_lim,total_bal_ex_mort,total_bc_limit,total_il_high_credit_limit,revol_bal_joint,sec_app_earliest_cr_line,sec_app_inq_last_6mths,sec_app_mort_acc,sec_app_open_acc,sec_app_revol_util,sec_app_open_act_il,sec_app_num_rev_accts,sec_app_chargeoff_within_12_mths,sec_app_collections_12_mths_ex_med,sec_app_mths_since_last_major_derog,hardship_flag,hardship_type,hardship_reason,hardship_status,deferral_term,hardship_amount,hardship_start_date,hardship_end_date,payment_plan_start_date,hardship_length,hardship_dpd,hardship_loan_status,orig_projected_additional_accrued_interest,hardship_payoff_balance_amount,hardship_last_payment_amount
type,int,int,int,int,int,enum,real,real,enum,enum,enum,enum,enum,real,enum,time,enum,enum,int,string,enum,enum,enum,enum,real,int,time,int,int,int,int,int,int,real,int,enum,int,int,int,enum,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,real,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,real,real,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,enum,enum,enum,enum,int,real,time,time,time,int,int,enum,real,real,real
mins,,,1000.0,1000.0,950.0,,0.06,23.36,,,,,,3000.0,,1388534400000.0,,,,,,,,,0.0,0.0,-820540800000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,,0.0,0.0,1.0,,,,,0.0,0.0,0.0,,,,,,,,,,,,0.0,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,16.7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,3.0,1.47,1485907200000.0,1491004800000.0,1485907200000.0,3.0,0.0,,4.41,174.15,0.04
mean,0.0,0.0,14884.780480168118,14884.780480168118,14879.984769203027,,0.13768163813324627,443.02308527123586,,,,,,74842.1021655859,,1403772635222.944,,,0.0,,,,,,18.038431917806744,0.3439559739749727,878010488536.0388,0.7575596760669507,33.40950190769873,70.73781512605042,11.671577316195588,0.2224598533173423,16517.317517008567,0.5562211189754319,26.019354256662876,,0.015474914293598825,42.4452214452214,1.0,,0.0,0.0,0.0,0.0056200048823460734,280.02962841100884,139916.78260292622,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,30777.311727183012,0.0,0.0,0.0,4.403568358151916,13425.63404110787,8488.266061096774,64.59277420047215,0.010783616544784914,9.796506999798337,128.53483025519225,185.81975121262616,13.078101616481097,8.003645838861342,1.853216510820764,24.440105868864123,39.5963653177332,6.918806067907228,35.46866324059305,0.5053122048038038,3.686240275109585,5.803241453240905,4.646629589139977,8.54615939798552,8.57417451203075,8.277009456892086,15.304217922455612,5.767600326904917,11.622318689833035,0.0009551360519945328,0.0036511457592578833,0.09438848615430329,2.0082309986520546,94.24337699141348,50.68730479940775,0.13524098623390737,0.05543054862710553,170413.48133033328,48481.2246834436,20070.11836505092,39932.08656611851,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,3.0,114.21573863636364,1509994881818.182,1516242681818.182,1510936200000.0,3.0,14.59375,,339.8909328358209,7947.003380681818,187.48076704545457
maxs,,,35000.0,35000.0,35000.0,,0.2606,1409.99,,,,,,7500000.0,,1417392000000.0,,,,,,,,,39.99,22.0,1320105600000.0,6.0,188.0,121.0,84.0,63.0,2560703.0,8.923,156.0,,20.0,188.0,1.0,,,,,3.0,9152545.0,3840795.0,,,,,,,,,,,,9999999.0,,,,53.0,497484.0,260250.0,255.2,7.0,65000.0,561.0,842.0,372.0,226.0,37.0,616.0,170.0,25.0,180.0,30.0,26.0,38.0,35.0,61.0,150.0,62.0,105.0,38.0,84.0,2.0,3.0,22.0,26.0,100.0,100.0,7.0,63.0,9999999.0,2688920.0,1090700.0,1027358.0,,,,,,,,,,,,,,,,3.0,344.24,1556668800000.0,1561939200000.0,1556668800000.0,3.0,32.0,,1032.72,20321.15,713.04
sigma,-0.0,-0.0,8444.529842237767,8444.529842237767,8441.74965734052,,0.043236714795126696,245.55956150039466,,,,,,55879.6254552511,,8618925772.810982,,,-0.0,,,,,,8.023289119934871,0.9000809312845888,235685775429.60028,1.035364539099023,21.777363524527555,28.5075895180931,5.280407394680721,0.6053968655503703,21598.80009195033,0.23102402224846186,11.891471363727664,,0.14101201867640875,20.880974652987707,0.0,,-0.0,-0.0,-0.0,0.07950349603856371,21174.68581402208,153006.06065658265,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,38384.00007004573,-0.0,-0.0,-0.0,2.8642976762774732,16026.702776462502,13412.411414410766,26.42401766142629,0.1173765114760668,565.4116442360041,51.34992108739378,93.00236919417999,16.134812986440643,8.759052551750663,2.1611275634408718,30.30743682243691,22.573879361083183,5.9354275780436705,22.304832497335703,1.2700994420156504,2.1527147426857067,3.1405449886489047,2.7200394282746365,4.812694785276978,7.3089541963331675,4.318223682539508,8.047168776251766,3.1221446203671164,5.278551769075766,0.03211042594205823,0.06423450317302083,0.49334840182534967,1.6084979574620704,8.481447276838344,34.90775600526837,0.3756866400997473,0.4123003808101904,172512.507823174,46113.3699393363,20243.507596068586,41490.84884582259,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,,,,,0.0,77.0094025673706,14500662214.885572,14471630662.23209,14520502785.954859,0.0,9.499873480938898,,240.1814929663525,4637.06552271656,147.17888898625247
zeros,0,0,0,0,0,,0,0,,,,,,0,,0,,,0,0,,,,,60,149487,34,100647,222,3,2,155114,448,480,0,,185745,37,0,,0,0,0,187438,159941,34,0,0,0,0,0,0,0,0,0,0,0,66,0,0,0,7733,31,3801,1595,186638,187742,2,0,2977,3084,72408,1245,97,15431,157,143588,3405,453,1839,330,5758,53,0,448,2,182006,187787,176622,32108,0,32367,164655,181918,3,68,2036,25148,0,0,0,0,0,0,0,0,0,0,0,,,,,0,0,0,0,0,0,68,,0,0,0
missing,188434,188434,0,0,0,0,0,0,0,0,10581,9589,0,0,0,0,0,0,188434,176217,0,0,0,0,0,0,0,0,92769,155114,0,0,0,101,0,0,0,135238,0,0,188434,188434,188434,0,0,0,188434,188434,188434,188434,188434,188434,188434,188434,188434,188434,188434,0,188434,188434,188434,0,3,1943,2074,0,0,5748,0,0,0,0,1788,138691,17436,120718,0,0,0,0,0,0,0,0,0,0,6261,0,0,0,0,2036,0,0,0,0,0,0,188434,188434,188434,188434,188434,188434,188434,188434,188434,188434,188434,0,188082,188082,188082,188082,188082,188082,188082,188082,188082,188082,188082,188166,188082,188082
0,,,18700.0,18700.0,18700.0,60 months,0.1629,457.64,D,D2,Assistant Manager,4 years,MORTGAGE,52000.0,Not Verified,2014-05-01 00:00:00,Fully Paid,n,,,credit_card,Credit card refinancing,630xx,MO,11.65,0.0,1999-08-01 00:00:00,5.0,59.0,,20.0,0.0,16920.0,0.502,37.0,w,0.0,59.0,1.0,Individual,,,,0.0,0.0,117999.0,,,,,,,,,,,,33700.0,,,,6.0,6210.0,4113.0,80.4,0.0,0.0,177.0,123.0,1.0,1.0,1.0,8.0,59.0,1.0,59.0,1.0,5.0,6.0,9.0,10.0,17.0,16.0,19.0,6.0,20.0,0.0,0.0,0.0,2.0,91.7,50.0,0.0,0.0,141329.0,51831.0,21000.0,39404.0,,,,,,,,,,,,N,,,,,,,,,,,,,,
1,,,20000.0,20000.0,20000.0,36 months,0.0917,637.58,B,B1,Engineering,4 years,RENT,93000.0,Source Verified,2014-08-01 00:00:00,Fully Paid,n,,,debt_consolidation,Debt consolidation,334xx,FL,19.15,0.0,1996-09-01 00:00:00,0.0,43.0,,9.0,0.0,10597.0,0.609,24.0,f,1.0,43.0,1.0,Individual,,,,0.0,2305.0,61086.0,,,,,,,,,,,,17400.0,,,,2.0,6787.0,2303.0,82.1,0.0,0.0,215.0,178.0,33.0,5.0,0.0,33.0,,9.0,,2.0,3.0,3.0,4.0,9.0,11.0,5.0,12.0,3.0,9.0,0.0,0.0,0.0,1.0,91.7,50.0,0.0,0.0,79108.0,61086.0,12900.0,61708.0,,,,,,,,,,,,N,,,,,,,,,,,,,,
2,,,11000.0,11000.0,11000.0,36 months,0.1099,360.08,B,B2,Teacher director,10+ years,RENT,30000.0,Not Verified,2014-02-01 00:00:00,Fully Paid,n,,,debt_consolidation,Debt consolidation,010xx,MA,27.84,0.0,1994-11-01 00:00:00,1.0,,,10.0,0.0,11523.0,0.546,21.0,w,0.0,,1.0,Individual,,,,0.0,90.0,17778.0,,,,,,,,,,,,21100.0,,,,2.0,1778.0,2200.0,71.8,0.0,0.0,133.0,231.0,6.0,6.0,0.0,63.0,,6.0,,0.0,4.0,9.0,4.0,9.0,5.0,9.0,16.0,9.0,10.0,0.0,0.0,0.0,1.0,100.0,75.0,0.0,0.0,35076.0,17778.0,7800.0,13976.0,,,,,,,,,,,,N,,,,,,,,,,,,,,


In [11]:
import os

# These two CSV files were created in the previous section

train_path = os.getcwd()+"/train_lc.csv.zip"
test_path = os.getcwd()+ "/test_lc.csv.zip"

train = h2o.load_dataset(train_path)
test = h2o.load_dataset(test_path)

Parse progress: |█████████████████████████████████████████████████████████| 100%
Parse progress: |█████████████████████████████████████████████████████████| 100%


In [12]:
train.describe()

Rows:188434
Cols:124




Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,url,desc,purpose,title,zip_code,addr_state,dti,delinq_2yrs,earliest_cr_line,inq_last_6mths,mths_since_last_delinq,mths_since_last_record,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,collections_12_mths_ex_med,mths_since_last_major_derog,policy_code,application_type,annual_inc_joint,dti_joint,verification_status_joint,acc_now_delinq,tot_coll_amt,tot_cur_bal,open_acc_6m,open_act_il,open_il_12m,open_il_24m,mths_since_rcnt_il,total_bal_il,il_util,open_rv_12m,open_rv_24m,max_bal_bc,all_util,total_rev_hi_lim,inq_fi,total_cu_tl,inq_last_12m,acc_open_past_24mths,avg_cur_bal,bc_open_to_buy,bc_util,chargeoff_within_12_mths,delinq_amnt,mo_sin_old_il_acct,mo_sin_old_rev_tl_op,mo_sin_rcnt_rev_tl_op,mo_sin_rcnt_tl,mort_acc,mths_since_recent_bc,mths_since_recent_bc_dlq,mths_since_recent_inq,mths_since_recent_revol_delinq,num_accts_ever_120_pd,num_actv_bc_tl,num_actv_rev_tl,num_bc_sats,num_bc_tl,num_il_tl,num_op_rev_tl,num_rev_accts,num_rev_tl_bal_gt_0,num_sats,num_tl_120dpd_2m,num_tl_30dpd,num_tl_90g_dpd_24m,num_tl_op_past_12m,pct_tl_nvr_dlq,percent_bc_gt_75,pub_rec_bankruptcies,tax_liens,tot_hi_cred_lim,total_bal_ex_mort,total_bc_limit,total_il_high_credit_limit,revol_bal_joint,sec_app_earliest_cr_line,sec_app_inq_last_6mths,sec_app_mort_acc,sec_app_open_acc,sec_app_revol_util,sec_app_open_act_il,sec_app_num_rev_accts,sec_app_chargeoff_within_12_mths,sec_app_collections_12_mths_ex_med,sec_app_mths_since_last_major_derog,hardship_flag,hardship_type,hardship_reason,hardship_status,deferral_term,hardship_amount,hardship_start_date,hardship_end_date,payment_plan_start_date,hardship_length,hardship_dpd,hardship_loan_status,orig_projected_additional_accrued_interest,hardship_payoff_balance_amount,hardship_last_payment_amount
type,int,int,int,int,int,enum,real,real,enum,enum,enum,enum,enum,real,enum,time,enum,enum,int,string,enum,enum,enum,enum,real,int,time,int,int,int,int,int,int,real,int,enum,int,int,int,enum,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,real,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,real,real,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,enum,enum,enum,enum,int,real,time,time,time,int,int,enum,real,real,real
mins,,,1000.0,1000.0,950.0,,0.06,23.36,,,,,,3000.0,,1388534400000.0,,,,,,,,,0.0,0.0,-820540800000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,,0.0,0.0,1.0,,,,,0.0,0.0,0.0,,,,,,,,,,,,0.0,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,16.7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,3.0,1.47,1485907200000.0,1491004800000.0,1485907200000.0,3.0,0.0,,4.41,174.15,0.04
mean,0.0,0.0,14884.780480168118,14884.780480168118,14879.984769203027,,0.13768163813324627,443.02308527123586,,,,,,74842.1021655859,,1403772635222.944,,,0.0,,,,,,18.038431917806744,0.3439559739749727,878010488536.0388,0.7575596760669507,33.40950190769873,70.73781512605042,11.671577316195588,0.2224598533173423,16517.317517008567,0.5562211189754319,26.019354256662876,,0.015474914293598825,42.4452214452214,1.0,,0.0,0.0,0.0,0.0056200048823460734,280.02962841100884,139916.78260292622,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,30777.311727183012,0.0,0.0,0.0,4.403568358151916,13425.63404110787,8488.266061096774,64.59277420047215,0.010783616544784914,9.796506999798337,128.53483025519225,185.81975121262616,13.078101616481097,8.003645838861342,1.853216510820764,24.440105868864123,39.5963653177332,6.918806067907228,35.46866324059305,0.5053122048038038,3.686240275109585,5.803241453240905,4.646629589139977,8.54615939798552,8.57417451203075,8.277009456892086,15.304217922455612,5.767600326904917,11.622318689833035,0.0009551360519945328,0.0036511457592578833,0.09438848615430329,2.0082309986520546,94.24337699141348,50.68730479940775,0.13524098623390737,0.05543054862710553,170413.48133033328,48481.2246834436,20070.11836505092,39932.08656611851,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,3.0,114.21573863636364,1509994881818.182,1516242681818.182,1510936200000.0,3.0,14.59375,,339.8909328358209,7947.003380681818,187.48076704545457
maxs,,,35000.0,35000.0,35000.0,,0.2606,1409.99,,,,,,7500000.0,,1417392000000.0,,,,,,,,,39.99,22.0,1320105600000.0,6.0,188.0,121.0,84.0,63.0,2560703.0,8.923,156.0,,20.0,188.0,1.0,,,,,3.0,9152545.0,3840795.0,,,,,,,,,,,,9999999.0,,,,53.0,497484.0,260250.0,255.2,7.0,65000.0,561.0,842.0,372.0,226.0,37.0,616.0,170.0,25.0,180.0,30.0,26.0,38.0,35.0,61.0,150.0,62.0,105.0,38.0,84.0,2.0,3.0,22.0,26.0,100.0,100.0,7.0,63.0,9999999.0,2688920.0,1090700.0,1027358.0,,,,,,,,,,,,,,,,3.0,344.24,1556668800000.0,1561939200000.0,1556668800000.0,3.0,32.0,,1032.72,20321.15,713.04
sigma,-0.0,-0.0,8444.529842237767,8444.529842237767,8441.74965734052,,0.043236714795126696,245.55956150039466,,,,,,55879.6254552511,,8618925772.810982,,,-0.0,,,,,,8.023289119934871,0.9000809312845888,235685775429.60028,1.035364539099023,21.777363524527555,28.5075895180931,5.280407394680721,0.6053968655503703,21598.80009195033,0.23102402224846186,11.891471363727664,,0.14101201867640875,20.880974652987707,0.0,,-0.0,-0.0,-0.0,0.07950349603856371,21174.68581402208,153006.06065658265,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,38384.00007004573,-0.0,-0.0,-0.0,2.8642976762774732,16026.702776462502,13412.411414410766,26.42401766142629,0.1173765114760668,565.4116442360041,51.34992108739378,93.00236919417999,16.134812986440643,8.759052551750663,2.1611275634408718,30.30743682243691,22.573879361083183,5.9354275780436705,22.304832497335703,1.2700994420156504,2.1527147426857067,3.1405449886489047,2.7200394282746365,4.812694785276978,7.3089541963331675,4.318223682539508,8.047168776251766,3.1221446203671164,5.278551769075766,0.03211042594205823,0.06423450317302083,0.49334840182534967,1.6084979574620704,8.481447276838344,34.90775600526837,0.3756866400997473,0.4123003808101904,172512.507823174,46113.3699393363,20243.507596068586,41490.84884582259,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,,,,,0.0,77.0094025673706,14500662214.885572,14471630662.23209,14520502785.954859,0.0,9.499873480938898,,240.1814929663525,4637.06552271656,147.17888898625247
zeros,0,0,0,0,0,,0,0,,,,,,0,,0,,,0,0,,,,,60,149487,34,100647,222,3,2,155114,448,480,0,,185745,37,0,,0,0,0,187438,159941,34,0,0,0,0,0,0,0,0,0,0,0,66,0,0,0,7733,31,3801,1595,186638,187742,2,0,2977,3084,72408,1245,97,15431,157,143588,3405,453,1839,330,5758,53,0,448,2,182006,187787,176622,32108,0,32367,164655,181918,3,68,2036,25148,0,0,0,0,0,0,0,0,0,0,0,,,,,0,0,0,0,0,0,68,,0,0,0
missing,188434,188434,0,0,0,0,0,0,0,0,10581,9589,0,0,0,0,0,0,188434,176217,0,0,0,0,0,0,0,0,92769,155114,0,0,0,101,0,0,0,135238,0,0,188434,188434,188434,0,0,0,188434,188434,188434,188434,188434,188434,188434,188434,188434,188434,188434,0,188434,188434,188434,0,3,1943,2074,0,0,5748,0,0,0,0,1788,138691,17436,120718,0,0,0,0,0,0,0,0,0,0,6261,0,0,0,0,2036,0,0,0,0,0,0,188434,188434,188434,188434,188434,188434,188434,188434,188434,188434,188434,0,188082,188082,188082,188082,188082,188082,188082,188082,188082,188082,188082,188166,188082,188082
0,,,18700.0,18700.0,18700.0,60 months,0.1629,457.64,D,D2,Assistant Manager,4 years,MORTGAGE,52000.0,Not Verified,2014-05-01 00:00:00,Fully Paid,n,,,credit_card,Credit card refinancing,630xx,MO,11.65,0.0,1999-08-01 00:00:00,5.0,59.0,,20.0,0.0,16920.0,0.502,37.0,w,0.0,59.0,1.0,Individual,,,,0.0,0.0,117999.0,,,,,,,,,,,,33700.0,,,,6.0,6210.0,4113.0,80.4,0.0,0.0,177.0,123.0,1.0,1.0,1.0,8.0,59.0,1.0,59.0,1.0,5.0,6.0,9.0,10.0,17.0,16.0,19.0,6.0,20.0,0.0,0.0,0.0,2.0,91.7,50.0,0.0,0.0,141329.0,51831.0,21000.0,39404.0,,,,,,,,,,,,N,,,,,,,,,,,,,,
1,,,20000.0,20000.0,20000.0,36 months,0.0917,637.58,B,B1,Engineering,4 years,RENT,93000.0,Source Verified,2014-08-01 00:00:00,Fully Paid,n,,,debt_consolidation,Debt consolidation,334xx,FL,19.15,0.0,1996-09-01 00:00:00,0.0,43.0,,9.0,0.0,10597.0,0.609,24.0,f,1.0,43.0,1.0,Individual,,,,0.0,2305.0,61086.0,,,,,,,,,,,,17400.0,,,,2.0,6787.0,2303.0,82.1,0.0,0.0,215.0,178.0,33.0,5.0,0.0,33.0,,9.0,,2.0,3.0,3.0,4.0,9.0,11.0,5.0,12.0,3.0,9.0,0.0,0.0,0.0,1.0,91.7,50.0,0.0,0.0,79108.0,61086.0,12900.0,61708.0,,,,,,,,,,,,N,,,,,,,,,,,,,,
2,,,11000.0,11000.0,11000.0,36 months,0.1099,360.08,B,B2,Teacher director,10+ years,RENT,30000.0,Not Verified,2014-02-01 00:00:00,Fully Paid,n,,,debt_consolidation,Debt consolidation,010xx,MA,27.84,0.0,1994-11-01 00:00:00,1.0,,,10.0,0.0,11523.0,0.546,21.0,w,0.0,,1.0,Individual,,,,0.0,90.0,17778.0,,,,,,,,,,,,21100.0,,,,2.0,1778.0,2200.0,71.8,0.0,0.0,133.0,231.0,6.0,6.0,0.0,63.0,,6.0,,0.0,4.0,9.0,4.0,9.0,5.0,9.0,16.0,9.0,10.0,0.0,0.0,0.0,1.0,100.0,75.0,0.0,0.0,35076.0,17778.0,7800.0,13976.0,,,,,,,,,,,,N,,,,,,,,,,,,,,


## Model Training

In [13]:
from h2o.automl import H2OAutoML

# Identify predictors and response
x = train.columns
y = "loan_status"
x.remove(y)

# For binary classification, response should be a factor
train[y] = train[y].asfactor()
test[y] = test[y].asfactor()

# Run AutoML 
aml = H2OAutoML(project_name='LP', 
                max_models=1,         # 1 base models *FOR DEMO PURPOSE
                balance_classes=True,  # Doing smart Class imbalance sampling
                max_runtime_secs=3600,  # 1 hour *FOR DEMO PURPOSE (need to be longer runtime or else model will not train)
                seed=1234)             # Set a seed for reproducability
aml.train(x=x, y=y, training_frame=train)

AutoML progress: |████████████████████████████████████████████████████████| 100%


### View the AutoML Leaderboard

In [14]:
lb = aml.leaderboard
lb.head(rows=lb.nrows)  # Print all rows instead of default (10 rows)

model_id,auc,logloss,mean_per_class_error,rmse,mse
XGBoost_1_AutoML_20200922_122203,0.705681,0.4271,0.49588,0.366057,0.133997




In [15]:
test_pc = aml.predict(test)
test_pc

xgboost prediction progress: |████████████████████████████████████████████| 100%


predict,Charged Off,Fully Paid
Fully Paid,0.0300723,0.969928
Fully Paid,0.038023,0.961977
Fully Paid,0.0306047,0.969395
Fully Paid,0.242732,0.757268
Fully Paid,0.0589932,0.941007
Fully Paid,0.133116,0.866884
Fully Paid,0.34577,0.65423
Charged Off,0.432134,0.567866
Fully Paid,0.379319,0.620681
Fully Paid,0.0814494,0.918551




## Define BentoService for model serving

In [16]:
%%writefile loan_prediction.py

import h2o

from bentoml import api, env, artifacts, BentoService
from bentoml.frameworks.h2o import H2oModelArtifact
from bentoml.adapters import DataframeInput

@env(
    pip_packages=['h2o==3.24.0.2', 'pandas'],
    conda_channels=['h2oai'],
    conda_dependencies=['h2o==3.24.0.2']
)
@artifacts([H2oModelArtifact('model')])
class LoanPrediction(BentoService):
    
    @api(input=DataframeInput(), batch=True)
    def predict(self, df):
        h2o_frame = h2o.H2OFrame(df, na_strings=['NaN'])
        predictions = self.artifacts.model.predict(h2o_frame)
        return predictions.as_data_frame()

Overwriting loan_prediction.py


## Save BentoService to file archive

In [17]:
# 1) import the custom BentoService defined above
from loan_prediction import LoanPrediction

# 2) `pack` it with required artifacts
bentoml_svc = LoanPrediction()
bentoml_svc.pack('model', aml.leader)

# 3) save your BentoSerivce
saved_path = bentoml_svc.save()

[2020-09-22 12:39:14,867] INFO - Using default docker base image: `None` specified inBentoML config file or env var. User must make sure that the docker base image either has Python 3.7 or conda installed.
[2020-09-22 12:39:15,844] INFO - Detected non-PyPI-released BentoML installed, copying local BentoML modulefiles to target saved bundle path..


no previously-included directories found matching 'e2e_tests'
no previously-included directories found matching 'tests'
no previously-included directories found matching 'benchmark'


UPDATING BentoML-0.9.0rc0+3.gcebf2015/bentoml/_version.py
set BentoML-0.9.0rc0+3.gcebf2015/bentoml/_version.py to '0.9.0.pre+3.gcebf2015'
[2020-09-22 12:39:19,606] INFO - BentoService bundle 'LoanPrediction:20200922123915_EEBBD2' saved to: /Users/bozhaoyu/bentoml/repository/LoanPrediction/20200922123915_EEBBD2


## REST API Model Serving


To start a REST API model server with the BentoService saved above, use the bentoml serve command:

In [25]:
!bentoml serve LoanPrediction:latest

[2020-09-22 17:48:06,148] INFO - Getting latest version LoanPrediction:20200922123915_EEBBD2
[2020-09-22 17:48:06,148] INFO - Starting BentoML API server in development mode..
[2020-09-22 17:48:06,777] INFO - Using default docker base image: `None` specified inBentoML config file or env var. User must make sure that the docker base image either has Python 3.7 or conda installed.
Checking whether there is an H2O instance running at http://localhost:54321 . connected.
--------------------------  ---------------------------------------------------
H2O cluster uptime:         5 hours 27 mins
H2O cluster timezone:       America/Los_Angeles
H2O data parsing timezone:  UTC
H2O cluster version:        3.24.0.2
H2O cluster version age:    1 year, 5 months and 5 days !!!
H2O cluster name:           H2O_from_python_bozhaoyu_392ekt
H2O cluster total nodes:    1
H2O cluster free memory:    3.906 Gb
H2O cluster total cores:    8
H2O cluster allowed cores:  8
H2O cluster status:         locked, healt

If you are running this notebook from Google Colab, you can start the dev server with `--run-with-ngrok` option, to gain acccess to the API endpoint via a public endpoint managed by [ngrok](https://ngrok.com/):

In [None]:
!bentoml serve LoanPrediction:latest --run-with-ngrok

Open http://127.0.0.1:5000 to see more information about the REST APIs server in your
browser.

```bash
curl -i \
    --request POST \
    --header "Content-Type: text/csv" \
    --data @sample_data.csv \
    localhost:5000/predict
```

## Containerize model server with Docker


One common way of distributing this model API server for production deployment, is via Docker containers. And BentoML provides a convenient way to do that.

Note that docker is **not available in Google Colab**. You will need to download and run this notebook locally to try out this containerization with docker feature.

If you already have docker configured, simply run the follow command to product a docker container serving the IrisClassifier prediction service created above:

In [27]:
!bentoml containerize LoanPrediction:latest

[2020-09-22 17:52:45,149] INFO - Getting latest version LoanPrediction:20200922123915_EEBBD2
Found Bento: /Users/bozhaoyu/bentoml/repository/LoanPrediction/20200922123915_EEBBD2
Tag not specified, using tag parsed from BentoService: 'loanprediction:20200922123915_EEBBD2'
Building Docker image loanprediction:20200922123915_EEBBD2 from LoanPrediction:latest 
-we in here
processed docker file
(None, None)
root in create archive /Users/bozhaoyu/bentoml/repository/LoanPrediction/20200922123915_EEBBD2 ['Dockerfile', 'LoanPrediction', 'LoanPrediction/__init__.py', 'LoanPrediction/__pycache__', 'LoanPrediction/__pycache__/loan_prediction.cpython-37.pyc', 'LoanPrediction/artifacts', 'LoanPrediction/artifacts/__init__.py', 'LoanPrediction/artifacts/model', 'LoanPrediction/bentoml.yml', 'LoanPrediction/loan_prediction.py', 'MANIFEST.in', 'README.md', 'bentoml-init.sh', 'bentoml.yml', 'bundled_pip_dependencies', 'bundled_pip_dependencies/BentoML-0.9.0rc0+3.gcebf2015.tar.gz', 'docker-entrypoint.sh'

In [None]:
!docker run --p 5000:5000 loanprediction

## Load saved BentoService

bentoml.load is the API for loading a BentoML packaged model in python:

In [125]:
import pandas as pd

loaded_bentoml_svc = bentoml.load(saved_path)
sample_data = pd.read_csv('sample_data.csv')
result = loaded_bentoml_svc.predict(sample_data)
print(result)

Checking whether there is an H2O instance running at http://localhost:54321 . connected.


0,1
H2O cluster uptime:,2 hours 30 mins
H2O cluster timezone:,America/Los_Angeles
H2O data parsing timezone:,UTC
H2O cluster version:,3.24.0.2
H2O cluster version age:,10 months and 7 days !!!
H2O cluster name:,H2O_from_python_bozhaoyu_7bamxr
H2O cluster total nodes:,1
H2O cluster free memory:,3.805 Gb
H2O cluster total cores:,8
H2O cluster allowed cores:,8


Parse progress: |█████████████████████████████████████████████████████████| 100%
xgboost prediction progress: |████████████████████████████████████████████| 100%
       predict  Charged Off  Fully Paid
0  Charged Off     0.436739    0.563261
1   Fully Paid     0.056414    0.943586


## Launch inference job from CLI

BentoML cli supports loading and running a packaged model from CLI. With the DataframeInput adapter, the CLI command supports reading input Dataframe data from CLI argument or local csv or json files:

In [127]:
!bentoml run LoanPrediction:latest predict --input-file sample_data.csv

[2020-02-24 17:30:05,013] INFO - Getting latest version LoanPrediction:20200224153935_977ED8
Checking whether there is an H2O instance running at http://localhost:54321 . connected.
--------------------------  ---------------------------------------------------
H2O cluster uptime:         2 hours 38 mins
H2O cluster timezone:       America/Los_Angeles
H2O data parsing timezone:  UTC
H2O cluster version:        3.24.0.2
H2O cluster version age:    10 months and 7 days !!!
H2O cluster name:           H2O_from_python_bozhaoyu_7bamxr
H2O cluster total nodes:    1
H2O cluster free memory:    3.805 Gb
H2O cluster total cores:    8
H2O cluster allowed cores:  8
H2O cluster status:         locked, healthy
H2O connection url:         http://localhost:54321
H2O connection proxy:
H2O internal security:      False
H2O API Extensions:         Amazon S3, XGBoost, Algos, AutoML, Core V3, Core V4
Python version:             3.7.3 final
--------------------------  --------------------------------------

# Deployment Options

If you are at a small team with limited engineering or DevOps resources, try out automated deployment with BentoML CLI, currently supporting AWS Lambda, AWS SageMaker, and Azure Functions:
- [AWS Lambda Deployment Guide](https://docs.bentoml.org/en/latest/deployment/aws_lambda.html)
- [AWS SageMaker Deployment Guide](https://docs.bentoml.org/en/latest/deployment/aws_sagemaker.html)
- [Azure Functions Deployment Guide](https://docs.bentoml.org/en/latest/deployment/azure_functions.html)

If the cloud platform you are working with is not on the list above, try out these step-by-step guide on manually deploying BentoML packaged model to cloud platforms:
- [AWS ECS Deployment](https://docs.bentoml.org/en/latest/deployment/aws_ecs.html)
- [Google Cloud Run Deployment](https://docs.bentoml.org/en/latest/deployment/google_cloud_run.html)
- [Azure container instance Deployment](https://docs.bentoml.org/en/latest/deployment/azure_container_instance.html)
- [Heroku Deployment](https://docs.bentoml.org/en/latest/deployment/heroku.html)

Lastly, if you have a DevOps or ML Engineering team who's operating a Kubernetes or OpenShift cluster, use the following guides as references for implementating your deployment strategy:
- [Kubernetes Deployment](https://docs.bentoml.org/en/latest/deployment/kubernetes.html)
- [Knative Deployment](https://docs.bentoml.org/en/latest/deployment/knative.html)
- [Kubeflow Deployment](https://docs.bentoml.org/en/latest/deployment/kubeflow.html)
- [KFServing Deployment](https://docs.bentoml.org/en/latest/deployment/kfserving.html)
- [Clipper.ai Deployment Guide](https://docs.bentoml.org/en/latest/deployment/clipper.html)

