# Week 7 Homework

The goal of this homerowkr is to familiarize with BentoML and how to build and test an ML production service

## Background

You are a new recruit at ACME corp. Your manager is emailing you about your first assignment.

## Email from your manager

Good morning recruit! It's good to have you here! I have an assignment for you. I have a data scientist that's built a credit risk model in a jupyter notebook. I need you to run the notebook and save the model with BentoML and see how big the model is. If it's greater than a certain size, I'm going to have to request additional resources from our infra team. Please let me know how big it is.

Thanks.

Mr McManager

## Question 1

- Install BentoML
- What's the version of BentoML you installed

In [1]:
!pip3 show bentoml

Name: bentoml
Version: 1.0.7
Summary: BentoML: The Unified Model Serving Framework
Home-page: 
Author: 
Author-email: BentoML Team <contact@bentoml.com>
License: Apache-2.0
Location: /Users/Frank/opt/anaconda3/lib/python3.9/site-packages
Requires: opentelemetry-semantic-conventions, prometheus-client, PyYAML, watchfiles, psutil, python-dateutil, packaging, Jinja2, simple-di, aiohttp, opentelemetry-instrumentation-asgi, cattrs, attrs, opentelemetry-api, opentelemetry-instrumentation, rich, cloudpickle, deepmerge, pip-tools, click, opentelemetry-util-http, opentelemetry-sdk, uvicorn, python-dotenv, schema, opentelemetry-instrumentation-aiohttp-client, circus, python-multipart, pynvml, pathspec, numpy, starlette, fs, requests
Required-by: 


## Question 2

Run the notebook which contains XGBoost model from module 6 i.e previous module and save the model with BentoML. 

In [2]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction import DictVectorizer

import xgboost as xgb

In [3]:
df = pd.read_csv('CreditScoring.csv')

In [4]:
df.columns = df.columns.str.lower()

status_values = {
    1: 'ok',
    2: 'default',
    0: 'unk'
}

df.status = df.status.map(status_values)

home_values = {
    1: 'rent',
    2: 'owner',
    3: 'private',
    4: 'ignore',
    5: 'parents',
    6: 'other',
    0: 'unk'
}

df.home = df.home.map(home_values)

marital_values = {
    1: 'single',
    2: 'married',
    3: 'widow',
    4: 'separated',
    5: 'divorced',
    0: 'unk'
}

df.marital = df.marital.map(marital_values)

records_values = {
    1: 'no',
    2: 'yes',
    0: 'unk'
}

df.records = df.records.map(records_values)

job_values = {
    1: 'fixed',
    2: 'partime',
    3: 'freelance',
    4: 'others',
    0: 'unk'
}

df.job = df.job.map(job_values)

for c in ['income', 'assets', 'debt']:
    df[c] = df[c].replace(to_replace=99999999, value=np.nan)

df = df[df.status != 'unk'].reset_index(drop=True)

In [5]:
df_train, df_test = train_test_split(df, test_size=0.2, random_state=11)

df_train = df_train.reset_index(drop=True)
df_test = df_test.reset_index(drop=True)

y_train = (df_train.status == 'default').astype('int').values
y_test = (df_test.status == 'default').astype('int').values

del df_train['status']
del df_test['status']

In [6]:
dv = DictVectorizer(sparse=False)

train_dicts = df_train.fillna(0).to_dict(orient='records')
X_train = dv.fit_transform(train_dicts)

test_dicts = df_test.fillna(0).to_dict(orient='records')
X_test = dv.transform(test_dicts)

### XGBoost

In [7]:
dtrain = xgb.DMatrix(X_train, label=y_train)

In [8]:
xgb_params = {
    'eta': 0.1, 
    'max_depth': 3,
    'min_child_weight': 1,

    'objective': 'binary:logistic',
    'eval_metric': 'auc',

    'nthread': 8,
    'seed': 1,
    'verbosity': 1,
}

model = xgb.train(xgb_params, dtrain, num_boost_round=175)

### BentoML

In [9]:
import bentoml

In [10]:
bentoml.xgboost.save_model(
    'credit_risk_model_homework',
    model,
    custom_objects={
        'dictVectorizer': dv
    },
)

Model(tag="credit_risk_model_homework:sowdigcrp2hncjv5", path="/Users/Frank/bentoml/models/credit_risk_model_homework/sowdigcrp2hncjv5/")

How big approximately is the saved BentoML model? Size can slightly vary depending on your local development environment. Choose the size closest to your model

The size of the model is 197 kb

### Another email from your manager

Great job recruit! Looks like I won't be having to go back to the procurement team. Thanks for the information.

However, I just got word from one of the teams that's using one of our ML services and they're saying our service is "broken" and their trying to blame our model. I looked at the data their sending and it's completely bogus. I don't want them to send bad data to us and blame us for our models. Could you write a pydantic schema for the data that they should be sending? That way next time it will tell them it's their data that's bad and not our model.

Thanks

Mr McManager



## Question 3

Say you have the following data that you're sending to your service:

{
  "name": "Tim",
  "age": 37,
  "country": "US",
  "rating": 3.14
}



What would the pydantic class look like? You can name the class UserProfile.

In [11]:
from pydantic import BaseModel

### Email from your CEO

Good morning! I hear you're the one to go to if I need something done well! We've got a new model that a big client needs deployed ASAP. I need you to build a service with it and test it against the old model and make sure that it performs better, otherwise we're going to lose this client. All our hopes are with you!



Good morning! I hear you're the one to go to if I need something done well! We've got a new model that a big client needs deployed ASAP. I need you to build a service with it and test it against the old model and make sure that it performs better, otherwise we're going to lose this client. All our hopes are with you!

Thanks,

CEO of Acme Corp

## Question 4

We've prepared a model for you to that you can import using

What version of scikit-learn was this model trained with?

In terminal use

bentoml models get mlzoomcamp_homework:qtzdz3slg6mwwdu5

 scikit-learn: 1.1.1 

## Question 5

Create a bento out of this scikit-learn model. The output type for this endpoint should be NumpyNdarray()

Send this array to the Bento:

In [14]:
[[6.4,3.5,4.5,1.2]]

[[6.4, 3.5, 4.5, 1.2]]

It returns 1

## Question 6

Ensure to serve your bento with --production for this question

Use the following locust file [locust.py](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/cohorts/2022/07-bento-production/locustfile.py) file

Ensure that it is pointed at your bento's endpoint (In case you did'nt name your endpoint "classify")

Configure 100 users with ramp time of 10 users per second. Clicl "Start Swarming" and ensure that it is working

Now download a second model with this command:

curl -O https://s3.us-west-2.amazonaws.com/bentoml.com/mlzoomcamp/coolmodel2.bentomodel

Now import the model:

bentoml models import coolmodel2.bentomodel

Update your bento's runner tag and test with both models. Which model allows more traffic (more throughput) as you ramp up the traffic?

Hint 1: Remember to turn off and turn on your bento service between changing the model tag. Use Ctl-C to close the service in between trials.

Hint 2: Increase the number of concurrent users to see which one has higher throughput

Which model has better performance at higher volumes?



The second one 

## Email from marketing

Hello ML person! I hope this email finds you well. I've heard there's this cool new ML model called Stable Diffusion. I hear if you give it a description of a picture it will generate an image. We need a new company logo and I want it to be fierce but also cool, think you could help out?

Thanks,

Mike Marketer

## Question 7

Go to this Bento deployment of [Stable Diffusion](http://54.176.205.174/):

Use the txt2image endpoint and update the prompt to: "A cartoon dragon with sunglasses". Don't change the seed, it should be 0 by default

What is the resulting image?

Number 3