# Homework for the DataTalks.Club Machine Learning Zoomcamp
## Week 5: Deployment

### Question 1

    Install Pipenv
    What's the version of pipenv you installed?
    Use --version to find out


In [1]:
!pipenv --version

[1mpipenv[0m, version 2022.10.12
[0m

### Question 2

    Use Pipenv to install Scikit-Learn version 1.0.2
    What's the first hash for scikit-learn you get in Pipfile.lock?

Note: you should create an empty folder for homework and do it there.

### Models

We've prepared a dictionary vectorizer and a model.

They were trained (roughly) using this code:

features = ['reports', 'share', 'expenditure', 'owner']
dicts = df[features].to_dict(orient='records')

dv = DictVectorizer(sparse=False)
X = dv.fit_transform(dicts)

model = LogisticRegression(solver='liblinear').fit(X, y)

    Note: You don't need to train the model. This code is just for your reference.

And then saved with Pickle. Download them:

    DictVectorizer
    LogisticRegression


With wget:

PREFIX=https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/course-zoomcamp/cohorts/2022/05-deployment/homework
wget $PREFIX/model1.bin
wget $PREFIX/dv.bin


In [2]:
PREFIX='https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/course-zoomcamp/cohorts/2022/05-deployment/homework'
!wget $PREFIX/model1.bin
!wget $PREFIX/dv.bin

--2022-10-28 20:55:00--  https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/course-zoomcamp/cohorts/2022/05-deployment/homework/model1.bin
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 889 [application/octet-stream]
Saving to: ‘model1.bin’


2022-10-28 20:55:00 (28.3 MB/s) - ‘model1.bin’ saved [889/889]

--2022-10-28 20:55:01--  https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/course-zoomcamp/cohorts/2022/05-deployment/homework/dv.bin
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 333

### Question 3

Let's use these models!

    Write a script for loading these models with pickle
    Score this client:

{"reports": 0, "share": 0.001694, "expenditure": 0.12, "owner": "yes"}

What's the probability that this client will get a credit card?

    0.162
    0.391
    0.601
    0.993

If you're getting errors when unpickling the files, check their checksum:

$ md5sum model1.bin dv.bin
3f57f3ebfdf57a9e1368dcd0f28a4a14  model1.bin
6b7cded86a52af7e81859647fa3a5c2e  dv.bin

In [4]:
import pickle

dv_file = 'dv.bin'
model_file = 'model1.bin'

with open(dv_file, 'rb') as f_in:
    dv = pickle.load(f_in)

with open(model_file, 'rb') as f_in:
    model = pickle.load(f_in)

dv, model

(DictVectorizer(sparse=False), LogisticRegression(solver='liblinear'))

In [5]:
test_data = {"reports": 0, "share": 0.001694, "expenditure": 0.12, "owner": "yes"}

In [8]:
# Appling the Dict-Vectorizer, that had also been used before the model training, on the  test data 
transformed_test_data = dv.transform([test_data])                  # <--- surprisingly running without needing to import it
transformed_test_data

array([[0.12    , 0.      , 1.      , 0.      , 0.001694]])

In [10]:
# Predict Probability
model.predict_proba(transformed_test_data)

array([[0.83786586, 0.16213414]])

We can see that the probability for this customer to get a credit card is 0.16213414.

### Question 4

Now let's serve this model as a web service

    Install Flask and gunicorn (or waitress, if you're on Windows)
    Write Flask code for serving the model
    Now score this client using requests:

url = "YOUR_URL"
client = {"reports": 0, "share": 0.245, "expenditure": 3.438, "owner": "yes"}
requests.post(url, json=client).json()

What's the probability that this client will get a credit card?

    0.274
    0.484
    0.698
    0.928


In [17]:
import requests

url = 'http://localhost:9696/predict'

client = {"reports": 0, "share": 0.245, "expenditure": 3.438, "owner": "yes"}

response = requests.post(url, json=client).json()
response

{'credit_card': True, 'probability_of_getting_credit_card': 0.9282218018527452}

### Docker

Install Docker. We will use it for the next two questions.

For these questions, we prepared a base image: svizor/zoomcamp-model:3.9.12-slim. You'll need to use it (see Question 5 for an example).

This image is based on python:3.9.12-slim and has a logistic regression model (a different one) as well a dictionary vectorizer inside.

This is how the Dockerfile for this image looks like:

FROM python:3.9.12-slim
WORKDIR /app
COPY ["model2.bin", "dv.bin", "./"]

We already built it and then pushed it to svizor/zoomcamp-model:3.9.12-slim.

Note: You don't need to build this docker image, it's just for your reference.

### Question 5

Download the base image svizor/zoomcamp-model:3.9.12-slim. You can easily make it by using docker pull command.

So what's the size of this base image?

    15 Mb
    125 Mb
    275 Mb
    415 Mb

You can get this information when running docker images - it'll be in the "SIZE" column.

I ran 

    docker run -it --rm --entrypoint=bash svizor/zoomcamp-model:3.9.12-slim
    
on the terminal.

Gets us the size in bytes:

    docker image inspect svizor/zoomcamp-model:3.9.12-slim --format='{{.Size}}'

124694386 bytes        ... hmm ... around 119 Mb

which is closest to 125 Mb.
  

### Dockerfile

Now create your own Dockerfile based on the image we prepared.

It should start like that:

FROM svizor/zoomcamp-model:3.9.12-slim
#### add your stuff here

Now complete it:

    Install all the dependencies form the Pipenv file
    Copy your Flask script
    Run it with Gunicorn

After that, you can build your docker image.

---> I built the Dockerfile and ran
1. docker build -t svizor/zoomcamp-model:3.9.12-slim .
2. docker run -it --rm --entrypoint=bash svizor/zoomcamp-model:3.9.12-slim #later without the entrypoint-overwritten

### Question 6

Let's run your docker container!

After running it, score this client once again:

url = "YOUR_URL"
client = {"reports": 0, "share": 0.245, "expenditure": 3.438, "owner": "yes"}
requests.post(url, json=client).json()

What's the probability that this client will get a credit card now?

    0.289
    0.502
    0.769
    0.972


In [20]:
url = 'http://localhost:9696/predict'

client2 = {"reports": 0, "share": 0.245, "expenditure": 3.438, "owner": "yes"}

response = requests.post(url, json=client2).json()
response

{'credit_card': True, 'probability_of_getting_credit_card': 0.9282218018527452}

---> Hm, same probability as above. Seems it was the same model provoded.