In [1]:
PREFIX="https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/master/cohorts/2024/05-deployment/homework"

!wget -P data {PREFIX}/model1.bin
!wget -P data {PREFIX}/dv.bin

--2024-10-30 20:17:19--  https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/master/cohorts/2024/05-deployment/homework/model1.bin
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 850 [application/octet-stream]
Saving to: ‘data/model1.bin’


2024-10-30 20:17:19 (174 MB/s) - ‘data/model1.bin’ saved [850/850]

--2024-10-30 20:17:19--  https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/master/cohorts/2024/05-deployment/homework/dv.bin
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 560 [app

# Loading the DictVectorizer and Logistic Regression model with Pickle

In [1]:
import pickle
from os.path import join

FOLDER = 'data'

with open(join(FOLDER, 'model1.bin'), 'rb') as f:
    model = pickle.load(f)

with open(join(FOLDER, 'dv.bin'), 'rb') as f:
    dv = pickle.load(f)

display(model)
display(dv)

### Q3. Probability of subscription

Let's use these models!

* Write a script for loading these models with pickle
* Score this client:

```json
{"job": "management", "duration": 400, "poutcome": "success"}
```

What's the probability that this client will get a subscription? 

In [2]:
client = {"job": "management", "duration": 400, "poutcome": "success"}

client_encoded = dv.transform(client)
client_encoded

array([[400.,   0.,   0.,   0.,   0.,   1.,   0.,   0.,   0.,   0.,   0.,
          0.,   0.,   0.,   0.,   1.,   0.]])

In [3]:
y_pred = model.predict_proba(client_encoded)[:, 1]
y_pred

array([0.75909665])

### Q4. Probability of subscription (Flask)

Now let's serve this model as a web service

* Install Flask and gunicorn (or waitress, if you're on Windows)
* Write Flask code for serving the model
* Now score this client using `requests`:

```python
url = "YOUR_URL"
client = {"job": "student", "duration": 280, "poutcome": "failure"}
requests.post(url, json=client).json()
```

What's the probability that this client will get a subscription?


In [11]:
import requests

url = "http://127.0.0.1:9696/predict"
client = {"job": "student", "duration": 280, "poutcome": "failure"}
requests.post(url, json=client).json()

{'subscription_probability': 0.33480703475511053}

### Q5. Size of base image

Install [Docker](https://github.com/DataTalksClub/machine-learning-zoomcamp/blob/master/05-deployment/06-docker.md). 
We will use it for the next two questions.

For these questions, we prepared a base image: `svizor/zoomcamp-model:3.11.5-slim`. 
You'll need to use it (see Question 5 for an example).

This image is based on `python:3.11.5-slim` and has a logistic regression model 
(a different one) as well a dictionary vectorizer inside. 

This is how the Dockerfile for this image looks like:

```docker 
FROM python:3.11.5-slim
WORKDIR /app
COPY ["model2.bin", "dv.bin", "./"]
```

We already built it and then pushed it to [`svizor/zoomcamp-model:3.11.5-slim`](https://hub.docker.com/r/svizor/zoomcamp-model).

> **Note**: You don't need to build this docker image, it's just for your reference.


Download the base image `svizor/zoomcamp-model:3.11.5-slim`. You can easily make it by using [docker pull](https://docs.docker.com/engine/reference/commandline/pull/) command.

So what's the size of this base image?

* 45 MB
* 130 MB
* 245 MB
* 330 MB

You can get this information when running `docker images` - it'll be in the "SIZE" column.

In [12]:
!docker images

REPOSITORY              TAG           IMAGE ID       CREATED       SIZE
svizor/zoomcamp-model   3.11.5-slim   15d61790363f   13 days ago   197MB


197MB

### Q6. Probability of subscription (Docker)

Now create your own Dockerfile based on the image we prepared.

It should start like that:

```docker
FROM svizor/zoomcamp-model:3.11.5-slim
# add your stuff here
```

Now complete it:

* Install all the dependencies form the Pipenv file
* Copy your Flask script
* Run it with Gunicorn 

After that, you can build your docker image.

Let's run your docker container!

After running it, score this client once again:

```python
url = "YOUR_URL"
client = {"job": "management", "duration": 400, "poutcome": "success"}
requests.post(url, json=client).json()
```

What's the probability that this client will get a subscription now?


In [25]:
url = "http://0.0.0.0:9696/predict"
client = {"job": "management", "duration": 400, "poutcome": "success"}
requests.post(url, json=client).json()

{'subscription_probability': 0.756743795240796}