In [1]:
import pickle

# Question 1

* Install Pipenv
* What's the version of pipenv you installed?
* Use `--version` to find out

In [5]:
!pipenv --version

[1mpipenv[0m, version 2024.2.0


# Question 2

* Use Pipenv to install Scikit-Learn version 1.5.2
* What's the first hash for scikit-learn you get in Pipfile.lock?

> **Note**: you should create an empty folder for homework
and do it there. 

sha256:03b6158efa3faaf1feea3faa884c840ebd61b6484167c711548fce208ea09445


## Models

We've prepared a dictionary vectorizer and a model.

They were trained (roughly) using this code:

```python
features = ['job', 'duration', 'poutcome']
dicts = df[features].to_dict(orient='records')

dv = DictVectorizer(sparse=False)
X = dv.fit_transform(dicts)

model = LogisticRegression().fit(X, y)
```

> **Note**: You don't need to train the model. This code is just for your reference.

And then saved with Pickle. Download them:

* [DictVectorizer](https://github.com/DataTalksClub/machine-learning-zoomcamp/tree/master/cohorts/2024/05-deployment/homework/dv.bin?raw=true)
* [LogisticRegression](https://github.com/DataTalksClub/machine-learning-zoomcamp/tree/master/cohorts/2024/05-deployment/homework/model1.bin?raw=true)

With `wget`:

```bash
PREFIX=https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/master/cohorts/2024/05-deployment/homework
wget $PREFIX/model1.bin
wget $PREFIX/dv.bin
```

In [8]:
!wget https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/master/cohorts/2024/05-deployment/homework/model1.bin
!wget https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/master/cohorts/2024/05-deployment/homework/dv.bin

--2024-10-29 12:46:33--  https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/master/cohorts/2024/05-deployment/homework/model1.bin
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8002::154, 2606:50c0:8000::154, 2606:50c0:8003::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8002::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 850 [application/octet-stream]
Saving to: ‘model1.bin’


2024-10-29 12:46:33 (39,2 MB/s) - ‘model1.bin’ saved [850/850]

--2024-10-29 12:46:33--  https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/master/cohorts/2024/05-deployment/homework/dv.bin
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8002::154, 2606:50c0:8000::154, 2606:50c0:8001::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8002::154|:443... connected.
HTTP request sent, awaiting response... 

In [12]:
!ls -l


total 128
-rw-rw-r-- 1 noname noname    560 oct 29 12:46 dv.bin
-rw-rw-r-- 1 noname noname 122444 oct 29 12:50 hw04.ipynb
-rw-rw-r-- 1 noname noname    850 oct 29 12:46 model1.bin


# Question 3

Let's use these models!

* Write a script for loading these models with pickle
* Score this client:

```json
{"job": "management", "duration": 400, "poutcome": "success"}
```

What's the probability that this client will get a subscription? 

* 0.359
* 0.559
* 0.759
* 0.959 ✅


In [29]:
model_file = "model1.bin"
with open (model_file, "rb") as f_in:
    model = pickle.load(f_in)

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


In [30]:
model

In [31]:
dv_file = "dv.bin"
with open (dv_file, "rb") as f_in:
    dv = pickle.load(f_in)

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


In [32]:
dv

In [48]:
client = {"job": "management", "duration": 400, "poutcome": "success"}

In [52]:
X = dv.transform([client])

In [53]:
X

array([[400.,   0.,   0.,   0.,   0.,   1.,   0.,   0.,   0.,   0.,   0.,
          0.,   0.,   0.,   0.,   1.,   0.]])

In [54]:
model.predict_proba(X)

array([[0.09149911, 0.90850089]])

# Question 4

Now let's serve this model as a web service

* Install Flask and gunicorn (or waitress, if you're on Windows)
* Write Flask code for serving the model
* Now score this client using `requests`:

```python
url = "YOUR_URL"
client = {"job": "student", "duration": 280, "poutcome": "failure"}
requests.post(url, json=client).json()
```

What's the probability that this client will get a subscription?

* 0.335 ✅
* 0.535
* 0.735
* 0.935


In [37]:
%autosave 0

Autosave disabled


In [38]:
import requests

In [39]:
url = "http://localhost:5000/predict"
client2 = {"job": "student", "duration": 280, "poutcome": "failure"}

In [47]:
requests.post(url, json=client2).json()

{'subscription_probability': 0.20212822258505542}

## Docker

Install [Docker](https://github.com/DataTalksClub/machine-learning-zoomcamp/blob/master/05-deployment/06-docker.md). 
We will use it for the next two questions.

For these questions, we prepared a base image: `svizor/zoomcamp-model:3.11.5-slim`. 
You'll need to use it (see Question 5 for an example).

This image is based on `python:3.11.5-slim` and has a logistic regression model 
(a different one) as well a dictionary vectorizer inside. 

This is how the Dockerfile for this image looks like:

```docker 
FROM python:3.11.5-slim
WORKDIR /app
COPY ["model2.bin", "dv.bin", "./"]
```

We already built it and then pushed it to [`svizor/zoomcamp-model:3.11.5-slim`](https://hub.docker.com/r/svizor/zoomcamp-model).

> **Note**: You don't need to build this docker image, it's just for your reference.

# Question 5

Download the base image `svizor/zoomcamp-model:3.11.5-slim`. You can easily make it by using [docker pull](https://docs.docker.com/engine/reference/commandline/pull/) command.

So what's the size of this base image?

* 45 MB
* 130 MB ✅
* 245 MB
* 330 MB

You can get this information when running `docker images` - it'll be in the "SIZE" column.




In [55]:
!docker images svizor/zoomcamp-model

REPOSITORY              TAG           IMAGE ID       CREATED       SIZE
svizor/zoomcamp-model   3.11.5-slim   975e7bdca086   10 days ago   130MB


## Dockerfile

Now create your own Dockerfile based on the image we prepared.

It should start like that:

```docker
FROM svizor/zoomcamp-model:3.11.5-slim
# add your stuff here
```

Now complete it:

* Install all the dependencies form the Pipenv file
* Copy your Flask script
* Run it with Gunicorn 

After that, you can build your docker image.

In [60]:
!docker build -t churn-prediction .


[1A[1B[0G[?25l[+] Building 0.0s (0/0)  docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.0s (0/0)  docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                    docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.1s (3/3)                                    docker:desktop-linux
[34m => [internal] load build definition from Dockerfile                       0.1s
[0m[34m => => transferring dockerfile: 282B                                       0.0s
[0m[34m => [internal] load metadata for docker.io/svizor/zoomcamp-model:3.11.5-s  0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[?25h[1A[1A[1A[1A[1A[0G[?25l[+] Building 0.2s (4/9)                                    docker:desktop-linux
[34m => [internal] load build definition from Dockerfile                       0.1s
[0m[34m => => transferring dockerfile: 282B                                       0.0s
[0m[34m => [internal] 

# Question 6

Let's run your docker container!

After running it, score this client once again:

```python
url = "YOUR_URL"
client = {"job": "management", "duration": 400, "poutcome": "success"}
requests.post(url, json=client).json()
```

What's the probability that this client will get a subscription now?

* 0.287
* 0.530
* 0.757 ✅
* 0.960

In [63]:
url = "http://localhost:5000/predict"
client3 = {"job": "management", "duration": 400, "poutcome": "success"}

In [65]:
requests.post(url, json=client3).json()

{'subscription_probability': 0.7590966516879658}