## Homework

> Note: sometimes your answer doesn't match one of the options exactly. 
> That's fine. 
> Select the option that's closest to your solution.

> Note: we recommend using python 3.11 in this homework.

## Question 1

* Install Pipenv
* What's the version of pipenv you installed?
* Use `--version` to find out

In [5]:
!pipenv --version

pipenv, version 2024.2.0


**Ans:** _version 2024.2.0_

## Question 2

* Use Pipenv to install Scikit-Learn version 1.5.2
* What's the first hash for scikit-learn you get in Pipfile.lock?

> **Note**: you should create an empty folder for homework
and do it there. 

In [8]:
! pipenv install scikit-learn==1.5.2

To activate this project's virtualenv, run pipenv shell.
Alternatively, run a command inside the virtualenv with pipenv run.
Installing scikit-learn==1.5.2...
Installation Succeeded
Building requirements...
[    ] Locking packages...
Resolving dependencies...
[    ] Locking packages...
[=   ] Locking packages...
[==  ] Locking packages...
[=== ] Locking packages...
[ ===] Locking packages...
[  ==] Locking packages...
[    ] Locking packages...
[   =] Locking packages...
[  ==] Locking packages...
[ ===] Locking packages...
[====] Locking packages...
[=== ] Locking packages...
[=   ] Locking packages...
[    ] Locking packages...
[=   ] Locking packages...
[==  ] Locking packages...
[=== ] Locking packages...
[ ===] Locking packages...
[   =] Locking packages...
[    ] Locking packages...
[   =] Locking packages...
[  ==] Locking packages...
[ ===] Locking packages...
[====] Locking packages...
[==  ] Locking packages...
[=   ] Locking packages...
[    ] Locking packages...
[=   ] Lock

**ANS:** _sha256:03b6158efa3faaf1feea3faa884c840ebd61b6484167c711548fce208ea09445_

## Models

We've prepared a dictionary vectorizer and a model.

They were trained (roughly) using this code:

```python
features = ['job', 'duration', 'poutcome']
dicts = df[features].to_dict(orient='records')

dv = DictVectorizer(sparse=False)
X = dv.fit_transform(dicts)

model = LogisticRegression().fit(X, y)
```

> **Note**: You don't need to train the model. This code is just for your reference.

And then saved with Pickle. Download them:

* [DictVectorizer](https://github.com/DataTalksClub/machine-learning-zoomcamp/tree/master/cohorts/2024/05-deployment/homework/dv.bin?raw=true)
* [LogisticRegression](https://github.com/DataTalksClub/machine-learning-zoomcamp/tree/master/cohorts/2024/05-deployment/homework/model1.bin?raw=true)

With `wget`:

```bash
PREFIX=https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/master/cohorts/2024/05-deployment/homework
wget $PREFIX/model1.bin
wget $PREFIX/dv.bin
```

In [9]:
import pickle

In [10]:
with open('model1.bin', 'rb') as f_in:
    model = pickle.load(f_in)

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


In [14]:
with open('dv.bin', 'rb') as f_in:
    dv = pickle.load(f_in)

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


## Question 3

Let's use these models!

* Write a script for loading these models with pickle
* Score this client:

```json
{"job": "management", "duration": 400, "poutcome": "success"}
```

What's the probability that this client will get a subscription? 

* 0.359
* 0.559
* 0.759
* 0.959

If you're getting errors when unpickling the files, check their checksum:

```bash
$ md5sum model1.bin dv.bin
3d8bb28974e55edefa000fe38fd3ed12  model1.bin
7d37616e00aa80f2152b8b0511fc2dff  dv.bin
```

In [17]:
client = {"job": "management", "duration": 400, "poutcome": "success"}

trans = dv.transform(client)
model.predict_proba(trans)[:, 1]

array([0.75909665])

**Ans:** _0.759_

## Question 4

Now let's serve this model as a web service

* Install Flask and gunicorn (or waitress, if you're on Windows)
* Write Flask code for serving the model
* Now score this client using `requests`:

```python
url = "YOUR_URL"
client = {"job": "student", "duration": 280, "poutcome": "failure"}
requests.post(url, json=client).json()
```

What's the probability that this client will get a subscription?

* 0.335
* 0.535
* 0.735
* 0.935

In [33]:
import requests

In [29]:
client = {"job": "student", "duration": 280, "poutcome": "failure"}

In [30]:
url = "http://localhost:9696/run"

In [34]:
response = requests.post(url, json=client).json()

In [35]:
response

{'Decision': False, 'Probability': 0.33480703475511053}

## Docker

Install [Docker](https://github.com/DataTalksClub/machine-learning-zoomcamp/blob/master/05-deployment/06-docker.md). 
We will use it for the next two questions.

For these questions, we prepared a base image: `svizor/zoomcamp-model:3.11.5-slim`. 
You'll need to use it (see Question 5 for an example).

This image is based on `python:3.11.5-slim` and has a logistic regression model 
(a different one) as well a dictionary vectorizer inside. 

This is how the Dockerfile for this image looks like:

```docker 
FROM python:3.11.5-slim
WORKDIR /app
COPY ["model2.bin", "dv.bin", "./"]
```

We already built it and then pushed it to [`svizor/zoomcamp-model:3.11.5-slim`](https://hub.docker.com/r/svizor/zoomcamp-model).

> **Note**: You don't need to build this docker image, it's just for your reference.

## Question 5

Download the base image `svizor/zoomcamp-model:3.11.5-slim`. You can easily make it by using [docker pull](https://docs.docker.com/engine/reference/commandline/pull/) command.

So what's the size of this base image?

* 45 MB
* 130 MB
* 245 MB
* 330 MB

You can get this information when running `docker images` - it'll be in the "SIZE" column.

In [36]:
! docker pull svizor/zoomcamp-model:3.11.5-slim

3.11.5-slim: Pulling from svizor/zoomcamp-model
a803e7c4b030: Pulling fs layer
bf3336e84c8e: Pulling fs layer
eb76b60fbb0c: Pulling fs layer
a2cee97f4fbd: Pulling fs layer
0358d4e17ae3: Pulling fs layer
fb37f8d7a667: Pulling fs layer
4e69cd59a5af: Pulling fs layer
0358d4e17ae3: Waiting
fb37f8d7a667: Waiting
4e69cd59a5af: Waiting
a2cee97f4fbd: Waiting
bf3336e84c8e: Verifying Checksum
bf3336e84c8e: Download complete
a2cee97f4fbd: Download complete
0358d4e17ae3: Verifying Checksum
0358d4e17ae3: Download complete
fb37f8d7a667: Verifying Checksum
fb37f8d7a667: Download complete
4e69cd59a5af: Download complete
eb76b60fbb0c: Verifying Checksum
eb76b60fbb0c: Download complete
a803e7c4b030: Verifying Checksum
a803e7c4b030: Download complete
a803e7c4b030: Pull complete
bf3336e84c8e: Pull complete
eb76b60fbb0c: Pull complete
a2cee97f4fbd: Pull complete
0358d4e17ae3: Pull complete
fb37f8d7a667: Pull complete
4e69cd59a5af: Pull complete
Digest: sha256:15d61790363f892dfdef55f47b78feed751cb59704d47ea

In [37]:
! docker images

REPOSITORY                                      TAG           IMAGE ID       CREATED        SIZE
zoomcamp                                        latest        678481ff22c8   45 hours ago   429MB
svizor/zoomcamp-model                           3.11.5-slim   975e7bdca086   9 days ago     130MB
streamlit-app                                   latest        9f154fde93f7   2 months ago   745MB
gcr.io/deployment-432907/streamlit-app          latest        9f154fde93f7   2 months ago   745MB
gcr.io/deployment-432907/streamlit-app          <none>        c0ab239e59b1   2 months ago   745MB
docker.elastic.co/elasticsearch/elasticsearch   8.4.3         ce2b9dc7fe85   2 years ago    1.26GB
elasticsearch                                   8.4.3         ce2b9dc7fe85   2 years ago    1.26GB


**ANS**: _130MB_

## Dockerfile

Now create your own Dockerfile based on the image we prepared.

It should start like that:

```docker
FROM svizor/zoomcamp-model:3.11.5-slim
# add your stuff here
```

Now complete it:

* Install all the dependencies form the Pipenv file
* Copy your Flask script
* Run it with Gunicorn 

After that, you can build your docker image.

## Question 6

Let's run your docker container!

After running it, score this client once again:

```python
url = "YOUR_URL"
client = {"job": "management", "duration": 400, "poutcome": "success"}
requests.post(url, json=client).json()
```

What's the probability that this client will get a subscription now?

* 0.287
* 0.530
* 0.757
* 0.960

In [43]:
url = "http://localhost:9696/run"

In [44]:
client2 = {"job": "management", "duration": 400, "poutcome": "success"}

In [45]:
ans = requests.post(url, json=client2).json()

In [46]:
ans

{'Decision': True, 'Probability': 0.756743795240796}

**Ans**: _0.75_