# Imports

In [32]:
import os
import json
import pickle
import requests

## Homework


> Note: sometimes your answer doesn't match one of the options exactly. 
> That's fine. 
> Select the option that's closest to your solution.

> Note: we recommend using python 3.11 in this homework.

In this homework, we will use the Bank Marketing dataset. Download it from [here](https://archive.ics.uci.edu/static/public/222/bank+marketing.zip).

You can do it with `wget`:

```bash
wget https://archive.ics.uci.edu/static/public/222/bank+marketing.zip
unzip bank+marketing.zip 
unzip bank.zip
```

We need `bank-full.csv`.

You can also access the copy of `back-full.csv` directly:

```bash
wget https://github.com/alexeygrigorev/datasets/raw/refs/heads/master/bank-full.csv
```

## Question 1

* Install Pipenv
* What's the version of pipenv you installed?
* Use `--version` to find out

In [2]:
!pipenv --version

[1mpipenv[0m, version 2024.1.0


## Question 2

* Use Pipenv to install Scikit-Learn version 1.5.2
* What's the first hash for scikit-learn you get in Pipfile.lock?

> **Note**: you should create an empty folder for homework
and do it there. 


In [3]:
import sklearn
print(sklearn.__version__)

1.5.2


In [7]:
# Cargar el archivo Pipfile.lock
with open('../../Pipfile.lock') as f:
    lock_data = json.load(f)

# Cambia 'nombre_de_la_biblioteca' por el nombre de la biblioteca que buscas
lib_name = 'scikit-learn'

# Extraer el hash
lib_hashes = lock_data['default'].get(lib_name, {}).get('hashes', [])

print(f"Hash of {lib_name}: {lib_hashes[0]}")

Hash of scikit-learn: sha256:03b6158efa3faaf1feea3faa884c840ebd61b6484167c711548fce208ea09445


In [8]:
#"sha256:03b6158efa3faaf1feea3faa884c840ebd61b6484167c711548fce208ea09445"
lib_hashes[0]

'sha256:03b6158efa3faaf1feea3faa884c840ebd61b6484167c711548fce208ea09445'

## Models

We've prepared a dictionary vectorizer and a model.

They were trained (roughly) using this code:

```python
features = ['job', 'duration', 'poutcome']
dicts = df[features].to_dict(orient='records')

dv = DictVectorizer(sparse=False)
X = dv.fit_transform(dicts)

model = LogisticRegression().fit(X, y)
```

> **Note**: You don't need to train the model. This code is just for your reference.

And then saved with Pickle. Download them:

* [DictVectorizer](https://github.com/DataTalksClub/machine-learning-zoomcamp/tree/master/cohorts/2024/05-deployment/homework/dv.bin?raw=true)
* [LogisticRegression](https://github.com/DataTalksClub/machine-learning-zoomcamp/tree/master/cohorts/2024/05-deployment/homework/model1.bin?raw=true)

With `wget`:

```bash
PREFIX=https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/master/cohorts/2024/05-deployment/homework
wget $PREFIX/model1.bin
wget $PREFIX/dv.bin
```


In [47]:
!ls -lh

total 40K
-rw-rw-r-- 1 aztleclan aztleclan  645 oct 25 12:51 app.py
-rw-rw-r-- 1 aztleclan aztleclan  560 oct 25 12:44 dv.bin
drwxrwxr-x 2 aztleclan aztleclan 4,0K oct 25 11:54 homework
-rw-rw-r-- 1 aztleclan aztleclan  13K oct 25 12:52 Homework.ipynb
-rw-rw-r-- 1 aztleclan aztleclan 4,9K oct 25 11:53 homework.md
-rw-rw-r-- 1 aztleclan aztleclan  850 oct 25 12:44 model1.bin
-rw-rw-r-- 1 aztleclan aztleclan    0 oct 25 12:42 run_app.sh


## Question 3

Let's use these models!

* Write a script for loading these models with pickle
* Score this client:

```json
{"job": "management", "duration": 400, "poutcome": "success"}
```


What's the probability that this client will get a subscription? 

* 0.359
* 0.559
* 0.759
* 0.959

In [62]:
input_data = {"job": "management", "duration": 400, "poutcome": "success"}

In [63]:
with open('dv.bin', 'rb') as f:
    dv = pickle.load(f)

In [64]:
with open('model1.bin', 'rb') as f:
    model1 = pickle.load(f)

In [65]:
dv_input_data = dv.transform(input_data)

In [66]:
dv_input_data

array([[400.,   0.,   0.,   0.,   0.,   1.,   0.,   0.,   0.,   0.,   0.,
          0.,   0.,   0.,   0.,   1.,   0.]])

In [67]:
pred_proba = model1.predict_proba(dv_input_data)

In [68]:
pred = pred_proba[::,1]

In [69]:
pred

array([0.75909665])

What's the probability that this client will get a subscription? 

* 0.359
* 0.559
* **0.759** <strong style="font-size: 24px;">&larr;</strong>
* 0.959

If you're getting errors when unpickling the files, check their checksum:

```bash
$ md5sum model1.bin dv.bin
3d8bb28974e55edefa000fe38fd3ed12  model1.bin
7d37616e00aa80f2152b8b0511fc2dff  dv.bin
```


In [48]:
!md5sum model1.bin dv.bin

3d8bb28974e55edefa000fe38fd3ed12  model1.bin
7d37616e00aa80f2152b8b0511fc2dff  dv.bin


## Question 4

Now let's serve this model as a web service

* Install Flask and gunicorn (or waitress, if you're on Windows)
* Write Flask code for serving the model
* Now score this client using `requests`:

```python
url = "YOUR_URL"
client = {"job": "student", "duration": 280, "poutcome": "failure"}
requests.post(url, json=client).json()
```




What's the probability that this client will get a subscription?

* 0.335
* 0.535
* 0.735
* 0.935

In [70]:
url = "http://127.0.0.1:5000/predict"
client = {"job": "student", "duration": 280, "poutcome": "failure"}
requests.post(url, json=client).json()

[0.33480703475511053]

What's the probability that this client will get a subscription?

* **0.335** <strong style="font-size: 24px;">&larr;</strong>
* 0.535
* 0.735
* 0.935

## Docker

Install [Docker](https://github.com/DataTalksClub/machine-learning-zoomcamp/blob/master/05-deployment/06-docker.md). 
We will use it for the next two questions.

For these questions, we prepared a base image: `svizor/zoomcamp-model:3.11.5-slim`. 
You'll need to use it (see Question 5 for an example).

This image is based on `python:3.11.5-slim` and has a logistic regression model 
(a different one) as well a dictionary vectorizer inside. 

This is how the Dockerfile for this image looks like:

```docker 
FROM python:3.11.5-slim
WORKDIR /app
COPY ["model2.bin", "dv.bin", "./"]
```

We already built it and then pushed it to [`svizor/zoomcamp-model:3.11.5-slim`](https://hub.docker.com/r/svizor/zoomcamp-model).

> **Note**: You don't need to build this docker image, it's just for your reference.

## Question 5

Download the base image `svizor/zoomcamp-model:3.11.5-slim`. You can easily make it by using [docker pull](https://docs.docker.com/engine/reference/commandline/pull/) command.


So what's the size of this base image?

* 45 MB
* 130 MB
* 245 MB
* 330 MB

You can get this information when running `docker images` - it'll be in the "SIZE" column.

In [52]:
!docker images

REPOSITORY                                      TAG           IMAGE ID       CREATED         SIZE
svizor/zoomcamp-model                           3.11.5-slim   975e7bdca086   6 days ago      130MB
ollama/ollama                                   latest        1577d5e882da   4 weeks ago     3.27GB
mlops-magic-platform                            latest        7d30dd20e974   4 months ago    4.52GB
pgvector/pgvector                               0.6.0-pg16    a608718e732f   8 months ago    427MB
apache/airflow                                  2.5.1         282394d57c1e   21 months ago   1.23GB
redis                                           latest        19c51d4327cf   21 months ago   117MB
postgres                                        13            beb2ef252f25   21 months ago   373MB
docker.elastic.co/elasticsearch/elasticsearch   8.4.3         ce2b9dc7fe85   2 years ago     1.26GB
elasticsearch                                   7.10.1        558380375f1a   3 years ago     774MB


So what's the size of this base image?

* 45 MB
* **130 MB** <strong style="font-size: 24px;">&larr;</strong>
* 245 MB
* 330 MB

# Dockerfile

Now create your own Dockerfile based on the image we prepared.

It should start like that:

```docker
FROM svizor/zoomcamp-model:3.11.5-slim
# add your stuff here
```

Now complete it:

* Install all the dependencies form the Pipenv file
* Copy your Flask script
* Run it with Gunicorn 

After that, you can build your docker image.


## Question 6

Let's run your docker container!

After running it, score this client once again:

```python
url = "YOUR_URL"
client = {"job": "management", "duration": 400, "poutcome": "success"}
requests.post(url, json=client).json()
```



What's the probability that this client will get a subscription now?

* 0.287
* 0.530
* 0.757
* 0.960

In [59]:
!docker ps -a

CONTAINER ID   IMAGE                                                 COMMAND                  CREATED          STATUS                      PORTS                                       NAMES
6c9640589774   zoomcamp-hw05:3.10.12-slim                            "pipenv run flask ru…"   13 seconds ago   Up 12 seconds               0.0.0.0:5000->5000/tcp, :::5000->5000/tcp   awesome_euclid
21b001723a42   ollama/ollama                                         "/bin/ollama serve"      12 hours ago     Exited (0) 11 hours ago                                                 ollama
919a85b61f15   docker.elastic.co/elasticsearch/elasticsearch:8.4.3   "/bin/tini -- /usr/l…"   12 hours ago     Exited (143) 11 hours ago                                               elasticsearch


In [61]:
url = "http://127.0.0.1:5000/predict"
client = {"job": "management", "duration": 400, "poutcome": "success"}
requests.post(url, json=client).json()

[0.7590966516879658]


What's the probability that this client will get a subscription now?

* 0.287
* 0.530
* **0.757** <strong style="font-size: 24px;">&larr;</strong>
* 0.960

In [None]:
## Submit the results

* Submit your results here: https://courses.datatalks.club/ml-zoomcamp-2024/homework/hw05
* If your answer doesn't match options exactly, select the closest one