# Homework5 - datatalks.club - Rui Pinto

In [32]:
import pickle
import glob
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction import DictVectorizer

## Question 1
- Install Pipenv
- What's the version of pipenv you installed?
- Use --version to find out

In [5]:
#Note: we recommend using python 3.11 in this homework.
#!pipenv install python 3.11

In [6]:
!pipenv --version

[1mpipenv[0m, version 2024.2.0


## Question 2
- Use Pipenv to install Scikit-Learn version 1.5.2
- What's the first hash for scikit-learn you get in Pipfile.lock?

In [7]:
#!pipenv install scikit-learn==1.5.2

In [13]:
# check first hash in pipfile.lock file for scikit-learn
!grep scikit-learn Pipfile.lock -A 2

        "scikit-learn": {
            "hashes": [
                "sha256:03b6158efa3faaf1feea3faa884c840ebd61b6484167c711548fce208ea09445",


## Question 3
Let's use these models!

- Write a script for loading these models with pickle
- Score this client:

```
{"job": "management", "duration": 400, "poutcome": "success"}
```

What's the probability that this client will get a subscription?

- 0.359
- 0.559
- 0.759
- 0.959

In [44]:
# Download the model and vectorizer using the raw file URLs
PREFIX="https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/master/cohorts/2024/05-deployment/homework"
#!wget $PREFIX/model1.bin
#!wget $PREFIX/dv.bin

--2024-10-27 14:25:33--  https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/master/cohorts/2024/05-deployment/homework/model1.bin
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 850 [application/octet-stream]
Saving to: ‘model1.bin’


2024-10-27 14:25:33 (75.1 MB/s) - ‘model1.bin’ saved [850/850]

--2024-10-27 14:25:33--  https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/master/cohorts/2024/05-deployment/homework/dv.bin
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 560 [application/

In [45]:
# load the model and the vectorizer
model_file = "model1.bin"
dv_file = "dv.bin"

with open(model_file, "rb") as f_in:
    model = pickle.load(f_in)
    
with open(dv_file, "rb") as f_in:
    dv = pickle.load(f_in)
    
model, dv

(LogisticRegression(max_iter=250), DictVectorizer(sparse=False))

In [48]:
# testing the model
costumer = {"job": "management", "duration": 400, "poutcome": "success"}

X = dv.transform([costumer])
y_pred = model.predict_proba(X)[0, 1]


print(f"The probability of the costumer subscribing to the term deposit is {y_pred:.3f}")

The probability of the costumer subscribing to the term deposit is 0.759


In [50]:
# testing the model with curl
!curl -X POST "http://127.0.0.1:9696/predict" -H "Content-Type: application/json" -d '{"job": "management", "duration": 400, "poutcome": "success"}'

{
  "subscription_probability": 0.7590966516879658
}


## Question 4
Now let's serve this model as a web service

- Install Flask and gunicorn (or waitress, if you're on Windows)
- Write Flask code for serving the model
- Now score this client using requests:

```python
url = "YOUR_URL"
client = {"job": "student", "duration": 280, "poutcome": "failure"}
requests.post(url, json=client).json()
```

What's the probability that this client will get a subscription?

- 0.335
- 0.535
- 0.735
- 0.935

In [54]:
# using test-predict.py script

!python test-predict.py

Prediction: {'subscription_probability': 0.33480703475511053}


Docker
Install Docker. We will use it for the next two questions.

For these questions, we prepared a base image: svizor/zoomcamp-model:3.11.5-slim. You'll need to use it (see Question 5 for an example).

This image is based on python:3.11.5-slim and has a logistic regression model (a different one) as well a dictionary vectorizer inside.

This is how the Dockerfile for this image looks like:

```
FROM python:3.11.5-slim
WORKDIR /app
COPY ["model2.bin", "dv.bin", "./"]
We already built it and then pushed it to svizor/zoomcamp-model:3.11.5-slim.
```

In [55]:
!docker pull svizor/zoomcamp-model:3.11.5-slim

3.11.5-slim: Pulling from svizor/zoomcamp-model

[1Be7c4b030: Pulling fs layer 
[1B36e84c8e: Pulling fs layer 
[1Bb60fbb0c: Pulling fs layer 
[1Be97f4fbd: Pulling fs layer 
[1Bd4e17ae3: Waiting fs layer 
[1Bf8d7a667: Pulling fs layer 
[1BDigest: sha256:15d61790363f892dfdef55f47b78feed751cb59704d47ea911df0ef3e9300c06[6A[2K[6A[2K
Status: Downloaded newer image for svizor/zoomcamp-model:3.11.5-slim
docker.io/svizor/zoomcamp-model:3.11.5-slim


## Question 5
Download the base image svizor/zoomcamp-model:3.11.5-slim. You can easily make it by using docker pull command.

So what's the size of this base image?

- 45 MB
- 130 MB
- 245 MB
- 330 MB

You can get this information when running docker images - it'll be in the "SIZE" column.

In [56]:
!docker images

REPOSITORY              TAG           IMAGE ID       CREATED       SIZE
zoomcamp-test           latest        ac3a0367711e   3 hours ago   501MB
<none>                  <none>        15147a5c2862   3 hours ago   501MB
python                  3.12-slim     fd162521da09   8 days ago    124MB
svizor/zoomcamp-model   3.11.5-slim   975e7bdca086   8 days ago    130MB


### Dockerfile
Now create your own Dockerfile based on the image we prepared.

It should start like that:

```
FROM svizor/zoomcamp-model:3.11.5-slim
# add your stuff here
```

Now complete it:

- Install all the dependencies form the Pipenv file
- Copy your Flask script
- Run it with Gunicorn

After that, you can build your docker image.

In [None]:
# building image
!docker run -it --rm -p 9696:9696 homework05

## Question 6
Let's run your docker container!

After running it, score this client once again:

```
url = "YOUR_URL"
client = {"job": "management", "duration": 400, "poutcome": "success"}
requests.post(url, json=client).json()
```

What's the probability that this client will get a subscription now?

- 0.287
- 0.530
- 0.757
- 0.960

In [57]:
!curl -X POST "http://localhost:9696/predict" -H "Content-Type: application/json" -d '{"job": "management", "duration": 400, "poutcome": "success"}'

{"subscription_probability":0.7590966516879658}
