# **Deployment**

**Question 1**  
- Install Pipenv
- What's the version of pipenv you installed?
- Use --version to find out

In [1]:
!pipenv --version

pipenv, version 2024.2.0


**Question 2**  
- Use Pipenv to install Scikit-Learn version 1.5.2
- What's the first hash for scikit-learn you get in Pipfile.lock?

In [3]:
!pipenv install scikit-learn==1.5.2

To activate this project's virtualenv, run pipenv shell.
Alternatively, run a command inside the virtualenv with pipenv run.
Installing scikit-learn==1.5.2...
Installation Succeeded
Building requirements...
[    ] Locking packages...
Resolving dependencies...
[    ] Locking packages...
[=   ] Locking packages...
[==  ] Locking packages...
[=== ] Locking packages...
[ ===] Locking packages...
[  ==] Locking packages...
[   =] Locking packages...
[   =] Locking packages...
[  ==] Locking packages...
[ ===] Locking packages...
[====] Locking packages...
[=== ] Locking packages...
[==  ] Locking packages...
[    ] Locking packages...
[=   ] Locking packages...
[==  ] Locking packages...
[=== ] Locking packages...
[ ===] Locking packages...
[  ==] Locking packages...
[   =] Locking packages...
[   =] Locking packages...
[  ==] Locking packages...
[ ===] Locking packages...
[====] Locking packages...
[=== ] Locking packages...
[==  ] Locking packages...
[=   ] Locking packages...
[=   ] Lock

In [14]:
# What's the first hash for scikit-learn you get in Pipfile.lock?
with open("Pipfile.lock") as file:
    found_scikit_learn = False
    lines_to_print = 3  # Number of lines to print after finding "scikit-learn"
    
    for line in file:
        if "scikit-learn" in line:
            found_scikit_learn = True
        if found_scikit_learn and lines_to_print > 0:
            print(line)#.strip())
            lines_to_print -= 1
        if lines_to_print == 0:
            break


        "scikit-learn": {

            "hashes": [

                "sha256:03b6158efa3faaf1feea3faa884c840ebd61b6484167c711548fce208ea09445",



### Models
We've prepared a dictionary vectorizer and a model.

They were trained (roughly) using this code:


```
features = ['job', 'duration', 'poutcome']
dicts = df[features].to_dict(orient='records')

dv = DictVectorizer(sparse=False)
X = dv.fit_transform(dicts)

model = LogisticRegression().fit(X, y)
```
> **Note**: You don't need to train the model. This code is just for your reference.

And then saved with Pickle. Download them:

* [DictVectorizer](https://github.com/DataTalksClub/machine-learning-zoomcamp/tree/master/cohorts/2024/05-deployment/homework/dv.bin?raw=true)
* [LogisticRegression](https://github.com/DataTalksClub/machine-learning-zoomcamp/tree/master/cohorts/2024/05-deployment/homework/model1.bin?raw=true)



In [5]:
PREFIX="https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/master/cohorts/2024/05-deployment/homework"
!wget $PREFIX/model1.bin
!wget $PREFIX/dv.bin

--2024-10-29 16:20:21--  https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/master/cohorts/2024/05-deployment/homework/model1.bin
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 850 [application/octet-stream]
Saving to: 'model1.bin'

     0K                                                       100% 13.4M=0s

2024-10-29 16:20:23 (13.4 MB/s) - 'model1.bin' saved [850/850]

--2024-10-29 16:20:23--  https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/master/cohorts/2024/05-deployment/homework/dv.bin
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 560 [application/oc

**Question 3**  
Let's use these models!

Write a script for loading these models with pickle   
Score this client:  
`{"job": "management", "duration": 400, "poutcome": "success"}`  

What's the probability that this client will get a subscription?

- 0.359
- 0.559
- 0.759
- 0.959

If you're getting errors when unpickling the files, check their checksum:
```
$ md5sum model1.bin dv.bin
3d8bb28974e55edefa000fe38fd3ed12  model1.bin
7d37616e00aa80f2152b8b0511fc2dff  dv.bin
```



In [11]:
# Write a script for loading these models with pickle  and score this client: {"job": "management", "duration": 400, "poutcome": "success"}

import pickle

import warnings
warnings.filterwarnings("ignore", category=UserWarning)

# Load the DictVectorizer
with open('dv.bin', 'rb') as f_dv:
    dv = pickle.load(f_dv)

# Load the LogisticRegression model
with open('model1.bin', 'rb') as f_model:
    model = pickle.load(f_model)

# Client data
client = {"job": "management", "duration": 400, "poutcome": "success"}

# Transform the client data using the DictVectorizer
X = dv.transform([client])

# Make a prediction using the loaded model
y_pred = model.predict_proba(X)[0, 1]

# Print the probability
y_pred

0.7590966516879658

**Question 4**  
Now let's serve this model as a web service

- Install Flask and gunicorn (or waitress, if you're on Windows)
- Write Flask code for serving the model
- Now score this client using requests:

```
url = "YOUR_URL"
client = {"job": "student", "duration": 280, "poutcome": "failure"}
requests.post(url, json=client).json()
```
What's the probability that this client will get a subscription?

- 0.335
- 0.535
- 0.735
- 0.935


In [13]:
import requests
 
url = 'http://localhost:9696/predict'
client = {"job": "student", "duration": 280, "poutcome": "failure"}
requests.post(url, json=client).json()

response = requests.post(url, json=client).json()
print(response)
 
if response['deposit'] == True:
    print("Client has a running term deposit subscription")
else:
    print("Client doesn't have a running term deposit subscription")

{'deposit': False, 'deposit_probability': 0.20212822258505542}
Client doesn't have a running term deposit subscription


### **Docker**  
Install Docker. We will use it for the next two questions.

For these questions, we prepared a base image: svizor/zoomcamp-model:3.11.5-slim. You'll need to use it (see Question 5 for an example).  

This image is based on python:3.11.5-slim and has a logistic regression model (a different one) as well a dictionary vectorizer inside.

This is how the Dockerfile for this image looks like:


```
FROM python:3.11.5-slim
WORKDIR /app
COPY ["model2.bin", "dv.bin", "./"]
```
We already built it and then pushed it to [`svizor/zoomcamp-model:3.11.5-slim`](https://hub.docker.com/r/svizor/zoomcamp-model).

> **Note**: You don't need to build this docker image, it's just for your reference.


**Question 5**  

Download the base image `svizor/zoomcamp-model:3.11.5-slim`. You can easily make it by using [docker pull](https://docs.docker.com/engine/reference/commandline/pull/) command.

So what's the size of this base image?

* 45 MB
* 130 MB
* 245 MB
* 330 MB

You can get this information when running `docker images` - it'll be in the "SIZE" column.

In [1]:
import subprocess

# Run the docker images command to check for image size
result = subprocess.run(["docker", "images", "--format", "{{.Repository}}:{{.Tag}} {{.Size}}"], capture_output=True, text=True)

# Filter for the specific image name
for line in result.stdout.splitlines():
    if "svizor/zoomcamp-model:3.11.5-slim" in line:
        print("Image Size:", line)


Image Size: svizor/zoomcamp-model:3.11.5-slim 130MB


## **Dockerfile**

Now create your own Dockerfile based on the image we prepared.

It should start like that:

```docker
FROM svizor/zoomcamp-model:3.11.5-slim
# add your stuff here
```

Now complete it:

* Install all the dependencies form the Pipenv file
* Copy your Flask script
* Run it with Gunicorn 

After that, you can build your docker image.

**Question 6**  

Let's run your docker container!

After running it, score this client once again:

```python
url = "YOUR_URL"
client = {"job": "management", "duration": 400, "poutcome": "success"}
requests.post(url, json=client).json()
```

What's the probability that this client will get a subscription now?

* 0.287
* 0.530
* 0.757
* 0.960

In [4]:
import requests
 
url = 'http://localhost:9696/predict'
client = {"job": "management", "duration": 400, "poutcome": "success"}
requests.post(url, json=client).json()

response = requests.post(url, json=client).json()
print(response)
 
if response['deposit'] == True:
    print("Client has a running term deposit subscription")
else:
    print("Client doesn't have a running term deposit subscription")

{'deposit': True, 'deposit_probability': 0.7590966516879658}
Client has a running term deposit subscription
