# Homework 5 Till Meineke


In [1]:
import pandas as pd

> Note: sometimes your answer doesn't match one of the options exactly.
> That's fine. Select the option that's closest to your solution.

> Note: we recommend using python 3.11 in this homework.

In this homework, we will use the Bank Marketing dataset. Download it from [here](https://archive.ics.uci.edu/static/public/222/bank+marketing.zip).

You can do it with `wget`:

```bash
wget https://archive.ics.uci.edu/static/public/222/bank+marketing.zip
unzip bank+marketing.zip
unzip bank.zip
```

We need `bank-full.csv`.

You can also access the copy of `back-full.csv` directly:

```bash
wget https://github.com/alexeygrigorev/datasets/raw/refs/heads/master/bank-full.csv
```


In [3]:
df = pd.read_csv("../03-classification/data/bank-full.csv", delimiter=";")
df.head().T

Unnamed: 0,0,1,2,3,4
age,58,44,33,47,33
job,management,technician,entrepreneur,blue-collar,unknown
marital,married,single,married,married,single
education,tertiary,secondary,secondary,unknown,unknown
default,no,no,no,no,no
balance,2143,29,2,1506,1
housing,yes,yes,yes,yes,no
loan,no,no,yes,no,no
contact,unknown,unknown,unknown,unknown,unknown
day,5,5,5,5,5


## Question 1


- Install Pipenv
- What's the version of pipenv you installed?
- Use `--version` to find out


In [None]:
# !pip install pipenv

already in my [environment.yml](../environment.yml)


In [4]:
!pipenv --version

[1mpipenv[0m, version 2024.1.0


## Question 2


- Use Pipenv to install Scikit-Learn version 1.5.2
- What's the first hash for scikit-learn you get in Pipfile.lock?

> **Note**: you should create an empty folder for homework
> and do it there.


"sha256:03b6158efa3faaf1feea3faa884c840ebd61b6484167c711548fce208ea09445"

## Models


We've prepared a dictionary vectorizer and a model.

They were trained (roughly) using this code:

```python
features = ['job', 'duration', 'poutcome']
dicts = df[features].to_dict(orient='records')

dv = DictVectorizer(sparse=False)
X = dv.fit_transform(dicts)

model = LogisticRegression().fit(X, y)
```

> **Note**: You don't need to train the model. This code is just for your reference.

And then saved with Pickle. Download them:

- [DictVectorizer](https://github.com/DataTalksClub/machine-learning-zoomcamp/tree/master/cohorts/2024/05-deployment/homework/dv.bin?raw=true)
- [LogisticRegression](https://github.com/DataTalksClub/machine-learning-zoomcamp/tree/master/cohorts/2024/05-deployment/homework/model1.bin?raw=true)

With `wget`:

```bash
PREFIX=https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/master/cohorts/2024/05-deployment/homework
wget $PREFIX/model1.bin
wget $PREFIX/dv.bin
```


I downloaded the files manually in the `./model` folder.

## Question 3


Let's use these models!

- Write a script for loading these models with pickle

In [6]:
import pickle


def load(filename: str):
    with open(filename, "rb") as f_in:
        return pickle.load(f_in)


dv = load("./model/dv.bin")
model = load("./model/model1.bin")

- Score this client:

```json
{ "job": "management", "duration": 400, "poutcome": "success" }
```

In [9]:

client = {"job": "management", "duration": 400, "poutcome": "success"}

X = dv.transform([client])
y_pred = model.predict_proba(X)[0, 1]

print(f"Probability client will get subscription: {y_pred:.3f}")

Probability client will get subscription: 0.759



What's the probability that this client will get a subscription?

- 0.359
- 0.559
- **0.759**
- 0.959

If you're getting errors when unpickling the files, check their checksum:

```bash
$ md5sum model1.bin dv.bin
3d8bb28974e55edefa000fe38fd3ed12  model1.bin
7d37616e00aa80f2152b8b0511fc2dff  dv.bin
```


In [5]:
!md5sum ./model/model1.bin ./model/dv.bin

3d8bb28974e55edefa000fe38fd3ed12  ./model/model1.bin
7d37616e00aa80f2152b8b0511fc2dff  ./model/dv.bin


## Question 4


Now let's serve this model as a web service

- Install Flask and gunicorn (or waitress, if you're on Windows)

In [10]:
!pip list | grep "Flask\|gunicorn"

Flask                     3.0.3
gunicorn                  23.0.0


- Write Flask code for serving the model

In [12]:
# import pickle

# from flask import Flask
# from flask import request
# from flask import jsonify


# def load(filename: str):
#     with open(filename, "rb") as f_in:
#         return pickle.load(f_in)


# dv = load("./model/dv.bin")
# model = load("./model/model1.bin")

# app = Flask("get-credit")


# @app.route("/predict", methods=["POST"])
# def predict():
#     client = request.get_json()

#     X = dv.transform([client])
#     y_pred = model.predict_proba(X)[0, 1]
#     get_credit = y_pred >= 0.5

#     result = {"get_credit_probability": float(y_pred), "get_credit": bool(get_credit)}

#     return jsonify(result)


# if __name__ == "__main__":
#     app.run(debug=True, host="0.0.0.0", port=9696)

- Now score this client using `requests`:

```python
url = "YOUR_URL"
client = {"job": "student", "duration": 280, "poutcome": "failure"}
requests.post(url, json=client).json()
```

In [14]:
import requests

url = "http://localhost:9696/predict"
client = {"job": "student", "duration": 280, "poutcome": "failure"}
requests.post(url, json=client).json()

{'get_credit': False, 'get_credit_probability': 0.33480703475511053}

What's the probability that this client will get a subscription?

- **0.335**
- 0.535
- 0.735
- 0.935


## Docker


Install [Docker](https://github.com/DataTalksClub/machine-learning-zoomcamp/blob/master/05-deployment/06-docker.md).
We will use it for the next two questions.

For these questions, we prepared a base image: `svizor/zoomcamp-model:3.11.5-slim`.
You'll need to use it (see Question 5 for an example).

This image is based on `python:3.11.5-slim` and has a logistic regression model
(a different one) as well a dictionary vectorizer inside.

This is how the Dockerfile for this image looks like:

```docker
FROM python:3.11.5-slim
WORKDIR /app
COPY ["model2.bin", "dv.bin", "./"]
```

We already built it and then pushed it to [`svizor/zoomcamp-model:3.11.5-slim`](https://hub.docker.com/r/svizor/zoomcamp-model).

> **Note**: You don't need to build this docker image, it's just for your reference.


## Question 5


Download the base image `svizor/zoomcamp-model:3.11.5-slim`. You can easily make it by using [docker pull](https://docs.docker.com/engine/reference/commandline/pull/) command.

So what's the size of this base image?

- 45 MB
- **130 MB**
- 245 MB
- 330 MB

You can get this information when running `docker images` - it'll be in the "SIZE" column.


In [15]:
!docker images

REPOSITORY              TAG           IMAGE ID       CREATED       SIZE
svizor/zoomcamp-model   3.11.5-slim   975e7bdca086   2 days ago    130MB
churn-predictor         latest        acb6169f1507   10 days ago   508MB


## Dockerfile


Now create your own Dockerfile based on the image we prepared.

It should start like that:

```docker
FROM svizor/zoomcamp-model:3.11.5-slim
# add your stuff here
```

Now complete it:

- Install all the dependencies form the Pipenv file
- Copy your Flask script
- Run it with Gunicorn

After that, you can build your docker image.


## Question 6


Let's run your docker container!

After running it, score this client once again:

```python
url = "YOUR_URL"
client = {"job": "management", "duration": 400, "poutcome": "success"}
requests.post(url, json=client).json()
```

In [21]:
import requests

url = "http://ec2-18-159-215-56.eu-central-1.compute.amazonaws.com:9696/predict"
client = {"job": "management", "duration": 400, "poutcome": "success"}
requests.post(url, json=client).json()

{'get_credit': True, 'get_credit_probability': 0.756743795240796}

What's the probability that this client will get a subscription now?

- 0.287
- 0.530
- **0.757**
- 0.960


## Homework URL

[Homework 05](https://github.com/TillMeineke/ML_Zoomcamp2024_hw/blob/main/05-deployment/homework_05_till_meineke.ipynb)

## Learning in public links (optional) 

- [x] Learn in public 1: weekly learning [LinkedIn](https://www.linkedin.com/posts/tillmeineke_mlzoomcamp-activity-7256627802806833152-tmg_?utm_source=share&utm_medium=member_desktop) on 28 October 2024
- [x] Learn in public 2: python in zoo [LinkedIn](https://www.linkedin.com/posts/tillmeineke_mlzoomcamp-hagenbecks-activity-7257087338097352704-U8IV?utm_source=share&utm_medium=member_desktop) on 29 October 2024
- [x] Learn in public 3: deployment [LinkedIn](https://www.linkedin.com/posts/tillmeineke_mlzoomcamp-activity-7257110420946075649-WP47?utm_source=share&utm_medium=member_desktop) on 29 October 2024
- [x] Learn in public 4: deployment flask [LinkedIn](https://www.linkedin.com/posts/tillmeineke_mlzoomcamp-activity-7257112284295036929-1sLr?utm_source=share&utm_medium=member_desktop) on 29 October 2024
- [x] Learn in public 5: course leaderboard top10 [LinkedIn](https://www.linkedin.com/posts/tillmeineke_mlzoomcamp-activity-7257290745663868928-8EyW?utm_source=share&utm_medium=member_desktop) on 30 October 2024
- [x] Learn in public 6: docker [LinkedIn](https://www.linkedin.com/posts/tillmeineke_mlzoomcamp-activity-7257292555875205120-jWzI?utm_source=share&utm_medium=member_desktop) on 30 October 2024
- [x] Learn in public 7: beanstalk [LinkedIn](https://www.linkedin.com/posts/tillmeineke_mlzoomcamp-activity-7257294376551206912-eI6t?utm_source=share&utm_medium=member_desktop) on 30 October 2024

## Time spent on lectures (hours) (optional)

## Time spent on homework (hours) (optional)

## FAQ contribution (FAQ document, optional)

added link for HW 2024