# Set up the environment

You need to install Python, NumPy, Pandas, Matplotlib and Seaborn.

Done and ready to go.

## Imports

In [1]:
import jupyter_black
jupyter_black.load()

In [9]:
import numpy as np
import pandas as pd

import pickle

import seaborn as sns
from matplotlib import pyplot as plt

from sklearn.feature_extraction import DictVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mutual_info_score
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
from sklearn.model_selection import KFold
from sklearn.model_selection import train_test_split


%matplotlib inline

---

## Homework

In this homework, we will use Credit Card Data from [the previous homework](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/cohorts/2022/04-evaluation/homework.md).

> Note: sometimes your answer doesn't match one of the options exactly. That's fine. 
Select the option that's closest to your solution.

---

## Question 1

* Install Pipenv
* What's the version of pipenv you installed?
* Use `--version` to find out

In [2]:
!pipenv --version

[1mpipenv[0m, version 2022.10.4
[0m

---

## Question 2

* Use Pipenv to install Scikit-Learn version 1.0.2
* What's the first hash for scikit-learn you get in Pipfile.lock?

Note: you should create an empty folder for homework
and do it there.

`sha256:08ef968f6b72033c16c479c966bf37ccd49b06ea91b765e1cc27afefe723920b`

---

## Models

We've prepared a dictionary vectorizer and a model.

They were trained (roughly) using this code:

```python
features = ['reports', 'share', 'expenditure', 'owner']
dicts = df[features].to_dict(orient='records')

dv = DictVectorizer(sparse=False)
X = dv.fit_transform(dicts)

model = LogisticRegression(solver='liblinear').fit(X, y)
```

> **Note**: You don't need to train the model. This code is just for your reference.

And then saved with Pickle. Download them:

* [DictVectorizer](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/cohorts/2022/05-deployment/homework/dv.bin?raw=true)
* [LogisticRegression](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/cohorts/2022/05-deployment/homework/model1.bin?raw=true)

With `wget`:

```bash
PREFIX=https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/course-zoomcamp/cohorts/2022/05-deployment/homework
wget $PREFIX/model1.bin
wget $PREFIX/dv.bin
```

In [3]:
%pwd

'/Users/davidcolton/git/github/mlbookcamp-homework/week_05'

In [4]:
%cd "./homework"

/Users/davidcolton/git/github/mlbookcamp-homework/week_05/homework


In [8]:
PREFIX = "https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/course-zoomcamp/cohorts/2022/05-deployment/homework"
model_path = f"{PREFIX}/model1.bin"
dv_path = f"{PREFIX}/dv.bin"
!wget $model_path
!wget $dv_path

--2022-10-06 17:02:10--  https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/course-zoomcamp/cohorts/2022/05-deployment/homework/model1.bin
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 889 [application/octet-stream]
Saving to: ‘model1.bin’


2022-10-06 17:02:10 (23.6 MB/s) - ‘model1.bin’ saved [889/889]

--2022-10-06 17:02:10--  https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/course-zoomcamp/cohorts/2022/05-deployment/homework/dv.bin
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 333

In [10]:
%cd ".."

/Users/davidcolton/git/github/mlbookcamp-homework/week_05


---

## Question 3

Let's use these models!

* Write a script for loading these models with pickle
* Score this client:

```json
{"reports": 0, "share": 0.001694, "expenditure": 0.12, "owner": "yes"}
```

What's the probability that this client will get a credit card? 

* **0.162**
* 0.391
* 0.601
* 0.993

If you're getting errors when unpickling the files, check their checksum:

```bash
$ md5sum model1.bin dv.bin
3f57f3ebfdf57a9e1368dcd0f28a4a14  model1.bin
6b7cded86a52af7e81859647fa3a5c2e  dv.bin
```

In [18]:
# The model file
model_file = "./homework/model1.bin"

# Read in model and Dict Vextorizer
with open(model_file, "rb") as fm:
    model = pickle.load(fm)
    
# The DictVectorizer file
dv_file = "./homework/dv.bin"

# Read in model and Dict Vextorizer
with open(dv_file, "rb") as fd:
    dv = pickle.load(fd)

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


In [20]:
# The client to score
client = {"reports": 0, 
          "share": 0.001694,
          "expenditure": 0.12, 
          "owner": "yes",}

In [21]:
# Create DictVevtor of client
X = dv.transform([client])

In [23]:
# Predict the client
model.predict_proba(X)[0, 1]

0.16213414434326598

```
(homework)
mlbookcamp-homework/week_05/homework  main ✗                                                                        4d16h ◒

▶ python q3.py
Prediction: 0.162
```

---
## Question 4

Now let's serve this model as a web service

* Install Flask and gunicorn (or waitress, if you're on Windows)
* Write Flask code for serving the model
* Now score this client using `requests`:

```python
url = "YOUR_URL"
client = {"reports": 0, "share": 0.245, "expenditure": 3.438, "owner": "yes"}
requests.post(url, json=client).json()
```

What's the probability that this client will get a credit card?

* 0.274
* 0.484
* 0.698
* **0.928**

```
(homework)
mlbookcamp-homework/week_05/homework  main ✗                                                                        4d16h ◒
▶ python q4_flask.py
 * Serving Flask app 'churn'
 * Debug mode: on
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:9696
 * Running on http://192.168.68.73:9696
Press CTRL+C to quit
 * Restarting with stat
 * Debugger is active!
 * Debugger PIN: 208-714-865
127.0.0.1 - - [06/Oct/2022 17:52:21] "POST /predict HTTP/1.1" 200 -
```

```
mlbookcamp-homework/week_05/homework  main ✗                                                                        4d16h ◒
▶ python q4_test.py
{'churn': True, 'churn_probability': 0.9282218018527452}
```

---

## Docker

Install [Docker](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/05-deployment/06-docker.md). We will use it for the next two questions.

For these questions, we prepared a base image: `svizor/zoomcamp-model:3.9.12-slim`. 
You'll need to use it (see Question 5 for an example).

This image is based on `python:3.9.12-slim` and has a logistic regression model 
(a different one) as well a dictionary vectorizer inside. 

This is how the Dockerfile for this image looks like:

```docker 
FROM python:3.9.12-slim
WORKDIR /app
COPY ["model2.bin", "dv.bin", "./"]
```

We already built it and then pushed it to [`svizor/zoomcamp-model:3.9.12-slim`](https://hub.docker.com/r/svizor/zoomcamp-model).

> **Note**: You don't need to build this docker image, it's just for your reference.

---

## Question 5

Download the base image `svizor/zoomcamp-model:3.9.12-slim`. You can easily make it by using [docker pull](https://docs.docker.com/engine/reference/commandline/pull/) command.

So what's the size of this base image?

* 15 Mb
* **125 Mb**
* 275 Mb
* 415 Mb

You can get this information when running `docker images` - it'll be in the "SIZE" column.

```
mlbookcamp-homework/week_05/homework  main ✗                                                                        4d22h ◒
▶ docker images
REPOSITORY              TAG           IMAGE ID       CREATED       SIZE
svizor/zoomcamp-model   3.9.12-slim   571a6fdc554b   4 days ago    125MB
ml-zoomcamp             latest        9350f5e3bc8c   4 weeks ago   8.12GB
```

---

## Dockerfile

Now create your own Dockerfile based on the image we prepared.

It should start like that:

```docker
FROM svizor/zoomcamp-model:3.9.12-slim
# add your stuff here
```

Now complete it:

* Install all the dependencies form the Pipenv file
* Copy your Flask script
* Run it with Gunicorn 

After that, you can build your docker image.

---

## Question 6

Let's run your docker container!

After running it, score this client once again:

```python
url = "YOUR_URL"
client = {"reports": 0, "share": 0.245, "expenditure": 3.438, "owner": "yes"}
requests.post(url, json=client).json()
```

What's the probability that this client will get a credit card now?

* 0.289
* 0.502
* **0.769**
* 0.972

```
(week_05)
github/mlbookcamp-homework/week_05  main ✗                                                                          4d22h ◒
▶ docker run -it --rm -p 9696:9696 q6-week05
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
[2022-10-06 22:27:48 +0000] [1] [INFO] Starting gunicorn 20.1.0
[2022-10-06 22:27:48 +0000] [1] [INFO] Listening at: http://0.0.0.0:9696 (1)
[2022-10-06 22:27:48 +0000] [1] [INFO] Using worker: sync
[2022-10-06 22:27:48 +0000] [8] [INFO] Booting worker with pid: 8
```

```
(homework)
mlbookcamp-homework/week_05/homework  main ✗                                                                        4d22h ◒
▶ python q6_test.py
{'churn': True, 'churn_probability': 0.7692649226628628}
```

---

## Submit the results

* Submit your results here: https://forms.gle/jU2we8f9WeLgX3qa6
* You can submit your solution multiple times. In this case, only the last submission will be used 
* If your answer doesn't match options exactly, select the closest one


## Deadline

The deadline for submitting is **10 October 2022 (Monday), 23:00 CEST (Berlin time)**. 

After that, the form will be closed.

---

# Learning in Publin

* https://twitter.com/David__Colton/status/1576364788053004288?s=20&t=8JD-Sn-Y1Kxi12t4N6_LqQ
* https://twitter.com/David__Colton/status/1576519724543811584?s=20&t=SdNTxJ6pyDPzSMCAQa0FTw
* https://twitter.com/David__Colton/status/1576560687417262080?s=20&t=XBa9sEeXAuD9Tk_V_S9pSQ
* https://twitter.com/David__Colton/status/1576573328806268928?s=20&t=faOvvCLuCb93Gz6CZ12SEw
* https://twitter.com/David__Colton/status/1576705300278939649?s=20&t=E_WEpToDsOPWXdr31Iu55Q
* https://twitter.com/David__Colton/status/1577057996697669632?s=20&t=7-BKWo1gzO-OCwSMY5TrvQ
* https://twitter.com/David__Colton/status/1577420741284347905?s=20&t=xYlzI9HIGkyQvUxv4yBxKQ
* https://twitter.com/David__Colton/status/1577421939131682816?s=20&t=0bfJNxPv27WzmemQdAlgog