# 5.10 Homework
- [homework](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/05-deployment/homework.md)
- files : https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/chapter-05-deployment

## Question 1: Version of Pipenv

* Install Pipenv
* What's the version of pipenv you installed?
* Use `--version` to find out

In [6]:
!pip install pipenv

In [7]:
# pipenv version check: pipenv, version 2021.5.29
!pipenv --version

## Question 2: Checksum for Scikit-Learn 1.0
* Use Pipenv to install Scikit-Learn version 1.0
* What's the first hash for scikit-learn you get in Pipfile.lock? 

In [8]:
!pipenv install scikit-learn==1.0

In [9]:
!cat Pipfile

In [10]:
!cat Pipfile.lock
# "scikit-learn": {
#             "hashes": [
#                 "sha256:121f78d6564000dc5e968394f45aac87981fcaaf2be40cfcd8f07b2baa1e1829",

## Models

We've prepared a dictionary vectorizer and a model.

They were trained (roughly) using this code:

```
features = ['tenure', 'monthlycharges', 'contract']
dicts = df[features].to_dict(orient='records')

dv = DictVectorizer(sparse=False)
X = dv.fit_transform(dicts)

model = LogisticRegression().fit(X, y)
```

> **Note**: You don't need to train the model. This code is just for your reference.

And then saved with Pickle. Download them:

* [DictVectorizer](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/05-deployment/homework/dv.bin?raw=true)
* [LogisticRegression](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/05-deployment/homework/model1.bin?raw=true)

With wget:

```bash
PREFIX=https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/course-zoomcamp/05-deployment/homework
wget $PREFIX/model1.bin
wget $PREFIX/dv.bin
```

In [2]:
%%bash
PREFIX=https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/course-zoomcamp/05-deployment/homework
wget $PREFIX/model1.bin
wget $PREFIX/dv.bin

In [8]:
!ls

In [7]:
!md5sum model1.bin dv.bin

## Question 3: Probability of churning (Script)
0.115
0.315
0.515
0.715

Let's use these models!
* Write a script for loading these models with pickle
* Score this customer:

```json
{"contract": "two_year", "tenure": 12, "monthlycharges": 19.7}
```

What's the probability that this customer is churning? 

If you're getting errors when unpickling the files, check their checksum:

```bash
$ md5sum model1.bin dv.bin
5868e129bfbb309ba60bf750263afab1  model1.bin
c49b69f8a5a3c560882ff5daa3c0ff4d  dv.bin
```


In [3]:
import pickle 
dv = pickle.load( open( "dv.bin", "rb" ) )
model = pickle.load( open( "model1.bin", "rb" ) )

def predict_single(customer, dv, model):
    X = dv.transform([customer])
    y_pred = model.predict_proba(X)[:, 1]
    return y_pred[0]

In [4]:
print("What's the probability that this customer is churning? ",
   round(
       predict_single({"contract": "two_year", "tenure": 12, "monthlycharges": 19.7}
                      , dv, model)
         ,3)   
)

## Question 4: Probability of churning (Flask) *
0.398
0.598
0.798
0.998

Now let's serve this model as a web service

* Install Flask and Gunicorn (or waitress, if you're on Windows)
* Write Flask code for serving the model
* Now score this customer using `requests`:

```python
url = "YOUR_URL"
customer = {"contract": "two_year", "tenure": 1, "monthlycharges": 10}
requests.post(url, json=customer).json()
```

What's the probability that this customer is churning?


In [20]:
!pipenv install flask

In [22]:
!pipenv

In [25]:
from flask import Flask

In [15]:
%%bash
cat << EOF > churn.py
import pickle

from flask import Flask
from flask import request
from flask import jsonify


dv = pickle.load( open( "dv.bin", "rb" ) )
model = pickle.load( open( "model2.bin", "rb" ) )

app = Flask('churn')

@app.route('/predict', methods=['POST'])
def predict():
    customer = request.get_json()

    X = dv.transform([customer])
    y_pred = model.predict_proba(X)[0, 1]
    churn = y_pred >= 0.5

    result = {
        'churn_probability': float(y_pred),
        'churn': bool(churn)
    }

    return jsonify(result)


if __name__ == "__main__":
    app.run(debug=True, host='0.0.0.0', port=9696)
EOF

In [16]:
!cat churn.py

In [13]:
!pip install gunicorn

In [26]:
!gunicorn --bind 0.0.0.0:9696 churn:app

In [30]:
%%bash
cat << EOF > churn_test.py

customer = {"contract": "two_year", "tenure": 1, "monthlycharges": 10}

url = 'http://localhost:9696/predict' ## this is the route we made for prediction

import requests ## to use the POST method we use a library named requests
response = requests.post(url, json=customer) ## post the customer information in json format
result = response.json() ## get the server response
print(result)

EOF

In [31]:
!cat churn_test.py

In [None]:
# using "model1.bin"
# 0.998

## Docker

Install [Docker](06-docker.md). We will use it for the next two questions.

For these questions, I prepared a base image: `agrigorev/zoomcamp-model:3.8.12-slim`. 
You'll need to use it (see Question 5 for an example).

This image is based on `python:3.8.12-slim` and has a logistic regression model 
(a different one) as well a dictionary vectorizer inside. 

This is how the Dockerfile for this image looks like:

```docker 
FROM python:3.8.12-slim
WORKDIR /app
COPY ["model2.bin", "dv.bin", "./"]
```

I already built it and then pushed it to [`agrigorev/zoomcamp-model:3.8.12-slim`](https://hub.docker.com/r/agrigorev/zoomcamp-model).

> **Note**: You don't need to build this docker image, it's just for your reference.

## Question 5: Digest for the base image (Docker): "f0f43f7bc6e0"

Now create your own Dockerfile based on the image I prepared.

It should start like that:

```docker
FROM agrigorev/zoomcamp-model:3.8.12-slim
# add your stuff here
```

Now complete it:

* Install all the dependencies form the Pipenv file
* Copy your Flask script
* Run it with gunicorn 


When you build your image, what's the image id for `agrigorev/zoomcamp-model:3.8.12-slim`?

Look at the first step of your build log. It should look something like that:

```
$ docker some-command-for-building
Sending build context to Docker daemon  2.048kB
Step 1/N : FROM agrigorev/zoomcamp-model:3.8.12-slim
 ---> XXXXXXXXXXXX
Step 2/N : ....
```

You need this `XXXXXXXXXXXX`.

Alternatively, you can get this information when running `docker images` - it'll be in the "IMAGE ID" column.
Submitting DIGEST (long string starting with "sha256") is also fine.

#### Dockerfile

In [12]:
%%bash
cat << EOF > dockerfile
FROM agrigorev/zoomcamp-model:3.8.12-slim

RUN pip install pipenv
RUN pip install gunicorn
RUN pip install flask

WORKDIR /app

COPY ["Pipfile", "Pipfile.lock", "./"]

RUN pipenv install --system --deploy

COPY ["churn.py","./"]

EXPOSE 9696

ENTRYPOINT ["gunicorn", "--bind=0.0.0.0:9696", "churn:app"]
EOF

### examples for testing
- `docker run -it --rm python:3.8.12-slim` 
- `docker run -it --rm --entrypoint=bash python:3.8.12-slim`
- `docker run -it --rm agrigorev/zoomcamp-model:3.8.12-slim`

### command for exercise
- `docker build -t zoomcamp-tag . ` # use docker file fromm current director 
- `docker run -it --rm -p 9696:9696 zoomcamp-tag` # run build and map ports

```
$ docker build -t zoomcamp-tag .
Sending build context to Docker daemon     47MB
Step 1/1 : FROM agrigorev/zoomcamp-model:3.8.12-slim
 ---> f0f43f7bc6e0
 ```


## Question 6: Probability of churning (Docker) *
0.329
0.529
0.728
0.928

Let's run your docker container!

After running it, score this customer:

```python
url = "YOUR_URL"
customer = {"contract": "two_year", "tenure": 12, "monthlycharges": 10}
requests.post(url, json=customer).json()
```

What's the probability that this customer is churning?

In [None]:
%%bash
cat << EOF > churn_test.py

customer = {"contract": "two_year", "tenure": 12, "monthlycharges": 10}

url = 'http://localhost:9696/predict' ## this is the route we made for prediction

import requests ## to use the POST method we use a library named requests
response = requests.post(url, json=customer) ## post the customer information in json format
result = response.json() ## get the server response
print(result)

EOF

In [11]:
# using "model2.bin"
# 0.728 {'churn_probability': 0.7284944888182928, 'churn': True}