## Question 1

* Install `uv`
* What's the version of uv you installed?
* Use `--version` to find out

In [54]:
#%pip install uv

In [55]:
#%uv --version

## Initialize an empty uv project

You should create an empty folder for homework
and do it there. 


In [56]:
#uv init /Users/Nina/code/codeNHerz/ml-zoomcamp-coursework/mein_projekt/

## Question 2

* Use uv to install Scikit-Learn version 1.6.1 
* What's the first hash for Scikit-Learn you get in the lock file?
* Include the entire string starting with sha256:, don't include quotes


In [57]:
#uv add scikit-learn==1.6.1

In [58]:
print('sha256:3faa5c39054b2f03ca547da9b2f52fde67c06240c31853f306aea97f13647b55')

sha256:3faa5c39054b2f03ca547da9b2f52fde67c06240c31853f306aea97f13647b55


Models
We have prepared a pipeline with a dictionary vectorizer and a model.

It was trained (roughly) using this code:

```python
categorical = ['lead_source']
numeric = ['number_of_courses_viewed', 'annual_income']

df[categorical] = df[categorical].fillna('NA')
df[numeric] = df[numeric].fillna(0)

train_dict = df[categorical + numeric].to_dict(orient='records')

pipeline = make_pipeline(
    DictVectorizer(),
    LogisticRegression(solver='liblinear')
)

pipeline.fit(train_dict, y_train)
```
Note: You don't need to train the model. This code is just for your reference.

And then saved with Pickle. Download it [here][https://github.com/DataTalksClub/machine-learning-zoomcamp/tree/master/cohorts/2025/05-deployment/pipeline_v1.bin].

With wget:

wget https://github.com/DataTalksClub/machine-learning-zoomcamp/raw/refs/heads/master/cohorts/2025/05-deployment/pipeline_v1.bin


## Question 3

Let's use the model!

* Write a script for loading the pipeline with pickle
* Score this record:

```json
{
    "lead_source": "paid_ads",
    "number_of_courses_viewed": 2,
    "annual_income": 79276.0
}
```

What's the probability that this lead will convert? 

* 0.333
* 0.533
* 0.733
* 0.933

If you're getting errors when unpickling the files, check their checksum:

```bash
$ md5sum pipeline_v1.bin
7d17d2e4dfbaf1e408e1a62e6e880d49 *pipeline_v1.bin

In [59]:
!wget https://github.com/DataTalksClub/machine-learning-zoomcamp/raw/refs/heads/master/cohorts/2025/05-deployment/pipeline_v1.bin -O pipeline_v1.bin

--2025-10-27 21:36:34--  https://github.com/DataTalksClub/machine-learning-zoomcamp/raw/refs/heads/master/cohorts/2025/05-deployment/pipeline_v1.bin
Resolving github.com (github.com)... 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/refs/heads/master/cohorts/2025/05-deployment/pipeline_v1.bin [following]
--2025-10-27 21:36:34--  https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/refs/heads/master/cohorts/2025/05-deployment/pipeline_v1.bin
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1300 (1.3K) [application/octet-stream]
Saving to: ‘pipeline_v1.bin’


2025-10-27 21:

In [60]:
import pickle
import pandas as pd
import inspect

from sklearn.pipeline import make_pipeline
from sklearn.pipeline import Pipeline

In [61]:
# --- Load the trained pipeline ---
with open("pipeline_v1.bin", "rb") as f:
    pipeline = pickle.load(f)

print(pipeline)
print("Model pipeline loaded successfully!")

# --- Define the input record ---
record = {
    "lead_source": "paid_ads",
    "number_of_courses_viewed": 2,
    "annual_income": 79276.0
}

Pipeline(steps=[('dictvectorizer', DictVectorizer()),
                ('logisticregression', LogisticRegression(solver='liblinear'))])
Model pipeline loaded successfully!


In [62]:
categorical = ['lead_source']
numeric = ['number_of_courses_viewed', 'annual_income']

print(categorical + numeric)
print("richtige Reihenfolge der Features")

['lead_source', 'number_of_courses_viewed', 'annual_income']
richtige Reihenfolge der Features


In [63]:
pred_df = pd.DataFrame(record, index=[0])
pred_df

Unnamed: 0,lead_source,number_of_courses_viewed,annual_income
0,paid_ads,2,79276.0


In [64]:
pred_dict = pred_df[categorical + numeric].to_dict(orient='records')
pred_dict

[{'lead_source': 'paid_ads',
  'number_of_courses_viewed': 2,
  'annual_income': 79276.0}]

In [65]:
pipeline.predict_proba(pred_dict)

array([[0.46639273, 0.53360727]])

In [None]:
y_pred = pipeline.predict_proba(pred_dict)
y_pred = y_pred[0, 1]
print(f"Probability that this lead will convert: {y_pred:.3f}")

Probability that this lead will convert: 0.466


In [67]:
# --- Optional: Map to nearest answer choice ---
choices = [0.333, 0.533, 0.733, 0.933]
closest = min(choices, key=lambda x: abs(x - y_pred))
print(f" Closest match: {closest}")

 Closest match: 0.533


In [None]:
#uv run python score_lead.py

## Question 4

Now let's serve this model as a web service

* Install FastAPI
* Write FastAPI code for serving the model
* Now score this client using `requests`:

```python
url = "YOUR_URL"
client = {
    "lead_source": "organic_search",
    "number_of_courses_viewed": 4,
    "annual_income": 80304.0
}
requests.post(url, json=client).json()
```

What's the probability that this client will get a subscription?

* 0.334
* 0.534
* 0.734
* 0.934


## Docker

Install [Docker](https://github.com/DataTalksClub/machine-learning-zoomcamp/blob/master/05-deployment/06-docker.md). 
We will use it for the next two questions.

For these questions, we prepared a base image: `agrigorev/zoomcamp-model:2025`. 
You'll need to use it (see Question 5 for an example).

This image is based on `3.13.5-slim-bookworm` and has
a pipeline with logistic regression (a different one)
as well a dictionary vectorizer inside. 

This is how the Dockerfile for this image looks like:

```docker 
FROM python:3.13.5-slim-bookworm
WORKDIR /code
COPY pipeline_v2.bin .
```

We already built it and then pushed it to [`agrigorev/zoomcamp-model:2025`](https://hub.docker.com/r/agrigorev/zoomcamp-model).

> **Note**: You don't need to build this docker image, it's just for your reference.


## Question 5

Download the base image `agrigorev/zoomcamp-model:2025`. You can easily make it by using [docker pull](https://docs.docker.com/engine/reference/commandline/pull/) command.

So what's the size of this base image?

* 45 MB
* 121 MB
* 245 MB
* 330 MB

You can get this information when running `docker images` - it'll be in the "SIZE" column.


## Dockerfile

Now create your own `Dockerfile` based on the image we prepared.

It should start like that:

```docker
FROM agrigorev/zoomcamp-model:2025
# add your stuff here
```

Now complete it:

* Install all the dependencies from pyproject.toml
* Copy your FastAPI script
* Run it with uvicorn 

After that, you can build your docker image.


## Question 6

Let's run your docker container!

After running it, score this client once again:

```python
url = "YOUR_URL"
client = {
    "lead_source": "organic_search",
    "number_of_courses_viewed": 4,
    "annual_income": 80304.0
}
requests.post(url, json=client).json()
```

What's the probability that this lead will convert?

* 0.39
* 0.59
* 0.79
* 0.99

